CN112861474B

CN112861474B - Information labeling method, device, equipment and computer readable storage medium

Info

Publication number: CN112861474B
Application number: CN202110439827.2A
Authority: CN
Inventors: 田上萱; 蔡成飞; 赵文哲; 孔伟杰; 王红法; 刘威
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-07-02
Anticipated expiration: 2041-04-23
Also published as: CN112861474A

Abstract

The application provides an information labeling method, an information labeling device, information labeling equipment and a computer readable storage medium; the method relates to a cloud technology and an artificial intelligence technology, and comprises the following steps: training the (i-1) th measurement model based on the ith positive sample, the ith anchor sample and the ith negative sample of the image data set to obtain an ith measurement model; comparing the ith positive sample with the ith anchor sample based on the ith metric model; based on the comparison result, performing body cutting on the ith positive sample to obtain an (i + 1) th positive sample, and performing body cutting on the ith anchor sample to obtain an (i + 1) th anchor sample; and continuing training the ith measurement model based on the (i + 1) th positive sample, the (i + 1) th anchor sample and the (i + 1) th negative sample, and respectively carrying out body cutting on the (i + 1) th positive sample and the (i + 1) th anchor sample based on the trained ith measurement model until a body marking area of the image data set is obtained when a cutoff condition is met. Through this application, can promote the mark efficiency in main part detection area.

Description

Information labeling method, device, equipment and computer readable storage medium

Technical Field

The present application relates to image processing technologies in the field of big data, and in particular, to an information annotation method, apparatus, device, and computer-readable storage medium.

Background

The main body detection refers to a processing process of determining a main body region (a salient region, namely a region where a main body is located) from an image to be detected; through subject detection, data support can be provided for image processing, and application is further carried out based on images; for example, applications such as image-based advertisement delivery, target positioning based on monitoring images, and searching images with images are all realized through subject detection.

Generally, in order to implement subject detection, a subject frame is usually marked, a subject detection model is trained based on the marked subject frame, and a subject region in an image to be detected is determined based on the subject detection model; however, the marking of the main body frame is manually marked, and the marking efficiency is low.

Disclosure of Invention

The embodiment of the application provides an information labeling method, an information labeling device, information labeling equipment and a computer readable storage medium, and the labeling efficiency can be improved.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides an information labeling method, which comprises the following steps:

training the (i-1) th measurement model based on the ith positive sample, the ith anchor sample and the ith negative sample of the image data set to obtain an ith measurement model, wherein i is a positive integer;

comparing the ith positive sample and the ith anchor sample based on the ith metrology model;

based on the comparison result, performing body cutting on the ith positive sample to obtain an (i + 1) th positive sample, and performing body cutting on the ith anchor sample to obtain an (i + 1) th anchor sample;

continuing to train the ith measurement model based on the (i + 1) th positive sample, the (i + 1) th anchor sample and the (i + 1) th negative sample, and respectively carrying out body cutting on the (i + 1) th positive sample and the (i + 1) th anchor sample based on the trained ith measurement model until a cutoff condition is met, and obtaining the nth positive sample and the nth anchor sample, wherein n is an integer greater than i;

and determining the nth positive sample and the nth anchor sample as a main body labeling area of the image data set.

An embodiment of the present application provides an information labeling apparatus, including:

the model training module is used for training the (i-1) th metric model based on the ith positive sample, the ith anchor sample and the ith negative sample of the image data set to obtain the ith metric model, wherein i is a positive integer;

a similarity measurement module for comparing the ith positive sample with the ith anchor sample based on the ith measurement model;

the main body cutting module is used for carrying out main body cutting on the ith positive sample based on the comparison result to obtain an (i + 1) th positive sample, and carrying out main body cutting on the ith anchor sample to obtain an (i + 1) th anchor sample;

the iterative processing module is used for continuing to train the ith metric model based on the (i + 1) th positive sample, the (i + 1) th anchor sample and the (i + 1) th negative sample, and respectively carrying out main body cutting on the (i + 1) th positive sample and the (i + 1) th anchor sample based on the trained ith metric model until a cutoff condition is met, so as to obtain an nth positive sample and an nth anchor sample, wherein n is an integer greater than i;

and the information labeling module is used for determining the nth positive sample and the nth anchor sample as a main body labeling area of the image data set.

In the embodiment of the present application, the comparison result includes a positive feature response map of the ith positive sample and an anchor feature response map of the ith anchor sample; wherein the positive feature response map is a similarity map of a most similar region in the ith positive sample and the ith anchor sample, and the anchor feature response map is a similarity map of a most similar region in the ith anchor sample and the ith positive sample; the body clipping module is further configured to perform body clipping on the ith positive sample based on the positive feature response graph to obtain the (i + 1) th positive sample; and performing body clipping on the ith anchor sample based on the anchor characteristic response graph to obtain the (i + 1) th anchor sample.

In this embodiment of the present application, the information labeling apparatus further includes a noise cleaning module, configured to determine a first similar region in the ith positive sample based on a similarity threshold and the positive feature response map, and determine a second similar region in the ith anchor sample based on the similarity threshold and the anchor feature response map; when the area value of at least one of the first similar area and the second similar area is smaller than an area threshold value, acquiring a third similar area of the ith positive sample corresponding to other images in the current image set under the current subject category, and acquiring a fourth similar area of the ith anchor sample corresponding to the other images; and when the third similar area is larger than the fourth similar area, determining that the ith anchor sample is a noise image, and deleting the ith anchor sample from the current image set.

In this embodiment of the application, the model training module is further configured to obtain, based on the i-1 st metric model, a positive sample feature of the i-th positive sample, an anchor sample feature of the i-th anchor sample, and a negative sample feature of the i-th negative sample of the image dataset; training the ith-1 th metric model based on the difference between the positive sample features and the anchor sample features and the difference between the anchor sample features and the negative sample features to obtain the ith metric model.

In this embodiment of the present application, the information labeling apparatus further includes a sample construction module, configured to obtain an ith image dataset corresponding to the image dataset, where the ith image dataset is formed by image sets in each subject category, and the ith image dataset is obtained by performing i-1 iterative subject clipping on the image dataset; taking two images in a current image set under a current subject category in the ith image data set as the ith positive sample and the ith anchor sample, wherein the current subject category is any subject category under each subject category; determining one image in the ith image data set different from the other image sets under the current subject category as the ith negative sample.

In this embodiment of the present application, the information labeling apparatus further includes a condition determining module, configured to obtain an nth positive area difference between the nth positive sample and an (n-1) th positive sample, and obtain an nth anchor area difference between the nth anchor sample and an (n-1) th anchor sample; acquiring positive region differences of the previous j times and anchor region differences of the previous j times, wherein j is a positive integer; determining that the cutoff condition is satisfied when a difference between the previous j times positive region difference and the nth times positive region difference is less than a difference threshold, and/or a difference between the previous j times anchor region difference and the nth times anchor region difference is less than the difference threshold.

In an embodiment of the application, the condition determining module is further configured to determine that the cutoff condition is satisfied when at least one of a first proportion of the first similar region in the nth positive sample and a second proportion of the second similar region in the nth anchor sample is greater than a proportion threshold.

In an embodiment of the present application, the information labeling apparatus further includes a subject detection module, configured to train a subject detection model based on the image data set and the subject labeling area; when a main body detection request is received, responding to the main body detection request, and acquiring an image to be detected; and carrying out main body detection on the image to be detected based on the main body detection model to obtain a main body area.

In an embodiment of the present application, the subject detection module is further configured to determine a predicted subject region of an image in the image data set based on an original subject detection model; iteratively training the original subject detection model based on the difference between the predicted subject region and the subject labeling region until a training cutoff condition is met, and obtaining the subject detection model.

In an embodiment of the present application, the main body area is at least one of a retrieval target area, a delivery target area, a detection target area, and a scene area.

In this embodiment of the present application, the information labeling apparatus further includes an application module, configured to, when the main body region is the retrieval target region, extract a feature of the main body labeling region, so as to obtain a feature search library corresponding to the image data set; extracting the feature to be retrieved of the retrieval object area; determining matched features matched with the features to be retrieved from the feature retrieval library; and determining the image corresponding to the matching features in the image data set as a retrieval result of the image to be detected, and performing recommendation processing based on the retrieval result.

In this embodiment of the application, the application module is further configured to, when the main body region is the drop object region, extract drop object features of the drop object region; determining a release attribute based on the release object characteristics, determining information to be released based on the release attribute, and performing release processing based on the information to be released, wherein the release attribute comprises at least one of a release category and a release label; or determining a putting conversion rate based on the characteristics of the putting object, carrying out putting processing on the image to be detected when the putting conversion rate is greater than a putting conversion rate threshold value, and finishing the putting processing on the image to be detected when the putting conversion rate is less than or equal to the putting conversion rate threshold value.

a memory for storing executable instructions;

and the processor is used for realizing the information labeling method provided by the embodiment of the application when the processor executes the executable instructions stored in the memory.

The embodiment of the present application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for annotating information provided by the embodiment of the present application.

The embodiment of the application has the following beneficial effects: training an ith measurement model through a positive sample pair and a negative sample pair composed of an ith positive sample, an ith anchor sample and an ith negative sample, and cutting the region where the main body is located in the ith positive sample and the ith anchor sample by adopting an ith standard injection model; training and cutting are continuously carried out until a main body labeling area (the nth positive sample and the nth anchor sample) is cut out; therefore, the main body labeling area is automatically acquired, and the labeling efficiency can be improved.

Drawings

FIG. 1 is a schematic diagram of an exemplary subject detection process;

FIG. 2 is an alternative architecture diagram of an information annotation system provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative architecture of an information annotation system according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a server in fig. 2 according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart of an alternative information annotation method according to an embodiment of the present disclosure;

FIG. 6 is a flow chart illustrating an exemplary annotation of information provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of an exemplary comparison result obtained according to an embodiment of the present application;

FIG. 8 is a schematic flowchart illustrating an exemplary iterative body clipping process provided by an embodiment of the present application;

FIG. 9 is a schematic flow chart of another alternative information annotation method provided in the embodiments of the present application;

FIG. 10 is a schematic diagram of an exemplary information annotation process provided by an embodiment of the present application;

FIG. 11 is a schematic flow chart of an exemplary training metric model provided by an embodiment of the present application;

FIG. 12 is a diagram illustrating another exemplary process for labeling areas of merchandise in an image according to an embodiment of the present disclosure;

FIG. 13 is a schematic flow chart of an exemplary subject test provided by an embodiment of the present application;

fig. 14 is a schematic flow chart of another exemplary subject detection provided in the embodiments of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third \ fourth" are only to distinguish similar objects and do not denote a particular order or importance to the objects, and it is to be understood that "first \ second \ third \ fourth" may be interchanged with a particular order or sequence where permissible to enable the embodiments of the present application described herein to be practiced in an order other than that shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

2) Machine Learning (ML) is a multi-domain cross discipline, relating to multi-domain disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. Specially researching how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills; reorganizing the existing knowledge structure to improve the performance of the knowledge structure. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning generally includes techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, and inductive learning.

3) Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence, and aims to research various theories and methods for realizing effective communication between people and computers by using natural Language; natural language processing is a science integrating linguistics, computer science and mathematics, so that research in the field relates to natural language, namely the language used by people daily, and is closely related to the research of linguistics; natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like. The information labeling method provided by the embodiment of the application can be applied to the field of natural language processing.

4) Neural Networks (NN), a mathematical model that mimics the structure and function of biological Neural Networks, exemplary structures of artificial Neural Networks herein include Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and the like. For example, the model for extracting features, the subject detection model, and the like in the embodiments of the present application.

5) Depth Metric Learning (DML), a method of Metric Learning, aims to learn a mapping from original features to a low-dimensional dense vector space (called embedding space), so that the distances calculated by using common distance functions (such as euclidean distance, cosine distance, etc.) on the embedding space are relatively close, and the distances between objects of different classes are relatively far.

6) Weak Supervised Learning (WSL), a branch of the field of machine Learning, uses limited, noisy or inaccurately labeled data for training model parameters, as compared to traditional Supervised Learning.

7) The salient region refers to a region positioned from the image and most attracting human visual attention.

8) Face recognition is a biometric technology for identity recognition based on facial features, and is a series of related technologies for detecting and tracking faces in images and further carrying out face recognition on the detected faces.

9) Pedestrian re-identification (Person re-identification), also known as pedestrian re-identification, is a technique that uses computer vision techniques to determine whether a particular pedestrian is present in an image or video sequence.

10) Block Chain (Block Chain): is the storage structure of an encrypted, chained transaction formed by blocks (blocks).

11) Block Chain Network (Block Chain Network): the new block is incorporated into the set of a series of nodes of the block chain in a consensus manner.

It should be noted that artificial intelligence is a comprehensive technique in computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

In addition, the artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology develops research and application in a plurality of fields; for example, common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autonomous, unmanned, robotic, smart medical, and smart customer service, etc.; with the development of the technology, the artificial intelligence technology can be applied in more fields and can play more and more important value. The application of the artificial intelligence in the embodiment of the present application in the field of subject detection will be described later.

It should be noted that Big data (Big data) refers to a data set that cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset that needs a new processing mode to have stronger decision-making power, insight discovery power and process optimization capability. With the advent of the cloud era, big data has attracted more and more attention, and the big data needs special technology to effectively process a large amount of data within a tolerance elapsed time. The method is suitable for the technology of big data, and comprises a large-scale parallel processing database, data mining, a distributed file system, a distributed database, a cloud computing platform, the Internet and an extensible storage system.

Generally speaking, the subject detection can be realized based on a full-supervision mode, namely, a subject frame is marked first, then a subject detection model is trained based on the marked subject frame, and then a subject region in an image to be detected is determined based on the subject detection model; however, the marking of the main body frame is manually marked, and the marking efficiency is low. For example, the data sets "VOC", "MSCOCO", "ImageNet", "Objects 365", etc. used to train the subject detection model are all manually labeled; referring to table 1, table 1 is a manual labeling case for a data set:

as can be easily seen from table 1, millions of images and tens of millions of detection frames need to be labeled at most, for example, it takes several weeks or even months to manually label a main body frame, and especially for some newly added main body categories, the main body frame needs to be labeled again; thus, manual annotation is less performable and less efficient. In addition, in order to ensure the quality of manual labeling, the whole labeling process is very complex and comprises data collection and cleaning, labeling personnel training, links such as spot check and verification for ensuring the accuracy of data labeling, the time consumption of the labeling process is large, and the labeling efficiency is low.

In addition, the main body detection can be realized based on a weak supervision method of a general pre-training detection model; the method comprises the steps of obtaining an initial saliency region of an image to be detected through a general pre-training detection model, adjusting the initial saliency region, iteratively training the general pre-training detection model based on the adjusted initial saliency region to obtain a target detection model, and finally determining the saliency region of the image to be detected by using the target detection model. Illustratively, referring to fig. 1, fig. 1 is a schematic diagram of an exemplary subject detection process; as shown in fig. 1, after an image 1-1 to be detected is input into a general pre-training detection model 1-2, a saliency region 1-3 is obtained, the saliency region 1-3 is adjusted based on a threshold filtering and/or a conditional random field, whether the saliency region is stable is judged based on the adjusted saliency region 1-4, and if so, the adjusted saliency region 1-4 is taken as a final detection result 1-5; if not, continuing to iteratively train the universal pre-training detection model 1-2 based on the difference between the adjusted significance region 1-4 and the significance region 1-3 to finely adjust the universal pre-training detection model 1-2 until the significance region is stable, and obtaining a final detection result 1-5.

However, in the process of implementing the subject detection, although information labeling is not required, on one hand, because the input general pre-trained detection model is a picture, the output saliency map representing the saliency region may also generate a strong response to the noise region, which results in that the saliency region and the noise region cannot be distinguished, the general pre-trained detection model is easily affected by noise when being trained, and thus the accuracy of the subject detection is low. On the other hand, a generic pre-training model (pre-training detection model) is required: it is usually necessary to use a generic pre-training model to obtain an initial result on which subsequent optimization iterations are performed. The initial results cannot be too poor, which requires a certain data support for the generic pre-trained model, and this data needs to be consistent with the distribution of the target data, otherwise it is difficult to optimize iteratively. Therefore, if the target data is not universal, it is difficult to find the existing universal pre-training model, for example, if the subject category is wine bottles, toys, milk powder and the like in the commodity, there is no corresponding universal pre-training model. Therefore, the main body detection is still performed based on the labeling method, but the main body frame is labeled manually, so that the labeling efficiency is low.

Based on this, embodiments of the present application provide an information labeling method, apparatus, device, and computer-readable storage medium, which can improve labeling efficiency during detection of a subject. An exemplary application of the information annotation device provided in the embodiment of the present application is described below, and the information annotation device provided in the embodiment of the present application may be implemented as various types of user terminals such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and may also be implemented as a server. Next, an exemplary application when the information labeling apparatus is implemented as a server will be described.

Referring to fig. 2, fig. 2 is an alternative architecture diagram of an information annotation system provided in the embodiment of the present application; as shown in fig. 2, in order to support an information annotation application, in the information annotation system 100, the terminals 400 (the terminal 400-1 and the terminal 400-2 are exemplarily shown) are connected to the server 200 (information annotation device) through the network 301, and the network 301 may be a wide area network or a local area network, or a combination of the two. In addition, the information annotation system 100 further includes a database 302 for providing data support for the server 200 when the server 200 performs information annotation.

The server 200 is used for training the (i-1) th metric model based on the ith positive sample, the ith anchor sample and the ith negative sample of the image data set to obtain the ith metric model, wherein i is a positive integer; comparing the ith positive sample with the ith anchor sample based on the ith metric model; based on the comparison result, performing body cutting on the ith positive sample to obtain an (i + 1) th positive sample, and performing body cutting on the ith anchor sample to obtain an (i + 1) th anchor sample; continuing training the ith measurement model based on the (i + 1) th positive sample, the (i + 1) th anchor sample and the (i + 1) th negative sample, respectively carrying out main body cutting on the (i + 1) th positive sample and the (i + 1) th anchor sample based on the trained ith measurement model until a cutoff condition is met, stopping main body cutting, and obtaining the nth positive sample and the nth anchor sample, wherein n is an integer greater than i; and determining the nth positive sample and the nth anchor sample as a main body labeling area of the image data set. The method is further used for training a main body detection model based on the main body labeling area, performing main body detection on the image to be detected sent by the terminal 400 through the network 301 by using the main body detection model, and sending the detected main body area to the terminal 400 through the network 301.

A terminal 400 for transmitting an image to be detected to the server 200 through the network 301; and is further configured to receive, via the network 301, the subject region transmitted by the server 200 via the network 301, and display the subject region of the image to be detected.

The embodiments of the present application may be implemented by means of Cloud Technology (Cloud Technology), which refers to a hosting Technology for unifying series resources such as hardware, software, and network in a wide area network or a local area network to implement data calculation, storage, processing, and sharing.

The cloud technology is a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources.

As an example, the server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

The embodiment of the present application can also be implemented by using a block chain technique, referring to fig. 3, where fig. 3 is a schematic diagram of another alternative architecture of the information annotation system provided in the embodiment of the present application. In the information annotation system 100 shown in fig. 3, the server 200 performs information annotation, and the server 200 may also perform subject detection on a plurality of terminals (the terminal 400-1 and the terminal 400-2 are exemplarily shown in fig. 3).

In some embodiments, the server 200, the terminal 400-1, and the terminal 400-2 may join the blockchain network 500 as one of the nodes. The type of blockchain network 500 is flexible and may be, for example, any of a public chain, a private chain, or a federation chain. Taking a public link as an example, an electronic device of any service agent may access the blockchain network 500 without authorization, so as to serve as a common node of the blockchain network 500, for example, the terminal 400-1 is mapped to the common node 500-1 in the blockchain network 500, the server 200 is mapped to the common node 500-2 in the blockchain network 500, and the terminal 400-2 is mapped to the common node 500-3 in the blockchain network 500.

Taking the blockchain network 500 as an example of a federation chain, the server 200, the terminal 400-1, and the terminal 400-2 may access the blockchain network 500 to become nodes after obtaining authorization. After the server 200 acquires the image data set, acquires the main body labeling area of the image data set, and trains a main body detection model based on the main body labeling area, main body detection can be performed on the to-be-detected images sent by the terminal 400-1 and the terminal 400-2 in a manner of executing an intelligent contract to acquire the main body area of the to-be-detected image, and the main body area is respectively sent to the block chain network 500 for consensus. And when the consensus passes and the main body area represents the area where the main body of the image to be detected is located, determining the main body area, namely the main body detection result of the image to be detected. Therefore, the detection result is identified by a plurality of nodes in the block chain network, and then the main body detection result is determined, so that the influence of server error detection can be avoided through an identification mechanism, and the reliability and the accuracy of main body detection are further improved.

Referring to fig. 4, fig. 4 is a schematic diagram of a component structure of the server in fig. 2 according to an embodiment of the present disclosure, where the server 200 shown in fig. 4 includes: at least one processor 210, memory 250, at least one network interface 220, and a user interface 230. The various components in server 200 are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 4.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 230 includes one or more output devices 231, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.

The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 250 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless-compatibility authentication (Wi-Fi), and Universal Serial Bus (USB), etc.;

a presentation module 253 to enable presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 231 (e.g., a display screen, speakers, etc.) associated with the user interface 230;

an input processing module 254 for detecting one or more user inputs or interactions from one of the one or more input devices 232 and translating the detected inputs or interactions.

In some embodiments, the information annotation device provided in the embodiments of the present application may be implemented in software, and fig. 4 illustrates an information annotation device 255 stored in a memory 250, which may be software in the form of programs and plug-ins, and includes the following software modules: the model training module 2551, the similarity measurement module 2552, the body cropping module 2553, the iteration processing module 2554, the information labeling module 2555, the noise cleaning module 2556, the sample construction module 2557, the condition determination module 2558, the body detection module 2559, and the application module 25510 are logical, and thus any combination or further splitting can be performed according to the implemented functions. The functions of the respective modules will be explained below.

In other embodiments, the information annotation Device provided in the embodiments of the present Application may be implemented in hardware, and for example, the information annotation Device provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to perform the information annotation method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

In the following, the information annotation method provided by the embodiment of the present application will be described in conjunction with an exemplary application and implementation of the server provided by the embodiment of the present application.

Referring to fig. 5, fig. 5 is an optional flowchart of an information labeling method provided in an embodiment of the present application, and will be described with reference to the steps shown in fig. 5.

S501, training the (i-1) th measurement model based on the ith positive sample, the ith anchor sample and the ith negative sample of the image data set to obtain the ith measurement model.

In the embodiment of the application, when the information labeling device acquires a data set for training a network model and the trained network model is used for performing main body detection, an image data set is acquired; and the image data set is an unlabelled image set, and weak supervision information exists between every two images: the subject category or subject label to which it belongs, e.g., whether image a and image B are the same delivery object or the same item (subject category or subject label, e.g., name or category). In order to realize the marking of the main frame of the image data set, the information marking equipment iteratively trains a measurement model, and gradually determines the main frame of each image in the image data set in the process of iteratively training the measurement model. Aiming at the ith training of the measurement model, the corresponding data set is the data set after the image data set is processed for i-1 times; the information labeling equipment extracts two images of the same subject type or subject label from the data set after i-1 times of processing of the image data set to form a positive sample pair, wherein the positive sample pair is an ith positive sample and an ith anchor sample, and extracts two images of different subject types or subject labels to form a negative sample pair, and the negative sample pair is an ith anchor sample image and an ith negative sample image. Here, i is a positive integer.

It should be noted that the metric model is used to determine the subject similarity between the sample images in the image data set; therefore, the i-1 th metric model is a metric model after being trained for i-1 times, the i-th metric model is a metric model after being trained for i times, and the i-th metric model is obtained by training the i-1 th metric model based on a positive sample pair and a negative sample pair formed by the i-th positive sample, the i-th anchor sample and the i-th negative sample by the information labeling equipment. Here, an image dataset is a dataset of a structured image type, which is reflected in whether the subject categories or subject labels are the same between images.

It should be noted that the data set obtained by processing the image data set i-1 times may be the original image data set (i.e., when i is 1), or may be obtained by processing the data set obtained by processing the image data set or the data set obtained by processing the image data set at least once through the i-1 th metrology model (i.e., when i is greater than 1).

And S502, comparing the ith positive sample with the ith anchor sample based on the ith metrology model.

In the embodiment of the application, after the information labeling device obtains the ith measurement model, because the subject categories or subject labels in the ith positive sample and the ith anchor sample are the same, the ith positive sample and the ith anchor sample are compared by the ith measurement model, in the obtained similarity distribution between the ith positive sample and the ith anchor sample, a region with a large similarity corresponds to a region inside a subject frame, and a region with a small similarity corresponds to a region outside the subject frame; and the similarity distribution is the comparison result of the ith positive sample and the ith anchor sample. That is, the comparison results are the subject-based response profiles of the ith positive sample and the ith anchor sample.

It can be understood that, when the measurement model is used for comparing the ith positive sample with the ith anchor sample, because the main body categories or main body labels in the ith positive sample and the ith anchor sample are the same, a strong response can be generated for the region where the main body is located in the obtained comparison result, but the region where the non-main body is located can be inhibited, a low response is generated, and then noise data can be effectively cleaned, the region where the main body is located can be accurately determined, and the labeling accuracy of the main body frame is improved.

S503, based on the comparison result, performing body cutting on the ith positive sample to obtain an (i + 1) th positive sample, and performing body cutting on the ith anchor sample to obtain an (i + 1) th anchor sample.

In the embodiment of the application, after the information labeling device obtains the comparison result, based on the similarity of the region where the body represented in the comparison result is located, the regions in the non-body frame in the ith positive sample and the ith anchor sample are respectively cut, that is, the body cutting of the ith positive sample and the ith anchor sample is realized; thus, the i +1 th positive sample corresponding to the i-th positive sample and the i +1 th anchor sample corresponding to the i-th anchor sample are also obtained.

Note that the processing for the image data set described in S501 is body cutting, which is processing for cutting out an area where a body is located. Here, when the information labeling device performs body clipping on the ith positive sample and the ith anchor sample respectively based on the comparison result, the body clipping may be performed based on a clipping threshold, so as to achieve fine tuning of the ith positive sample and the ith anchor sample, retain as many similarity regions as possible, delete as few background or noise regions as possible, prevent false deletion, and improve the accuracy of labeling.

It can be understood that the i +1 th positive sample is closer to the region where the main body of the positive sample is located than the i-th positive sample; the (i + 1) th anchor sample is closer to the region where the body of the anchor sample is located than the (i) th anchor sample.

S504, training the ith measurement model continuously based on the (i + 1) th positive sample, the (i + 1) th anchor sample and the (i + 1) th negative sample, and respectively carrying out body cutting on the (i + 1) th positive sample and the (i + 1) th anchor sample based on the trained ith measurement model until the n-th positive sample and the n-th anchor sample are obtained when a cutoff condition is met.

In the embodiment of the application, when the information labeling device obtains the (i + 1) th positive sample and the (i + 1) th anchor sample, the image data set is subject-cut for i times by using the metric model, and the information labeling device extracts an image different from the subject type or subject label of the (i + 1) th anchor sample from the data set obtained by subject-cut for i times on the image data set, so that the (i + 1) th negative sample is obtained; here, the (i + 1) th negative example may be an image obtained by clipping the ith negative example by using the ith metrology model, or may be an image obtained by clipping other negative examples different from the ith negative example by using the ith metrology model, which is not specifically limited in this embodiment of the present application. In addition, when the information labeling equipment performs ith main body cutting on the image data set, the main body cutting of the ith negative sample is realized. And the information marking equipment stops cutting the main body when determining that the cutoff condition is met.

It should be noted that, the information labeling device trains the ith metric model based on the (i + 1) th positive sample, the (i + 1) th anchor sample and the (i + 1) th negative sample, and a process of obtaining the trained ith metric model is similar to the training process in S501; the information labeling equipment adopts the trained ith measurement model, and respectively carries out the main body cutting process on the (i + 1) th positive sample and the (i + 1) th anchor sample, which is similar to the main body cutting process in the S501; the embodiments of the present application are not described herein again.

The information labeling device continuously carries out subsequent training on the trained ith measurement model, and uses the trained measurement model to carry out body cutting on the positive sample pair of the training sample every time the training is finished, and when the positive sample pair cut out by the body tends to be stable in the body frame (the variation of the body frame is smaller than the variation threshold) or meets a certain condition (the area occupation ratio of the body in the body frame is larger than the occupation ratio threshold), the information labeling device determines that a cut-off condition is met; at this time, the results of the body cropping of the obtained image dataset are the nth positive sample and the nth anchor sample. In addition, n is an integer greater than i. It is easy to know that the nth positive sample and the nth anchor sample are images obtained by performing the body cropping n-1 times, and the ith positive sample includes the nth positive sample and the ith anchor sample includes the nth anchor sample. In addition, the cutoff condition may also be the number of body cuts.

And S505, determining the nth positive sample and the nth anchor sample as the main body labeling area of the image data set.

It should be noted that, after the information labeling device obtains the nth positive sample and the nth anchor sample, the nth positive sample and the nth anchor sample are regions of the body frame of the image data set; therefore, the nth positive sample and the nth anchor sample are main body labeling areas of the image data set, and at the moment, labeling of the area where the main body of the image data set is located is achieved. It is easy to know that the main body labeling area is the labeling information of the image data set for the main body.

Referring to fig. 6, fig. 6 is a schematic flowchart of an exemplary information annotation provided in an embodiment of the present application; as shown in fig. 6, for the image data set 6-11, corresponding pairs of positive samples and negative samples are constructed to train the metric model 6-20, resulting in the metric model 6-21; using the measurement model 6-21 to perform main body clipping on the positive sample pairs in the image data set 6-11 to obtain an image data set 6-12; aiming at the image data set 6-12, constructing corresponding positive sample pairs and negative sample pairs to train a metric model 6-21 to obtain a metric model 6-22; using the metric model 6-22 to perform main body clipping on the positive sample pairs in the image data set 6-12 to obtain an image data set 6-13; … …, respectively; an image data set 6-1n is obtained. Here, the positive sample pair in the image dataset 6-11, i.e., the 1 st positive sample and the 1 st anchor sample, and the positive sample pair in the image dataset 6-12, i.e., the 2 nd positive sample and the 2 nd anchor sample; in addition, the positive sample pairs in the image dataset 6-1n are the nth positive sample and the nth anchor sample. In the figure, the processing corresponding to the dotted arrow refers to body clipping, and the processing corresponding to the solid arrow refers to training of the metric model.

It can be understood that the ith measurement model is trained through a positive sample pair and a negative sample pair composed of the ith positive sample, the ith anchor sample and the ith negative sample, and the ith measurement model is adopted to cut the region where the main body is located in the ith positive sample and the ith anchor sample; training and cutting are continuously carried out until a main body labeling area (the nth positive sample and the nth anchor sample) is cut out; therefore, the main body labeling area is automatically acquired, and the labeling efficiency in human body detection can be improved.

In the embodiment of the application, the comparison result comprises a positive characteristic response graph of the ith positive sample and an anchor characteristic response graph of the ith anchor sample; the anchor characteristic response graph is a similarity distribution graph of the most similar region of the ith anchor sample in the ith anchor sample; at this time, S503 may be implemented by S5031 and S5032; that is, the information labeling device performs body clipping on the ith positive sample to obtain the (i + 1) th positive sample based on the comparison result, and performs body clipping on the ith anchor sample to obtain the (i + 1) th anchor sample, including S5031 and S5032, which are described below.

S5031, performing main body cutting on the ith positive sample based on the positive feature response graph to obtain the (i + 1) th positive sample.

In the embodiment of the present application, the response of the metric model generated by the most similar region in the two images in the positive sample pair is the strongest, and the response generated by the less similar region is the lower, that is, the response value of the feature response map (the positive feature response map and the anchor feature response map) is positively correlated with the similarity between the ith positive sample and the ith negative sample. The information labeling equipment superposes the positive feature response image on the ith positive sample, determines an area (an area where a characterized main body is located) with the strongest response (the largest similarity) in the positive feature response image as the center of the ith positive sample, cuts the periphery of the center, and cuts the ith positive sample after cutting, namely the (i + 1) th positive sample.

It should be noted that the information labeling device may also use random field or threshold filtering, and determine the region of the subject clipping by combining the feature response graph, which is not specifically limited in this embodiment of the present application.

S5032, performing main body clipping on the ith anchor sample based on the anchor characteristic response graph to obtain the (i + 1) th anchor sample.

It should be noted that the description of the implementation process corresponding to S5032 is similar to the description of the implementation process corresponding to S5031, and the description of the embodiment of the present application is not repeated here.

Referring to fig. 7, fig. 7 is a schematic diagram of an exemplary comparative result obtained according to an embodiment of the present application; as shown in fig. 7, the positive sample pair 7-1 includes an ith positive sample 7-11 and an ith anchor sample 7-12; when the ith measurement model is used for comparing the ith positive sample 7-11 with the ith anchor sample 7-12, obtaining a positive characteristic response graph 7-21 corresponding to the ith positive sample 7-11 and an anchor characteristic response graph 7-22 corresponding to the ith anchor sample 7-12; wherein the positive feature response fig. 7-21 and the anchor feature response fig. 7-22 are the comparison results 7-2. Here, fig. 7 to 31 are the effects of superimposing the positive feature response fig. 7 to 21 on the ith positive sample 7 to 11, and fig. 7 to 32 are the effects of superimposing the anchor feature response fig. 7 to 22 on the ith anchor sample 7 to 12.

In the following, the body clipping is described by taking an example in which the body clipping is iteratively performed on the ith positive sample 7-11 based on the positive feature response fig. 7-21; referring to fig. 8, fig. 8 is a schematic flowchart illustrating an exemplary iterative body clipping process provided in an embodiment of the present application; as shown in fig. 8, for the ith positive sample 7-11, when the body clipping is performed based on the positive feature response fig. 7-21, the (i + 1) th positive sample 8-1 is obtained; then, based on a characteristic response diagram corresponding to the i +1 th positive sample 8-1 obtained by the measurement model, performing body cutting on the i +1 th positive sample 8-1 to obtain an i +2 th positive sample 8-2; and continuously performing body cutting on the i +2 th positive sample 8-2 based on the characteristic response graph corresponding to the i +2 th positive sample 8-2 obtained by the measurement model to obtain the i +3 th positive sample 8-3 (nth positive sample).

In the embodiment of the present application, S502 may further include S506 to S508; that is, after the information labeling apparatus compares the ith positive sample and the ith anchor sample based on the ith metric model, the information labeling method further includes S506 to S508, and each step is described below.

S506, determining a first similar region in the ith positive sample based on the similarity threshold and the positive characteristic response graph, and determining a second similar region in the ith anchor sample based on the similarity threshold and the anchor characteristic response graph.

In the embodiment of the application, the information annotation device can acquire a similarity threshold, and the similarity threshold is used for determining a similar area between two images in the positive sample pair. Therefore, here, the information annotation device can also obtain a similar region in the ith positive sample and the ith anchor sample by comparing each similarity in the positive feature response map with a similarity threshold, which is referred to as a first similar region; similarly, the information annotation device can also obtain a similar region, referred to herein as a second similar region, in the ith anchor sample from the ith positive sample by comparing each similarity in the anchor feature response map with the similarity threshold.

And S507, when the area value of at least one of the first similar area and the second similar area is smaller than the area threshold, acquiring a third similar area of the ith positive sample corresponding to other images in the current image set under the current subject category, and acquiring a fourth similar area of the ith anchor sample corresponding to other images.

In the embodiment of the application, the information labeling device can acquire an area threshold, wherein the area threshold is used for determining whether the ith positive sample and the ith anchor sample are similar to each other or not; here, after the information labeling apparatus obtains the first similar area and the second similar area, the area values corresponding to the first similar area and the second similar area, respectively, are compared with the area threshold value. If the region value of at least one of the first similar region and the second similar region is not less than the region threshold, it indicates that the ith positive sample and the ith anchor sample are similar.

If the area value of at least one of the first similar area and the second similar area is less than the area threshold value, the ith positive sample and the ith anchor sample are not similar; at this time, it indicates that one of the ith positive sample and the ith anchor sample is a noise image (e.g., an image including a part of the subject or an image for presenting the associated content of the subject); thus, the information labeling apparatus further determines noise images in the ith positive sample and the ith anchor sample.

It should be noted that the ith positive sample and the ith anchor sample are two images in the current image set under the current subject category; here, the information annotation device determines a third similar region based on the similar regions of the ith positive sample and other images in the current image set (for example, calculates an average value of the similar regions of the ith positive sample and other images in the current image set); similarly, a fourth similar area of the ith anchor sample corresponding to the other images is determined.

And S508, when the third similar area is larger than the fourth similar area, determining that the ith anchor sample is a noise image, and deleting the ith anchor sample from the current image set.

It should be noted that, after the information annotation device obtains the third similar region and the fourth similar region, the third similar region and the fourth similar region are compared, and if the third similar region is larger than the fourth similar region, the ith anchor sample is a noise image, so that the ith anchor sample is deleted from the current image set; and if the third similar area is smaller than the fourth similar area, indicating that the ith positive sample is a noise image, and deleting the ith positive sample from the current image set. And subsequently, constructing a positive sample pair based on the current image set with the noise image deleted, and performing body clipping.

It can be understood that the noise image is cleaned in the body cutting process, the quality of the obtained training sample for training the body detection model can be improved, and then the marking effect and the body detection precision can be improved. It is easy to understand that, based on the methods determined in S506 to S508, the similarity result of the images in different subject categories may be detected, and when the images in different subject categories are similar, it indicates that the two subject categories are a subject category, and at this time, the sets of images in the two subject categories may be merged.

In the embodiment of the present application, S501 may be implemented by S5011 and S5012; that is, the information annotation device trains the metric model i-1 times based on the ith positive sample, the ith anchor sample, and the ith negative sample of the image data set, and obtains the ith metric model, including S5011 and S5012, which are described below.

S5011, acquiring the positive sample feature of the ith positive sample, the anchor sample feature of the ith anchor sample and the negative sample feature of the ith negative sample of the image data set based on the ith-1 th metrology model.

In the embodiment of the application, the information labeling equipment adopts the (i-1) th measurement model to extract the features of the ith positive sample, so that the positive sample features are obtained; extracting the characteristics of the ith anchor sample to obtain the characteristics of the anchor sample; and extracting the characteristics of the ith negative sample to obtain the characteristics of the negative sample.

S5012, training the metric model of the (i-1) th time based on the difference between the positive sample characteristic and the anchor sample characteristic and the difference between the anchor sample characteristic and the negative sample characteristic, and obtaining the metric model of the (i) th time.

It should be noted that the information labeling device obtains a loss function value based on the characteristics of the positive sample, the characteristics of the anchor sample, and the characteristics of the negative sample, and trains the metric model of the (i-1) th order based on the loss function value, so that the metric model of the (i) th order is obtained. Here, when a triple Loss function (Triplet Loss) is used to obtain Loss function values corresponding to the features of the positive sample, the features of the anchor sample, and the features of the negative sample, the difference between the features of the positive sample and the anchor sample, and the difference between the features of the anchor sample and the features of the negative sample together constitute the Loss function value.

In the embodiment of the present application, S509 to S511 may be further included before S501; that is to say, before the information labeling device trains the metric model for the (i-1) th time based on the ith positive sample, the ith anchor sample and the ith negative sample of the image data set to obtain the ith metric model, the information labeling method further includes S509 to S511, which are described below.

And S509, acquiring the ith image data set corresponding to the image data set.

It should be noted that the ith image dataset is composed of a set of images in each subject category, and the ith image dataset is obtained by performing i-1 iterative subject cropping on the image dataset. It is easy to understand that the image data set is composed of a plurality of images in each subject category.

S510, taking two images in a current image set under the current subject category in the ith image data set as an ith positive sample and an ith anchor sample.

It should be noted that the current subject category is any subject category under each subject category; the category of the subject, which may also be referred to herein as a subject label, is used to characterize the category of the subject.

S511, determining one image in the ith image data set, which is different from the images in other image sets under the current subject category, as an ith negative sample.

In this embodiment of the present application, before the information labeling device determines that the cutoff condition is satisfied and obtains the nth positive sample and the nth anchor sample in S504, the information labeling method further includes S5041 to S5043, and the following steps are respectively described.

S5041, an nth positive region difference between the nth positive sample and the (n-1) th positive sample is obtained, and an nth anchor region difference between the nth anchor sample and the (n-1) th anchor sample is obtained.

The nth positive region difference refers to the fine adjustment degree of the nth-1 st positive sample; the nth anchor region difference refers to the degree of fine tuning for the nth-1 th anchor sample.

S5042, acquiring the positive region difference of the previous j times and the anchor region difference of the previous j times, wherein j is a positive integer.

It should be noted that, the positive region difference of the previous j times includes the fine adjustment degree for the n-j-1 th positive sample to the n-2 nd positive sample; the anchor region difference of the previous j times comprises the fine adjustment degree of the anchor sample of the n-j-1 th time to the fine adjustment degree of the anchor sample of the n-2 th time.

And S5043, determining that a cut-off condition is met when the difference between the current j times of positive area differences and the nth time of positive area differences is smaller than a difference threshold value and/or the difference between the previous j times of anchor area differences and the nth time of anchor area differences is smaller than a difference threshold value.

It should be noted that, when the information labeling device determines that the difference between the positive area difference of the nth time and the positive area difference of the previous j times is smaller than the difference threshold value by comparing the positive area difference of the previous j times with the positive area difference of the nth time, it determines that the similar area corresponding to the positive sample of the nth time is stable; the information labeling equipment determines that a similar area corresponding to the anchor sample at the nth time is stable when the difference between the anchor area difference at the nth time and the anchor area difference at the previous j times is smaller than a difference threshold value by comparing the anchor area difference at the previous j times with the anchor area difference at the nth time; here, when at least one of the corresponding similar regions in the nth positive sample and the nth anchor sample is stable, it is determined that the cutoff condition is satisfied.

In this embodiment of the present application, before the information labeling device determines that the cutoff condition is satisfied and obtains the nth positive sample and the nth anchor sample in S504, the information labeling method further includes S5044, which is described below.

S5044, determining that a cutoff condition is satisfied when a first fraction of the first similar region at the nth positive sample is greater than a fraction threshold and/or a second fraction of the second similar region at the nth anchor sample is greater than a fraction threshold.

It should be noted that, the information labeling device calculates a ratio by taking the region value of the first similar region as a numerator and taking the region of the nth positive sample as a denominator, so as to obtain a first proportion, where the first proportion represents the accuracy of the region where the main body is located in the nth positive sample; therefore, when the first ratio is larger than the ratio threshold, it indicates that the nth positive sample is more accurate as the region where the subject is located.

Similarly, the information labeling device takes the region value of the second similar region as a numerator and the region of the nth anchor sample as a denominator, and calculates the ratio, so that a second ratio is obtained, wherein the second ratio represents the accuracy of the region where the main body in the nth anchor sample is located. Therefore, when the second ratio is larger than the ratio threshold, the nth anchor sample is indicated as the region of the main body with higher precision. Therefore, here, the information labeling apparatus determines that the cutoff condition is satisfied when determining that at least one of the first and second occupation ratios is larger than the occupation ratio threshold value.

Referring to fig. 9, fig. 9 is a schematic flow chart of another alternative information annotation method provided in the embodiment of the present application; as shown in fig. 9, in the embodiment of the present application, after the information labeling apparatus determines the nth positive sample and the nth anchor sample as the subject labeling area of the image data set, the information labeling method further includes S512 to S514, which are respectively described below.

And S512, training a main body detection model based on the image data set and the main body labeling area.

In the embodiment of the application, the information labeling equipment determines a prediction subject area of an image in an image data set based on an original subject detection model; and iteratively training the original subject detection model based on the difference between the predicted subject region and the subject labeling region until a training cutoff condition is met, and obtaining the subject detection model. That is, the information labeling apparatus predicts a subject prediction region of an image in the image data set using the original subject detection model, iteratively trains the original subject detection model based on a difference between the subject prediction region and the subject labeling region, and determines the original subject detection model after the current iterative training as the subject detection model when a training cutoff condition is satisfied.

It should be noted that the subject detection model is used to determine the region where the subject is located in the image, that is, to perform subject detection on the image.

S513, when the main body detection request is received, the image to be detected is obtained in response to the main body detection request.

In the embodiment of the application, when the image is determined to be subjected to the subject detection, the information labeling device also obtains a subject detection request; at this time, the information annotation device obtains the image to be detected from the main body detection request, or obtains the image to be detected according to the image obtaining address in the main body detection request. It is easy to know that the image to be detected is the image to be subjected to main body detection.

And S514, carrying out main body detection on the image to be detected based on the main body detection model to obtain a main body area.

It should be noted that, since the information labeling apparatus has already obtained a subject detection model for performing subject detection on an image; therefore, after the information labeling device obtains the image to be detected, the image to be detected is input into the main body detection model, and the region where the main body of the image to be detected is located, which is referred to as the main body region, is obtained.

In this embodiment of the present application, the information labeling, the training of the subject detection model, and the subject detection may be performed by different devices, or may be performed by the same device, which is not specifically limited in this embodiment of the present application.

It can be understood that the ith measurement model is trained through a positive sample pair and a negative sample pair composed of the ith positive sample, the ith anchor sample and the ith negative sample, and the ith measurement model is adopted to cut the region where the main body is located in the ith positive sample and the ith anchor sample; training and cutting are continuously carried out until a main body detection frame (the nth positive sample and the nth anchor sample) is cut; so, the acquirement of main part detection frame is automatic to training out the main part detection model based on the main part detection frame of cutting out, and adopting the main part detection model to carry out main part detection time measuring, can promote the efficiency that the main part detected.

In the embodiment of the present application, an application scenario in which a subject detection model is used for subject detection may be image retrieval, for example, commodity retrieval, image searching using an image, and the like, where a subject area is a retrieval object area; the method can also be used for multimedia information delivery, such as advertisement delivery and the like, and at the moment, the main body area is a delivery object area; the method can also be used for target recognition, such as face recognition, pedestrian re-recognition, general object recognition and the like, and at the moment, the main body area is a detection target area; and may be a scene segmentation, in which case the subject region is a scene region.

In this embodiment of the application, when the main body area is the retrieval target area, S505 may be followed by S515; that is, after the information labeling apparatus determines the nth time positive sample and the nth time anchor sample as the subject labeling area of the image data set, the information labeling method further includes S515, which is explained below.

And S515, extracting the features of the main body labeling area so as to obtain a feature search library corresponding to the image data set.

Correspondingly, in the embodiment of the present application, S514 may further include S516 to S518; that is, after the information labeling apparatus performs subject detection on the image to be detected based on the subject detection model to obtain the subject region, the information labeling method further includes S516 to S518, which are described below.

And S516, extracting the to-be-retrieved features of the retrieval object area.

It should be noted that the feature to be retrieved is consistent with the feature type of the feature in the feature retrieval library; and the feature extraction mode of the main body representation area is the same as that of the features to be retrieved, for example, the same feature extraction model is adopted for extraction.

And S517, determining the matching features matched with the features to be retrieved from the feature retrieval library.

In the embodiment of the application, the information labeling device compares the features to be retrieved with the features in the feature retrieval library one by one, so that the matching features are obtained. Here, when the information labeling device does not determine the matching feature matched with the feature to be retrieved from the feature retrieval library, the prompt information without matching information is generated.

And S518, determining the image corresponding to the matched features in the image data set as a retrieval result of the image to be detected, and performing recommendation processing based on the retrieval result.

Note that the search result is an image similar to the image to be detected. Here, the information labeling device may use the retrieval result as recommendation information to implement the recommendation; the retrieval result can also be directly displayed to realize retrieval.

In this embodiment of the application, when the main body area is the object-to-be-delivered area, S514 may further include S519 to S521; that is, after the information labeling apparatus performs subject detection on the image to be detected based on the subject detection model to obtain a subject region, the information labeling method further includes S519 to S521, which are described below.

And S519, extracting the throwing object characteristics of the throwing object area.

S520, determining a release attribute based on the release object characteristics, determining information to be released based on the release attribute, and performing release processing based on the information to be released.

It should be noted that the release attributes include at least one of a release category and a release label.

And S521, determining a throwing conversion rate based on the characteristics of the throwing object, carrying out throwing processing on the image to be detected when the throwing conversion rate is greater than a throwing conversion rate threshold value, and finishing the throwing processing on the image to be detected when the throwing conversion rate is less than or equal to the throwing conversion rate threshold value.

It should be noted that S520 to S521 are two independent processing steps, and are not in sequence in the execution order.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

Referring to fig. 10, fig. 10 is a schematic diagram illustrating an exemplary information labeling process provided in an embodiment of the present application; as shown in fig. 10, the exemplary information labeling process describes a process of labeling a commodity region in a picture, including:

and S1001, starting.

S1002, commodity training data D1 (image data set) is acquired.

It should be noted that, a plurality of pictures of each commodity at different angles or use scenes are obtained, and a plurality of pictures corresponding to each commodity in the plurality of commodities also form the commodity training data D1.

S1003, constructing a positive sample pair (i-th positive sample and i-th anchor sample) and a negative sample pair (i-th anchor sample and i-th negative sample) based on the commodity training data D1.

Note that, from the commodity training data D1, two pictures under the same commodity (current subject category) are selected to construct a positive sample pair, and two pictures under different commodities are selected to construct a negative sample pair.

S1004, training a depth measurement model M0 (i-1 st-order measurement model) based on the positive sample pair and the negative sample pair constructed by the commodity training data D1 to obtain a depth measurement model M1 (i-th-order measurement model).

Referring to fig. 11, fig. 11 is a schematic flowchart of an exemplary training metric model provided in an embodiment of the present application; as shown in fig. 11, first, a data set 11-1 is sampled 11-2 to obtain a batch of commodity training data 11-3 (an image data set, such as commodity training data D1, a "batch"); secondly, the commodity training data 11-3 are subjected to mining processing 11-4 to obtain a positive sample 11-51 (an ith positive sample), an anchor sample 11-52 (an ith anchor sample) and a negative sample 11-53 (an ith negative sample); then, extracting embedded features (Embedding features) of the positive samples 11-51, the anchor samples 11-52 and the negative samples 11-53 respectively by using the metric model 11-6 (the (i-1) th metric model, such as the depth metric model M0), and sequentially obtaining features 11-71 (positive sample features), features 11-72 (anchor sample features) and features 11-73 (negative sample features); finally, the metric model loss values 11-8 (the difference between the positive sample feature and the anchor sample feature, and the difference between the anchor sample feature and the negative sample feature) corresponding to the features 11-71, 11-72, and 11-73 are calculated using the triplet loss function, and the metric model 11-6 is updated based on the metric model loss values 11-8 (the updated metric model 11-6, i.e., the i-th metric model, such as the depth metric model M1).

S1005, inputting the positive sample pair constructed based on the training data D1 into the depth metric model M1, obtaining a response feature map corresponding to each picture in the positive sample pair (see the positive feature response fig. 7-21 and the anchor feature response fig. 7-22 in fig. 7), and cropping each picture in the positive sample pair based on the response feature map (see the process of cropping the body of the ith positive sample 7-11 to obtain the i +1 th positive sample 8-1 in fig. 8), thereby obtaining the commodity training data D2 (the 2 nd image dataset).

It should be noted that, after the response characteristic map is obtained, the response characteristic map can also be used for cleaning noise pictures, such as detailed pictures of commodities or packaging pictures; here, when the predicted similarity regions of the two pictures belonging to the same product are very small (the region value of at least one of the first similar region and the second similar region is smaller than the region threshold), one of the pictures may be a noise picture (a product detail picture or a package picture, etc.). At this time, the two pictures are respectively subjected to similar region prediction and comparison with all other pictures (other images) in the same commodity (the current subject category), if the similar regions (a third similar region, which may be an average result of the similar regions of the picture and all other pictures) of one picture and all other pictures are significantly larger than those of the other picture and all other pictures (a fourth similar region), the other picture can be determined to be a noise picture, and the other picture is discarded.

S1006, continuing to train the depth measurement model M1 based on the positive sample pair (i +1 st positive sample and i +1 st anchor sample) and the negative sample pair (i +1 st anchor sample and i +1 st negative sample) constructed by the commodity training data D2 to obtain a depth measurement model M2, and clipping the positive sample pair constructed by the commodity training data D2 by using the depth measurement model M2 until the clipped picture tends to be stable (meets a cut-off condition), and obtaining commodity region pictures (nth positive sample and nth anchor sample).

And S1007, completing commodity labeling of the commodity training data D1 based on the commodity region picture.

And S1008, ending.

Based on fig. 10, referring to fig. 12, fig. 12 is another exemplary process for labeling the commodity region in the image according to the embodiment of the present application; as shown in fig. 12, a positive and negative sample pair 12-1 (including a positive sample pair 12-11 and a negative sample pair 12-12) is used to train a depth metric model 12-2, a feature response map 12-3 of the positive sample pair 12-11 is obtained by using the depth metric model 12-2, a similar region 12-4 of the positive sample pair 12-11 is adjusted (subject cropping) based on the feature response map 12-3 (subject cropping process), whether the similar region is stable is judged based on the adjusted similar region 12-4, and if so, a labeling result 12-5 is obtained (a subject labeling region, such as a commodity region picture); if not, the depth metric model 12-2 is fine-tuned (trained) based on the adjusted positive sample pairs 12-11. Here, the process of fine-tuning the depth metric model 12-2 is the process of updating the metric model 11-6 in fig. 11.

Referring to fig. 13, fig. 13 is a schematic flowchart of an exemplary subject detection provided in the embodiments of the present application; as shown in fig. 13, a commodity picture divided by commodities is extracted from the commodity library 13-1, and weak supervision structured information 13-2 (image dataset) is obtained; determining pictures under the same commodity in the weak supervision structured information 13-2 as a positive sample pair, determining pictures among different commodities as a negative sample pair, and performing main body clipping processing on the positive sample pair and the negative sample pair by adopting the information labeling process in the graph 10 to obtain commodity detection frames 13-3 of all commodity pictures in the weak supervision structured information 13-2; on one hand, fine-grained embedded features are extracted from the commodity detection frame 13-3, and a search library 13-4 (feature search library) is constructed; on the other hand, a commodity detection model 13-5 (a main body detection model) is trained based on the commodity detection frame 13-3; when the picture retrieval request 13-7 is acquired, the picture to be retrieved 13-8 (picture to be retrieved) is acquired in response to the picture retrieval request 13-7, main body detection is carried out on the picture to be retrieved 13-8 by adopting the commodity detection model 13-5 to obtain a commodity detection frame 13-9, and embedding characteristics with fine granularity are extracted from the commodity detection frame 13-9 to retrieve the matched picture 13-10 (retrieval result) in the retrieval library 13-4. The picture detection process can improve the retrieval effect.

Referring to fig. 14, fig. 14 is a schematic flow chart of another exemplary subject detection provided in the embodiments of the present application; as shown in fig. 14, all advertisement pictures 14-1 are acquired, and the advertisement pictures are divided by the target of delivery, so as to obtain the weakly supervised structured information 14-2 (image data set); determining pictures under the same delivery object in the weak supervision structured information 14-2 as a positive sample pair, determining pictures between different delivery objects as a negative sample pair, and performing main body clipping processing on the positive sample pair and the negative sample pair by adopting the information labeling process in the graph 10 to obtain delivery object detection frames 14-3 of all advertisement pictures in the weak supervision structured information 14-2; training a drop object detection model 14-4 (a main body detection model) based on the drop object detection frame 14-3; when the release request 14-5 is acquired, a to-be-released picture 14-6 (a picture to be detected) is acquired in response to the release request 14-5, a release object detection model 14-4 is adopted to perform main body detection on the to-be-released picture 14-6 to obtain a release object detection frame 14-7, and coarse-grained embedding features are extracted from the release object detection frame 14-7 to determine the category, the label and other attribute information of the release object so as to perform release processing 14-8. The extracted features can be added into an advertisement recommendation model, and conversion rates such as click rate and the like can be estimated. This information input process can promote and put in the effect.

It can be understood that, when the information labeling method provided by the embodiment of the application is adopted to realize human detection, on one hand, a detection box does not need to be labeled manually, and the resource consumption of information labeling is less. The information labeling in the subject detection can be automatically realized through the positive sample pair and the negative sample pair constructed by weak supervision data (for example, whether the positive sample pair and the negative sample pair are the same commodity or not). On the other hand, no pre-training model is needed, so that the application scene range is wide, for example, commodity or object detection can be performed on some uncommon data sets; and based on the structured weak supervision data, the depth measurement model can be directly trained, then the data is gradually cleaned and the model is adjusted according to the result, the iteration is continuously optimized, and the processing of initializing the training model is avoided. On the other hand, the processing based on the image pair can robustly capture the similar areas in the two images, meanwhile, the non-main body area is effectively restrained, the effective cleaning of noise is realized, and the accuracy of the main body detection model can be improved.

Continuing with the exemplary structure of the information annotation device 255 provided in the embodiments of the present application as software modules, in some embodiments, as shown in fig. 4, the software modules stored in the information annotation device 255 of the memory 250 may include:

the model training module 2551 is configured to train the metrology model i-1 times based on an ith positive sample, an ith anchor sample and an ith negative sample of the image data set to obtain an ith metrology model, where i is a positive integer;

a similarity measurement module 2552, configured to compare the ith positive sample and the ith anchor sample based on the ith measurement model;

a body clipping module 2553, configured to perform body clipping on the ith positive sample to obtain an i +1 th positive sample, and perform body clipping on the ith anchor sample to obtain an i +1 th anchor sample;

an iterative processing module 2554, configured to continue to train the ith metric model based on the i +1 th positive sample, the i +1 th anchor sample, and the i +1 th negative sample, and perform body clipping on the i +1 th positive sample and the i +1 th anchor sample based on the trained ith metric model respectively until a cutoff condition is met, to obtain an nth positive sample and an nth anchor sample, where n is an integer greater than i;

an information labeling module 2555, configured to determine the nth positive sample and the nth anchor sample as a subject labeling region of the image data set.

In the embodiment of the present application, the comparison result includes a positive feature response map of the ith positive sample and an anchor feature response map of the ith anchor sample; wherein the positive feature response map is a similarity map of a most similar region in the ith positive sample and the ith anchor sample, and the anchor feature response map is a similarity map of a most similar region in the ith anchor sample and the ith positive sample; the body clipping module 2553 is further configured to perform body clipping on the ith positive sample based on the positive feature response map to obtain the (i + 1) th positive sample; and performing body clipping on the ith anchor sample based on the anchor characteristic response graph to obtain the (i + 1) th anchor sample.

In this embodiment of the application, the information labeling device 255 further includes a noise cleaning module 2556, configured to determine a first similar region in the ith positive sample based on a similarity threshold and the positive feature response map, and determine a second similar region in the ith anchor sample based on the similarity threshold and the anchor feature response map; when the area value of at least one of the first similar area and the second similar area is smaller than an area threshold value, acquiring a third similar area of the ith positive sample corresponding to other images in the current image set under the current subject category, and acquiring a fourth similar area of the ith anchor sample corresponding to the other images; and when the third similar area is larger than the fourth similar area, determining that the ith anchor sample is a noise image, and deleting the ith anchor sample from the current image set.

In this embodiment of the application, the model training module 2551 is further configured to obtain, based on the i-1 st metric model, a positive sample feature of the i-th positive sample, an anchor sample feature of the i-th anchor sample, and a negative sample feature of the i-th negative sample of the image dataset; training the ith-1 th metric model based on the difference between the positive sample features and the anchor sample features and the difference between the anchor sample features and the negative sample features to obtain the ith metric model.

In this embodiment of the present application, the information labeling apparatus 255 further includes a sample construction module 2557, configured to obtain an ith image dataset corresponding to the image dataset, where the ith image dataset is formed by image sets in each subject category, and the ith image dataset is obtained by performing i-1 iterative subject clipping on the image dataset; taking two images in a current image set under a current subject category in the ith image data set as the ith positive sample and the ith anchor sample, wherein the current subject category is any subject category under each subject category; determining one image in the ith image data set different from the other image sets under the current subject category as the ith negative sample.

In this embodiment of the present application, the information labeling apparatus 255 further includes a condition determining module 2558, configured to obtain an nth positive region difference between the nth positive sample and the (n-1) th positive sample, and obtain an nth anchor region difference between the nth anchor sample and the (n-1) th anchor sample; acquiring positive region differences of the previous j times and anchor region differences of the previous j times, wherein j is a positive integer; determining that the cutoff condition is satisfied when a difference between the previous j times positive region difference and the nth times positive region difference is less than a difference threshold, and/or a difference between the previous j times anchor region difference and the nth times anchor region difference is less than the difference threshold.

In this embodiment, the condition determining module 2558 is further configured to determine that the cutoff condition is satisfied when at least one of a first ratio of the first similar region in the nth positive sample and a second ratio of the second similar region in the nth anchor sample is greater than a ratio threshold.

In this embodiment of the application, the information labeling apparatus 255 further includes a subject detection module 2559, configured to train a subject detection model based on the image data set and the subject labeling area; when a main body detection request is received, responding to the main body detection request, and acquiring an image to be detected; and carrying out main body detection on the image to be detected based on the main body detection model to obtain a main body area.

In this embodiment of the application, the subject detection module 2559 is further configured to determine a predicted subject region of an image in the image data set based on an original subject detection model; iteratively training the original subject detection model based on the difference between the predicted subject region and the subject labeling region until a training cutoff condition is met, and obtaining the subject detection model.

In this embodiment of the application, the information labeling apparatus 255 further includes an application module 25510, configured to, when the main body region is the search target region, extract features of the main body labeling region, thereby obtaining a feature search library corresponding to the image data set; extracting the feature to be retrieved of the retrieval object area; determining matched features matched with the features to be retrieved from the feature retrieval library; and determining the image corresponding to the matching features in the image data set as a retrieval result of the image to be detected, and performing recommendation processing based on the retrieval result.

In this embodiment of the application, the application module 25510 is further configured to, when the main body region is the drop object region, extract drop object features of the drop object region; determining a release attribute based on the release object characteristics, determining information to be released based on the release attribute, and performing release processing based on the information to be released, wherein the release attribute comprises at least one of a release category and a release label; or determining a putting conversion rate based on the characteristics of the putting object, carrying out putting processing on the image to be detected when the putting conversion rate is greater than a putting conversion rate threshold value, and finishing the putting processing on the image to be detected when the putting conversion rate is less than or equal to the putting conversion rate threshold value.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the information labeling method described in the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium storing executable instructions, which when executed by a processor, will cause the processor to perform an information labeling method provided by embodiments of the present application, for example, the method shown in fig. 5.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, in the embodiment of the present application, an ith measurement model is trained through a positive sample pair and a negative sample pair composed of an ith positive sample, an ith anchor sample and an ith negative sample, and the ith measurement model is adopted to cut the region where the main body is located in the ith positive sample and the ith anchor sample; training and cutting are continuously carried out until a main body labeling area (the nth positive sample and the nth anchor sample) is cut out; therefore, the main body labeling area is automatically acquired, and the labeling efficiency can be improved. In addition, a subject detection model for subject detection is trained based on the subject labeling area, so that the accuracy of subject detection can be improved; and through the main body cutting of the positive sample pair, the similar areas of the two images in the positive sample pair can be effectively determined, and the area where the non-main body is located and the background part are effectively inhibited, so that the accuracy of the obtained marking information is higher, the accuracy of a main body detection model trained based on the marking information is higher, and the accuracy of main body detection can be improved.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. An information labeling method, comprising:

2. The method according to claim 1, wherein the comparison result comprises a positive characteristic response map of the ith positive sample and an anchor characteristic response map of the ith anchor sample;

wherein the positive feature response map is a similarity map of a most similar region in the ith positive sample and the ith anchor sample, and the anchor feature response map is a similarity map of a most similar region in the ith anchor sample and the ith positive sample;

based on the comparison result, performing body clipping on the ith positive sample to obtain an (i + 1) th positive sample, and performing body clipping on the ith anchor sample to obtain an (i + 1) th anchor sample, including:

performing body clipping on the ith positive sample based on the positive feature response graph to obtain the (i + 1) th positive sample;

and performing body clipping on the ith anchor sample based on the anchor characteristic response graph to obtain the (i + 1) th anchor sample.

3. The method according to claim 2, wherein after comparing the ith positive sample and the ith anchor sample based on the ith metrology model, the method further comprises:

determining a first similar region in the ith positive sample based on a similarity threshold and the positive feature response map, and determining a second similar region in the ith anchor sample based on the similarity threshold and the anchor feature response map;

when the area value of at least one of the first similar area and the second similar area is smaller than an area threshold value, acquiring a third similar area of the ith positive sample corresponding to other images in the current image set under the current subject category, and acquiring a fourth similar area of the ith anchor sample corresponding to the other images;

and when the third similar area is larger than the fourth similar area, determining that the ith anchor sample is a noise image, and deleting the ith anchor sample from the current image set.

4. The method of any one of claims 1 to 3, wherein training the metric model i-1 times based on an i-th positive sample, an i-th anchor sample, and an i-th negative sample of the image dataset to obtain an i-th metric model comprises:

acquiring the positive sample characteristics of the ith positive sample, the anchor sample characteristics of the ith anchor sample and the negative sample characteristics of the ith negative sample of the image data set based on the ith-1 metric model;

training the ith-1 th metric model based on the difference between the positive sample features and the anchor sample features and the difference between the anchor sample features and the negative sample features to obtain the ith metric model.

5. The method of any of claims 1 to 3, wherein before training the metric model i-1 times based on the i-th positive, anchor and negative samples of the image dataset to obtain the i-th metric model, the method further comprises:

acquiring an ith image data set corresponding to the image data set, wherein the ith image data set is composed of image sets under all subject categories, and the ith image data set is obtained by performing iterative subject clipping on the image data set for i-1 times;

taking two images in a current image set under a current subject category in the ith image data set as the ith positive sample and the ith anchor sample, wherein the current subject category is any subject category under each subject category;

determining one image in the ith image data set different from the other image sets under the current subject category as the ith negative sample.

6. The method according to any one of claims 1 to 3, wherein when the cutoff condition is satisfied, before obtaining the n-th positive sample and the n-th anchor sample, the method further comprises:

acquiring the nth positive region difference between the nth positive sample and the (n-1) th positive sample, and acquiring the nth anchor region difference between the nth anchor sample and the (n-1) th anchor sample;

acquiring positive region differences of the previous j times and anchor region differences of the previous j times, wherein j is a positive integer;

determining that the cutoff condition is satisfied when a difference between the previous j times positive region difference and the nth times positive region difference is less than a difference threshold, and/or a difference between the previous j times anchor region difference and the nth times anchor region difference is less than the difference threshold.

7. The method according to claim 3, wherein before obtaining the n-th positive sample and the n-th anchor sample when the cutoff condition is satisfied, the method further comprises:

determining that the cutoff condition is satisfied when at least one of a first fraction of the first similar region at the nth positive sample and a second fraction of the second similar region at the nth anchor sample is greater than a fraction threshold.

8. The method according to any one of claims 1 to 3, wherein after determining the n-th positive sample and the n-th anchor sample as the subject annotation region of the image data set, the method further comprises:

training a main body detection model based on the image data set and the main body labeling area;

when a main body detection request is received, responding to the main body detection request, and acquiring an image to be detected;

and carrying out main body detection on the image to be detected based on the main body detection model to obtain a main body area.

9. The method of claim 8, wherein training a subject detection model based on the image dataset and the subject labeling region comprises:

determining a predicted subject region of an image in the image dataset based on an original subject detection model;

iteratively training the original subject detection model based on the difference between the predicted subject region and the subject labeling region until a training cutoff condition is met, and obtaining the subject detection model.

10. The method of claim 8, wherein the subject region is at least one of a search object region, a delivery object region, a detection target region, and a scene region.

11. The method according to claim 10, wherein when the subject region is the search target region, after the determining the n-th positive sample and the n-th anchor sample as the subject labeling region of the image data set, the method further comprises:

extracting the features of the main body labeling area so as to obtain a feature search library corresponding to the image data set;

after the subject detection is performed on the image to be detected based on the subject detection model to obtain a subject region, the method further includes:

extracting the feature to be retrieved of the retrieval object area;

determining matched features matched with the features to be retrieved from the feature retrieval library;

and determining the image corresponding to the matching features in the image data set as a retrieval result of the image to be detected, and performing recommendation processing based on the retrieval result.

12. The method according to claim 10, wherein when the subject region is the delivery object region, the subject detection is performed on the image to be detected based on the subject detection model, and after the subject region is obtained, the method further comprises:

extracting the throwing object characteristics of the throwing object area;

determining a release attribute based on the release object characteristics, determining information to be released based on the release attribute, and performing release processing based on the information to be released, wherein the release attribute comprises at least one of a release category and a release label; alternatively, the first and second electrodes may be,

determining a putting conversion rate based on the characteristics of the putting objects, carrying out putting processing on the image to be detected when the putting conversion rate is greater than a putting conversion rate threshold value, and finishing the putting processing on the image to be detected when the putting conversion rate is less than or equal to the putting conversion rate threshold value.

13. An information labeling apparatus, comprising:

14. An information labeling apparatus, comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 12 when executing executable instructions stored in the memory.

15. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1 to 12.