CN113939791A

CN113939791A - Image labeling method, device, equipment and medium

Info

Publication number: CN113939791A
Application number: CN202080023639.5A
Authority: CN
Inventors: 高瑞阳; 王正; 刘艳琳
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2022-01-14
Also published as: WO2021217543A1

Abstract

The application provides an image annotation method, which is applied to the field of Artificial Intelligence (AI), and comprises the following steps: the image annotation system presents an image to a user through a Graphical User Interface (GUI), receives annotation information through the GUI, wherein the annotation information comprises position information of an annotation point, the position information of the annotation point is from a single-point annotation of a target in the image by the user, and the image annotation system further obtains the annotated image according to the image and the annotation information. Therefore, the image with the marked object can be obtained only by the simple single-point marking of the object in the image by the user, the user operation is simplified, and the marking efficiency is improved.

Description

Image labeling method, device, equipment and medium

Technical Field

The present application relates to the technical field of Artificial Intelligence (AI), and in particular, to an image annotation method, apparatus, device, and computer-readable storage medium.

Background

With the rise of deep learning, the target detection technology has made great progress. Currently, the target detection technology is widely applied to scenes such as face detection, license plate recognition, vehicle positioning and the like.

When the target detection technology is applied to the scene, a large amount of data is required to be supported. Specifically, the target detection technology is implemented by an algorithm model, and training, verification and testing of the algorithm model all need to be participated by data of a real scene, and the data of the real scene generally needs to be labeled before being used for training, verification and testing of the algorithm model, namely, a label is set for the data to guide learning of the algorithm model. Therefore, fast and efficient annotation of real scene data is a popular research topic, and various annotation tools are also produced.

The existing marking tool improves marking efficiency to a certain extent, but still needs to consume large manpower and time. Based on this, it is highly desirable to provide a more efficient image annotation method.

Disclosure of Invention

In view of this, the present application provides an image annotation method, which enables a user to obtain an annotated image in a simple single-point annotation manner, solves the problem of large manpower and time consumption in the related art, and improves annotation efficiency. Corresponding apparatus, devices, computer-readable storage media, and computer program products are also provided.

In a first aspect, the present application provides an image annotation method. The method may be performed by an image annotation system. The image annotation system can be deployed in a cloud environment, a marginal environment, or an end device. The image annotation system can present an image to a user through a Graphical User Interface (GUI), the user can perform single-point annotation on a target in the image, the image annotation system can receive annotation information through the GUI, the annotation information comprises position information of an annotation point, the position information of the annotation point is from the user to the single-point annotation of the target in the image, and the image annotation system automatically obtains the annotated image according to the image and the annotation information. In the method, the user can obtain the marked image in a simpler single-point marking mode, so that the user operation is simplified, and the marking efficiency is improved.

In some possible implementations, the image annotation system may further present the annotated image to the user through the GUI, where the annotated image includes a target box, and the target box includes an image area representing the same target.

For example, the image includes a plurality of targets such as a cat, a dog, a flower, and the like, the image annotation system annotates the targets in the image to obtain an annotated image, the annotated image includes a plurality of target frames, and each target frame includes an image area representing the same target. In this example, the annotated image includes three target boxes, which include image areas representing a cat, a dog, and a flower, respectively.

In some possible implementation manners, the labeled image may be used in a scenario such as model training or model testing to complete tasks such as target detection and target recognition. Specifically, the image region representing the same target in the target frame is used for being learned by an Artificial Intelligence (AI) model.

In some possible implementations, the annotated images may be used in educational or entertainment scenarios, for example, in a childhood educational scenario, where the annotated images may help the child recognize objects or parts of objects in the physical world.

In some possible implementations, when the annotated image has a deviation, for example, a deviation in the position and/or size of the target frame, the user may also correct the annotated image through the GUI. Specifically, the image annotation system may receive, through the GUI, correction information of the annotated image by the user, where the correction information is derived from modification of a target frame in the annotated image by the user, so as to improve annotation accuracy of the annotated image.

In some possible implementations, for example, in an application such as object classification or object recognition, the image annotation system may further receive attribute information of the object, and associate the attribute information of the object with the object in the annotated image. By performing model training or model testing on the associated images, not only target detection (determination of the position and size of the target) but also target recognition (determination of attribute information of the target) can be achieved.

In other applications such as education or entertainment, the image annotation system can also receive attribute information of a target, and associate the attribute information of the target with the target in the annotated image, so as to provide richer information for a user and improve user experience.

In some possible implementation manners, the image annotation system may input the position information of the image and the annotation point into an annotation model to obtain an annotated image, where the annotation model is configured to infer a position of a target corresponding to the annotation point in the image to obtain a target frame, and the target frame includes the annotation point of the user single-point annotation.

In some possible implementations, the annotation model may be an end-to-end model with the image and annotation information as inputs and the annotated completed image as an output. After the image annotation system inputs the image and the annotation information to the annotation model, the annotation model can preprocess the image according to the annotation information, then reason the preprocessed image to obtain the region of the target meeting the preset conditions, and then map the region of the target obtained by inference to the original image (namely the image before single-point annotation), thereby determining the position and the size of the target frame in the original image, and obtaining the annotated image according to the original image and the target frame. Compared with the traditional labeling method, the labeling model firstly preprocesses the image and then infers the preprocessed image, so that the labeling precision is improved.

In some possible implementations, the annotation model may be a center-sensitive model that takes as input the image whose target appears in the middle region. Based on this, when the target in the image to be labeled is not in the middle area, the image labeling system also needs to preprocess the image, and then input the preprocessed image into the labeling model. After obtaining the area of the target satisfying the preset condition inferred by the annotation model, the image annotation system needs to perform post-processing, specifically, the area of the target satisfying the preset condition is mapped to the original image, so as to determine the position and size of the target frame in the original image, and the annotated image can be obtained according to the original image and the target frame. Compared with the traditional marking method, the marking model can focus on the middle area and has higher marking precision.

In some possible implementations, the annotation model may also be a pixel contour recognition model. The pixel outline recognition model can obtain the outline of the target according to the pixel value of the marked point and the pixel values of the points around the marked point.

In some possible implementation manners, in order to further simplify user operation and improve annotation efficiency, the image annotation system can also train an initial detection model by using the annotated image to obtain an intermediate detection model; obtaining coarse positioning information of a target to be marked in the newly added image through the intermediate detection model; and determining a target frame corresponding to the target to be labeled in the newly added image by using the rough positioning information and the labeling model, and obtaining the newly added image after the labeling is finished. Therefore, full-automatic labeling is realized, and the labeling efficiency is further improved. Furthermore, the image labeling system can also train the intermediate detection model by using the newly added image after labeling to obtain a target detection model, thereby realizing automatic training of the intermediate detection model.

In a second aspect, the present application provides an image annotation apparatus. The device comprises:

the system comprises an interface module, a display module and a display module, wherein the interface module is used for presenting an image to a user through a GUI (graphical user interface) and receiving marking information through the GUI, the marking information comprises position information of a marking point, and the position information of the marking point is from a single-point marking of a target in the image by the user;

and the marking module is used for obtaining the marked image according to the image and the marking information.

In some possible implementations, the interface module is further to:

and presenting the marked image to the user through the GUI, wherein the marked image comprises a target frame, and the target frame comprises an image area representing the same target.

In some possible implementations, image regions in the object box that represent the same object are used for learning by the AI model.

In some possible implementations, the interface module is further to:

and receiving correction information of the annotated image by the user through the GUI, wherein the correction information comes from the modification of the target frame in the annotated image by the user.

In some possible implementations, the interface module is further to:

receiving attribute information of the target;

the labeling module is further configured to:

and associating the attribute information of the target with the target in the image after the labeling is finished.

In some possible implementations, the labeling module is specifically configured to:

and inputting the image and the position information of the annotation point into an annotation model to obtain an annotated image, wherein the annotation model is used for reasoning the position of a target corresponding to the annotation point in the image to obtain a target frame, and the target frame comprises the annotation point.

In some possible implementations, the apparatus further includes:

the training module is used for training an initial detection model by using the marked image to obtain an intermediate detection model;

the labeling module is further used for obtaining rough positioning information of a target to be labeled in the newly added image through the intermediate detection model, determining a target frame corresponding to the target to be labeled in the newly added image by using the rough positioning information and the labeling model, and obtaining the newly added image after labeling;

and the training module is also used for training the intermediate detection model by using the newly added image after the labeling to obtain a target detection model.

In a third aspect, the present application provides a computing device comprising a processor, a memory, and a display. The processor and the memory are in communication with each other. The processor is configured to execute instructions stored in the memory to cause the computing device to perform the image annotation method as in the first aspect or any implementation manner of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium having stored therein instructions, which, when executed on a computing device, cause the computing device to execute the image annotation method according to the first aspect or any implementation manner of the first aspect.

In a fifth aspect, the present application provides a computer program product containing instructions that, when run on a computing device, cause the computing device to perform the image annotation method according to the first aspect or any implementation manner of the first aspect.

The present application can further combine to provide more implementations on the basis of the implementations provided by the above aspects.

Drawings

In order to more clearly illustrate the technical method of the embodiments of the present application, the drawings used in the embodiments will be briefly described below.

Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present application;

FIG. 2 is a schematic diagram of another system architecture according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an image annotation system according to an embodiment of the present application;

FIG. 4 is a schematic view of an interface provided by an embodiment of the present application;

FIG. 5 is a schematic view of an interface provided by an embodiment of the present application;

FIG. 6 is a schematic view of an interface provided by an embodiment of the present application;

FIG. 7 is a schematic view of an interface provided by an embodiment of the present application;

fig. 8 is a flowchart of an image annotation method according to an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating an embodiment of a method for preprocessing an image;

FIG. 10 is a flowchart of a method for training a target detection model according to an embodiment of the present disclosure;

fig. 11 is a flowchart of an image annotation method according to an embodiment of the present application;

fig. 12 is a schematic diagram illustrating an image being cropped into K candidate images according to an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of an image annotation apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

The scheme in the embodiments provided in the present application will be described below with reference to the drawings in the present application.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished.

In the field of image processing, object detection refers to detecting a specific object or a part of a specific object in an image. The target is a specific object or a part of a specific object. In some implementations, the target can be an animal, such as a human, cat, dog, or a part of such an animal, such as a human face. In other implementations, the target may also be a vehicle or license plate, etc. In different application scenarios, the targets to be detected are different. Object detection is also performed in applications such as classification and recognition of objects in images.

Object detection is typically achieved by relying on an object detection model. The target detection model is a mathematical model for simulating human eyes to identify targets so as to realize artificial intelligence. With the development of deep learning, the target detection model can be obtained by training a neural network model through labeled images. Specifically, a neural network model is selected as an initial detection model, an image is input into the initial detection model, the initial detection model can predict the position information of a target in the image, then a loss value is determined according to the predicted position information and position information labeled in advance in the image for training, and the weight of the initial detection model is updated based on the loss value, so that the training of the initial detection model is realized. And training the initial detection model to obtain a target detection model.

In order to obtain a good target detection model, a large number of labeled images are usually required to train the initial detection model. The annotated image may be generated with the aid of an annotation tool. For example, the annotation tool presents an image to be annotated to a user through a Graphical User Interface (GUI), the user annotates a target in the image in the interface by inputting content or dragging a mouse, and a background of the annotation tool generates a label of the image in the target, so that the annotation efficiency is improved to a certain extent by the annotation tools. However, for each image, the user still needs to perform tedious operations to ensure the annotation precision, which results in a large amount of time and labor required to annotate the image, increasing the annotation cost, and the annotation efficiency cannot meet the business requirements.

In view of this, the present application provides an image annotation method. The image annotation method can be realized by an image annotation system. Specifically, the image annotation system presents an image to a user through a GUI, and for each target to be annotated in the image, the user performs at most one time of punctuation operation, that is, performs single-point annotation on the target to be annotated in the image, and the image annotation system can receive annotation information through the GUI, where the annotation information is position information of an annotation point, and then the image annotation system can automatically obtain an image with the annotation completed according to the image and the annotation information. The image after single-point labeling only roughly labels the target to be labeled in the image and cannot be used as training data or test data, and the image after labeling accurately labels the target to be labeled in the image and can be used as training data or test data to realize model training or model testing.

In the method, the user can obtain the marked image in a simpler single-point marking mode, so that the user operation is simplified, and the marking efficiency is improved. Furthermore, after the position information of the annotation point is obtained, the image annotation system can precisely position the target to be annotated through the customized annotation model to obtain the annotated image, so that the annotation precision is improved.

Further, the image labeling system can also train an initial detection model by using the labeled image to obtain an intermediate detection model. The accuracy of this intermediate detection model may be difficult to meet the requirements of a particular application. The specific application refers to an application with high requirements on the missing detection rate and the precision. That is, it is also necessary to continue to train the intermediate detection model with more labeled images so that its performance meets the requirements of the specific application.

The intermediate detection model may assist in image annotation. Specifically, the image annotation system obtains coarse positioning information of the target to be annotated in the newly added image through the intermediate detection model; and determining a target frame corresponding to the target to be labeled in the newly added image by using the rough positioning information and the labeling model, and obtaining the newly added image after the labeling is finished. Therefore, the user is not required to perform single-point labeling on the newly added image, the unmanned full-automatic labeling is realized, and the labeling efficiency is further improved. And the image labeling system can further train the intermediate detection model by using the newly added image after labeling to obtain a target detection model, thereby realizing automatic training of the intermediate detection model.

In some implementations, the image annotation system can be used not only as a system for assisting model training or testing, but also in other actual scenes, such as scenes for education, entertainment, and the like. For example, in an application such as child education, the image annotation system may also present an image to a user through a GUI, the user (for example, a child or a guardian) performs single-point annotation on an object in the image through the GUI, and the image annotation system receives annotation information through the GUI and obtains an annotated image according to the image and the annotation information. The image annotation system then presents the annotated image to the user via the GUI to assist the child in recognizing the object or portion of the object in the physical world. Compared with the traditional labeling method, the method has higher labeling efficiency and labeling precision, so that the user experience can be improved.

For convenience of description, the following description illustrates a scenario in which the image annotation method is applied to model training or model testing.

As shown in FIG. 1, the image annotation system can be deployed in a cloud environment, and in particular, one or more computing devices (e.g., a central server) on the cloud environment. The image annotation system can also be deployed in an edge environment, specifically on one or more computing devices (edge computing devices) in the edge environment, where the edge computing devices can be servers, computing boxes, and the like. The cloud environment indicates a central cluster of computing devices owned by a cloud service provider for providing computing, storage, and communication resources; the edge environment indicates a cluster of edge computing devices geographically close to an end device (i.e., a peer device) for providing computing, storage, and communication resources.

In some implementations, the image annotation system can also be deployed on the end device. The end device includes, but is not limited to, a desktop computer, a notebook computer, a smart phone, and other user terminals. Image annotation can be achieved by running an image annotation system on these user terminals. The end device can also be used as an image providing device for providing images to the image labeling system so as to label the images. When the end device is used only for providing images, the end device may also be a camera, a radar, an infrared camera, or the like.

When the image annotation system is deployed in a cloud environment or a marginal environment, the image annotation system can be provided for users in a service form. Specifically, a user can access a cloud environment or an edge environment through a browser, create an instance of the image annotation system in the cloud environment or the edge environment, and then interact with the instance of the image annotation system through the browser to achieve image annotation.

The image annotation system can also be deployed on an end device and provided for a user in a form of a client. Specifically, the end device obtains an installation package of the image annotation system, and the installation package is operated, so that the client of the image annotation system is installed in the end device. And the end equipment realizes image annotation by operating the client.

As shown in FIG. 2, the image annotation system comprises multiple parts (e.g., comprises multiple subsystems, each subsystem comprising multiple units), and thus, the various parts of the image annotation system can also be distributively deployed in different environments. For example, portions of the image annotation system can be deployed separately on three of a cloud environment, a fringe environment, an end device, or any two of them.

The image labeling system is used for labeling the position of the target in the image to obtain a labeled image. The subsystems and units in the image annotation system can be divided in various ways, which are not limited in this application, and fig. 3 is an exemplary division way, and as shown in fig. 3, the image annotation system 100 includes a user interaction subsystem 120 and an image annotation subsystem 140. The functions of each subsystem and its included functional units are briefly described below, respectively.

The user interaction subsystem 120 is used to present images to a user through the GUI. In this way, the user can perform single-point labeling on the target in the image through the GUI, for example, label a point in the region where the target is located, such as a central point (the target may be equivalent to a polygon, and the geometric center of the polygon may be regarded as the central point of the target) or a point near the central point.

Fig. 4 provides an interface schematic diagram of a GUI, as shown in fig. 4, an image 402 is presented in an interface 400, the image 402 includes a target 4021, a user may mark a point at a position in the target 4021 by mouse click, stylus click, or the like, and a marked point 404 formed by user marking is displayed in the interface 400.

Correspondingly, the user interaction subsystem 120 may receive annotation information via the GUI, the annotation information including location information of an annotation point, the location information of the annotation point being derived from a single-point annotation of a target in the image by a user. For example, the position information of the annotation point may be the coordinates of the annotation point 404 in fig. 4 (specifically, the coordinates in the coordinate system established based on the image 402).

In some implementations, the user interaction subsystem 120 can also present the annotated image to the user via the GUI after obtaining the annotated image.

It should be understood that the annotated image in the present application may be an image that originally needs to be annotated and includes an object box, which is an image area belonging to an annotated object when the annotated image is presented through the GUI. In some embodiments, the labeled target may be a target used for learning by the AI model, such as a vehicle, a license plate, and the like, and the target frame is formed according to the labeling information obtained after the single-point labeling by the user.

The target frame may be a line frame or a scattered point frame composed of a plurality of scattered points. The target frame may be a rectangular frame, a circular frame, an oval frame, or a frame of other shape, for example the target frame may be a frame that conforms to the shape of the target. In some implementations, the target box may also be a box proximate to the target, such as an outline box of the target.

It should be understood that in a background storage system, the annotated image may include two pieces of information: the information of the original image to be marked and the information of the target frame corresponding to the image.

Optionally, the image with the label being completed may further include attribute information of the target, the attribute information of the target may be sent to the image labeling system by a user through a keyboard input, a voice password, a touch screen input, or an interface selection, and the like, and the image labeling system associates the received attribute information of the target with the labeled target in the image with the label being completed. The image annotation system may also present the received attribute information of the target to the user via the GUI.

In some implementations, the attribute information of the target may also be obtained by classifying or identifying the target in the image. For example, when the image annotation system coarsely locates the target in the image by using the intermediate detection model, the intermediate detection model may further include a classifier, and the image annotation system may classify the target by using the classifier of the intermediate detection model, so as to obtain the attribute information of the target, and then associate the attribute information of the target with the annotated target in the annotated image.

Fig. 5 and 6 each provide an interface schematic of a GUI. As shown in fig. 5, the interface 500 presents an annotated image 502, the annotated image 502 includes a target 5021 and a target box 5023, and the target box 5023 includes the target 5021. In the example shown in fig. 5, the target box 5023 is a rectangular line box. As shown in fig. 6, the interface 600 presents an annotated image, the annotated image 602 includes a target 6021 and a target frame 6023, and the target frame 6023 includes the target 6021. In the example shown in FIG. 6, the target box 6023 is a scatter-point contour box formed by a plurality of discrete points on the contour of the target.

The annotated image 502 shown in fig. 5 and the annotated image 602 shown in fig. 6 are generated from the image 402, and the target 4021, the target 5021, and the target 6021 are the same target.

In order to better present the annotated image, the image annotation system can also present the image after the single-point annotation and the annotated image in the same interface. FIG. 7 also provides an interface schematic of another GUI. As shown in FIG. 7, a user single-point annotated image 702 and an annotated image 704 are presented in the interface 700. The user can view the user's single-point annotated image 702 and the annotated image 704 simultaneously through the interface 700.

In some implementations, when the annotated image has errors, the user can also correct the annotated image through the GUI, for example, correct the position of the target frame, or correct the size of the target frame. As shown in fig. 5 or fig. 6, the user may switch the annotated image to the correction mode through a correction control, such as the correction control 504 or the correction control 604, and then correct the position or size of the target frame by dragging the mouse or the like. Correspondingly, the user interaction subsystem 120 may also receive user correction information for the annotated image via the GUI. The correction information comes from the modification of the target frame in the image with the labeling completed by the user.

After the annotated image is corrected, the user may also trigger a save operation via a save control, such as the save control 506 or the save control 606, to save the corrected annotated image.

The user interaction subsystem 120 includes a plurality of functional units. Among them, the communication unit 121 is used to receive images, and the display unit 123 is used to present images to a user through a GUI. The communication unit 121 is further configured to receive labeling information obtained by a user performing single-point labeling on an image through a GUI, where the labeling information specifically includes position information of a labeling point. The image and annotation information can be provided to the image annotation subsystem 140 for reading and use.

In some implementations, the display unit 123 is further configured to display the annotated image via the GUI, and the communication unit 121 is further configured to receive correction information obtained by correcting the annotated image via the GUI by the user. The correction information can be provided to the image annotation subsystem 140 for reading and use.

The image annotation subsystem 140 is used for obtaining an annotated image according to the image and the annotation information. Further, when the annotated image has an error, the image annotation subsystem 140 can obtain a corrected and annotated image according to the correction information.

In some implementations, the image annotation subsystem 140 can also train the initial detection model with the annotated image to obtain an intermediate detection model. The intermediate detection model can be used for pre-reasoning the position of the target to be marked in the input image to obtain coarse positioning information. The image annotation can be assisted by the coarse positioning information.

Specifically, the image annotation subsystem 140 may input the new image into an intermediate detection model, and the intermediate detection model performs pre-inference on the new image to obtain coarse positioning information of the target in the new image. The coarse positioning information may be target position information obtained by performing coarse positioning on the target by the intermediate detection model. The intermediate detection model can realize coarse positioning of the target by reasoning and regressing the position information of the target frame. That is, the coarse positioning information may be position information of the target frame regressed by the intermediate detection model.

The image annotation subsystem 140 can determine annotation information for the newly added image based on the coarse positioning information. For example, the image annotation subsystem 140 determines the position information of a point (e.g., the center point) in the target frame according to the position information of the target frame regressed by the intermediate detection model, as the annotation information of the new image. The image annotation subsystem 140 can obtain the newly added image after the annotation is completed according to the newly added image and the annotation information. Therefore, the situation that the user manually carries out single-point labeling on the newly added image is avoided, the non-manual full-automatic labeling is realized, and the labeling efficiency is further improved. In addition, the image labeling subsystem 140 may also train the intermediate detection model using the newly added image after the labeling to obtain a target detection model, thereby implementing automatic training of the intermediate detection model.

The image annotation subsystem 140 includes a number of functional units. The communication unit 141 is configured to obtain an image and annotation information, and the image annotation unit 143 is configured to obtain an annotated image according to the image and annotation information. Further, the communication unit 141 is further configured to obtain correction information, and the image annotation subsystem 140 further includes a correction unit 145, where the correction unit 145 is configured to obtain a corrected and annotated image according to the correction information.

In some implementations, the image annotation subsystem 140 further includes a training unit 147 for training the initial detection model with the annotated image to obtain an intermediate detection model. The image annotation subsystem 140 may further include a coarse positioning unit 149, configured to perform pre-inference on the newly added image by using the intermediate detection model, so as to obtain coarse positioning information of the target in the newly added image.

Next, an image annotation method provided in the embodiment of the present application is described.

Referring to the flowchart of the image annotation method shown in fig. 8, the method includes:

s802: the image annotation system presents images to a user via a GUI.

The image annotation system can present at least one image to be annotated to a user via a GUI. Specifically, the image annotation system may present the image to be annotated in a manner of presenting the image one by one, or in a manner of presenting a plurality of images at a time.

S804: and the image annotation system receives annotation information through the GUI.

Specifically, a user may mark a point in an area where a target in an image is located through a GUI, a mouse click, a stylus click, a touch, or a voice password, so as to realize single-point marking of the target. In some possible implementations, the GUI may also provide a single-point annotation control, which may be a control visible to the user. After the user triggers the control by mouse clicking, touching, voice password and the like, a point can be marked in the area where the target is located in the image presented by the GUI.

When the user marks a point in the region of the target in the image, the point may be marked at a center point of the target (i.e., a geometric center of the target), or may be marked at a point near the center point (e.g., a point within a first preset distance from the center point).

The image annotation system receives annotation information via the GUI, the annotation information including location information of the annotation point. The position information of the annotation point comes from the single-point annotation of the user on the target in the image, and the position information can be characterized by the coordinate of the annotation point (specifically, the coordinate in a coordinate system established based on the image).

S806: and the image labeling system obtains the labeled image according to the image and the labeling information.

Specifically, the labeling information may be used to perform coarse positioning on the target in the image, and on this basis, the image labeling system may perform fine positioning on the target in the image based on the features of the image and the coarse positioning result, and may obtain the image after the labeling according to the fine positioning result.

In some implementations, the image annotation system inputs the position information of the image and the annotation point into the annotation model to obtain the annotated image. The annotation model is used for reasoning the position of a target corresponding to the annotation point in the image to obtain a target frame, and the target frame comprises the annotation point of the user single-point annotation.

Specifically, the annotation model may be an end-to-end model in which the image and the annotation information are input, and the annotated image is output. After the image annotation system inputs the image and the annotation information to the annotation model, the annotation model can preprocess the image according to the annotation information, then reason the preprocessed image to obtain the region of the target meeting the preset conditions, and then map the region of the target obtained by inference to the original image (namely the image before single-point annotation), thereby determining the position and the size of the target frame in the original image, and obtaining the annotated image according to the original image and the target frame.

When the annotation model preprocesses the image according to the annotation information, the image can be expanded and/or cut according to the position information of the annotation point in the annotation information. Specifically, the annotation model may expand and/or crop the image according to the position information of the annotation point, so that the annotation point is in the middle area of the expanded and/or cropped image. The middle area is an area having a distance from the edge of the image of more than a second preset distance. The second preset distance may be set according to a size of the image, and when the size of the image is large, the second preset distance may be set to a large value, and when the size of the image is small, the second preset distance may be set to a small value.

For ease of understanding, the pre-treatment process is described below with reference to a specific example. As shown in fig. 9, the annotation model determines the maximum distance max _ d from the annotation point (in this example, the center point of the target) to the edge of the image, and then expands the image into a square with a side length of 2 × max _ d based on the annotation point, wherein the expanded portion may be filled with a preset background color, such as black. Thus, the annotation point is in the middle region of the expanded image. Fig. 9 illustrates an example of performing the expansion process on the image, and in another possible implementation, the annotation model may perform the cropping process on the image.

When the annotation point is in the middle area of the preprocessed image, the target has a high probability to appear in the middle area of the preprocessed image. For this purpose, the annotation model may use an inference network dedicated to the presence of the target in the middle region of the image to perform inference, resulting in a region of the target that meets the preset conditions. The preset condition may be that the confidence is greater than the confidence threshold, and the target image ratio (i.e., the area ratio of the region of the inferred target to the image) is within a preset range.

The region of the target satisfying the preset condition may be specifically represented by an offset amount and a scaling of the target in the pre-processed image. Wherein the offset and scale include offset and scale in the x-axis and y-axis. That is, the region of the target satisfying the preset condition may be characterized by (dx, dy, rx, ry). The annotation model maps the region of the target meeting the preset conditions to the original image, determines the offset and the scaling of the target in the original image, determines the position and the size of the target frame according to the offset and the scaling, and can obtain the annotated image according to the original image and the target frame.

The mapping of the region of the target meeting the preset condition to the original image by the annotation model is specifically realized according to the pixel relationship between the image and the preprocessed image. The pixel relationship describes a correspondence between pixels of the image and pixels of the pre-processed image. In the example of fig. 9, the image and the preprocessed image each establish a coordinate system with the lower left corner as the origin, and a pixel with coordinates (x, y) in the preprocessed image corresponds to a pixel with coordinates (x ', y'), where x '═ x, y' ═ y-max _ d.

Based on the method, the labeling model can determine the characteristic point (such as the central point) of the target according to the offset obtained by inference of the inference network. Assuming that the coordinates of the annotation point are (x0, y0), the annotation model determines the coordinates of the center point of the target in the preprocessed image as (x0+ dx, y0+ dy) according to the coordinates and the offset of the annotation point. Then the annotation model is mapped according to the pixel relation, and the coordinate of the central point of the target in the image can be determined to be (x0+ dx, y0+ dy-max _ d). And the scaling coefficients in the image and the preprocessed image are not changed, the labeling model maps the region of the target meeting preset conditions in the preprocessed image to the image to obtain the region which takes (x0+ dx, y0+ dy-max _ d) as the center and the scaling of the x axis and the y axis as rx and ry, the position and the size of the target frame can be determined according to the mapped region, and the image after labeling can be obtained according to the original image and the target frame.

In some implementations, the annotation model can also be a non-end-to-end model. Specifically, the annotation model may only include the inference network, that is, the annotation model may be a central sensitive model that is sensitive to the middle region of the image. Based on this, before the image and the annotation information are input into the annotation model, the image annotation system needs to preprocess the image according to the annotation information, then input the preprocessed image into the annotation model, and after an output result of the annotation model (i.e. an area of a target satisfying a preset condition) is obtained, map the area of the target satisfying the preset condition to determine the position and size of the target frame, so that the annotated image can be obtained according to the image and the target frame.

The inference network can focus on the middle area, is specially used for reasoning the image of the target in the middle area, and has higher reasoning precision, so the annotation precision of the annotation model can be improved. In addition, when the annotation point is in the middle region, the target is more likely to appear in the middle region, so that after the inference network down-samples the preprocessed image through the CNN to obtain the feature map, the inference network can only select the feature points adjacent to the annotation point to perform classification (whether the anchor frame corresponding to the feature point includes the target classification) and regression operation, rather than performing classification and regression operation on all the feature points.

As one example, the inference network may include s for a square with side length s centered on the annotation point²The anchor boxes (anchor boxes) corresponding to the feature points are classified and subjected to target regression, and all the feature points included in the feature graph with the size of m x n do not need to be classified and subjected to target regression, so that the labeling efficiency can be improved

And the service requirement can be met.

The label model can be obtained through training. The core of the labeling model is an inference network. The image labeling system can be trained to obtain an inference network, the front end of the inference network is connected with a preprocessing layer, and the rear end of the inference network is connected with a post-processing layer, so that a labeling model is obtained. The preprocessing layer is used for preprocessing an input image, and the post-processing layer is used for mapping a region of a target inferred by the inference network to determine the position and the size of a target frame and obtaining an image with finished labeling according to the image and the target frame.

Specifically, the image annotation system may construct an initial inference network according to a region candidate network (RPN), where the initial inference network is specifically a network that selects feature points (but not all feature points) adjacent to the annotation point for location regression.

Then, the image annotation system can also acquire a general data set for target detection, then preprocess each image in the general data set, specifically, perform single-point annotation on each image, and then expand and/or crop the image according to the position information of the annotation point, so that the annotation point is in the middle area of the expanded and/or cropped image, thereby obtaining the target data set. Each piece of data in the target data set is a labeled image with the target in the middle area. The image annotation system trains the initial reasoning network by using the target data set, when the initial reasoning network meets the training ending condition, if the loss value of the network tends to be convergent, the training can be stopped, and the trained network can be used as the reasoning network.

It should be noted that, when the image annotation system constructs the initial inference network, it may also select the RPN of the corresponding type according to the service requirement to construct the initial inference network. Specifically, when there is a target segmentation requirement, for example, when a target frame in an image subjected to requirement annotation is close to a target, the image annotation system may use mask RCNN to construct an initial inference network. When there is no target segmentation requirement, for example, when only a target frame in the image subjected to annotation comprises a target, the image annotation system can adopt RCNN and faster RCNN to construct an initial inference network.

In some implementations, the image annotation system can also directly acquire the trained annotation model. For example, the image annotation system can obtain the trained annotation model used by other annotation tasks for the current annotation task. Therefore, the time for training the model can be saved, and the image labeling efficiency is improved.

In some implementations, the annotation model includes, but is not limited to, a model based on a centro-sensitive inference network. For example, the labeling model may also be a pixel contour recognition model, and the pixel contour recognition model obtains the contour of the object according to the pixel values of the labeling points and the pixel values of the points around the labeling points.

S808: and the image annotation system presents the annotated image to the user through a GUI.

And the image subjected to labeling comprises a target frame, and the target frame comprises a target used for being learned by the AI model. The AI model detects the target by learning the characteristics of the target and regresses the position of the target. The position of the target can be visually presented through the target box.

When the image annotation system adopts different AI models to precisely position the target, the shape of the target frame in the annotated image can be different. In some possible implementation manners, when the image annotation system adopts an AI model based on RCNN and false RCNN to perform fine positioning on the target, the shape of the target frame is a preset shape such as a rectangle, a circle, or an ellipse. As shown in fig. 5, the target frame is rectangular in shape. In other possible implementations, when the image annotation system performs fine positioning on the target by using the mask RCNN-based AI model, the shape of the target frame is the shape close to the target, for example, the shape of the outline of the target, that is, the target frame is the outline frame of the target. As shown in fig. 6, the shape of the target frame is the shape of the outline of the target.

It should be further noted that the target frame may be a line frame, and specifically includes a line frame formed by a solid line or a dashed line. In some implementations, the target box may also be a scatter box formed of a plurality of discrete dots. The embodiments of the present application do not limit this.

It is to be understood that in some implementations, S808 may not be performed to perform the image annotation method.

S810: and the image annotation system receives the correction information of the user on the annotated image through the GUI.

When the annotated image has errors, the user can also correct the annotated image through the GUI. Specifically, the user may adjust a target frame in the annotated image through the GUI, for example, adjust the size and/or the position of the target frame, so as to correct the annotated image.

Correspondingly, the image annotation system receives the correction information of the annotated image from the user through the GUI. The correction information is specifically from the modification of the target frame in the image with the labeling completed by the user. As one example, the correction information may include an offset and/or a scale of the target box, and/or the like. The image labeling system can obtain the corrected and labeled image according to the correction information. Thus, the accuracy of the marked image is further improved.

It is to be understood that in some implementations, S810 may not be performed to perform the image annotation method.

Based on the above description, the image annotation method provided by the embodiment of the application enables a user to obtain an image with completed annotation in a simpler single-point annotation manner, simplifies user operation, and improves annotation efficiency. Moreover, the user can correct the marked image through the GUI, so that the accuracy of the marked image is improved.

In order to further simplify the user operation and improve the labeling efficiency, the image labeling system can also input a newly added image to the trained intermediate detection model, and utilizes the trained intermediate detection model to carry out pre-reasoning on the target in the newly added image so as to obtain the coarse positioning information of the target in the newly added image. Then, the image annotation system determines the annotation information of the newly added image according to the coarse positioning information, does not need to manually perform single-point annotation to obtain the annotation information, and can obtain the annotated image according to the newly added image and the annotation information. Therefore, full-automatic labeling is realized, and the labeling efficiency is further improved.

The intermediate detection model can be obtained by training the initial detection model through the image marked by the mark. The annotated image may be the annotated image output by S806 in the embodiment shown in fig. 8, or may be a corrected annotated image obtained by the image annotation system after receiving the correction information in the embodiment shown in fig. 8. Of course, the labeled image used for training the intermediate detection model may also include an image obtained by other methods, for example, an image obtained by manual labeling.

For ease of understanding, the following description will be given by using the labeled image output in S806 to train an intermediate detection model.

Referring to fig. 10, a schematic flow chart of obtaining an intermediate detection model by using image training completed by labeling is shown, as shown in fig. 10, the method includes:

s1002: and the image labeling system inputs the labeled image to the initial detection model to obtain target position information obtained by reasoning the labeled image by the initial detection model.

The image labeling system can input the labeled images to the initial detection model in batches according to the preset batch size (batch size), and the initial detection model obtains the target position information by learning the characteristics of the labeled images and carrying out reasoning according to the characteristics.

S1004: and the image labeling system determines a loss value according to the target position information obtained by inference and the target position information labeled in advance of the labeled image, updates the initial detection model parameters according to the loss value and obtains an intermediate detection model.

The image labeling system can substitute the target position information obtained by inference and the target position information labeled in advance into a loss function so as to determine a loss value, and update the parameters of the initial detection model according to the loss value so as to reduce the loss value of the model. The intermediate detection model is used for pre-reasoning the position of the target to be marked in the input newly added image to obtain coarse positioning information.

In consideration of training efficiency, in some implementations, the image annotation system may further obtain a general target detection model as an initial detection model, and obtain an intermediate detection model by performing fine-tune (fine-tune) on the initial detection model.

Further, the image annotation system may also be configured to annotate images according to a preset ratio, such as 80%: and 20%, splitting the marked images into a training set and a verification set. And the image labeling system inputs the labeled images in the training set into the initial detection model so as to train or fine tune the initial detection model. When the trained model or the trimmed model meets the end condition, the image annotation system can also verify the trained model or the trimmed model by using the image labeled in the verification set, and the trained model or the trimmed model meets a specific condition, for example, if the Intersection Over Unit (IOU) on the verification set is greater than a preset value, an intermediate detection model is obtained. Wherein the preset value is set according to an empirical value. As an example, the preset value may be set to 70%.

After the intermediate detection model is obtained, the image annotation system can utilize the intermediate detection model and the annotation model to annotate the target to be annotated in the newly added image, so as to obtain the newly added image with the annotation. Specifically, the image annotation system may input the newly added image into the intermediate detection model, and the intermediate detection model performs pre-inference on the newly added image to obtain coarse positioning information of the target in the newly added image. The coarse positioning information is target position information obtained by performing coarse positioning on a target in the newly added image by the intermediate detection model. The intermediate detection model can realize coarse positioning of the target by reasoning and regressing a positioning frame of the target. Based on this, the coarse positioning information may be position information of the target frame regressed by the intermediate detection model.

The image annotation system can determine the position information of the intersection point of the diagonal line of the target frame according to the position information of the target frame regressed by the intermediate detection model, and then the image annotation system can input the position information of the intersection point of the newly added image and the diagonal line (namely the annotation information of the newly added image) to the annotation model to obtain the newly added image after annotation. Therefore, the full-automatic labeling of the newly added image can be realized, and the labeling efficiency is further improved.

FIG. 11 shows a flow chart of an image annotation process. As shown in fig. 11, the number of images to be labeled is M, where M is a positive integer. The method specifically comprises the following steps:

s1102: the user marks the image 1 and the image 2 … … with a single point in sequence.

S1104: the image annotation system inputs the image 1, the image 2 … …, the image N, and annotation information of each image into the annotation model, and obtains the annotated image N of the annotated image 1 and the annotated image 2 … … output by the annotation model.

S1106: the image labeling system inputs the labeled images 1 to N into an initial detection model, determines a loss value according to target position information obtained by reasoning the labeled images 1 to N by the initial detection model and the labeled target position information in the labeled images 1 to N, and updates parameters of the initial detection model according to the loss value, so that the training of the initial detection model is realized. And when the trained initial detection model meets a specific condition, obtaining an intermediate detection model.

S1108: the image labeling system inputs the images from N +1 to M into the intermediate detection model, and the intermediate detection model carries out reasoning on the images to obtain coarse positioning information of targets in the images from N +1 to M. The coarse positioning information includes position information of the target frame regressed by the intermediate detection model.

S1110: and the image labeling system determines the position information of the intersection point of the diagonal lines of the target frame according to the position information of the target frame, so as to obtain labeling information.

S1112: the image annotation system inputs the images N +1 to M and the annotation information of each image to the annotation model, and obtains the images N +1 to M which are output by the annotation model and are annotated.

S1114: the image labeling system inputs the labeled images from N +1 to M to the intermediate detection model, determines a loss value according to target position information obtained by reasoning the intermediate detection model on the labeled images from N +1 to M and the labeled target position information in the labeled images from N +1 to M, and updates parameters of the intermediate detection model according to the loss value, so that the training of the intermediate detection model is realized. And when the trained intermediate detection model meets a specific condition, obtaining a target detection model.

It should be understood that in some implementations, performing the image annotation method may not perform S1106 to S1114 as well.

In applications such as classification and recognition of objects in an image, for example, in applications such as face recognition, it is necessary to detect the objects, that is, to regress the positions of the objects (or the sizes of the objects in some cases), and to classify the objects to obtain attribute information of the objects. Based on this, the image annotation system may further receive attribute information of a target, and associate the attribute information of the target with the target in the annotated image.

The attribute information can further describe and explain the target, and increase the information content contained in the image subjected to annotation, so that the image subjected to annotation can be applied to more scenes. This attribute information may vary depending on the particular application scenario.

As one example, the target may be a vehicle, and the attribute information of the target may include one or more of the following information: the type of the vehicle, the color of the vehicle, the license plate number of the vehicle, the model of the vehicle, the running speed of the vehicle, the owner identity information and the registration and maintenance information of the vehicle.

In some implementations, the intermediate detection model can also include a classifier by which the intermediate detection model can infer attribute information of the target. For example, in the example of fig. 4, the intermediate detection model may infer attribute information of the type of the vehicle for the target vehicle, which is specifically "car" as shown in fig. 4. The image annotation system can associate the attribute information with a target in the annotated image for target classification, target identification, or other application scenarios.

Based on the description, the image annotation system can also train an initial detection model by using the annotated image to obtain an intermediate detection model, pre-reasoning is carried out on the input image through the intermediate detection model to obtain coarse positioning information of a target in the input image, and annotation information of the input image can be determined according to the coarse positioning information, so that manual single-point annotation can be avoided to obtain annotation information, the annotated image can be obtained based on the input image and the annotation information, so that manual full-automatic annotation is realized, and the annotation efficiency is further improved.

In consideration of the diversity of the sizes of the targets in the image, in order to realize higher labeling precision for the targets with different sizes, the image labeling system can also expand and/or cut the image according to the labeling information to obtain the images with different sizes. Specifically, the image annotation system may expand the image according to the position information of the annotation point, the expanded image may refer to fig. 9, and then the image annotation system may cut the expanded image for multiple times, so as to obtain K candidate images with different sizes. As shown in FIG. 12, the annotation point can be in the middle region of each candidate image.

The image annotation system inputs K candidate images with different sizes into the annotation model, so that the condition that the target size is too small and the detection omission or the regression error is large is avoided, and the annotation precision is improved. It should be noted that the number K of candidate images may be set according to the target size. For example, when the target size is small, K may be set to a large value. As one example, K may be set to 8.

The image annotation method provided by the embodiment of the present application is described above with reference to fig. 1 to 12, and the image annotation system, the image annotation apparatus, and the computing device for implementing the function of the image annotation apparatus provided by the embodiment of the present application are described next with reference to the accompanying drawings.

Referring to fig. 3, an embodiment of the present application provides an image annotation system 100, which is configured to perform steps S802 to S810 in the foregoing method embodiment, and optionally perform a method optional in the foregoing steps, and further optionally perform steps S1002 to S1004, and optionally perform a method optional in the foregoing steps. The system includes a user interaction subsystem 120 and an image annotation subsystem 140.

As shown in fig. 13, an image annotation apparatus 1300 is further provided in an embodiment of the present application, where the apparatus 1300 is configured to perform the image annotation method. The embodiment of the present application does not limit the division of the functional modules in the apparatus 1300, and the following exemplary provides a division of the functional modules:

the image annotation apparatus 1300 includes an interface module 1302 and an annotation module 1304.

The interface module 1302 is configured to present an image to a user through a GUI, and receive annotation information through the GUI, where the annotation information includes location information of an annotation point, and the location information of the annotation point is from a single point annotation of a target in the image by the user;

the labeling module 1304 is configured to obtain an image with a completed label according to the image and the labeling information.

In some possible implementations, the interface module 1302 is further configured to:

receiving attribute information of the target;

the tagging module 1304 is further configured to:

and associating the attribute information of the target with the target in the image with the labeling completion.

In some possible implementations, the tagging module 1304 is specifically configured to:

and inputting the image and the position information of the annotation point into an annotation model to obtain an annotated image, wherein the annotation model is used for reasoning the position of a target corresponding to the annotation point in the image to obtain a target frame, and the target frame comprises the annotation point. In some possible implementations, the apparatus 1300 further includes:

a training module 1306, configured to train an initial detection model using the labeled image to obtain an intermediate detection model;

the labeling module 1304 is further configured to obtain coarse positioning information of a target to be labeled in the newly added image through the intermediate detection model, determine a target frame corresponding to the target to be labeled in the newly added image by using the coarse positioning information and the labeling model, and obtain the newly added image after labeling;

the training module 1306 is further configured to train the intermediate detection model by using the newly added image after the labeling, so as to obtain a target detection model.

The image annotation apparatus 1300 according to the embodiment of the present application may correspond to performing the method described in the embodiment of the present application, and the above and other operations and/or functions of each module of the image annotation apparatus 1300 are respectively for implementing corresponding flows of each method in fig. 8 and fig. 10, and are not described herein again for brevity.

The image annotation apparatus 1300 can be implemented by a computing device. Fig. 14 provides a computing device, and as shown in fig. 14, the computing device 1400 may be specifically used to implement the functions of the image annotation device 1300 in the embodiment shown in fig. 13.

The computing device 1400 includes a bus 1401, a processor 1402, a display 1403, and a memory 1404. Communication between the processor 1402, memory 1404, and display 1403 occurs via bus 1401.

The bus 1401 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 14, but this is not intended to represent only one bus or type of bus.

The processor 1402 may be any one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Micro Processor (MP), a Digital Signal Processor (DSP), and the like.

The display 1403 is an input/output (I/O) device. The device can display electronic documents such as images and characters on a screen for a user to view. The display 1405 may be classified into a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, and the like according to a manufacturing material. Specifically, the display 1403 may display an image through the GUI, receive annotation information through the GUI, or display an annotated image through the GUI, or the like.

The memory 1404 may include a volatile memory (volatile memory), such as a Random Access Memory (RAM). The memory 1404 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, a hard drive (HDD), or a Solid State Drive (SSD).

The memory 1404 stores executable program code, and the processor 1402 executes the executable program code to perform the image labeling method. Specifically, the processor 1402 executes the program codes to control the display 1403 to present an image to a user through a GUI, and receive annotation information through the GUI, where the annotation information includes position information of an annotation point, and the position information of the annotation point is from a single point annotation of a target in the image by the user, and then the processor 1402 obtains an annotated image according to the image and the annotation information.

In some possible implementations, the processor may further control the other interface to receive attribute information of the target, and associate the attribute information with the target that is labeled in the image in which the labeling is completed. Wherein the other interface may be a microphone or the like. Specifically, the microphone may receive attribute information expressed in a voice form.

The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium can be any available medium that a computing device can store or a data storage device, such as a data center, that contains one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others. The computer-readable storage medium includes instructions that instruct a computing device to perform the image annotation method applied to the image annotation apparatus.

The embodiment of the application also provides a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computing device, cause the processes or functions described in accordance with embodiments of the application to occur, in whole or in part.

The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, or data center to another website site, computer, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.).

The computer program product may be a software installation package which may be downloaded and executed on a computing device in the event that any of the methods of image annotation described above are required.

The description of the flow or structure corresponding to each of the above drawings has emphasis, and a part not described in detail in a certain flow or structure may refer to the related description of other flows or structures.

Claims

An image annotation method, characterized in that the method comprises:

presenting an image to a user through a Graphical User Interface (GUI);

receiving annotation information through the GUI, wherein the annotation information comprises position information of an annotation point, and the position information of the annotation point is from a single-point annotation of a target in the image by the user;

and obtaining the image with the label according to the image and the label information.
The method of claim 1, further comprising:

and presenting the marked image to the user through the GUI, wherein the marked image comprises a target frame, and the target frame comprises an image area representing the same target.
The method of claim 2, wherein image regions in the object box representing the same object are used for learning by an Artificial Intelligence (AI) model.
A method according to claim 2 or 3, characterized in that the method further comprises:

and receiving correction information of the annotated image by the user through the GUI, wherein the correction information comes from the modification of the target frame in the annotated image by the user.
The method according to any one of claims 1 to 4, further comprising: and receiving the attribute information of the target, and associating the attribute information of the target with the target in the image after the labeling is finished.
The method according to any one of claims 1 to 5, wherein the obtaining the annotated image from the image and the annotation information comprises:

and inputting the image and the position information of the annotation point into an annotation model to obtain an annotated image, wherein the annotation model is used for reasoning the position of a target corresponding to the annotation point in the image to obtain a target frame, and the target frame comprises the annotation point.
The method according to any one of claims 1 to 6, further comprising:

training an initial detection model by using the marked image to obtain an intermediate detection model;

obtaining coarse positioning information of the target to be marked in the newly added image through the intermediate detection model;

determining a target frame corresponding to a target to be labeled in the newly added image by using the coarse positioning information and the labeling model, and obtaining the newly added image after labeling;

and training the intermediate detection model by using the newly added image after the labeling to obtain a target detection model.
An image annotation apparatus, characterized in that the apparatus comprises:

the system comprises an interface module, a display module and a display module, wherein the interface module is used for presenting an image to a user through a Graphical User Interface (GUI) and receiving annotation information through the GUI, the annotation information comprises position information of an annotation point, and the position information of the annotation point is from a single-point annotation of a target in the image by the user;

and the marking module is used for obtaining the marked image according to the image and the marking information.
The apparatus of claim 8, wherein the interface module is further configured to:

and presenting the marked image to the user through the GUI, wherein the marked image comprises a target frame, and the target frame comprises an image area representing the same target.
The apparatus of claim 9, wherein image regions in the target box that represent a same target are used for learning by an Artificial Intelligence (AI) model.
The apparatus of claim 9 or 10, wherein the interface module is further configured to:

and receiving correction information of the annotated image by the user through the GUI, wherein the correction information comes from the modification of the target frame in the annotated image by the user.
The apparatus of any of claims 8 to 11, wherein the interface module is further configured to:

receiving attribute information of the target;

the labeling module is further configured to:

and associating the attribute information of the target with the target in the image after the labeling is finished.
The apparatus according to any one of claims 8 to 12, wherein the labeling module is specifically configured to:

and inputting the image and the position information of the annotation point into an annotation model to obtain an annotated image, wherein the annotation model is used for reasoning the position of a target corresponding to the annotation point in the image to obtain a target frame, and the target frame comprises the annotation point.
The apparatus of any one of claims 8 to 13, further comprising:

the training module is used for training an initial detection model by using the marked image to obtain an intermediate detection model;

the labeling module is further used for obtaining rough positioning information of a target to be labeled in the newly added image through the intermediate detection model, determining a target frame corresponding to the target to be labeled in the newly added image by using the rough positioning information and the labeling model, and obtaining the newly added image after labeling;

and the training module is also used for training the intermediate detection model by using the newly added image after the labeling to obtain a target detection model.
A computing device comprising a processor, a memory, and a display;

the processor is to execute instructions stored in the memory to cause the computing device to perform the method of any of claims 1 to 7.
A computer-readable storage medium comprising instructions that, when executed on a computing device, cause the computing device to perform the method of any of claims 1 to 7.