CN117275025A - Processing system for batch image annotation - Google Patents

Processing system for batch image annotation Download PDF

Info

Publication number
CN117275025A
CN117275025A CN202311438309.4A CN202311438309A CN117275025A CN 117275025 A CN117275025 A CN 117275025A CN 202311438309 A CN202311438309 A CN 202311438309A CN 117275025 A CN117275025 A CN 117275025A
Authority
CN
China
Prior art keywords
detection frame
image
annotation
frame
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311438309.4A
Other languages
Chinese (zh)
Inventor
张学森
孙涤非
任轶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Daoyi Shuhui Technology Co ltd
Original Assignee
Beijing Daoyi Shuhui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Daoyi Shuhui Technology Co ltd filed Critical Beijing Daoyi Shuhui Technology Co ltd
Priority to CN202311438309.4A priority Critical patent/CN117275025A/en
Publication of CN117275025A publication Critical patent/CN117275025A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • G06V30/1448Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on markings or identifiers characterising the document or the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures

Abstract

The embodiment of the invention relates to a processing system for batch image annotation, which comprises the following components: the system comprises a task scheduling module, a task input module, a manual labeling module, a manual auditing module, a task output module, a multi-mode target detection model, an image feature learning model and an image segmentation model; the system can shorten the working time of the labeling work, improve the working efficiency of the labeling work and reduce the labeling cost of the labeling work.

Description

Processing system for batch image annotation
Technical Field
The invention relates to the field of data processing, in particular to a processing system for batch image annotation.
Background
In the automatic driving field, massive images need to be acquired for training of various models, and the acquired massive images need to be subjected to image annotation. At present, conventional image marking work is finished manually, and the conventional working mode has low working efficiency, long marking time and high marking cost.
Disclosure of Invention
The object of the present invention is to provide a processing system for batch image annotation, which aims at the defects of the prior art, and the system comprises: the system comprises a task scheduling module, a task input module, a manual labeling module, a manual auditing module, a task output module, a multi-mode target detection model, an image feature learning model and an image segmentation model; the task scheduling module is used for sorting out the image sequence to be detected from the labeling task received by the task input module; the manual annotation module is used for carrying out target type text confirmation according to interaction between the annotation mode and a user and selecting part of images to be detected from the image sequence to be detected for pre-annotation processing; the task scheduling module invokes the multi-mode target detection model, the image feature learning model and the image segmentation model again, and performs target detection, low-resolution detection frame filtering and semantic segmentation processing on the image sequence to be detected according to the target type text sequence and the annotation frame data set output by the manual annotation module to obtain a corresponding detection frame segmentation data set; the manual auditing module carries out manual auditing according to the image sequence to be inspected, the detection frame data set and the detection frame segmentation data set; and the task scheduling module forms the audit output of the manual audit module into a corresponding task output data packet and outputs the data packet through the task output module. When the system processes the massive image labeling task each time, only a few images are selected from the massive images according to the target types to be labeled in advance to perform pre-labeling, then the system can automatically label the rest massive images according to the pre-labeled target types and the labeling frame, and a manual auditing interface is provided for auditing labeling results. The system can shorten the working time of the labeling work, improve the working efficiency of the labeling work and reduce the labeling cost of the labeling work.
To achieve the above object, an embodiment of the present invention provides a processing system for batch image annotation, the system including: the system comprises a task scheduling module, a task input module, a manual labeling module, a manual auditing module, a task output module, a multi-mode target detection model, an image feature learning model and an image segmentation model;
the task scheduling module is respectively connected with the task input module, the manual annotation module, the manual auditing module, the task output module, the multi-mode target detection model, the image feature learning model and the image segmentation model; the multi-mode target detection model defaults to a grouping DINO model; the image feature learning model defaults to a DINov2 model; the image segmentation model adopts a SAM model by default;
the task input module is used for sending a first labeling task input by a user to the task scheduling module; the first labeling task comprises a first labeling mode, a first task data type and first task data; the first annotation mode comprises a simple annotation mode and a complex annotation mode; the first task data type comprises an image type and a video type; the first task data corresponding to the first task data type is an image sequence when the first task data type is an image type, and the first task data corresponding to the first task data type is a video data when the first task data type is a video type;
The task scheduling module is used for extracting the corresponding first annotation mode, the first task data type and the first task data from the received first annotation task; the first task data type is identified, if the first task data type is an image type, the first task data is used as a corresponding first image sequence to be detected, if the first task data type is a video type, video framing image extraction processing is carried out on the first task data, and all the extracted images are formed into the corresponding first image sequence to be detected according to time sequence; the first labeling mode and the first image sequence to be detected are sent to the manual labeling module;
the manual annotation module is used for carrying out target type text confirmation according to interaction between the first annotation mode and a user to obtain a corresponding first target type text sequence when the first annotation mode and the first image sequence to be detected are received; selecting part of images to be detected from the first image sequence to be detected according to the first labeling mode and the interaction between the first target type text sequence and a user, and performing pre-labeling processing to obtain a corresponding first labeling frame data set; the first target type text sequence and the first annotation frame data set are returned to the task scheduling module;
The task scheduling module is further used for calling the multi-mode target detection model to perform target detection processing on the first image sequence to be detected according to the first target type text sequence to obtain a corresponding first detection frame data set when the first target type text sequence and the first annotation frame data set are received; calling the image feature learning model to respectively carry out corresponding labeling/detection frame image feature recognition processing on the first labeling frame data set and the first detection frame data set to obtain a corresponding first labeling frame feature set and a corresponding first detection frame feature set; performing low-resolution detection frame filtering processing on the first detection frame data set according to the first annotation frame feature set and the first detection frame feature set; invoking the image segmentation model to carry out detection frame image semantic segmentation processing on the filtered first detection frame data set to obtain a corresponding first detection frame segmentation data set; the first image sequence to be detected, the first detection frame data set and the first detection frame segmentation data set are sent to the manual auditing module;
The manual auditing module is used for conducting manual auditing processing according to the received first image sequence to be checked, the first detection frame data set and the first detection frame segmentation data set, outputting a corresponding first auditing image sequence, a corresponding first auditing detection frame data set and a corresponding first auditing detection frame segmentation data set, and sending back to the task scheduling module;
the task scheduling module is further used for generating a corresponding first task output data packet by the received first examination image sequence, the first examination detection frame data set and the first examination detection frame segmentation data set; and outputting the first task output data packet to a user through the task output module.
Preferably, the first image sequence to be detected includes a plurality of first images to be detected, and each first image to be detected corresponds to a first image identifier;
the first target type text sequence includes one or more first target type texts; when the first annotation mode is a simple annotation mode, the first target type text sequence consists of a plurality of first target type texts, and each first target type text is a target type noun without a fixed language; when the first annotation mode is a complex annotation mode, the first target type text sequence only comprises one first target type text, and the unique first target type text is a target type noun phrase with one or more fixed languages;
The first annotation frame data set comprises a plurality of first annotation frame data; the first annotation frame data comprises a first father image identification, a first annotation frame image, a first annotation frame center point coordinate, a first annotation frame size, a first annotation frame orientation and a first annotation frame type; the first father image identification corresponds to one first image identification; the first annotation frame type corresponds to one first target type text;
the first detection frame data set comprises a plurality of first detection frame data; the first detection frame data comprises a second father image identifier, a first detection frame image, a first detection frame center point coordinate, a first detection frame size, a first detection frame orientation and a first detection frame type; the second father image identification corresponds to one of the first image identifications; the first detection frame type corresponds to one first target type text;
the first detection frame segmentation data set comprises a plurality of first detection frame segmentation data; the first detection frame segmentation data comprise a second detection frame identifier and a first detection frame semantic segmentation map; the second detection frame identifier corresponds to one of the first detection frame identifiers; the pixel semantics of the first detection frame semantic segmentation map include foreground semantics and background semantics, and the front Jing Yuyi corresponds to one of the first detection frame types.
Preferably, the manual labeling module is specifically configured to identify the first labeling mode when the target type text is confirmed to obtain a corresponding first target type text sequence according to interaction between the first labeling mode and a user;
if the first annotation mode is a simple annotation mode, providing a first simple target type input page for a user; receiving a plurality of target type nouns input by a user through the first simple target type input page, taking each input target type noun as a corresponding first target type text, and forming a corresponding first target type text sequence by all obtained first target type texts;
if the first annotation mode is a complex annotation mode, providing a first complex target type input page for a user; and receiving a target type noun phrase with one or more fixed languages input by a user through the first complex target type input page as the corresponding first target type text, and forming the corresponding first target type text sequence by the unique first target type text.
Preferably, the manual labeling module is specifically configured to provide a first pre-labeling page for a user when the first labeling frame data set corresponding to a pre-labeling process is obtained by selecting a part of images to be detected from the first image sequence according to the first labeling mode and the first target type text sequence and user interaction, and arrange and display all the first images to be detected of the first image sequence on the first pre-labeling page;
When any one of the first images to be detected is selected by a user, the currently selected first image to be detected is used as a corresponding current image; providing a marking frame drawing function for a user to draw marking frames on the current image so as to obtain one or more corresponding first marking frames; the first image identification of the current image is used as the first father image identification of each first annotation frame; extracting the annotation frame images of the first annotation frames on the current image to serve as the corresponding first annotation frame images; the coordinate of the center point of the marking frame, the size of the marking frame and the orientation of the marking frame of each first marking frame on the current image are used as the corresponding coordinate of the center point of the first marking frame, the size of the first marking frame and the orientation of the first marking frame;
when any first annotation frame is selected by a user, taking the currently selected first annotation frame as a corresponding current annotation frame; identifying the first labeling mode; if the first annotation mode is a simple annotation mode, providing an annotation frame type marking function for a user to optionally select one first target type text from the first target type text sequence as a corresponding first annotation frame type to mark the current annotation frame; if the first annotation mode is a complex annotation mode, taking a unique first target type text in the first target type text sequence as a corresponding current target type text, displaying a first prompt message with a confirmation option and a cancel option to a user, prompting whether the current target type text is to be used as the first annotation frame type corresponding to the current annotation frame or not through the first prompt message, and setting the first annotation frame type corresponding to the current annotation frame as the corresponding current target type text when the user selects the confirmation option of the first prompt message;
When a pre-annotation submitting option preset on the first pre-annotation page is selected by a user, forming corresponding first annotation frame data by the first father image identifier, the first annotation frame image, the first annotation frame center point coordinate, the first annotation frame size, the first annotation frame orientation and the first annotation frame type corresponding to each first annotation frame; and the corresponding first annotation frame data set is composed of all the obtained first annotation frame data.
Preferably, the task scheduling module is specifically configured to traverse the first to-be-detected image of the first to-be-detected image sequence when the multi-mode target detection model is invoked to perform target detection processing on the first to-be-detected image sequence according to the first target type text sequence to obtain a corresponding first detection frame data set; the first image to be detected which is traversed currently is used as a corresponding current image to be detected, and the first image identifier corresponding to the current image to be detected is used as a corresponding current image identifier; inputting the first target type text sequence and the current to-be-detected image into the multi-mode target detection model, and carrying out directional target detection on the current to-be-detected image by the multi-mode target detection model according to one or more first target type texts in the first target type text sequence and outputting a corresponding first detection frame-text pair set; if the first detection frame-text pair set is not empty, carrying out detection frame data assembly according to the current image identification, the current image to be detected and the first detection frame-text pair set to obtain a corresponding first detection frame data subset; when the traversing is finished, combining all the obtained first detection frame data subsets to form a corresponding first detection frame data set;
Wherein the first set of detection box-text pairs comprises a plurality of first detection box-text pairs; the first detection box-text pair comprises a first target detection box and a first text; the first target detection frame comprises a first target detection frame center point coordinate, a first target detection frame size and a first target detection frame orientation; the first text corresponds to one of the first target type texts in the sequence when the number of the first target type texts in the first target type text sequence is not unique; and when the number of the first target type texts in the first target type text sequence is unique, the first text corresponds to the unique first target type text in the sequence.
Further, the task scheduling module is specifically configured to traverse the first detection frame-text pair of the first detection frame-text pair set when the corresponding first detection frame data subset is obtained by performing detection frame data assembly according to the current image identifier, the current image to be detected, and the first detection frame-text pair set; traversing, wherein the first detection frame-text pair currently traversed is used as a corresponding current detection frame-text pair; the current image identifier is used as the corresponding second father image identifier; a unique detection frame identifier is allocated to the first target detection frame of the current detection frame-text pair as the corresponding first detection frame identifier; extracting a detection frame image of the first target detection frame of the current detection frame-text pair on the current image to be detected as a corresponding first detection frame image; the first target detection frame center point coordinates, the first target detection frame size and the first target detection frame orientation of the first target detection frame of the current detection frame-text pair are used as the corresponding first detection frame center point coordinates, the first detection frame size and the first detection frame orientation; and taking the first text of the current detection frame-text pair as the corresponding first detection frame type; the obtained second father image identification, the first detection frame image, the first detection frame center point coordinate, the first detection frame size, the first detection frame orientation and the first detection frame type form corresponding first detection frame data; and when the traversal is finished, the corresponding first detection frame data subset is formed by all the obtained first detection frame data.
Preferably, the task scheduling module is specifically configured to, when the invoking the image feature learning model performs corresponding labeling/detection frame image feature recognition processing on the first labeling frame data set and the first detection frame data set to obtain a corresponding first labeling frame feature set and a first detection frame feature set, input the first labeling frame images of the first labeling frame data set into the image feature learning model, and perform image feature extraction processing on the first labeling frame images by using the image feature learning model to obtain corresponding first labeling frame features; inputting the first detection frame images of the first detection frame data set into the image feature learning model, and carrying out image feature extraction processing on the first detection frame images by the image feature learning model to obtain corresponding first detection frame features; and the corresponding first labeling frame feature set is formed by all the obtained first labeling frame features, and the corresponding first detection frame feature set is formed by all the obtained first detection frame features.
Preferably, the task scheduling module is specifically configured to traverse a first detection frame feature of the first detection frame feature set when the low-resolution detection frame filtering process is performed on the first detection frame data set according to the first labeling frame feature set and the first detection frame feature set; the first detection frame characteristic of the current traversal is used as a corresponding current detection frame characteristic, and the first detection frame type of the first detection frame data corresponding to the current detection frame characteristic is used as a corresponding current detection frame type; and taking each first annotation frame data matched with the current detection frame type in the first annotation frame data set as corresponding matched annotation frame data; and taking the first annotation frame features corresponding to the matched annotation frame data in the first annotation frame feature set as corresponding similar annotation frame features; matching and scoring the current detection frame characteristics and the similar marking frame characteristics based on a Hungary matching algorithm to obtain corresponding first scores, and averaging all the obtained first scores to generate corresponding first average scores; and deleting the first detection frame data corresponding to the current detection frame characteristics from the first detection frame data set when the first average score is lower than a preset scoring threshold.
Preferably, the task scheduling module is specifically configured to traverse the first detection frame data of the first detection frame data set when the image segmentation model is invoked to perform detection frame image semantic segmentation processing on the filtered first detection frame data set to obtain a corresponding first detection frame segmentation data set; traversing, wherein the first detection frame data which is traversed currently is used as corresponding current detection frame data; inputting the first detection frame image of the current detection frame data into the image segmentation model, and performing pixel-level foreground and background pixel semantic segmentation processing on the first detection frame image by the image segmentation model to generate a corresponding first detection frame semantic segmentation map; marking each pixel point with pixel semantics not being background semantics on the first detection frame semantic segmentation map as a corresponding first foreground pixel point, and setting the pixel semantics of each first foreground pixel point as the first detection frame type of the current detection frame data; the first detection frame identifier of the current detection frame data is used as the corresponding second detection frame identifier; the second detection frame mark and the first detection frame semantic segmentation map are obtained to form corresponding first detection frame segmentation data; and when the traversal is finished, the corresponding first detection frame segmentation data set is formed by all the obtained first detection frame segmentation data.
Preferably, the manual auditing module is specifically configured to, when performing manual auditing processing according to the received first image sequence to be inspected, the first detection frame data set, and the first detection frame segmentation data set and outputting a corresponding first audit image sequence, first audit detection frame data set, and first audit detection frame segmentation data set to send back to the task scheduling module, combine the first detection frame data set and the first detection frame segmentation data set according to a corresponding relation of detection frame identifiers to obtain a corresponding second detection frame data set; wherein the second set of detection frame data includes a plurality of second detection frame data; the second detection frame data comprises the second father image identifier, the first detection frame image, the first detection frame center point coordinate, the first detection frame size, the first detection frame orientation, the first detection frame type and the first detection frame semantic segmentation map;
traversing each first to-be-detected image of the first to-be-detected image sequence; traversing, wherein the first to-be-detected image which is traversed currently is used as a corresponding current to-be-detected image; the first image identifier corresponding to the current image to be detected is used as a corresponding current image identifier; the second detection frame data matched with the current image identifier in the second father image identifier in the second detection frame data set is recorded as corresponding first matching detection frame data; identifying whether the number of the first matching detection frame data is zero or not; if the number of the first matching detection frame data is zero, marking the current image to be detected as a corresponding first image to be filtered; if the number of the first matching detection frame data is not zero, corresponding detection frame drawing, front Jing Yuyi pixel coloring and text prompt frame drawing processing are carried out on the current to-be-detected image according to all the first matching detection frame data to obtain a corresponding first examination and approval image; when the traversal is finished, providing a first image examination page for a user, and displaying all the first examination images on the first image examination page in a arraying way;
When any one of the first examination and delivery images is selected by a user, displaying a second prompt message with a confirmation option and a cancel option to the user, prompting whether the currently selected first examination and delivery image is to be marked as a disqualified image or not through the second prompt message, and marking the currently selected first examination and delivery image as the corresponding first image to be filtered when the user selects the confirmation option of the second prompt message;
when a preset examination ending option on the first image examination page is selected by a user, deleting the first to-be-detected images corresponding to the first to-be-filtered images in the first to-be-detected image sequence, and taking the deleted image sequence as the corresponding first examination image sequence; the second father image identification in the first detection frame data set and the first detection frame data corresponding to each first image to be filtered are used as corresponding first detection frame data to be deleted; deleting the first detection frame segmentation data corresponding to each piece of first detection frame data to be deleted by the second detection frame identification in the first detection frame segmentation data set, and taking the deleted data set as the corresponding first examination detection frame segmentation data set; deleting all the first detection frame data to be deleted in the first detection frame data set, and taking the deleted data set as the corresponding first check detection frame data set; and sending the obtained first audit image sequence, the first audit detection frame data set and the first audit detection frame segmentation data set back to the task scheduling module.
Further, the manual auditing module is specifically configured to traverse each first matching detection frame data when the corresponding detection frame drawing, the coloring of the front Jing Yuyi pixel points and the text prompt box drawing process are performed on the current image to be inspected according to all the first matching detection frame data to obtain a corresponding first examination and approval image; traversing, wherein the first matching detection frame data in the current traversal is used as corresponding current matching detection frame data; drawing a detection frame on the current image to be detected according to the first detection frame center point coordinate, the first detection frame size and the first detection frame orientation of the current matching detection frame data to obtain a corresponding first drawing frame; the foreground semantic pixel point marking is carried out on the image in the first drawing frame according to the first detection frame semantic segmentation map of the current matching detection frame data, and the preset first color is used for setting the color of the front Jing Yuyi pixel point of the first drawing frame; drawing a text prompt box at a designated position on the first drawing box to serve as a corresponding first text box, and setting the text content of the first text box as the first detection box type of the current matching detection box data; and when the traversal is finished, taking the current to-be-inspected image added with the drawing information as the corresponding first review image.
The embodiment of the invention provides a processing system for batch image annotation, which comprises the following components: the system comprises a task scheduling module, a task input module, a manual labeling module, a manual auditing module, a task output module, a multi-mode target detection model, an image feature learning model and an image segmentation model; the task scheduling module is used for sorting out the image sequence to be detected from the labeling task received by the task input module; the manual annotation module is used for carrying out target type text confirmation according to interaction between the annotation mode and a user and selecting part of images to be detected from the image sequence to be detected for pre-annotation processing; the task scheduling module invokes the multi-mode target detection model, the image feature learning model and the image segmentation model again, and performs target detection, low-resolution detection frame filtering and semantic segmentation processing on the image sequence to be detected according to the target type text sequence and the annotation frame data set output by the manual annotation module to obtain a corresponding detection frame segmentation data set; the manual auditing module carries out manual auditing according to the image sequence to be inspected, the detection frame data set and the detection frame segmentation data set; and the task scheduling module forms the audit output of the manual audit module into a corresponding task output data packet and outputs the data packet through the task output module. When the system processes the massive image labeling task each time, only a few images are selected from the massive images according to the target types to be labeled in advance to perform pre-labeling, then the system can automatically label the rest massive images according to the pre-labeled target types and the labeling frame, and a manual auditing interface is provided for auditing labeling results. The system not only shortens the working time of the labeling work, but also improves the working efficiency of the labeling work and reduces the labeling cost of the labeling work.
Drawings
FIG. 1 is a schematic block diagram of a processing system for batch image annotation according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
FIG. 1 is a schematic block diagram of a processing system for batch image labeling according to an embodiment of the present invention, where, as shown in FIG. 1, the system includes: the system comprises a task scheduling module 1, a task input module 2, a manual labeling module 3, a manual auditing module 4, a task output module 5, a multi-mode target detection model 6, an image feature learning model 7 and an image segmentation model 8. The task scheduling module 1 is respectively connected with the task input module 2, the manual annotation module 3, the manual auditing module 4, the task output module 5, the multi-mode target detection model 6, the image feature learning model 7 and the image segmentation model 8.
The task input module 2 is configured to send a first labeling task input by a user to the task scheduling module 1. The first labeling task comprises a first labeling mode, a first task data type and first task data; the first annotation mode comprises a simple annotation mode and a complex annotation mode; the first task data type includes an image type and a video type; the first task data corresponding to the first task data type is an image sequence when the first task data type is an image type, and the first task data corresponding to the first task data type is a video data when the first task data type is a video type.
The task scheduling module 1 is used for extracting a corresponding first labeling mode, a first task data type and first task data from the received first labeling task; the first task data type is identified, if the first task data type is the image type, the first task data is used as a corresponding first image sequence to be detected, if the first task data type is the video type, video framing image extraction processing is carried out on the first task data, and all the extracted images are formed into the corresponding first image sequence to be detected according to time sequence; and the first annotation mode and the first image sequence to be detected are sent to the manual annotation module 3. The first image sequence comprises a plurality of first images to be detected, and each first image to be detected corresponds to one first image identifier.
The manual annotation module 3 is used for carrying out target type text confirmation according to interaction between the first annotation mode and a user to obtain a corresponding first target type text sequence when the first annotation mode and the first image sequence to be detected are received; selecting part of images to be detected from the first image sequence to be detected according to the first annotation mode and the first target type text sequence and interaction of the user, and performing pre-annotation processing to obtain a corresponding first annotation frame data set; and the first target type text sequence and the first annotation frame data set are returned to the task scheduling module 1.
Wherein the first target type text sequence comprises one or more first target type texts; when the first annotation mode is a simple annotation mode, the first target type text sequence consists of a plurality of first target type texts, and each first target type text is a target type noun without a fixed language; when the first annotation mode is a complex annotation mode, the first target type text sequence comprises only one first target type text, and the unique first target type text is a target type noun phrase with one or more dialects. The first annotation frame data set comprises a plurality of first annotation frame data; the first annotation frame data comprises a first father image identification, a first annotation frame image, a first annotation frame center point coordinate, a first annotation frame size, a first annotation frame orientation and a first annotation frame type; the first father image identification corresponds to a first image identification; the first callout box type corresponds to a first target type text.
In a specific implementation manner of the embodiment of the present invention, the manual labeling module 3 is specifically configured to, when performing target type text confirmation with user interaction according to the first labeling mode to obtain a corresponding first target type text sequence:
step A1, identifying a first labeling mode;
step A2, if the first labeling mode is a simple labeling mode, providing a first simple target type input page for a user; receiving a plurality of target type nouns input by a user through a first simple target type input page, taking each input target type noun as a corresponding first target type text, and forming a corresponding first target type text sequence by all obtained first target type texts;
for example, in the case that the first annotation mode is a simple annotation mode, the first simple object type input page receives three object type nouns input by the user: "automobile", "tree", "pedestrian"; then the first target type text sequence is { "car", "tree", "pedestrian" };
step A3, if the first labeling mode is a complex labeling mode, providing a first complex target type input page for a user; and receiving a target type noun phrase with one or more fixed languages input by a user through the first complex target type input page as a corresponding first target type text, and forming a corresponding first target type text sequence by the unique first target type text.
For example, in the case that the first annotation mode is a complex annotation mode, the first complex object type input page receives a noun phrase of an object type input by the user: pedestrians on a lane; the resulting text sequence of the first target type is { "pedestrian on lane" }.
In another specific implementation manner of the embodiment of the present invention, the manual labeling module 3 is specifically configured to, when selecting a portion of an image to be detected from the first image sequence to be detected according to the first labeling mode and the first target type text sequence and user interaction, perform pre-labeling processing to obtain a corresponding first labeling frame data set:
step B1, providing a first pre-marked page for a user, and displaying all first images to be detected of a first image sequence on the first pre-marked page in an arrangement way;
step B2, when any one of the first images to be detected is selected by a user, taking the currently selected first image to be detected as a corresponding current image; providing a marking frame drawing function for a user to draw the marking frame on the current image so as to obtain one or more corresponding first marking frames; the first image identification of the current image is used as a first father image identification of each first annotation frame; extracting the annotation frame images of the first annotation frames on the current image to serve as corresponding first annotation frame images; the coordinate of the center point of the marking frame, the size of the marking frame and the orientation of the marking frame of each first marking frame on the current image are used as the corresponding coordinate of the center point of the first marking frame, the size of the first marking frame and the orientation of the first marking frame;
Step B3, when any first annotation frame is selected by a user, taking the currently selected first annotation frame as a corresponding current annotation frame; identifying the first labeling mode; if the first annotation mode is a simple annotation mode, providing an annotation frame type marking function for a user to select a first target type text from a first target type text sequence as a corresponding first annotation frame type to mark the current annotation frame; if the first annotation mode is a complex annotation mode, taking a unique first target type text in a first target type text sequence as a corresponding current target type text, displaying a first prompt message with a confirmation option and a cancel option to a user, prompting whether the current target type text is to be used as a first annotation frame type corresponding to the current annotation frame or not through the first prompt message, and setting the first annotation frame type corresponding to the current annotation frame as a corresponding current target type text when the user selects the confirmation option of the first prompt message;
step B4, when a pre-annotation submitting option preset on the first pre-annotation page is selected by a user, forming corresponding first annotation frame data by a first father image identifier, a first annotation frame image, a first annotation frame center point coordinate, a first annotation frame size, a first annotation frame orientation and a first annotation frame type corresponding to each first annotation frame; and the corresponding first marking frame data set is formed by all the obtained first marking frame data.
The task scheduling module 1 is further configured to, when receiving the first target type text sequence and the first label frame data set, invoke the multi-mode target detection model 6 to perform target detection processing on the first image sequence to be detected according to the first target type text sequence to obtain a corresponding first detection frame data set; calling an image feature learning model 7 to respectively carry out corresponding labeling/detection frame image feature recognition processing on the first labeling frame data set and the first detection frame data set to obtain a corresponding first labeling frame feature set and a corresponding first detection frame feature set; performing low-resolution detection frame filtering processing on the first detection frame data set according to the first marking frame feature set and the first detection frame feature set; invoking an image segmentation model 8 to perform detection frame image semantic segmentation processing on the filtered first detection frame data set to obtain a corresponding first detection frame segmentation data set; and the first image sequence to be detected, the first detection frame data set and the first detection frame segmentation data set are sent to the manual auditing module 4.
Wherein, the multi-mode target detection model 6 defaults to a grouping DINO model; the image feature learning model 7 adopts a DINov2 model by default; the image segmentation model 8 defaults to the SAM model. The first detection frame data set comprises a plurality of first detection frame data; the first detection frame data comprises a second father image identifier, a first detection frame image, a first detection frame center point coordinate, a first detection frame size, a first detection frame orientation and a first detection frame type; the second father image identification corresponds to a first image identification; the first detection frame type corresponds to a first target type text. The first detection frame segmentation data set comprises a plurality of first detection frame segmentation data; the first detection frame segmentation data comprise a second detection frame identification and a first detection frame semantic segmentation map; the second detection frame identifier corresponds to one first detection frame identifier; the pixel semantics of the first detection frame semantic segmentation map include foreground semantics and background semantics, the front Jing Yuyi corresponding to one first detection frame type.
In another specific implementation manner of the embodiment of the present invention, the task scheduling module 1 is specifically configured to traverse a first to-be-detected image of the first to-be-detected image sequence when invoking the multi-mode target detection model 6 to perform target detection processing on the first to-be-detected image sequence according to the first target type text sequence to obtain a corresponding first detection frame data set; the first image to be detected which is traversed at present is used as a corresponding current image to be detected, and a first image identifier corresponding to the current image to be detected is used as a corresponding current image identifier; inputting a first target type text sequence and a current image to be detected into a multi-mode target detection model 6, and carrying out directional target detection on the current image to be detected by the multi-mode target detection model 6 according to one or more first target type texts in the first target type text sequence and outputting a corresponding first detection frame-text pair set; if the first detection frame-text pair set is not empty, carrying out detection frame data assembly according to the current image identification, the current image to be detected and the first detection frame-text pair set to obtain a corresponding first detection frame data subset; when the traversing is finished, combining all the obtained first detection frame data subsets to form a corresponding first detection frame data set;
Wherein the first set of detection box-text pairs comprises a plurality of first detection box-text pairs; the first detection box-text pair comprises a first target detection box and a first text; the first target detection frame comprises a first target detection frame center point coordinate, a first target detection frame size and a first target detection frame orientation; when the number of the first target type texts in the first target type text sequence is not the same, the first text corresponds to one first target type text in the sequence; the first text corresponds to a unique first object type text in the sequence when the number of first object type texts in the sequence of first object type texts is unique.
Here, the multi-modal object detection model 6 in the embodiment of the present invention adopts a grouping DINO model by default, where the grouping DINO model is a multi-modal object detection big model implemented based on a transformation model structure, and as shown in paper "grouping DINO: marrying DINO with Grounded Pre-Training for Open-Set Object Detection", the big model is composed of an image feature extraction module, a text feature extraction module, a feature enhancement module, a language guidance query selection module, and a cross-modal decoder module, and as shown in the paper, the big model performs dual-mode (text, image) feature extraction and fusion on an input object type text and an input image, performs object detection based on fusion features, and combines the input object type text and a detected object detection box (bbox) into a detection box-text pair for output.
In another specific implementation manner of the embodiment of the present invention, the task scheduling module 1 is specifically configured to traverse a first detection frame-text pair of the first detection frame-text pair set when the detection frame data is assembled according to the current image identifier, the current image to be detected, and the first detection frame-text pair set to obtain a corresponding first detection frame data subset; traversing, wherein the first detection frame-text pair in the current traversal is used as the corresponding current detection frame-text pair; the current image identifier is used as a corresponding second father image identifier; a unique detection frame identifier is allocated to a first target detection frame of the current detection frame-text pair as a corresponding first detection frame identifier; extracting a detection frame image of a first target detection frame of the current detection frame-text pair on a current image to be detected as a corresponding first detection frame image; the first target detection frame center point coordinate, the first target detection frame size and the first target detection frame orientation of the first target detection frame of the current detection frame-text pair are used as corresponding first detection frame center point coordinate, first detection frame size and first detection frame orientation; taking a first text of the current detection frame-text pair as a corresponding first detection frame type; the obtained second father image identification, the first detection frame image, the first detection frame center point coordinate, the first detection frame size, the first detection frame orientation and the first detection frame type form corresponding first detection frame data; and when the traversal is finished, the corresponding first detection frame data subset is formed by all the obtained first detection frame data.
In another specific implementation manner of the embodiment of the present invention, the task scheduling module 1 is specifically configured to, when the image feature learning model 7 is invoked, perform corresponding labeling/detection frame image feature recognition processing on the first labeling frame data set and the first detection frame data set to obtain a corresponding first labeling frame feature set and a corresponding first detection frame feature set, input first labeling frame images of each first labeling frame data of the first labeling frame data set into the image feature learning model 7, and perform image feature extraction processing on each first labeling frame image by the image feature learning model 7 to obtain a corresponding first labeling frame feature; inputting first detection frame images of the first detection frame data set into an image feature learning model 7, and carrying out image feature extraction processing on the first detection frame images by the image feature learning model 7 to obtain corresponding first detection frame features; and the corresponding first marking frame feature set is formed by all the obtained first marking frame features, and the corresponding first detection frame feature set is formed by all the obtained first detection frame features.
Here, the image feature learning model 7 in the embodiment of the present invention adopts a DINOv2 model as a default, and the DINOv2 model is a visual large model, as shown in paper DINOv2: learning Robust Visual Features without Supervision, and the model can perform image feature learning (extraction) on an input arbitrary scale image.
In another specific implementation manner of the embodiment of the present invention, the task scheduling module 1 is specifically configured to traverse the first detection frame feature of the first detection frame feature set when performing low-resolution detection frame filtering processing on the first detection frame data set according to the first label frame feature set and the first detection frame feature set; the first detection frame characteristic of the current traversal is used as the corresponding current detection frame characteristic, and the first detection frame type of the first detection frame data corresponding to the current detection frame characteristic is used as the corresponding current detection frame type; and taking each first marking frame data of which the first marking frame type is matched with the current detection frame type in the first marking frame data set as corresponding matched marking frame data; the first marking frame features corresponding to the matching marking frame data in the first marking frame feature set are used as corresponding similar marking frame features; matching and scoring the current detection frame characteristics and the similar marking frame characteristics based on a Hungary matching algorithm to obtain corresponding first scores, and averaging all the obtained first scores to generate corresponding first average scores; and deleting the first detection frame data corresponding to the current detection frame characteristics from the first detection frame data set when the first average score is lower than a preset scoring threshold.
In another specific implementation manner of the embodiment of the present invention, the task scheduling module 1 is specifically configured to traverse first detection frame data of the first detection frame data set when the image segmentation model 8 is invoked to perform detection frame image semantic segmentation processing on the filtered first detection frame data set to obtain a corresponding first detection frame segmentation data set; traversing, wherein the first detection frame data in the current traversal is used as corresponding current detection frame data; inputting a first detection frame image of the current detection frame data into an image segmentation model 8, and performing pixel-level foreground and background pixel semantic segmentation processing on the first detection frame image by the image segmentation model 8 to generate a corresponding first detection frame semantic segmentation map; marking each pixel point with pixel semantics not being background semantics on the first detection frame semantic segmentation map as a corresponding first foreground pixel point, and setting the pixel semantics of each first foreground pixel point as a first detection frame type of current detection frame data; the first detection frame identifier of the current detection frame data is used as a corresponding second detection frame identifier; the second detection frame mark and the first detection frame semantic segmentation map form corresponding first detection frame segmentation data; and when the traversal is finished, the obtained first detection frame segmentation data form a corresponding first detection frame segmentation data set.
Here, the image segmentation model 8 in the embodiment of the present invention adopts a SAM model by default, and the SAM model is generally called Segment Anything Model, which is a large model for image segmentation.
The manual auditing module 4 is configured to perform manual auditing processing according to the received first image sequence to be inspected, the first detection frame data set, and the first detection frame segmentation data set, and output a corresponding first auditing image sequence, first auditing detection frame data set, and first auditing detection frame segmentation data set, and send back to the task scheduling module 1, specifically:
step C1, merging the first detection frame data set and the first detection frame segmentation data set according to the corresponding relation of the detection frame identifications to obtain a corresponding second detection frame data set;
wherein the second detection frame data set includes a plurality of second detection frame data; the second detection frame data comprises a second father image identifier, a first detection frame image, a first detection frame center point coordinate, a first detection frame size, a first detection frame orientation, a first detection frame type and a first detection frame semantic segmentation map;
step C2, traversing each first to-be-detected image of the first to-be-detected image sequence; traversing, and taking the first currently traversed image to be detected as a corresponding current image to be detected; the first image identifier corresponding to the current image to be detected is used as the corresponding current image identifier; and recording second detection frame data, in which a second father image identifier is matched with the current image identifier, in the second detection frame data set as corresponding first matching detection frame data; identifying whether the number of the first matching detection frame data is zero or not; if the number of the first matching detection frame data is zero, marking the current image to be detected as a corresponding first image to be filtered; if the number of the first matching detection frame data is not zero, corresponding detection frame drawing, front Jing Yuyi pixel coloring and text prompt frame drawing processing are carried out on the current to-be-detected image according to all the first matching detection frame data to obtain a corresponding first examination and approval image; when the traversal is finished, providing a first image examination page for a user, and displaying all the first examination images on the first image examination page in a arraying way;
In another specific implementation manner of the embodiment of the present invention, the manual review module 4 is specifically configured to traverse each first matching detection frame data when a corresponding detection frame drawing, a front Jing Yuyi pixel coloring and a text prompt box drawing process are performed on the current to-be-detected image according to all the first matching detection frame data to obtain a corresponding first delivery image; traversing, wherein the first matching detection frame data in the current traversal is used as corresponding current matching detection frame data; drawing the detection frame on the current image to be detected according to the first detection frame center point coordinate, the first detection frame size and the first detection frame orientation of the current matching detection frame data to obtain a corresponding first drawing frame; the foreground semantic pixel point marking is carried out on the image in the first drawing frame according to the first detection frame semantic segmentation diagram of the current matching detection frame data, and the preset first color is used for setting the color of the front Jing Yuyi pixel point of the first drawing frame; drawing a text prompt box at a designated position on the first drawing box to serve as a corresponding first text box, and setting the text content of the first text box as a first detection box type of the current matching detection box data; when the traversal is finished, taking the current image to be inspected, to which the drawing information is added, as a corresponding first examination image;
Step C3, when any first examination and delivery image is selected by a user, displaying a second prompt message with a confirmation option and a cancel option to the user, prompting whether the currently selected first examination and delivery image is to be marked as an unqualified image or not to the user through the second prompt message, and marking the currently selected first examination and delivery image as a corresponding first image to be filtered when the user selects the confirmation option of the second prompt message;
step C4, deleting the first to-be-detected images corresponding to the first to-be-filtered images in the first to-be-detected image sequence when the preset examination ending options on the first image examination page are selected by the user, and taking the deleted image sequence as a corresponding first examination image sequence; the second father image identification in the first detection frame data set and the first detection frame data corresponding to each first image to be filtered are used as corresponding first detection frame data to be deleted; deleting first detection frame segmentation data corresponding to each piece of first detection frame data to be deleted from a second detection frame identification in the first detection frame segmentation data set, and taking the deleted data set as a corresponding first check detection frame segmentation data set; deleting all first detection frame data to be deleted in the first detection frame data set, and taking the deleted data set as a corresponding first check detection frame data set;
And step C5, the obtained first audit image sequence, the first audit detection frame data set and the first audit detection frame segmentation data set are returned to the task scheduling module 1.
The task scheduling module 1 is further configured to generate a corresponding first task output data packet by using the received first audit image sequence, the first audit detection frame data set and the first audit detection frame segmentation data set; and outputs the first task output data packet to the user through the task output module 5.
The embodiment of the invention provides a processing system for batch image annotation, which comprises the following components: the system comprises a task scheduling module, a task input module, a manual labeling module, a manual auditing module, a task output module, a multi-mode target detection model, an image feature learning model and an image segmentation model; the task scheduling module is used for sorting out the image sequence to be detected from the labeling task received by the task input module; the manual annotation module is used for carrying out target type text confirmation according to interaction between the annotation mode and a user and selecting part of images to be detected from the image sequence to be detected for pre-annotation processing; the task scheduling module invokes the multi-mode target detection model, the image feature learning model and the image segmentation model again, and performs target detection, low-resolution detection frame filtering and semantic segmentation processing on the image sequence to be detected according to the target type text sequence and the annotation frame data set output by the manual annotation module to obtain a corresponding detection frame segmentation data set; the manual auditing module carries out manual auditing according to the image sequence to be inspected, the detection frame data set and the detection frame segmentation data set; and the task scheduling module forms the audit output of the manual audit module into a corresponding task output data packet and outputs the data packet through the task output module. When the system processes the massive image labeling task each time, only a few images are selected from the massive images according to the target types to be labeled in advance to perform pre-labeling, then the system can automatically label the rest massive images according to the pre-labeled target types and the labeling frame, and a manual auditing interface is provided for auditing labeling results. The system not only shortens the working time of the labeling work, but also improves the working efficiency of the labeling work and reduces the labeling cost of the labeling work.
Those of skill would further appreciate that the steps of a system, module, unit, and algorithm described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the various illustrative components and steps have been described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a system, module, unit, or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (11)

1. A processing system for batch image annotation, the system comprising: the system comprises a task scheduling module, a task input module, a manual labeling module, a manual auditing module, a task output module, a multi-mode target detection model, an image feature learning model and an image segmentation model;
the task scheduling module is respectively connected with the task input module, the manual annotation module, the manual auditing module, the task output module, the multi-mode target detection model, the image feature learning model and the image segmentation model; the multi-mode target detection model defaults to a grouping DINO model; the image feature learning model defaults to a DINov2 model; the image segmentation model adopts a SAM model by default;
The task input module is used for sending a first labeling task input by a user to the task scheduling module; the first labeling task comprises a first labeling mode, a first task data type and first task data; the first annotation mode comprises a simple annotation mode and a complex annotation mode; the first task data type comprises an image type and a video type; the first task data corresponding to the first task data type is an image sequence when the first task data type is an image type, and the first task data corresponding to the first task data type is a video data when the first task data type is a video type;
the task scheduling module is used for extracting the corresponding first annotation mode, the first task data type and the first task data from the received first annotation task; the first task data type is identified, if the first task data type is an image type, the first task data is used as a corresponding first image sequence to be detected, if the first task data type is a video type, video framing image extraction processing is carried out on the first task data, and all the extracted images are formed into the corresponding first image sequence to be detected according to time sequence; the first labeling mode and the first image sequence to be detected are sent to the manual labeling module;
The manual annotation module is used for carrying out target type text confirmation according to interaction between the first annotation mode and a user to obtain a corresponding first target type text sequence when the first annotation mode and the first image sequence to be detected are received; selecting part of images to be detected from the first image sequence to be detected according to the first labeling mode and the interaction between the first target type text sequence and a user, and performing pre-labeling processing to obtain a corresponding first labeling frame data set; the first target type text sequence and the first annotation frame data set are returned to the task scheduling module;
the task scheduling module is further used for calling the multi-mode target detection model to perform target detection processing on the first image sequence to be detected according to the first target type text sequence to obtain a corresponding first detection frame data set when the first target type text sequence and the first annotation frame data set are received; calling the image feature learning model to respectively carry out corresponding labeling/detection frame image feature recognition processing on the first labeling frame data set and the first detection frame data set to obtain a corresponding first labeling frame feature set and a corresponding first detection frame feature set; performing low-resolution detection frame filtering processing on the first detection frame data set according to the first annotation frame feature set and the first detection frame feature set; invoking the image segmentation model to carry out detection frame image semantic segmentation processing on the filtered first detection frame data set to obtain a corresponding first detection frame segmentation data set; the first image sequence to be detected, the first detection frame data set and the first detection frame segmentation data set are sent to the manual auditing module;
The manual auditing module is used for conducting manual auditing processing according to the received first image sequence to be checked, the first detection frame data set and the first detection frame segmentation data set, outputting a corresponding first auditing image sequence, a corresponding first auditing detection frame data set and a corresponding first auditing detection frame segmentation data set, and sending back to the task scheduling module;
the task scheduling module is further used for generating a corresponding first task output data packet by the received first examination image sequence, the first examination detection frame data set and the first examination detection frame segmentation data set; and outputting the first task output data packet to a user through the task output module.
2. The processing system for batch image annotation of claim 1 wherein,
the first image sequence to be detected comprises a plurality of first images to be detected, and each first image to be detected corresponds to one first image identifier;
the first target type text sequence includes one or more first target type texts; when the first annotation mode is a simple annotation mode, the first target type text sequence consists of a plurality of first target type texts, and each first target type text is a target type noun without a fixed language; when the first annotation mode is a complex annotation mode, the first target type text sequence only comprises one first target type text, and the unique first target type text is a target type noun phrase with one or more fixed languages;
The first annotation frame data set comprises a plurality of first annotation frame data; the first annotation frame data comprises a first father image identification, a first annotation frame image, a first annotation frame center point coordinate, a first annotation frame size, a first annotation frame orientation and a first annotation frame type; the first father image identification corresponds to one first image identification; the first annotation frame type corresponds to one first target type text;
the first detection frame data set comprises a plurality of first detection frame data; the first detection frame data comprises a second father image identifier, a first detection frame image, a first detection frame center point coordinate, a first detection frame size, a first detection frame orientation and a first detection frame type; the second father image identification corresponds to one of the first image identifications; the first detection frame type corresponds to one first target type text;
the first detection frame segmentation data set comprises a plurality of first detection frame segmentation data; the first detection frame segmentation data comprise a second detection frame identifier and a first detection frame semantic segmentation map; the second detection frame identifier corresponds to one of the first detection frame identifiers; the pixel semantics of the first detection frame semantic segmentation map include foreground semantics and background semantics, and the front Jing Yuyi corresponds to one of the first detection frame types.
3. The processing system for batch image annotation of claim 2 wherein,
the manual annotation module is specifically configured to identify the first annotation mode when the target type text is confirmed to obtain a corresponding first target type text sequence according to interaction between the first annotation mode and a user;
if the first annotation mode is a simple annotation mode, providing a first simple target type input page for a user; receiving a plurality of target type nouns input by a user through the first simple target type input page, taking each input target type noun as a corresponding first target type text, and forming a corresponding first target type text sequence by all obtained first target type texts;
if the first annotation mode is a complex annotation mode, providing a first complex target type input page for a user; and receiving a target type noun phrase with one or more fixed languages input by a user through the first complex target type input page as the corresponding first target type text, and forming the corresponding first target type text sequence by the unique first target type text.
4. The processing system for batch image annotation of claim 2 wherein,
the manual annotation module is specifically configured to provide a first pre-annotation page for a user when the first annotation frame dataset corresponding to the first annotation frame dataset is obtained by pre-annotating a part of images to be detected selected from the first image sequence according to the first annotation mode and the first target type text sequence, and the first pre-annotation page is used for displaying all the first images to be detected of the first image sequence in a arrayed manner;
when any one of the first images to be detected is selected by a user, the currently selected first image to be detected is used as a corresponding current image; providing a marking frame drawing function for a user to draw marking frames on the current image so as to obtain one or more corresponding first marking frames; the first image identification of the current image is used as the first father image identification of each first annotation frame; extracting the annotation frame images of the first annotation frames on the current image to serve as the corresponding first annotation frame images; the coordinate of the center point of the marking frame, the size of the marking frame and the orientation of the marking frame of each first marking frame on the current image are used as the corresponding coordinate of the center point of the first marking frame, the size of the first marking frame and the orientation of the first marking frame;
When any first annotation frame is selected by a user, taking the currently selected first annotation frame as a corresponding current annotation frame; identifying the first labeling mode; if the first annotation mode is a simple annotation mode, providing an annotation frame type marking function for a user to optionally select one first target type text from the first target type text sequence as a corresponding first annotation frame type to mark the current annotation frame; if the first annotation mode is a complex annotation mode, taking a unique first target type text in the first target type text sequence as a corresponding current target type text, displaying a first prompt message with a confirmation option and a cancel option to a user, prompting whether the current target type text is to be used as the first annotation frame type corresponding to the current annotation frame or not through the first prompt message, and setting the first annotation frame type corresponding to the current annotation frame as the corresponding current target type text when the user selects the confirmation option of the first prompt message;
When a pre-annotation submitting option preset on the first pre-annotation page is selected by a user, forming corresponding first annotation frame data by the first father image identifier, the first annotation frame image, the first annotation frame center point coordinate, the first annotation frame size, the first annotation frame orientation and the first annotation frame type corresponding to each first annotation frame; and the corresponding first annotation frame data set is composed of all the obtained first annotation frame data.
5. The processing system for batch image annotation of claim 2 wherein,
the task scheduling module is specifically configured to traverse the first to-be-detected image of the first to-be-detected image sequence when the multi-mode target detection model is invoked to perform target detection processing on the first to-be-detected image sequence according to the first target type text sequence to obtain a corresponding first detection frame data set; the first image to be detected which is traversed currently is used as a corresponding current image to be detected, and the first image identifier corresponding to the current image to be detected is used as a corresponding current image identifier; inputting the first target type text sequence and the current to-be-detected image into the multi-mode target detection model, and carrying out directional target detection on the current to-be-detected image by the multi-mode target detection model according to one or more first target type texts in the first target type text sequence and outputting a corresponding first detection frame-text pair set; if the first detection frame-text pair set is not empty, carrying out detection frame data assembly according to the current image identification, the current image to be detected and the first detection frame-text pair set to obtain a corresponding first detection frame data subset; when the traversing is finished, combining all the obtained first detection frame data subsets to form a corresponding first detection frame data set;
Wherein the first set of detection box-text pairs comprises a plurality of first detection box-text pairs; the first detection box-text pair comprises a first target detection box and a first text; the first target detection frame comprises a first target detection frame center point coordinate, a first target detection frame size and a first target detection frame orientation; the first text corresponds to one of the first target type texts in the sequence when the number of the first target type texts in the first target type text sequence is not unique; and when the number of the first target type texts in the first target type text sequence is unique, the first text corresponds to the unique first target type text in the sequence.
6. The processing system for batch image annotation of claim 5,
the task scheduling module is specifically configured to traverse the first detection frame-text pair of the first detection frame-text pair set when the corresponding first detection frame data subset is obtained by performing detection frame data assembly according to the current image identifier, the current image to be detected, and the first detection frame-text pair set; traversing, wherein the first detection frame-text pair currently traversed is used as a corresponding current detection frame-text pair; the current image identifier is used as the corresponding second father image identifier; a unique detection frame identifier is allocated to the first target detection frame of the current detection frame-text pair as the corresponding first detection frame identifier; extracting a detection frame image of the first target detection frame of the current detection frame-text pair on the current image to be detected as a corresponding first detection frame image; the first target detection frame center point coordinates, the first target detection frame size and the first target detection frame orientation of the first target detection frame of the current detection frame-text pair are used as the corresponding first detection frame center point coordinates, the first detection frame size and the first detection frame orientation; and taking the first text of the current detection frame-text pair as the corresponding first detection frame type; the obtained second father image identification, the first detection frame image, the first detection frame center point coordinate, the first detection frame size, the first detection frame orientation and the first detection frame type form corresponding first detection frame data; and when the traversal is finished, the corresponding first detection frame data subset is formed by all the obtained first detection frame data.
7. The processing system for batch image annotation of claim 2 wherein,
the task scheduling module is specifically configured to, when the image feature learning model is invoked to perform corresponding labeling/detection frame image feature recognition processing on the first labeling frame data set and the first detection frame data set to obtain a corresponding first labeling frame feature set and a corresponding first detection frame feature set, input the first labeling frame images of the first labeling frame data sets into the image feature learning model, and perform image feature extraction processing on the first labeling frame images by using the image feature learning model to obtain corresponding first labeling frame features; inputting the first detection frame images of the first detection frame data set into the image feature learning model, and carrying out image feature extraction processing on the first detection frame images by the image feature learning model to obtain corresponding first detection frame features; and the corresponding first labeling frame feature set is formed by all the obtained first labeling frame features, and the corresponding first detection frame feature set is formed by all the obtained first detection frame features.
8. The processing system for batch image annotation of claim 2 wherein,
the task scheduling module is specifically configured to traverse a first detection frame feature of the first detection frame feature set when the low-resolution detection frame filtering processing is performed on the first detection frame data set according to the first labeling frame feature set and the first detection frame feature set; the first detection frame characteristic of the current traversal is used as a corresponding current detection frame characteristic, and the first detection frame type of the first detection frame data corresponding to the current detection frame characteristic is used as a corresponding current detection frame type; and taking each first annotation frame data matched with the current detection frame type in the first annotation frame data set as corresponding matched annotation frame data; and taking the first annotation frame features corresponding to the matched annotation frame data in the first annotation frame feature set as corresponding similar annotation frame features; matching and scoring the current detection frame characteristics and the similar marking frame characteristics based on a Hungary matching algorithm to obtain corresponding first scores, and averaging all the obtained first scores to generate corresponding first average scores; and deleting the first detection frame data corresponding to the current detection frame characteristics from the first detection frame data set when the first average score is lower than a preset scoring threshold.
9. The processing system for batch image annotation of claim 2 wherein,
the task scheduling module is specifically configured to traverse the first detection frame data of the first detection frame data set when the filtered first detection frame data set is subjected to detection frame image semantic segmentation processing by using the image segmentation model to obtain a corresponding first detection frame segmentation data set; traversing, wherein the first detection frame data which is traversed currently is used as corresponding current detection frame data; inputting the first detection frame image of the current detection frame data into the image segmentation model, and performing pixel-level foreground and background pixel semantic segmentation processing on the first detection frame image by the image segmentation model to generate a corresponding first detection frame semantic segmentation map; marking each pixel point with pixel semantics not being background semantics on the first detection frame semantic segmentation map as a corresponding first foreground pixel point, and setting the pixel semantics of each first foreground pixel point as the first detection frame type of the current detection frame data; the first detection frame identifier of the current detection frame data is used as the corresponding second detection frame identifier; the second detection frame mark and the first detection frame semantic segmentation map are obtained to form corresponding first detection frame segmentation data; and when the traversal is finished, the corresponding first detection frame segmentation data set is formed by all the obtained first detection frame segmentation data.
10. The processing system for batch image annotation of claim 2 wherein,
the manual auditing module is specifically configured to combine the first detection frame data set and the first detection frame segmentation data set according to a corresponding relation of detection frame identifiers when the first to-be-inspected image sequence, the first detection frame data set and the first detection frame segmentation data set are received, perform manual auditing processing, and output a corresponding first auditing image sequence, a corresponding first auditing detection frame data set and a corresponding first auditing detection frame segmentation data set to the task scheduling module; wherein the second set of detection frame data includes a plurality of second detection frame data; the second detection frame data comprises the second father image identifier, the first detection frame image, the first detection frame center point coordinate, the first detection frame size, the first detection frame orientation, the first detection frame type and the first detection frame semantic segmentation map;
traversing each first to-be-detected image of the first to-be-detected image sequence; traversing, wherein the first to-be-detected image which is traversed currently is used as a corresponding current to-be-detected image; the first image identifier corresponding to the current image to be detected is used as a corresponding current image identifier; the second detection frame data matched with the current image identifier in the second father image identifier in the second detection frame data set is recorded as corresponding first matching detection frame data; identifying whether the number of the first matching detection frame data is zero or not; if the number of the first matching detection frame data is zero, marking the current image to be detected as a corresponding first image to be filtered; if the number of the first matching detection frame data is not zero, corresponding detection frame drawing, front Jing Yuyi pixel coloring and text prompt frame drawing processing are carried out on the current to-be-detected image according to all the first matching detection frame data to obtain a corresponding first examination and approval image; when the traversal is finished, providing a first image examination page for a user, and displaying all the first examination images on the first image examination page in a arraying way;
When any one of the first examination and delivery images is selected by a user, displaying a second prompt message with a confirmation option and a cancel option to the user, prompting whether the currently selected first examination and delivery image is to be marked as a disqualified image or not through the second prompt message, and marking the currently selected first examination and delivery image as the corresponding first image to be filtered when the user selects the confirmation option of the second prompt message;
when a preset examination ending option on the first image examination page is selected by a user, deleting the first to-be-detected images corresponding to the first to-be-filtered images in the first to-be-detected image sequence, and taking the deleted image sequence as the corresponding first examination image sequence; the second father image identification in the first detection frame data set and the first detection frame data corresponding to each first image to be filtered are used as corresponding first detection frame data to be deleted; deleting the first detection frame segmentation data corresponding to each piece of first detection frame data to be deleted by the second detection frame identification in the first detection frame segmentation data set, and taking the deleted data set as the corresponding first examination detection frame segmentation data set; deleting all the first detection frame data to be deleted in the first detection frame data set, and taking the deleted data set as the corresponding first check detection frame data set; and sending the obtained first audit image sequence, the first audit detection frame data set and the first audit detection frame segmentation data set back to the task scheduling module.
11. The processing system for batch image annotation of claim 10 wherein,
the manual auditing module is specifically configured to traverse each first matching detection frame data when the corresponding first examination image is obtained by performing corresponding detection frame drawing, front Jing Yuyi pixel coloring and text prompt box drawing processing on the current image to be examined according to all the first matching detection frame data; traversing, wherein the first matching detection frame data in the current traversal is used as corresponding current matching detection frame data; drawing a detection frame on the current image to be detected according to the first detection frame center point coordinate, the first detection frame size and the first detection frame orientation of the current matching detection frame data to obtain a corresponding first drawing frame; the foreground semantic pixel point marking is carried out on the image in the first drawing frame according to the first detection frame semantic segmentation map of the current matching detection frame data, and the preset first color is used for setting the color of the front Jing Yuyi pixel point of the first drawing frame; drawing a text prompt box at a designated position on the first drawing box to serve as a corresponding first text box, and setting the text content of the first text box as the first detection box type of the current matching detection box data; and when the traversal is finished, taking the current to-be-inspected image added with the drawing information as the corresponding first review image.
CN202311438309.4A 2023-11-01 2023-11-01 Processing system for batch image annotation Pending CN117275025A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311438309.4A CN117275025A (en) 2023-11-01 2023-11-01 Processing system for batch image annotation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311438309.4A CN117275025A (en) 2023-11-01 2023-11-01 Processing system for batch image annotation

Publications (1)

Publication Number Publication Date
CN117275025A true CN117275025A (en) 2023-12-22

Family

ID=89214476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311438309.4A Pending CN117275025A (en) 2023-11-01 2023-11-01 Processing system for batch image annotation

Country Status (1)

Country Link
CN (1) CN117275025A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117690031A (en) * 2024-02-04 2024-03-12 中科星图数字地球合肥有限公司 SAM model-based small sample learning remote sensing image detection method
CN117690031B (en) * 2024-02-04 2024-04-26 中科星图数字地球合肥有限公司 SAM model-based small sample learning remote sensing image detection method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117690031A (en) * 2024-02-04 2024-03-12 中科星图数字地球合肥有限公司 SAM model-based small sample learning remote sensing image detection method
CN117690031B (en) * 2024-02-04 2024-04-26 中科星图数字地球合肥有限公司 SAM model-based small sample learning remote sensing image detection method

Similar Documents

Publication Publication Date Title
CN111046784A (en) Document layout analysis and identification method and device, electronic equipment and storage medium
CN110889402A (en) Business license content identification method and system based on deep learning
CN110503054B (en) Text image processing method and device
CN109344864B (en) Image processing method and device for dense object
CN114549993B (en) Method, system and device for grading line segment image in experiment and readable storage medium
CN111260666A (en) Image processing method and device, electronic equipment and computer readable storage medium
CN114663904A (en) PDF document layout detection method, device, equipment and medium
CN112883926A (en) Identification method and device for table medical images
CN109635808B (en) Method for extracting keywords and contexts of Chinese in natural scene image
CN110728193A (en) Method and device for detecting richness characteristics of face image
CN114758341A (en) Intelligent contract image identification and contract element extraction method and device
CN113610068A (en) Test question disassembling method, system, storage medium and equipment based on test paper image
CN112364687A (en) Improved Faster R-CNN gas station electrostatic sign identification method and system
CN112381840A (en) Method and system for marking vehicle appearance parts in loss assessment video
CN112613367A (en) Bill information text box acquisition method, system, equipment and storage medium
CN117275025A (en) Processing system for batch image annotation
CN115544200A (en) Document image classification method and device
CN111680691B (en) Text detection method, text detection device, electronic equipment and computer readable storage medium
CN115661904A (en) Data labeling and domain adaptation model training method, device, equipment and medium
CN115565193A (en) Questionnaire information input method and device, electronic equipment and storage medium
CN114579796A (en) Machine reading understanding method and device
CN112149654B (en) Invoice text information identification method based on deep learning
CN115311666A (en) Image-text recognition method and device, computer equipment and storage medium
CN114387600A (en) Text feature recognition method and device, computer equipment and storage medium
CN114926842A (en) Dongba pictograph recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination