CN113806573A

CN113806573A - Labeling method, labeling device, electronic equipment, server and storage medium

Info

Publication number: CN113806573A
Application number: CN202111082491.5A
Authority: CN
Inventors: 罗泽丰; 何聪辉
Original assignee: Shanghai Sensetime Technology Development Co Ltd
Current assignee: Shanghai Sensetime Technology Development Co Ltd
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2021-12-17

Abstract

The embodiment of the application provides a labeling method, a labeling device, electronic equipment, a server and a storage medium, wherein the labeling method comprises the following steps: acquiring position information of an object to be marked in an image; uploading the position information to a server, wherein the server is used for inputting the image and the position information into a neural network model to obtain a binary image containing an object to be labeled and converting the binary image into labeling point information; and receiving the annotation point information sent by the server, rendering according to the annotation point information, and displaying in the image. The embodiment of the application can improve the labeling efficiency.

Description

Labeling method, labeling device, electronic equipment, server and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a labeling method and apparatus, an electronic device, a server, and a storage medium.

Background

The image semantic segmentation is an important branch in the field of artificial intelligence, is an important ring about image understanding in the machine vision technology, and can not separate a large number of artificial labels. In the current manual labeling, there is a scene to label each part of the current scene. For example: in a landscape image, sky, lawn, people, trees, various small animals and the like may exist at the same time, more objects to be marked may exist in the image, and in the manual marking process, each object to be marked in the image needs to be cut by means of a software tool. When the number of objects to be labeled is large, the labeling efficiency is low.

Disclosure of Invention

The embodiment of the application provides a labeling method, a labeling device, electronic equipment, a server and a storage medium. The labeling efficiency can be improved.

A first aspect of an embodiment of the present application provides an annotation method, where the method is applied to an electronic device, and the method includes:

acquiring position information of an object to be marked in an image;

uploading the position information to a server, wherein the server is used for inputting the image and the position information into a neural network model to obtain a binary image containing the object to be labeled and converting the binary image into labeling point information;

and receiving the annotation point information sent by the server, rendering according to the annotation point information, and displaying in the image.

Optionally, the obtaining of the position information of the object to be labeled in the image includes:

responding to a graphic frame selection instruction input by a user, and selecting an image area containing an object to be annotated from the image;

in response to a positioning instruction input by a user, selecting a positioning point for the object to be marked from the image area;

and generating the position information of the object to be marked according to the coordinate information of the image area and the coordinate information of the positioning point.

Optionally, the method further includes:

and uploading the identification information of the image to a server, wherein the server is further used for acquiring the image from an image library according to the identification information before inputting the image and the position information into a neural network model.

Optionally, if the image area is a rectangular area, the coordinate information of the image area includes coordinate information of four vertices of the image area.

Optionally, after rendering according to the annotation point information and displaying in the image, the method further includes:

and responding to an adjusting instruction input by a user, and finely adjusting the marking point information to obtain the adjusted marking point information for the object to be marked.

A second aspect of the embodiments of the present application provides an annotation method, where the method is applied to a server, and the method includes:

receiving identification information and position information of an object to be marked, which are sent by electronic equipment;

acquiring an image corresponding to the identification information, and inputting the position information and the image into a neural network model to obtain a binary image containing the object to be labeled;

converting the binary image into annotation point information;

and sending the information of the mark point to the electronic equipment.

Optionally, after obtaining the binary image including the object to be labeled, the method further includes:

judging whether the binary image only contains one object to be marked;

and if the binary image only contains one object to be labeled, executing the step of converting the binary image into labeling point information.

Optionally, if the binary image contains at least two objects to be labeled, sending a prompt message to the electronic device;

the prompting message is used for prompting the user to reselect an image area in the image.

Optionally, the determining whether the binary image contains only one object to be labeled includes:

determining the number of black connected domains in the binary image;

if the number of the black connected domains is one, the binary image only contains one object to be marked;

and if the number of the black connected domains is at least two, the binary image comprises at least two objects to be labeled.

A third aspect of an embodiment of the present application provides a labeling apparatus, where the labeling apparatus is applied to an electronic device, and the labeling apparatus includes:

the first acquisition unit is used for acquiring the position information of an object to be marked in the image;

the uploading unit is used for uploading the position information to a server, and the server is used for inputting the image and the position information into a neural network model to obtain a binary image containing the object to be labeled and converting the binary image into labeling point information;

a first receiving unit, configured to receive the annotation point information sent by the server;

and the display unit is used for rendering according to the annotation point information and then displaying in the image.

A fourth aspect of the embodiments of the present application provides a labeling apparatus, where the labeling apparatus is applied to a server, the labeling apparatus includes:

the second receiving unit is used for receiving the identification information sent by the electronic equipment and the position information of the object to be marked;

a second obtaining unit, configured to obtain an image corresponding to the identification information, and input the position information and the image into a neural network model to obtain a binary image including the object to be labeled;

the conversion unit is used for converting the binary image into annotation point information;

and the sending unit is used for sending the marking point information to the electronic equipment.

A fifth aspect of embodiments of the present application provides an electronic device, including a processor and a memory, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the step instructions in the first aspect of embodiments of the present application.

A sixth aspect of embodiments of the present application provides a server comprising a processor and a memory, the memory being configured to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the step instructions as in the first aspect of embodiments of the present application.

A seventh aspect of embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and the computer program is used to make a computer execute some or all of the steps described in the first aspect of embodiments of the present application.

An eighth aspect of embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and the computer program is used to make a computer execute some or all of the steps described in the second aspect of embodiments of the present application.

A ninth aspect of an embodiment of the present application provides a computer program product, wherein the computer program product includes a computer program, and when the computer program is executed by a computer, the computer executes some or all of the steps described in the first aspect of the embodiment of the present application. The computer program product may be a software installation package.

A tenth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a computer program, and the computer program, when executed by a computer, causes the computer to perform some or all of the steps as described in the second aspect of embodiments of the present application. The computer program product may be a software installation package.

In the embodiment of the application, the electronic equipment acquires the position information of an object to be marked in an image; uploading the position information to a server, wherein the server is used for inputting the image and the position information into a neural network model to obtain a binary image containing an object to be labeled and converting the binary image into labeling point information; and receiving the annotation point information sent by the server, rendering according to the annotation point information, and displaying in the image. In the embodiment of the application, compare with artifical mark, only need to wait to mark the positional information of object and upload to the server, the server can obtain the mark point information to this object of waiting to mark, need not the manual mark of user, has improved marking efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a diagram of a communication connection architecture of an electronic device and a server according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an annotation method provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of an image coordinate system provided by an embodiment of the present application;

FIG. 4 is a schematic flow chart of another annotation method provided in the embodiments of the present application;

FIG. 5a is a schematic diagram illustrating an image region being selected according to a frame selection instruction according to an embodiment of the present application;

FIG. 5b is a diagram illustrating another example of selecting an image area according to a frame selection command according to an embodiment of the present application;

FIG. 5c is a schematic diagram illustrating a method for selecting a location point in an image according to a location instruction according to an embodiment of the present application;

FIG. 5d is a schematic diagram illustrating fine tuning according to the annotation point information according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of another annotation method provided in the embodiments of the present application;

FIG. 7 is a schematic structural diagram of a labeling apparatus according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of another marking device provided in the embodiments of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The electronic device according to the embodiments of the present application may include a device with computing capability and communication capability, such as a mobile phone, a tablet, a personal computer, and the like. Personal computers, which may also be referred to as user computers, may include desktop computers, notebook computers, and the like. For convenience of description, the above-mentioned devices are collectively referred to as electronic devices.

The server according to the embodiment of the present application may be a server having an image processing function.

Referring to fig. 1, fig. 1 is a diagram illustrating a communication connection architecture between an electronic device and a server according to an embodiment of the present disclosure. As shown in fig. 1, an electronic device 101 may communicate with a server 102. There may be a plurality of electronic devices. The electronic device 101 may obtain position information of an object to be annotated in the image; uploading the position information to a server 102, inputting the image and the position information into a neural network model by the server 102 to obtain a binary image containing an object to be labeled, and converting the binary image into labeling point information; the server 102 sends the annotation point information to the electronic device 101, and the electronic device 101 renders the annotation point information and displays the annotation point information in an image.

The electronic device 101 may be understood as a front-end and the server 102 may be understood as a back-end, and a browser, such as a web browser, may be installed on the electronic device 101. Server 102 may be a web server. The architecture of the electronic device 101 and the server 102 may be a browser-server (B/S) architecture, in which a web browser is the most important application software of a client. The mode unifies the client, centralizes the core part of the system function realization to the server, and simplifies the development, maintenance and use of the system. The client only needs to install a browser, such as Netscape Navigator or Internet Explorer, and the Server can install databases such as SQL Server, Oracle, MYSQL and the like. And the browser performs data interaction with the database through the web server. The B/S framework has the advantages that the client communicates with the server through the browser, the client does not need to be maintained, and the expansion is very easy.

In the embodiment of the application, compare with artifical mark, only need to wait to mark the positional information of object and upload to the server, the server can obtain the mark point information to this object of waiting to mark, need not the manual mark of user, has improved marking efficiency.

Referring to fig. 2, fig. 2 is a schematic flow chart of a labeling method according to an embodiment of the present application. As shown in fig. 2, the labeling method may include the following steps.

The electronic equipment acquires the position information of the object to be marked in the image 201.

In the embodiment of the application, the object to be labeled is an object which is not labeled in the image.

In the current manual labeling, there is a scene that semantic labeling is performed on each part of the current scene. For example: in a landscape image, sky, lawn, people, trees, various small animals and the like may exist at the same time, more objects to be labeled may exist in the image, and in the manual labeling process, each object to be labeled in the image needs to be labeled by means of a software tool. For example, people, trees, and small animals in the drawing need to be labeled, if the number of people, trees, and small animals in the drawing is large, the number of objects to be labeled is large, and the efficiency of labeling by a labeling person through a software tool is low. Among them, the software tool may be canvas 2D. canvas2D is a software tool that implements 2D graphics rendering, and canvas2D may be done by script (usually JavaScript). canvas2D may provide a < canvas > tag that is selected to render graphics through script. Different < canvas > tags may draw different shapes of graphics, a < canvas > tag being a container of graphics, and in the case of a selected < canvas > tag, the graphics of the selected shape are drawn by script. For example, canvas2D may be used to draw a path, box, circle, etc. graphic.

The position information of the object to be labeled may be coordinate information of an edge of the object to be labeled. Each pixel point in the image may correspond to a coordinate. For example, an image coordinate system may be established, and a rectangular coordinate system x-y in pixels may be established with the upper left corner of the image as the origin. The abscissa x and the ordinate y of a pixel are the number of pixel columns and the number of pixel rows in the image of the pixel, respectively. Referring to fig. 3, fig. 3 is a schematic diagram of an image coordinate system according to an embodiment of the present disclosure. As shown in fig. 3, the white area in fig. 3 is an image, the upper left corner of the image is an origin O, and its coordinates are (0, 0), the abscissa is x, and the ordinate is y. The coordinates of a pixel point P in the image are: (x1, y 1).

The shape of the object to be labeled is irregular, the object to be labeled can be selected by using the polygon selection box in the embodiment of the application, and the position information of the object to be labeled can include the coordinates of each vertex of the polygon selection box. For example, if the polygon selection box is a rectangular selection box, the position information of the object to be labeled may include coordinates of four vertices of the rectangular selection box. If the polygon selection box is a triangle selection box, the position information of the object to be labeled may include the coordinates of the three vertices of the triangle selection box.

Optionally, in step 201, the electronic device obtains the position information of the object to be annotated in the image, and may include the following steps:

(11) the electronic equipment responds to a graphic frame selection instruction input by a user, and selects an image area containing an object to be annotated from the image;

(12) the electronic equipment responds to a positioning instruction input by a user, and selects a positioning point for the object to be marked from the image area;

(13) and the electronic equipment generates the position information of the object to be marked according to the coordinate information of the image area and the coordinate information of the positioning point.

In this embodiment, the graphic frame selection instruction may be a polygonal graphic frame selection instruction, such as a rectangular frame selection instruction, a triangular frame selection instruction, a pentagonal frame selection instruction, a hexagonal frame selection instruction, and the like.

The user inputs a graphic selection frame instruction in the image by using a mouse, selects an image area from the image by using the graphic selection frame, and can adjust the size of the graphic selection frame by using the mouse so as to adjust the size of the image area, so that the image area contains an object to be marked.

A user can click a point on an object to be marked in the image area through a mouse, and the point is used as a positioning point of the object to be marked. The positioning point is a point which is selected by the user from the image area and can indicate the center position of the object to be marked, the positioning point can be located at any point of the object to be marked theoretically, and in order to better position the object to be marked, the user can select a point which is considered to be closest to the center of the object to be marked as the positioning point through a mouse. The number of the positioning points can be one or more, the more the number is, the more accurate the positioning of the object to be marked is, and the more accurate the subsequent generated binary image is.

In the embodiment of the application, the position information of the object to be labeled can include the coordinate information of the image area and the coordinate information of the positioning point, the preparation is made for the subsequent input of the neural network model, and the binary image generated by the neural network model can be calibrated through the coordinate information of the positioning point, so that the positioning point is ensured to fall into the area formed by the foreground in the binary image, and the accuracy of the generated binary image is further improved.

Optionally, the coordinate information of the image area includes coordinate information of four vertices of the image area.

In the embodiment of the present application, the image area is a quadrilateral area as an example. Specifically, the quadrangular region may be a rectangular region.

And 202, the electronic equipment uploads the position information to a server, and the server is used for inputting the image and the position information into the neural network model to obtain a binary image containing the object to be labeled and converting the binary image into labeling point information.

In the embodiment of the present application, the neural network model is a neural network model for generating a binary image according to the image and the position information, and the neural network model may set a threshold value selected for generating the binary image according to different image settings.

Binary Image (Binary Image) refers to the state of only two possible values or gray levels for each pixel in an Image, and people often represent Binary images in black and white. The binary image is that in the image, the gray levels are only two, and the gray value of any pixel point in the image is 0 or 255, which respectively represents black and white. The gray values of all the pixels in the image with the gray values larger than a certain threshold value can be set to be 255, and the gray values of all the pixels with the gray values smaller than the certain threshold value can be set to be 0. Because the gray values of the same kind of objects in the image are usually concentrated in a certain interval, and the gray value of the background is in another interval, the binary image can better distinguish the background and the object to be labeled (foreground) in the image.

The electronic equipment converts the binary image into marking point information, and particularly, edge parts of the foreground in the binary image can be connected by using marking points, so that the marking point information of the object to be marked is obtained. The annotation point information may include a plurality of annotation point composition point set information, for example, the annotation point information may include coordinate information of a plurality of annotation points.

In the embodiment of the application, the server can acquire the image from the electronic device and also can acquire the image from the image library.

Optionally, the following steps may also be performed in the embodiment of the present application:

(21) the electronic equipment uploads the identification information of the image to a server, and the server is further used for acquiring the image from an image library according to the identification information before inputting the image and the position information into a neural network model.

Step (21) may be performed simultaneously with step 202, and step (21) may be performed before step 202 or may be performed after step 202.

When the step (21) can be executed simultaneously with the step 202, the electronic device can upload the position information of the object to be annotated and the identification information of the image to the server. The position information of the object to be marked and the identification information of the image can be put in the same message, and the server can know that the position information of the object to be marked is on the image.

According to the image uploading method and device, the image does not need to be directly uploaded to the server, the server directly obtains the image from the image library according to the image identification, the process of uploading the image is saved, and the labeling efficiency can be improved.

And 203, the electronic equipment receives the annotation point information sent by the server, renders the annotation point information and displays the annotation point information in the image.

In the embodiment of the application, after the electronic device receives the annotation point information sent by the server, the electronic device can find the corresponding annotation points in the image according to the coordinates of the plurality of annotation points included in the annotation point information, and render the annotation points and display the annotation points in the image.

In the embodiment of the application, the marking point information is sent by the server, manual marking by a user is not needed on the electronic equipment side, and compared with manual marking, the server can obtain the marking point information of the object to be marked only by uploading the position information of the object to be marked to the server, the user does not need to manually mark, and the marking efficiency is improved.

Optionally, in step 202, after obtaining the binary image including the object to be labeled, the server may further determine whether the binary image includes only one object to be labeled, and convert the binary image into labeling point information under the condition that the binary image includes only one object to be labeled; and under the condition that the binary image only contains at least two objects to be labeled, the server sends a prompt message to the electronic equipment.

Optionally, after performing step 202, the method shown in fig. 2 may further include the following steps:

the electronic device receives the prompt message sent by the server and continues to execute step 201.

Referring to fig. 4, fig. 4 is a schematic flow chart of another labeling method according to an embodiment of the present application. Fig. 4 is further optimized based on fig. 2, and as shown in fig. 4, the labeling method may include the following steps.

401, the electronic device obtains position information of an object to be annotated in the image.

And 402, the electronic equipment uploads the position information to a server, and the server is used for inputting the image and the position information into the neural network model to obtain a binary image containing the object to be labeled and converting the binary image into labeling point information.

And 403, the electronic equipment receives the annotation point information sent by the server, renders the annotation point information and displays the annotation point information in the image.

The specific implementation of steps 401 to 403 may refer to steps 201 to 203 shown in fig. 2, which are not described herein again.

And 404, the electronic equipment responds to the adjustment instruction input by the user to finely adjust the marking point information to obtain the adjusted marking point information for the object to be marked.

In the embodiment of the application, the adjustment instruction input by the user may be a dragging instruction of the user to the annotation point in the software tool. The information of the marking point sent by the server is not necessarily accurate, and an error may exist, and at this time, the user can finely adjust the information of the marking point by means of a software tool, so that the adjusted information of the marking point can be more accurate. Because the adjustment of the user is adjusted on the existing marking point information, compared with the pure manual marking, the marking is not required to be started from the beginning, and the marking efficiency is higher.

The following describes a specific flow of the labeling method of the present application with reference to fig. 5a to 5 d. Referring to fig. 5a, fig. 5a is a schematic diagram illustrating an image region selected according to a frame selection instruction according to an embodiment of the present disclosure. Fig. 5b is a schematic diagram of another image region selection according to a graphic frame selection instruction provided in this embodiment of the present application.

As can be seen from fig. 5a, the user can select a starting point of the rectangular frame in the image through the rectangular frame by using the mouse, and the starting point is located at the upper left corner of the object to be marked (such as the notebook computer shown in fig. 5 a) in the image. As can be seen from fig. 5b, the user can select an end point of the rectangular frame in the image through the rectangular frame by using the mouse, the end point is located at the lower right corner of an object to be annotated in the image (such as a notebook computer shown in fig. 5 b), as can be seen from fig. 5b, a rectangular image area (such as a rectangular dashed frame in fig. 5 b) is determined by taking the start point and the end point as diagonal lines, and as can be seen, the object to be annotated is located in the image area. The position information of the object to be annotated may comprise four vertex coordinates of the image region.

Referring to fig. 5c, fig. 5c is a schematic diagram of selecting a positioning point in an image according to a positioning instruction according to an embodiment of the present application. As shown in fig. 5c, the user may click on a positioning point of an object to be labeled (e.g., the notebook computer shown in fig. 5 c) through a mouse, where the positioning point may be a center point of the notebook computer visually observed by the user or a center point of the selected rectangular frame. The user clicks the target object, so that the coordinates of the locating point are obtained. The electronic equipment sends the position information including the object to be marked to the server, and the position information includes coordinate values of four vertexes of the rectangular frame and coordinates of the positioning point. The server transmits the image and the position information of the object to be labeled into a neural network model, the model can output a binary image, and the binary image can be converted into coordinates of a labeling point through a Python-CV2 library and returned to the electronic equipment for rendering.

Fig. 5d is a schematic diagram of fine tuning according to the annotation point information according to the embodiment of the present application. As shown in fig. 5d, the white circles in fig. 5d are rendered marking points, the marking points are located at the edge of the notebook computer, a closed polygonal frame is formed by connecting a plurality of marking points, the object in the closed polygonal frame is the object to be marked (such as the notebook computer shown in fig. 5 d), and the marker can finely adjust the position of the marking point, so that the information of the adjusted marking point can be more accurate.

Referring to fig. 6, fig. 6 is a schematic flow chart of another labeling method according to an embodiment of the present application. As shown in fig. 6, the method may be applied to a server, and the method shown in fig. 6 may include the following steps.

601, the server receives the identification information and the position information of the object to be marked sent by the electronic device.

In the embodiment of the present application, the identification information may be identification information of an image, and may be a number of the image. The images may be stored in an image library, where the number of each image in the image library is different.

The position information of the object to be labeled may include vertex coordinates of an image region including the object to be labeled and coordinates of a positioning point of the object to be labeled.

And 602, the server acquires an image corresponding to the identification information, and inputs the position information and the image into the neural network model to obtain a binary image containing the object to be labeled.

In the embodiment of the application, the server can acquire the image corresponding to the identification information from the image library, and the electronic device does not need to transmit the image to the server, so that the annotation efficiency can be improved.

The neural network model can also be called as a binary image neural network model, and the purpose of the neural network model is to perform binarization processing on an area in an image, which is located within vertex coordinates of an image area, so as to obtain a binary image of the image area.

The neural network model can be a trained model, the neural network model can be trained through supervised training, specifically, an original image and a real label can be input, and model parameters are optimized according to the matching degree of the model output detection label and the real label. The neural network model can comprise a convolutional layer, an inverse convolutional layer, a pooling layer (comprising a maximum pooling layer and a mean pooling layer) and a Dropout layer, the maximum pooling layer can be used for highlighting edge features of an original image, the mean pooling layer is used for reserving position features in the image, the Dropout layer adds noise to prevent overfitting, and the convolutional layer and the inverse convolutional layer can be in a symmetrical structure to facilitate more natural learning of a training set label.

603, the server converts the binary image into annotation point information.

In the embodiment of the application, the information of the annotation point may be coordinates of the annotation point in the image, and the server may convert the binary image into the coordinates of the annotation point through a Python-CV2 library.

604, the server sends the annotation point information to the electronic device.

In the embodiment of the application, the annotation point information is obtained by the server, and the server can obtain the binary image containing the object to be annotated only by inputting the image corresponding to the identification information and the position information of the object to be annotated into the neural network model, and then convert the binary image into the annotation point information. Need not the manual artifical mark of user at the electronic equipment side, compare with artifical mark, only need to wait to mark the positional information of object and upload to the server, the server can obtain the mark point information to this object of waiting to mark, need not the manual mark of user, has improved marking efficiency.

Optionally, after performing step 602,

(31) the server judges whether the binary image only contains one object to be marked;

(32) if the binary image contains only one object to be labeled, go to step 603.

In the embodiment of the application, the server can determine whether the binary image only contains one object to be labeled by determining the number of black connected domains in the binary image. The black area in the binary image is the foreground area, and the white area is the background area. If the number of the black connected domains is one, the binary image only contains one object to be marked; and if the number of the black connected domains is at least two, the binary image comprises at least two objects to be labeled.

The method and the device for processing the image area comprise determining whether the binary image contains the object to be annotated or not by judging whether the binary image contains the object to be annotated or not, and if the binary image contains only one object to be annotated, indicating that the image area selected by the graphic frame selection instruction on the electronic equipment side only contains one object to be annotated. If the image area selected by the graphic frame selection instruction on the electronic equipment side contains at least two objects to be annotated, the image area selected by the graphic frame selection instruction input by the user may be too large. For example, the dashed box in fig. 5b not only includes a notebook computer, but also may include other objects with similar gray values (e.g., plastic bottles in fig. 5 b), which results in an error in the generated annotation point information.

Generally speaking, the gray value of the same object is in the same interval, if the gray values of two objects are close, the two objects cannot be distinguished through a binary method, and at this time, the binary image may include two black connected domains, where the two black connected domains represent the two objects respectively. Whether the binary image contains an object to be labeled is judged according to the number of the black connected domains, whether the binary image contains other objects can be accurately judged, and therefore errors of labeling point information generated according to the binary image are avoided.

Optionally, after the step (31) is executed, the following steps may also be executed:

(33) and if the binary image contains at least two objects to be labeled, the server sends a prompt message to the electronic equipment, wherein the prompt message is used for prompting a user to reselect an image area in the image.

In the embodiment of the application, if the binary image contains at least two objects to be annotated, the server sends a prompt message to the electronic device, wherein the prompt message is used for prompting a user to reselect an image area in the image and prompting the user to reselect the image area.

(34) and if the binary image contains an object to be labeled and at least one labeled object, the server sends a prompt message to the electronic equipment, wherein the prompt message is used for prompting a user to reselect an image area in the image.

In the embodiment of the application, for an already-labeled object in an image, a labeled point information server of the already-labeled object is known, and only whether the labeled point coordinate of the already-labeled object falls into the binary image is needed to be judged, and if so, the binary image is indicated to contain the already-labeled object.

According to the embodiment of the application, after the binary image is obtained, whether the binary image only contains one object to be labeled is judged, the binary image is converted into the labeling point information only when the binary image only contains one object to be labeled, and therefore the labeling point information generated according to the binary image is prevented from being wrong under the condition that the binary image contains at least two objects to be labeled, and therefore the labeling accuracy is improved.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the server and the electronic device may be divided according to the above method examples, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a labeling apparatus 700 provided in an embodiment of the present application, where the labeling apparatus 700 is applied to an electronic device, and the labeling apparatus 700 may include a first obtaining unit 701, an uploading unit 702, a first receiving unit 703 and a display unit 704, where:

a first obtaining unit 701, configured to obtain position information of an object to be labeled in an image;

an uploading unit 702, configured to upload the location information to a server, where the server is configured to input the image and the location information into a neural network model, obtain a binary image including the object to be labeled, and convert the binary image into labeling point information;

a first receiving unit 703, configured to receive the annotation point information sent by the server;

and a display unit 704, configured to render according to the annotation point information and display in the image.

Optionally, the acquiring the position information of the object to be labeled in the image by the first acquiring unit 701 includes: responding to a graphic frame selection instruction input by a user, and selecting an image area containing an object to be annotated from the image; in response to a positioning instruction input by a user, selecting a positioning point for the object to be marked from the image area; and generating the position information of the object to be marked according to the coordinate information of the image area and the coordinate information of the positioning point.

Optionally, the uploading unit 702 is further configured to upload the identification information of the image to a server, and the server is further configured to obtain the image from an image library according to the identification information before inputting the image and the location information into a neural network model.

The annotation device 700 can also include a fine tuning unit 705;

a fine-tuning unit 705, configured to, after the display unit 704 renders the annotation point information and displays the rendered annotation point information in the image, perform fine tuning on the annotation point information in response to an adjustment instruction input by a user, so as to obtain adjusted annotation point information for the object to be annotated.

In this embodiment, the first obtaining unit 701 may be an input/output device (e.g., a mouse or a display screen) of an electronic device, the uploading unit 702 and the first receiving unit 703 may be communication modules of the electronic device, and the fine-tuning unit 705 may be a processor of the electronic device. The display unit 704 may be a display device of the electronic apparatus.

Referring to fig. 8, fig. 8 is a schematic structural diagram of another annotation apparatus provided in this embodiment of the application, where the annotation apparatus 800 is applied to a server, and the annotation apparatus 800 may include a second receiving unit 801, a second obtaining unit 802, a converting unit 803, and a sending unit 804, where:

a second receiving unit 801, configured to receive the identification information and the position information of the object to be labeled, which are sent by the electronic device;

a second obtaining unit 802, configured to obtain an image corresponding to the identification information, and input the position information and the image into a neural network model to obtain a binary image including the object to be labeled;

a converting unit 803, configured to convert the binary image into annotation point information;

a sending unit 804, configured to send the annotation point information to the electronic device.

Optionally, the annotating device 800 can also determine a unit 805;

the determining unit 805 is further configured to determine whether the binary image only includes one object to be labeled after the second obtaining unit 802 obtains the binary image including the object to be labeled;

a converting unit 805, configured to convert the binary image into annotation point information when the determining unit 805 determines that the binary image only includes one object to be annotated.

The sending unit 804 is further configured to send a prompt message to the electronic device when the determining unit 805 determines that the binary image includes at least two objects to be labeled; the prompting message is used for prompting the user to reselect an image area in the image.

Optionally, the sending unit 804 is further configured to send a prompt message to the electronic device when the determining unit 805 determines that the binary image includes an object to be labeled and at least one labeled object, where the prompt message is used to prompt a user to reselect an image area in the image.

Optionally, the determining unit 805 determines whether the binary image contains only one object to be labeled, including: determining the number of black connected domains in the binary image; under the condition that the number of the black connected domains is one, judging that the binary image only contains one object to be marked; and under the condition that the number of the black connected domains is at least two, judging that the binary image contains at least two objects to be marked.

In this embodiment of the present application, the second receiving unit 801, the second obtaining unit 802, and the sending unit 804 may be communication modules of a server, and the converting unit 803 and the determining unit 805 may be processors of the server.

In the embodiment of the application, the server only needs to input the image corresponding to the identification information and the position information of the object to be labeled into the neural network model, so that a binary image containing the object to be labeled can be obtained, and then the binary image is converted into the labeling point information. Need not the manual artifical mark of user at the electronic equipment side, compare with artifical mark, only need to wait to mark the positional information of object and upload to the server, the server can obtain the mark point information to this object of waiting to mark, need not the manual mark of user, has improved marking efficiency.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, as shown in fig. 9, the electronic device 900 includes a processor 901 and a memory 902, and the processor 901 and the memory 902 may be connected to each other through a communication bus 903. The communication bus 903 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 903 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus. The memory 902 is used for storing a computer program comprising program instructions, and the processor 901 is configured for calling the program instructions, said program comprising instructions for executing the method of fig. 2, 4.

The processor 901 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the above schemes.

The Memory 902 may be, but is not limited to, a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

The electronic device 900 may also include a display 904, and the display 904 may include a display.

The electronic device 900 may further include a communication module 905, and the communication module 905 may include an input and output device such as a radio frequency circuit and an antenna.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a server according to an embodiment of the present disclosure, as shown in fig. 10, the server 1000 includes a processor 1001 and a memory 1002, and the processor 1001 and the memory 1002 may be connected to each other through a communication bus 1003. The communication bus 1003 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 1003 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus. The memory 1002 is used for storing a computer program comprising program instructions, and the processor 1001 is configured for invoking the program instructions, said program comprising instructions for performing the method of fig. 6.

The processor 1001 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the above schemes.

The Memory 1002 may be a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

The server 1000 may further include a communication module 1004, and the communication module 1004 may include an input and output device such as a radio frequency circuit, an antenna, and the like.

Embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the annotation methods as described in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, read-only memory, random access memory, magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An annotation method, applied to an electronic device, the method comprising:

acquiring position information of an object to be marked in an image;

2. The method according to claim 1, wherein the acquiring the position information of the object to be labeled in the image comprises:

3. The method of claim 2, further comprising:

4. The method of claim 2, wherein the coordinate information of the image area comprises coordinate information of four vertices of the image area if the image area is a rectangular area.

5. The method according to any one of claims 1 to 4, wherein after the rendering according to the annotation point information and the displaying in the image, the method further comprises:

6. An annotation method, which is applied to a server, and comprises:

converting the binary image into annotation point information;

and sending the information of the mark point to the electronic equipment.

7. The method according to claim 6, wherein after obtaining the binary image containing the object to be labeled, the method further comprises:

judging whether the binary image only contains one object to be marked;

8. The method of claim 7, further comprising:

if the binary image contains at least two objects to be labeled, sending a prompt message to the electronic equipment;

9. A labeling apparatus, applied to an electronic device, the labeling apparatus comprising:

10. A labeling apparatus, wherein the labeling apparatus is applied to a server, the labeling apparatus comprises:

11. An electronic device comprising a processor and a memory, the memory for storing a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1 to 5.

12. A server comprising a processor and a memory, the memory for storing a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 6 to 8.

13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 8.