CN111353536B

CN111353536B - Image labeling method and device, readable medium and electronic equipment

Info

Publication number: CN111353536B
Application number: CN202010131364.9A
Authority: CN
Inventors: 郭冠军
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2023-05-30
Anticipated expiration: 2040-02-28
Also published as: CN111353536A

Abstract

The disclosure relates to an image labeling method, an image labeling device, a readable medium and an electronic device, and relates to the technical field of image processing, wherein the method comprises the following steps: acquiring a first image acquired at the current acquisition moment, determining a first color class of the first image, determining a first scene label according to the first color class, a second color class of the second image and the second image, wherein the second image is the image acquired at the previous acquisition moment, and labeling the first image according to the first scene label. According to the method and the device, the scene label marked on the later image is determined according to the color types of the two images which are continuous in time sequence and the former image in the two images, so that the stability and the accuracy of image marking can be improved.

Description

Image labeling method and device, readable medium and electronic equipment

Technical Field

The disclosure relates to the technical field of image processing, and in particular relates to an image labeling method, an image labeling device, a readable medium and electronic equipment.

Background

With the continuous development of terminal technology and image processing technology, the image processing operations available on terminal devices are becoming more and more abundant. For example, the terminal device may recognize different scenes (e.g., indoor, scenery, characters, mountains, lakes, beach, etc.) included in the picture and perform corresponding operations according to the different scenes. In general, for scene recognition of a video (i.e., pictures of consecutive frames), a machine learning method is used to recognize a scene in each frame of picture, so that the jump of the recognition result is relatively large, and the scene contained in the video cannot be accurately and stably reflected.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a method for labeling an image, the method comprising:

acquiring a first image acquired at the current acquisition moment;

determining a first color class of the first image;

determining a first scene tag according to the first color category, a second color category of a second image and the second image, wherein the second image is an image acquired at the last acquisition time;

and labeling the first image according to the first scene label.

In a second aspect, the present disclosure provides an image annotation device, the device comprising:

the acquisition module is used for acquiring a first image acquired at the current acquisition moment;

a first determining module configured to determine a first color class of the first image;

the second determining module is used for determining a first scene tag according to the first color category, a second color category of a second image and the second image, wherein the second image is an image acquired at the last acquisition time;

And the labeling module is used for labeling the first image according to the first scene label.

In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which when executed by a processing device performs the steps of the method of the first aspect of the present disclosure.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing said computer program in said storage means to carry out the steps of the method of the first aspect of the disclosure.

Through the technical scheme, the method comprises the steps of firstly acquiring a first image acquired at the current acquisition time, then determining a first color class of the first image, then determining a first scene label according to a second image acquired at the last acquisition time, a second color class of the second image and the first color class, and finally labeling the first image according to the first scene label. According to the method and the device, the scene label marked on the later image is determined according to the color types of the two images which are continuous in time sequence and the former image in the two images, so that the stability and the accuracy of image marking can be improved.

Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

In the drawings:

FIG. 1 is a flow chart illustrating a method of labeling an image according to an exemplary embodiment;

FIG. 2 is a flowchart illustrating another method of labeling an image, according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating another method of labeling an image, according to an exemplary embodiment;

FIG. 4 is a flowchart illustrating another method of labeling an image, according to an exemplary embodiment;

FIG. 5 is a flowchart illustrating another method of labeling an image, according to an exemplary embodiment;

FIG. 6 is a flowchart illustrating another method of labeling an image, according to an exemplary embodiment;

FIG. 7 is a block diagram illustrating an image annotation device according to an exemplary embodiment;

FIG. 8 is a block diagram of another image annotation device shown according to an exemplary embodiment;

FIG. 9 is a block diagram of an annotation device for another image, shown according to an exemplary embodiment;

FIG. 10 is a block diagram of another image annotation device shown according to an exemplary embodiment;

fig. 11 is a schematic diagram of an electronic device according to an exemplary embodiment.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

FIG. 1 is a flow chart illustrating a method of labeling an image, as shown in FIG. 1, according to an exemplary embodiment, the method comprising:

step 101, acquiring a first image acquired at the current acquisition time.

For example, the terminal device may continuously acquire a plurality of images according to a preset acquisition period (for example, 40 ms), where the plurality of images may be, for example, each frame of image in a video shot by the user through the terminal device, or each frame of image in a video selected by the user on the terminal device (for example, a video played on a display interface of the terminal device). The image acquired at the current acquisition time is the first image, the image acquired at the last acquisition time is the second image, the time interval between the last acquisition time and the current acquisition time is the acquisition period, and the second image can be understood as the last frame image of the first image in the video.

Step 102, determining a first color class of a first image.

For example, after the first image is acquired, a first color class of the first image may be identified according to a preset color identification algorithm. The first color class may be understood as a main color in the first image, and the first color class may be one color or may be multiple colors, which is not specifically limited in this disclosure. The preset color recognition algorithm may be, for example, inputting the first image into a pre-trained color class recognition model, where the color class recognition model outputs a degree of matching between the first image and each color in the color set, and then uses a preset number (e.g., 3) of colors with the highest degree of matching as the first color class. The preset color recognition algorithm may also determine the number of pixels belonging to each color in the color set in the first image according to the color coordinates of each pixel in the first image, and use the preset number of colors with the highest number of pixels as the first color class. The preset color recognition algorithm may further determine the semantic color of each pixel according to the color coordinates of each pixel in the first image and combining the mapping relation between the predetermined color coordinates and the semantic colors, then determine the number of pixels belonging to different semantic colors in the first image, and finally use the preset number of semantic colors with the highest number of pixels as the first color class. Accordingly, when the step 102 is performed on the second image at the last acquisition time, the second color class of the second image identified by the color identification algorithm may be the same.

Step 103, determining a first scene tag according to the first color category, the second color category of the second image and the second image, wherein the second image is an image acquired at the last acquisition time.

And 104, labeling the first image according to the first scene label.

For example, the degree of difference between the first image and the second image may be determined according to the first color class and the second color class, and then the first scene label for labeling the first image may be determined according to the second image, where the first scene label may be one scene label or multiple scene labels. The scene tag may include, for example: indoor, scenery, figures, cats, dogs, automobiles, mountains, lakes, beach sand etc. personalized scene tags can also be added according to different needs of users.

When the first color class and the second color class are the same, the difference degree between the first image and the second image is small, and then the first scene label may be the same as the scene label marked by the second image (i.e., the second scene label later), that is, the scene label marked by the second image may be the label marked when the step 104 is executed at the last acquisition time. When the first color category and the second color category are different, indicating that there is a difference between the first image and the second image, then the second image may be combined to determine the first scene tag. Specifically, the degree of matching of the first image with each scene tag and the degree of matching of the second image with each scene tag may be weighted and summed to determine the first scene tag.

The matching degree between the first image and each scene tag may be obtained by a pre-trained image classification model before step 103, or may be obtained by an image classification model after determining that the first color class and the second color class are different. Correspondingly, the matching degree of the second image and each scene tag can be obtained through an image classification model after the second image is acquired at the last acquisition time, or can be determined according to the matching degree of the image acquired at the last acquisition time and each scene tag. It can be understood that when the terminal device starts to collect images, the initial image collected at the first collecting moment is firstly input into the image classification model to obtain the matching degree of the initial image output by the image classification model and each scene label, so as to label the initial image. If the color type of the image collected at each subsequent collection time is the same as that of the initial image, and the scene in the image is unchanged, the image collected at each collection time can be marked by directly using the scene label marked by the initial image. If the color class of the image acquired at a later acquisition time is different from that of the initial image, the matching degree of the image and each scene tag can be determined by combining the matching degree of the initial image and each scene tag.

Because the process of determining the first scene tag combines the matching degree of the second image and each scene tag, the first image is marked according to the first scene tag, and the continuity of the first image and the scene tag marked by the second image can be maintained, so that the stability and the accuracy of image marking are improved.

For example, the second color class of the second image is: the scene labels marked by the second image are characters and lakes. If the first color class of the first image is the same as the second color class, then the first image may also be labeled with scene tags of people and lakes. If the first color category is different from the second color category, the matching degree of the first image and the scene tags such as indoor, landscape, character, cat, dog, automobile, mountain, lake, beach and the like can be determined first, then the weighted summation is carried out on the matching degree of the second image determined at the previous acquisition time and the scene tags such as indoor, landscape, character, cat, dog, automobile, mountain, lake, beach and the like, finally the preset number (for example, 2) of scene tags with the highest matching degree after the weighted summation are used as the first scene tags, and the first image is marked.

In summary, the disclosure first acquires a first image acquired at a current acquisition time, then determines a first color class of the first image, determines a first scene tag according to a second image acquired at a previous acquisition time, a second color class of the second image and the first color class, and finally marks the first image according to the first scene tag. According to the method and the device, the scene label marked on the later image is determined according to the color types of the two images which are continuous in time sequence and the former image in the two images, so that the stability and the accuracy of image marking can be improved.

FIG. 2 is a flowchart illustrating another method of labeling an image, as shown in FIG. 2, in accordance with an exemplary embodiment, step 103 may be implemented by:

in step 1031, if the first color class is the same as the second color class, the second scene tag marked by the second image is used as the first scene tag.

In step 1032, if the first color class is different from the second color class, the first image is input into the pre-trained image classification model to obtain the matching degree between the first image output by the image classification model and each of the plurality of scene tags.

Step 1033, determining the first scene tag according to the matching degree of the first image and each scene tag, and the matching degree of the second image and each scene tag.

For example, when the first color class and the second color class are the same, the difference degree between the first image and the second image is small, and then the second scene tag may be directly used as the first scene tag, i.e. the second scene tag may be directly used to label the first image. The second scene tag is a tag marked on the second image when the step 104 is executed at the last acquisition time. When the first color category and the second color category are different, indicating that there is a difference between the first image and the second image, then the second image may be combined to determine the first scene tag. Specifically, the first image may be input into a pre-trained image classification model, where the image classification model may output a degree of matching between the first image and each of the plurality of scene tags. Correspondingly, the matching degree of the second image and each scene tag is obtained in the same way at the last acquisition time. And finally, combining the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag to determine the first scene tag, so that the first scene tag and the second scene tag have continuity.

The image classification model may include, for example, an input layer, a convolution layer, a feedback layer, a full connection layer, and an output layer. The implementation of step 1032 may be: the first image (or the second image) is first input to the input layer, and the convolutional layer features are extracted from the first image by the convolutional layer. And extracting the current feedback layer characteristics from the convolution layer by combining the last feedback layer characteristics and the last feedback layer characteristics through the feedback layer, abstracting the feedback layer characteristics through the full-connection layer to generate the matching degree of the first image and each scene label, and outputting a plurality of matching degrees through the output layer. The image classification model may be, for example, a convolutional neural network (English: convolutional Neural Networks, abbreviation: CNN). Convolutional neural networks are just one example of a neural network of an embodiment of the present disclosure, the present disclosure is not limited thereto, and various other neural networks may be included.

FIG. 3 is a flow chart illustrating another method of labeling an image, according to an exemplary embodiment, as shown in FIG. 3, a particular implementation of step 1033 may include the steps of:

and step A, carrying out weighted summation on the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag to obtain the comprehensive matching degree of the first image and each scene tag.

And B, determining the first scene tag according to the comprehensive matching degree of the first image and each scene tag.

The implementation manner of the step B may be:

first, the sequence of each scene tag is determined according to the comprehensive matching degree of the first image and each scene tag.

Then, a preset number of scene tags are selected as first scene tags according to the sequence.

In a specific application scene, the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag can be weighted and summed to obtain the comprehensive matching degree of the first image and each scene tag, and the comprehensive matching degree can be understood as a result of smoothing the matching degree of the first image and each scene tag output by the image classification model, so that the first image and the second image can be comprehensively considered. And then, determining the first scene tag according to the comprehensive matching degree of the first image and each scene tag. For example, the plurality of integrated matching degrees may be first ranked, and then a preset number (for example, 4) of scene tags having the highest integrated matching degree may be determined as the first scene tag from the plurality of scene tags according to the ranking. For example, the plurality of comprehensive matching degrees may be arranged in descending order, then the first 4 scene tags may be selected as the first scene tag, or the plurality of comprehensive matching degrees may be arranged in ascending order, then the last 4 scene tags may be selected as the first scene tag.

FIG. 4 is a flowchart illustrating another method for labeling an image, according to an exemplary embodiment, as shown in FIG. 4, the implementing step of step 1033 may further include:

and C, determining a first weight and a second weight according to the difference degree of the first color category and the second color category, wherein the first weight is positively correlated with the difference degree, and the second weight is negatively correlated with the difference degree.

Further, before executing the step a, a first weight corresponding to the matching degree of the first image and each scene tag and a second weight corresponding to the matching degree of the second image and each scene tag may be determined according to the difference degrees of the first color category and the second color category. The degree of difference between the first color class and the second color class can reflect the difference between the first image and the second image, and can be understood as different color classes in the first color class and the second color class, for example: the first color class is red, white, yellow, and the second color class is red, blue, green, then the degree of difference is 2. The first color class is red, white, black, and the second color class is red, white, gray, then the degree of difference is 1. Wherein the first weight is positively correlated with the degree of variance and the second weight is negatively correlated with the degree of variance. It can be understood that the larger the difference between the first color class and the second color class, the larger the influence of the matching degree of the first image and each scene tag on the comprehensive matching degree, and the smaller the influence of the matching degree of the second image and each scene tag on the comprehensive matching degree. The smaller the difference between the first color class and the second color class, the smaller the influence of the matching degree of the first image and each scene tag on the comprehensive matching degree, and the larger the influence of the matching degree of the second image and each scene tag on the comprehensive matching degree.

Accordingly, the implementation manner of the step a may be:

first, the matching degree of the first image and the scene tag is multiplied by a first weight to obtain a first product, and the matching degree of the second image and the scene tag is multiplied by a second weight to obtain a second product.

And then taking the sum of the first product and the second product as the comprehensive matching degree of the first image and the scene tag.

For example, the overall match may be determined by the following formula:

OUT _i ＝aP _i +(1-a)Q _i

wherein OUT is _i For the comprehensive matching degree of the first image and the ith scene label, P _i For the matching degree of the first image and the ith scene label, Q _i A is a first weight, and (1-a) is a second weight, for the matching degree label of the second image and the ith scene label. It will be appreciated that when a is 0 and (1-a) is 1 when the first and second color categories are the same, then the overall match is Q _i， I.e. the scene marked with the first image directly using the second scene tag. When the first color class and the second color class are completely different (i.e., not the same color), a is 1, (1-a) is 0, then the overall matching degree is P _i， That is, the first image and the second image are completely different, and then the matching degree of the first image and each scene tag output according to the image classification model is used as the comprehensive matching degree, and when the first color category and the second color category have the same color and also have different colors, the range of the first weight and the second weight is between 0 and 1.

FIG. 5 is a flowchart illustrating another method of labeling an image, as shown in FIG. 5, after step 101, the method further comprising;

step 105, inputting the first image into a pre-trained image classification model to obtain the matching degree of the first image output by the image classification model and each scene tag in the plurality of scene tags.

Accordingly, step 103 includes:

in step 1034, if the first color class is the same as the second color class, the second scene tag marked by the second image is used as the first scene tag.

Step 1035, if the first color class is different from the second color class, determining the first scene tag according to the matching degree of the first image and each scene tag, and the matching degree of the second image and each scene tag.

For example, the matching degree of the first image with each scene tag may also be obtained through a pre-trained image classification model after step 101, that is, whether the first color class and the second color class are the same or not, the first image is input into the pre-trained image classification model to obtain the matching degree of the first image with each scene tag in the plurality of scene tags. And then judging whether the first color category is the same as the second color category. The implementation process of step 1034 and step 1035 is the same as the implementation process of step 1032 and step 1033, and will not be repeated here.

FIG. 6 is a flowchart illustrating another method of labeling an image, according to an exemplary embodiment, as shown in FIG. 6, the implementation of step 102 may include:

in step 1021, the color coordinates of each pixel point of the first image in the preset color space are obtained.

Step 1022, determining the semantic color of each pixel according to the mapping relation between the preset color space and the semantic color and the color coordinates of each pixel.

Step 1023, determining a first color class of the first image according to the semantic color of each pixel point.

For example, the color coordinates of each pixel point in the first image may be first determined according to a preset color space, where the preset color space may be, for example: RGB (english: red Green Blue, chinese: red-Green-Blue) color space, LUV color space, LAB color space, and the like. Taking RGB color space as an example, the color coordinates of each pixel point may be in the form of (x, y, z), and the range of x, y, z is 0-255.

And searching the semantic color corresponding to the color coordinate of each pixel point according to the mapping relation between the preset color space and the semantic color, and taking the semantic color corresponding to the color coordinate of each pixel point as the semantic color of the pixel point. The mapping relationship may be stored in the terminal device in advance, or may be stored in the server, and when the terminal device needs to use the mapping relationship, the mapping relationship is acquired from the server. The mapping relationship can be understood as a corresponding relationship between each color coordinate and multiple semantic colors in a preset color space, that is, each color coordinate corresponds to one semantic color. For example, the mapping relationship may be in the form of a table, where each row of the table includes a color coordinate, and a semantic color corresponding to the color coordinate. Color coordinates are a set of values with no explicit meaning. Semantic colors may be understood as real colors in the real world, i.e. colors containing definite semantics, which may for example comprise: red, orange, yellow, green, blue, violet, black, white, gray, etc.

After determining the semantic color of each pixel, a first color class of the first image may be determined according to the number of distributions of semantic colors contained in the first image. The number of pixels with multiple semantic colors in the first image can be determined first, then the number of pixels with different semantic colors is ordered, a preset number (for example, 3) of semantic colors with the largest number of pixels are selected as a first color class, the semantic colors with the number of pixels larger than a preset threshold value can be used as the first color class, the ratio of the number of pixels with different semantic colors to the total number of pixels of the first image can be determined, and the semantic colors with the ratio larger than the preset threshold value can be used as the first color class. For example, the first image includes 1000 pixels, the numbers of pixels with multiple semantic colors are arranged in a descending order, wherein the semantic colors of 420 pixels are white, the semantic colors of 310 pixels are blue, the semantic colors of 205 pixels are green, and the remaining 65 pixels are other colors, so that the white, blue and green can be used as the first color category.

Specifically, the mapping relationship may be obtained as follows:

and D, acquiring a sample image set, wherein the sample image set comprises a plurality of sample images.

And E, determining the number of each color coordinate marked as each semantic color in the preset color space according to the color coordinate of each pixel point of each sample image in the preset color space and the semantic color marked by each pixel point of each sample image.

And F, determining a mapping relation according to the number of each semantic color marked by each color coordinate.

In a specific application scenario, a large number of sample images can be collected first as a sample image set in a mapping relation establishing manner. The sample image may be, for example, a photograph taken in the real world, and in order to further improve the accuracy of the mapping relationship, the sample image set may include photographs of each of a plurality of scenes under different lighting conditions. And then, acquiring the color coordinates of each pixel point in each sample image in a preset color space and the semantic colors marked by each pixel point in each sample image, so as to determine the number of each semantic color marked by each color coordinate in the preset color space. The color coordinates of each pixel point in a preset color space can be automatically identified through a terminal device, the semantic color of each pixel point can be determined by manually marking and then recording the result of manual marking, and the semantic color corresponding to a real-world scene contained in a sample image can be recorded in advance when the sample image is acquired, so that the semantic color of each pixel point marking is determined. Finally, the number of each semantic color marked by each color coordinate can be counted, and the semantic color with the largest number is determined as the semantic color corresponding to the color coordinate, so that the mapping relation is determined.

For example, the color coordinates are (128, 128, 50) in the RGB color space, the number of colors marked as gray is 1500 times, the number of colors marked as blue is 820 times, and the number of colors marked as black is 170 times, and gray can be regarded as the corresponding semantic color in the RGB color space (128, 128, 50).

FIG. 7 is a block diagram of an image annotation device, shown in FIG. 7, according to an exemplary embodiment, the device 200 comprising:

the acquiring module 201 is configured to acquire a first image acquired at a current acquisition time.

A first determining module 202 is configured to determine a first color class of the first image.

The second determining module 203 is configured to determine the first scene tag according to the first color category, the second color category of the second image, and the second image, where the second image is an image acquired at the last acquisition time.

The labeling module 204 is configured to label the first image according to the first scene tag.

Fig. 8 is a block diagram of another image labeling apparatus, according to an exemplary embodiment, and as shown in fig. 8, the second determining module 203 may include:

the first determining submodule 2031 is configured to use the second scene tag marked by the second image as the first scene tag if the first color class is the same as the second color class.

The second determining submodule 2032 is configured to input the first image into the pre-trained image classification model if the first color class is different from the second color class, so as to obtain a matching degree between the first image output by the image classification model and each of the plurality of scene tags.

The second determining submodule 2032 is further configured to determine the first scene tag according to the matching degree of the first image and each scene tag, and the matching degree of the second image and each scene tag.

Optionally, the second determination submodule 2032 may be used for performing the following steps:

Step A, carrying out weighted summation on the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag to obtain the comprehensive matching degree of the first image and each scene tag;

Optionally, the second determination submodule 2032 may also be used for performing the following steps:

Accordingly, step a may include:

Alternatively, step B may include:

FIG. 9 is a block diagram of another image annotation device, shown in accordance with an exemplary embodiment, and as shown in FIG. 9, the device 200 further comprises:

the input module 205 is configured to input the first image to the pre-trained image classification model after the first image acquired at the current acquisition time is acquired, so as to acquire a matching degree between the first image output by the image classification model and each of the plurality of scene tags.

Accordingly, the second determining module 203 may include:

and the third determining sub-module is used for taking the second scene label marked by the second image as the first scene label if the first color class is the same as the second color class.

And the fourth determining submodule is used for determining the first scene tag according to the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag if the first color category is different from the second color category.

FIG. 10 is a block diagram of another image annotation device, shown in accordance with an exemplary embodiment, as shown in FIG. 10, the first determination module 202 may include:

the coordinate acquiring submodule 2021 is configured to acquire color coordinates of each pixel point of the first image in a preset color space.

The color determination submodule 2022 is configured to determine a semantic color of each pixel according to a mapping relationship between a preset color space and the semantic color and a color coordinate of each pixel.

The color determination submodule 2022 is further configured to determine a first color class of the first image according to the semantic color of each pixel point.

Optionally, the mapping relationship is obtained by:

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Referring now to fig. 11, a schematic diagram of an electronic device 300 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 11 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 11, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 11 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via a communication device 309, or installed from a storage device 308, or installed from a ROM 302. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some embodiments, the terminal devices, servers, may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a first image acquired at the current acquisition moment; determining a first color class of the first image; determining a first scene tag according to the first color category, a second color category of a second image and the second image, wherein the second image is an image acquired at the last acquisition time; and labeling the first image according to the first scene label.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of a module is not limited to the module itself in some cases, and for example, the first determining module may be described as "a module that determines the first color class".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, example 1 provides a method for labeling an image, including: acquiring a first image acquired at the current acquisition moment; determining a first color class of the first image; determining a first scene tag according to the first color category, a second color category of a second image and the second image, wherein the second image is an image acquired at the last acquisition time; and labeling the first image according to the first scene label.

According to one or more embodiments of the present disclosure, example 2 provides the method of example 1, the determining a first scene tag from the first color category, a second color category of a second image, and the second image, comprising: if the first color class is the same as the second color class, taking a second scene tag marked by the second image as the first scene tag; if the first color class is different from the second color class, inputting the first image into a pre-trained image classification model to obtain the matching degree of the first image output by the image classification model and each of a plurality of scene labels; and determining the first scene tag according to the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag.

According to one or more embodiments of the present disclosure, example 3 provides the method of example 2, the determining the first scene tag according to the degree of matching of the first image with each of the scene tags and the degree of matching of the second image with each of the scene tags, comprising: the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag are weighted and summed to obtain the comprehensive matching degree of the first image and each scene tag; and determining the first scene tag according to the comprehensive matching degree of the first image and each scene tag.

According to one or more embodiments of the present disclosure, example 4 provides the method of example 3, the determining the first scene tag according to the degree of matching of the first image with each of the scene tags and the degree of matching of the second image with each of the scene tags, further comprising: determining a first weight and a second weight according to the difference degree of the first color category and the second color category, wherein the first weight is positively correlated with the difference degree, and the second weight is negatively correlated with the difference degree; the step of carrying out weighted summation on the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag to obtain the comprehensive matching degree of the first image and each scene tag comprises the following steps: multiplying the matching degree of the first image and the scene tag by the first weight to obtain a first product, and multiplying the matching degree of the second image and the scene tag by the second weight to obtain a second product; and taking the sum of the first product and the second product as the comprehensive matching degree of the first image and the scene tag.

According to one or more embodiments of the present disclosure, example 5 provides the method of example 3, the determining the first scene tag according to a comprehensive degree of matching of the first image with each of the scene tags, including: determining the sequence of each scene tag according to the comprehensive matching degree of the first image and each scene tag; and selecting a preset number of scene tags as the first scene tags according to the sequence.

According to one or more embodiments of the present disclosure, example 6 provides the method of example 1, after the acquiring the first image acquired at the current acquisition time, the method further comprising: inputting the first image into a pre-trained image classification model to obtain the matching degree of the first image output by the image classification model and each scene tag in a plurality of scene tags; the determining a first scene tag according to the first color category, a second color category of a second image and the second image comprises: if the first color class is the same as the second color class, taking a second scene tag marked by the second image as the first scene tag; and if the first color class is different from the second color class, determining the first scene label according to the matching degree of the first image and each scene label and the matching degree of the second image and each scene label.

In accordance with one or more embodiments of the present disclosure, example 7 provides the method of example 1, the determining the first color class of the first image comprising: acquiring color coordinates of each pixel point of the first image in a preset color space; determining the semantic color of each pixel point according to the mapping relation between a preset color space and the semantic color and the color coordinate of each pixel point; and determining the first color category of the first image according to the semantic color of each pixel point.

In accordance with one or more embodiments of the present disclosure, example 8 provides the method of example 7, the mapping relationship is obtained by: acquiring a sample image set, wherein the sample image set comprises a plurality of sample images; determining the number of each color coordinate marked as each semantic color in the preset color space according to the color coordinate of each pixel point of each sample image in the preset color space and the semantic color marked by each pixel point of each sample image; the mapping relationship is determined according to the number of each semantic color marked by each color coordinate.

Example 9 provides an image annotation apparatus according to one or more embodiments of the present disclosure, comprising: the acquisition module is used for acquiring a first image acquired at the current acquisition moment; a first determining module configured to determine a first color class of the first image; the second determining module is used for determining a first scene tag according to the first color category, a second color category of a second image and the second image, wherein the second image is an image acquired at the last acquisition time; and the labeling module is used for labeling the first image according to the first scene label.

According to one or more embodiments of the present disclosure, example 10 provides a computer-readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the methods described in examples 1 to 8.

Example 11 provides an electronic device according to one or more embodiments of the present disclosure, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to realize the steps of the method described in examples 1 to 8.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims. The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Claims

1. A method for labeling an image, the method comprising:

acquiring a first image acquired at the current acquisition moment;

determining a first color class of the first image;

labeling the first image according to the first scene tag;

the determining a first scene tag according to the first color category, a second color category of a second image and the second image comprises: if the first color class is the same as the second color class, taking a second scene tag marked by the second image as the first scene tag; if the first color class is different from the second color class, inputting the first image into a pre-trained image classification model to obtain the matching degree of the first image output by the image classification model and each of a plurality of scene labels; and determining the first scene tag according to the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag.

2. The method of claim 1, wherein said determining the first scene tag based on the degree of matching of the first image to each of the scene tags and the degree of matching of the second image to each of the scene tags comprises:

the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag are weighted and summed to obtain the comprehensive matching degree of the first image and each scene tag;

and determining the first scene tag according to the comprehensive matching degree of the first image and each scene tag.

3. The method of claim 2, wherein the determining the first scene tag based on the degree of matching of the first image to each of the scene tags and the degree of matching of the second image to each of the scene tags further comprises:

determining a first weight and a second weight according to the difference degree of the first color category and the second color category, wherein the first weight is positively correlated with the difference degree, and the second weight is negatively correlated with the difference degree;

the step of carrying out weighted summation on the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag to obtain the comprehensive matching degree of the first image and each scene tag comprises the following steps:

Multiplying the matching degree of the first image and the scene tag by the first weight to obtain a first product, and multiplying the matching degree of the second image and the scene tag by the second weight to obtain a second product;

and taking the sum of the first product and the second product as the comprehensive matching degree of the first image and the scene tag.

4. The method of claim 2, wherein the determining the first scene tag based on the combined match of the first image with each of the scene tags comprises:

determining the sequence of each scene tag according to the comprehensive matching degree of the first image and each scene tag;

and selecting a preset number of scene tags as the first scene tags according to the sequence.

5. The method of claim 1, wherein the determining the first color class of the first image comprises:

acquiring color coordinates of each pixel point of the first image in a preset color space;

determining the semantic color of each pixel point according to the mapping relation between a preset color space and the semantic color and the color coordinate of each pixel point;

And determining the first color category of the first image according to the semantic color of each pixel point.

6. The method of claim 5, wherein the mapping is obtained by:

acquiring a sample image set, wherein the sample image set comprises a plurality of sample images;

determining the number of each color coordinate marked as each semantic color in the preset color space according to the color coordinate of each pixel point of each sample image in the preset color space and the semantic color marked by each pixel point of each sample image;

the mapping relationship is determined according to the number of each semantic color marked by each color coordinate.

7. An apparatus for labeling an image, the apparatus comprising:

The labeling module is used for labeling the first image according to the first scene label;

the second determining module includes:

a first determining sub-module, configured to use, if the first color class is the same as the second color class, a second scene tag marked by the second image as the first scene tag;

a second determining sub-module, configured to input the first image into a pre-trained image classification model if the first color class is different from the second color class, so as to obtain a matching degree between the first image output by the image classification model and each of the plurality of scene tags;

the second determining submodule is further used for determining the first scene tag according to the matching degree of the first image and each scene tag and the matching degree of the second image and each scene tag.

8. A computer readable medium on which a computer program is stored, characterized in that the program, when being executed by a processing device, carries out the steps of the method according to any one of claims 1-6.

9. An electronic device, comprising:

a storage device having a computer program stored thereon;

Processing means for executing said computer program in said storage means to carry out the steps of the method according to any one of claims 1-6.