CN106469437B

CN106469437B - Image processing method and image processing apparatus

Info

Publication number: CN106469437B
Application number: CN201510508531.6A
Authority: CN
Inventors: 蒋树强; 宋新航; 贺志强
Original assignee: Lenovo Beijing Ltd; Institute of Computing Technology of CAS
Current assignee: Lenovo Beijing Ltd; Institute of Computing Technology of CAS
Priority date: 2015-08-18
Filing date: 2015-08-18
Publication date: 2020-08-25
Anticipated expiration: 2035-08-18
Also published as: CN106469437A

Abstract

An image processing method and an image processing apparatus are provided. The image processing method comprises the following steps: averagely dividing an image to be processed into a plurality of image blocks; obtaining semantic description information of the image block; determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block. In the technical scheme of the embodiment of the disclosure, the semantic description information of the image can be determined by organically associating the related image blocks in the whole image, so that the understanding habit of the image is better met.

Description

Image processing method and image processing apparatus

Technical Field

The present invention relates to the field of information technology, and more particularly, to an image processing method and an image processing apparatus.

Background

Image Understating (IU) is a semantic understanding of an image. The image understanding is to take an image as an object and knowledge as a core, and study objects in the image, the mutual relation among the objects, the scene of the image and the application of the scene.

The semantic description information is used as a basic description carrier of knowledge information, can convert complete image content into a text-like language expression which can be intuitively understood, and plays an important role in image understanding. The abundant semantic description information in the images can provide a more accurate image search engine, and generate intelligent digital image albums and visual scene descriptions in the virtual world.

As a way of generating semantic description information of an image, it is common to divide an image based on visual features of the image to obtain a plurality of image regions, the divided image regions being independent of each other, and then analyze semantic description information of each image region and obtain semantic description information of the entire image based on the semantic description information of each image region.

Disclosure of Invention

The embodiment of the disclosure provides an image processing method and an image processing device, which provide a new image processing mode for determining semantic description information of an image, and the image processing mode organically links related image blocks and better conforms to the understanding habit of the image.

In a first aspect, an image processing method is provided. The image processing method may include: averagely dividing an image to be processed into a plurality of image blocks; obtaining semantic description information of the image block; determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block.

With reference to the first aspect, in an implementation manner of the first aspect, the obtaining semantic description information of the image block may include: obtaining a Gaussian mixture model of image content corresponding to each semantic description information; and determining semantic description information of the image block according to the image block and the Gaussian mixture model.

With reference to the first aspect and the foregoing implementation manner of the first aspect, in another implementation manner of the first aspect, the determining semantic description information of the image block according to the image block and the gaussian mixture model may include: determining probability estimation of the image blocks belonging to the contents of the images according to the similarity of the image blocks and the Gaussian mixture model; and determining semantic description information of the image block according to the probability estimation that the image block belongs to each image content.

With reference to the first aspect and the foregoing implementation manner of the first aspect, in another implementation manner of the first aspect, the determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block may include: determining the weight of the image block according to the semantic description information of the image block and the spatial position of the image block in the image; and determining the semantic description information of the image according to the semantic description information of the image block and the weight of the image block.

With reference to the first aspect and the foregoing implementation manner of the first aspect, in another implementation manner of the first aspect, the determining the weight of the image block according to the semantic description information of the image block and the spatial position of the image block in the image may include: calculating the similarity between the image block and the adjacent image block according to the semantic description information of the image block; setting the weight of the image block based on the similarity between the image block and its neighboring image blocks.

With reference to the first aspect and the foregoing implementation manner of the first aspect, in another implementation manner of the first aspect, the determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block may include: determining adjacent image blocks with similar semantic description information as content aggregation areas based on the spatial positions of the image blocks in the image and the semantic description information of the image blocks, and using the similar semantic description information as the semantic description information of the content aggregation areas; determining the weight of the content gathering area according to the semantic description information of the content gathering area and the spatial position of the content gathering area in the image; and determining semantic description information of the image according to the weight of the content aggregation area and the semantic description information of the content aggregation area.

In a second aspect, an image processing apparatus is provided. The image processing apparatus may include: a memory; and a processor for performing the following operations: averagely dividing an image to be processed into a plurality of image blocks; obtaining semantic description information of the image block; determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block.

In a third aspect, an image processing apparatus is provided. The image processing apparatus may include a dividing unit 910, an image block semantic determining unit 920, and an image semantic determining unit 930. The dividing unit 910 equally divides an image to be processed into a plurality of image blocks. The image block semantic determining unit 920 obtains semantic description information of the image block. The image semantic determination unit 930 determines semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block

In the technical solution of the image processing method and the image processing apparatus according to the embodiment of the present disclosure, the semantic description information of the whole image is determined based on the spatial position of each image block in the whole image and the semantic description information of each image block, so that the relevant image blocks in the whole image are organically linked, and the understanding habit of the image is better met.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flow chart schematically illustrating an image processing method according to an embodiment of the present disclosure.

Fig. 2 schematically illustrates an example of an image divided into a plurality of image blocks.

Fig. 3 is a flow chart schematically illustrating obtaining semantic description information for each image block in the image processing method of fig. 1.

Fig. 4 schematically illustrates semantic description information of the obtained image block.

Fig. 5 is a flowchart illustrating a first example of determining semantic description information of an image in the image processing method of fig. 1.

Fig. 6 schematically illustrates semantic description information of the determined image.

Fig. 7 is a flowchart illustrating a second example of determining semantic description information of an image in the image processing method of fig. 1.

Fig. 8 is a block diagram schematically illustrating a first image processing apparatus according to an embodiment of the present disclosure.

Fig. 9 is a block diagram schematically illustrating a second image processing apparatus according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure.

For an image, the machine typically cannot understand its content, making image searching difficult. In the embodiment of the disclosure, the image to be processed is processed to convert the image into the semantic description information which can be intuitively understood, and a more accurate image search engine can be provided based on the semantic description information, so that the visual scene description in an intelligent digital image album and a virtual world is generated. In addition, based on semantic description information of the image, image labeling, image recognition and the like can be performed. The image to be processed may be an image searched from a network, may be an image captured by a camera or the like. The manner in which the image to be processed is obtained does not constitute a limitation on the embodiments of the present disclosure.

Fig. 1 is a flow diagram schematically illustrating an image processing method 100 according to an embodiment of the present disclosure. As shown in fig. 1, the image processing method 100 includes: averagely dividing an image to be processed into a plurality of image blocks (S110); obtaining semantic description information of the image block (S120); determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block (S130).

In S110, an image to be processed is averagely divided into a plurality of image blocks. The image to be processed is usually represented by the numerical value of the individual pixels of the image. For each pixel, it can be represented by a gray value, three primary color components, and the like. An image typically comprises a matrix of pixels arranged in rows and columns. As an example of the division into a plurality of image blocks, the division may be sequentially performed in such a manner that each image block includes 16 × 16 pixels.

Fig. 2 schematically illustrates an example of an image to be processed divided into a plurality of image blocks. The image to be processed is a photograph of a seaside landscape. The upper part of the seaside photo is a light blue sky, the middle part of the seaside photo is a blue sea, and the lower part of the seaside photo is a gold beach. Assuming that the seaside scene photo has 128 × 112 pixels, if each image block B (1,8) includes 16 × 16 pixels, the seaside scene photo may be divided into a total of 56 (i.e., 8 × 7) image blocks of 128/16-8 image blocks per row and 112/16-7 image blocks per column in the row and column directions in units of 16 × 16 pixels from the end of the pixel matrix of the image. As shown by the grid lines in fig. 2. Let it be assumed that each image block is denoted by B (i, j), where i is the number of rows of image block B in the entire image and j is the number of columns of image block B in the entire image.

The description in fig. 2 is given with the total number of pixels of the image to be processed just being equally divided as an example. In the case where the total number of pixels of the image to be processed is not an integral multiple of the image block, it is also divided in the row and column directions in units of 16 × 16 pixels from the end of the pixel matrix, and the final image area smaller than 16 × 16 pixels is also divided into image blocks.

In the embodiment of the present disclosure, the image to be processed is averagely divided into a plurality of image blocks, and the division manner is simple. However, in other image processing methods for generating semantic description information of an image, it may be necessary to divide the image based on visual features of the image to obtain a plurality of image regions, which is very complicated.

In S120, for each of the plurality of image blocks divided in S110, semantic description information of the image block is obtained. The meaning of data is the semantics, and the semantic description information is the information used to describe the semantics. The image data itself is a symbol, and only data to which a meaning is assigned can be used, and at this time, the data is converted into semantic description information. The semantic description information is, for example, a text-like language expression that can be intuitively understood by a machine. Taking fig. 2 as an example, the semantic description information of each image block of the seaside landscape photo may be a probability distribution that each image block is at least one of sky, sea, and beach.

Fig. 3 is a flowchart schematically illustrating obtaining semantic description information S120 for each image block in the image processing method of fig. 1. As shown in fig. 3, obtaining semantic description information of each image block of the plurality of image blocks includes: obtaining a Gaussian mixture model of the image content corresponding to each semantic description information (S121); determining probability estimates that the image blocks belong to the respective image contents according to the similarity of the image blocks and the Gaussian mixture model (S122); and determining semantic description information of the image block according to the probability estimation that the image block belongs to the content of each image (S123).

In S121, training images of respective image contents corresponding to respective semantic description information may be acquired from a database in advance, and a gaussian mixture model of each image content may be acquired based on the training image of the image content. For example, training images of respective image contents including, for example, sky, sea, beach, grassland, big tree, mountain, and the like are stored in advance in the database. The image content is content that is involved in semantic description information of various images.

Each image content may have a different image, for example, the sky is often different in color and brightness under different weather conditions. Thus, there may be multiple training images, for example 512, corresponding to each image content. For each image content, a gaussian model may be established based on the plurality of training images to characterize the image content. The gaussian model accurately quantizes image content using a normal distribution curve, and decomposes gray scale, color, and the like in the image content into a plurality of models formed based on the normal distribution curve. And carrying out weighted average on the Gaussian model corresponding to each image content to obtain a Gaussian mixture model of the image content. The manner in which the gaussian mixture model is obtained does not constitute a limitation on the embodiments of the present disclosure. Generally, a training image for each image content and a gaussian mixture model for each image content are obtained in advance, and a database is built.

In S122, a probability estimate that each image block belongs to the respective image content is determined according to a similarity of the each image block and the gaussian mixture model of the respective image content. Taking the image block at the upper right corner in fig. 2 as an example, the image data of 16 × 16 pixels of the image block B (1,8) is respectively matched with the gaussian mixture models of the respective image contents obtained in S121. For example, the image data of the image block B (1,8) is respectively matched with a gaussian mixture model of sky, a gaussian mixture model of sea, a gaussian mixture model of beach, a gaussian mixture model of grass, a gaussian mixture model of big tree, and a gaussian mixture model of mountain to determine probability estimates that the image block B (1,8) belongs to sky, sea, beach, grass, big tree, and mountain, respectively. As can be seen from fig. 2, if the image block in the upper right corner is the sky, it may be determined that the probability that the image block B (1,8) belongs to the sky is the largest, and the probability of belonging to any one of the ocean, the beach, the grassland, the big tree, and the mountain may be very small or even zero through the operation of S122. Here, it is assumed that the image blocks B (1,8) have probabilities of being sky, sea, beach, grassland, big tree, and mountain of 90%, 6%, 4%, 0, and 0, respectively.

And S123, determining semantic description information of the image block according to the probability estimation that the image block belongs to each image content. For example, the probability that each image block belongs to the respective image content may be represented in a histogram, and the probability that each image block belongs to the respective image content may also be represented in a scale map. Depending on the specific requirements of the semantic description information, the probability estimates may be suitably processed to generate the required semantic description information.

For each image block in the image to be processed, the similarity of the image block and the Gaussian mixture model of each image content is judged to determine the probability estimation of each image block belonging to each image content, and the semantic description information of each image block is determined based on the probability estimation, so that the semantic description information of each image block in the image to be processed is obtained.

In S122 and S123 in fig. 3, semantic description information of the image block is determined according to the image block and the gaussian mixture model, which is merely an example. Based on the gaussian mixture model, semantic description information for each image block may also be determined in other ways.

Fig. 4 schematically illustrates semantic description information of the obtained image block. As shown in fig. 4, the operation processing described in conjunction with fig. 3 is performed for the image block in the upper right corner among the respective image blocks shown in fig. 2, and the semantic description information of the image block in the upper right corner expressed in the histogram is obtained. As can be seen from the illustration of fig. 4, the probability that the image block B (1,8) belongs to the sky is much greater than the probability that it belongs to the sea or the beach, so that it can basically be determined that the image block B (1,8) belongs to the sky. For convenience, only the semantic description information of one image block is shown in fig. 4.

Actually, through the process of S120, the semantic description information of each image block in fig. 2 is obtained. That is, semantic description information corresponding one-to-one to each image block is obtained. Continuing with the example of fig. 4, in fig. 4, the view of the sky, sea, and beach is primarily shown. Among the image blocks showing the sky, semantic description information having a greater probability of belonging to the sky may be obtained, such as the image blocks of row 1 and row 2 of fig. 4; among the image blocks showing the sea, semantic description information with a greater probability of belonging to the sea can be obtained, for example, the image blocks of the 3 rd and 4 th rows of fig. 4; among the image blocks showing a beach, semantic description information having a greater probability of belonging to the beach, for example, the image blocks of the 6 th and 7 th rows of fig. 4, can be obtained.

In S130, semantic description information of the image is determined based on the spatial position of the image block in the image and the semantic description information of the image block. As an example, the weights of the image blocks may be determined according to semantic description information of the image blocks and spatial positions of the image blocks in the image; and determining the semantic description information of the image according to the semantic description information of the image block and the weight of the image block. The importance of each image block can be set distinguishably by setting the weight for the image block according to the spatial position of the image block in the image, so that the semantic description information of the whole image can be more accurately expressed. In other words, the image blocks divided in S110 are not independent from each other, but semantic description information of the image is determined based on the spatial positions of the image blocks in the image.

Fig. 5 is a flowchart illustrating a first example of determining semantic description information (S130) of an image in the image processing method of fig. 1. As shown in fig. 5, calculating the similarity between the image block and its neighboring image block according to the semantic description information of the image block (S131); setting weights of the image blocks based on similarities between the image blocks and their neighboring image blocks (S132); determining semantic description information of the image according to the semantic description information of the image block and the weight of the image block (S133).

In S131, the closer the semantic description information between two adjacent image blocks is, the greater the similarity between the two adjacent image blocks is. Taking the individual image blocks in fig. 4 as an example, it is assumed that the individual image blocks relate to only three image contents, namely sky, sea and beach.

For the image block B (1,4) in fig. 4, the adjacent image blocks are the image block B (1,3), the image block B (1,5) and the image block B (2,4), the image contents of the image block B (1,4) and the adjacent image block are dominated by sky, and the contents of sea and beach occupy a very small part, so that the similarity between the image block B (1,4) and the adjacent image block is large. In calculating the similarity, the respective image contents of the image blocks and the proportions occupied by the respective image contents are considered. As an example, the similarity between two image blocks may be calculated by calculating a euclidean distance between semantic description information of the two image blocks. In this embodiment, the four-neighborhood domain shown in fig. 4 is used as the neighboring image block, and eight-neighborhood domains or even more neighborhood domains may also be used as the neighboring image block, which is not limited herein.

For the image block B (2,4) in fig. 4, the image blocks adjacent thereto are the image block B (2,3), the image block B (2,5), the image block B (1,4), and the image block B (3, 4). In the image blocks B (2,4), B (2,3), B (2,5) and B (1,4), the image contents are mainly sky, and the contents of ocean and beach occupy a very small part. However, in the image block B (3,4), the image content is dominated by the ocean, and the contents of the sky and the beach occupy a very small portion. Therefore, the image block B (2,4) has a high similarity to the adjacent image blocks B (2,3), B (2,5), B (1,4), but has a low similarity to the adjacent image block B (3,4), which reduces the similarity of the image block B (2,4) to its adjacent image block. Therefore, the similarity between the image block B (2,4) and its neighboring image block is lower than the similarity between the image block B (1,4) and its neighboring image block.

In S132, weights of the image blocks are set based on similarities between the image blocks and their neighboring image blocks. For example, in the case where the similarity between an image block and its neighboring image block is high, a high weight is set for the image block; and in the case that the similarity between the image block and the adjacent image block is low, setting low weight for the image block. For example, for the above-described image blocks B (1,4) and B (2,4), the weight W (1,4) of the image block B (1,4) is greater than the weight W (2,4) of the image block B (2, 4). And so on, to obtain the weight W (i, j) of each image block B (i, j) in fig. 4.

In the image shown in fig. 4, the sky is dominant in the first and second rows of the image, the ocean is dominant in the third and fourth rows, the ocean is dominant in the first four columns in the fifth row, the sand is dominant in the last four columns, and the sand is dominant in the sixth and seventh rows. When the similarity between the image content of each image block and the content of the image blocks around the image block is higher, the weight of the image block is higher, and the semantics of the image in the area can be more accurately represented.

In S131 and S132, the weights of the image blocks are determined according to the semantic description information of the image blocks and the spatial positions of the image blocks in the image, which is merely an example, and the weights of the respective image blocks may also be determined in other manners, for example, according to the pixel information of the respective image blocks.

In S133, semantic description information of the image is determined according to the semantic description information of the image block and the weight of the image block. The entire image may include multiple image content, each located at a different position in the image. Accordingly, the semantic description information of the image is the distribution of the semantic description information of different location areas of the image.

Continuing with the example of fig. 4, the semantic description information of each image block in the first and second rows of the image is sky-dominated, and the proportion of sea and beach is small, and each image block in the first row has a large similarity with the surrounding image blocks and a large weight, each image block in the second row has a high similarity with three adjacent image blocks and a low similarity with only one adjacent image block, and the weight of each image block in the second row is lower than that of the image block in the first row as soon as possible but also has a higher weight, so that the semantic description information of the image in this area can be determined to be sky based on the semantic description information and the weight of each image in the first and second rows, and sea and beach occupy a very small proportion. Similarly, with this S133, semantic description information in other position areas of the image can be determined.

Fig. 6 schematically illustrates semantic description information of the determined image. As shown in the rightmost side of fig. 6, according to the method of determining semantic description information of an image shown in fig. 5, it can be judged that the entire image has three image contents, i.e., sky, ocean, and beach. The first and second rows of the image may be described by first semantic description information, the right four columns of the third, fourth, and fifth rows of the image may be described by second semantic description information, and the left four columns of the fifth row of the image, the sixth, seventh rows may be described by third semantic description information. The first semantic description information is a scale representation of the respective image content, wherein sky occupies the largest proportion and sea and beach occupy smaller proportions, respectively. The second semantic description information is a scale representation of the respective image content, wherein the sea occupies the largest proportion and the sky and the beach occupy smaller proportions, respectively. The third semantic description information is a scale representation of the respective image content, wherein sand occupies the largest proportion and sky and sea occupy smaller proportions, respectively.

The manner in which the semantic description information of an image is determined described above in connection with fig. 5 and 6 is merely an example. In practice, the semantic descriptive information of the image may also be determined in other ways, as shown in fig. 7 below.

Fig. 7 is a flowchart illustrating a second example of determining semantic description information of an image in the image processing method of fig. 1. As shown in fig. 7, the determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block (S130) may include: determining an adjacent image block having similar semantic description information as a content aggregation area based on a spatial position of the image block in the image and the semantic description information of the image block, the similar semantic description information being semantic description information of the content aggregation area (S131A); determining a weight of the content aggregation area according to the semantic description information of the content aggregation area and the spatial position of the content aggregation area in the image (S132A); semantic description information of the image is determined according to the weight of the content aggregation area and the semantic description information of the content aggregation area (S133A).

In S131A, adjacent image blocks in the image having similar semantic description information are taken as the content aggregation area. The content aggregation area typically comprises a plurality of image blocks, having a larger area. However, in the case where the content of the image is rich, the content aggregation area may include one image block. The content aggregation areas have similar semantic description information. For example, image blocks B (1,1), B (1,2), B (2,1), B (2,2) in the image of fig. 2 have similar semantic description information, and may be regarded as a content aggregation area. Accordingly, the image is composed of a plurality of content aggregation areas.

At S132A, the weights of the respective content aggregation areas in the image are calculated. The weight of the content aggregation area may be calculated in a similar manner to the calculation of the weight of the image block. For example, the similarity between a content aggregation area and its neighboring content aggregation areas may be calculated according to semantic description information of the content aggregation area; setting a weight of the content aggregation area based on a similarity between the content aggregation area and its neighboring content aggregation areas.

In S133A, similar to the determination of the semantic description information of the image based on the semantic description information and the weight of the image block, the semantic description information of the image is determined from the weight and the semantic description information of the content aggregation area, and specifically, the description made above in conjunction with S133 may be referred to.

As can be seen from the above description in conjunction with fig. 7, each content aggregation area is divided from image blocks of an image, and semantic description information of the image is obtained based on the semantic description information of the content aggregation area in a manner similar to that of obtaining the semantic description information of the image based on the semantic description information of the image blocks. When the image is large and the content is rich, the information processing amount can be reduced, and therefore the semantic description information of the image can be determined more quickly.

In the technical scheme of the image processing method according to the embodiment of the disclosure, the semantic description information of the whole image is determined based on the spatial position of each image block in the whole image and the semantic description information of each image block, so that the relevant image blocks in the whole image are organically related, and the understanding habit of the image is better met.

Fig. 8 is a block diagram schematically illustrating a first image processing apparatus 800 according to an embodiment of the present disclosure. As shown in fig. 8, the first image processing device 800 includes one or more processors 810, a storage device 820, an input device 830, an output device 840, a communication device 850, and a camera 860, which are interconnected via a bus system 870 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the first image processing apparatus 800 shown in fig. 8 are only exemplary and not restrictive, and the first image processing apparatus 800 may have other components and structures as necessary.

The processor 810 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the first image processing apparatus 800 to perform desired functions.

Storage 820 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 810 to implement the image processing methods described above in connection with fig. 1-7. Various applications and various data, such as image data and various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 830 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like. The instruction is, for example, an instruction to take an image using the camera 860 described below. Output device 840 may output various information (e.g., images or sounds) to an external (e.g., user), and may include one or more of a display, speakers, or the like. The communication device 850 may communicate with other devices (e.g., personal computers, servers, mobile stations, base stations, etc.) via a network, which may be the internet, wireless local area networks, mobile communication networks, etc., or other technologies, which may include, for example, bluetooth communication, infrared communication, etc. Camera 860 may take images (e.g., photographs, videos, etc.) to be processed and store the taken images in storage 820 for use by other components.

Fig. 9 is a block diagram schematically illustrating a second image processing apparatus 900 according to an embodiment of the present disclosure. As shown in fig. 9, the second image processing apparatus 900 may include a dividing unit 910, an image block semantic determining unit 920, and an image semantic determining unit 930. The dividing unit 910 equally divides an image to be processed into a plurality of image blocks. The image block semantic determining unit 920 obtains semantic description information of the image block. The image semantic determining unit 930 determines semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block.

The dividing unit 910 equally divides an image to be processed into a plurality of image blocks. The image to be processed is usually represented by the numerical value of the individual pixels of the image. For each pixel, it can be represented by a gray value, three primary color components, and the like. An image typically comprises a matrix of pixels arranged in rows and columns. As an example of the division into a plurality of image blocks, the division may be sequentially performed in such a manner that each image block includes 16 × 16 pixels. Specific examples of the division can be seen in the illustration of fig. 2 and the description in conjunction with fig. 2. Each image block is represented by B (i, j) in FIG. 2, where i is the row number of image block B in the entire image and 1 ≦ i ≦ 7, and j is the column number of image block B in the entire image and 1 ≦ j ≦ 8. In an application, the dividing unit 910 may be implemented by a processor and a memory.

The dividing unit 910 equally divides the image to be processed into a plurality of image blocks, which is simple. However, in other image processing methods for generating semantic description information of an image, it may be necessary to divide the image based on visual features of the image to obtain a plurality of image regions, which is very complicated.

The image block semantic determining unit 920 obtains semantic description information of each of the plurality of image blocks divided by the dividing unit 910. The meaning of data is the semantics, and the semantic description information is the information used to describe the semantics. The image data itself is a symbol, and only data to which a meaning is assigned can be used, and at this time, the data is converted into semantic description information. The semantic description information is, for example, a text-like language expression that can be intuitively understood by a machine. For example, the semantic description information of the respective image blocks of the seaside landscape photo of fig. 2 may be a probability distribution that each image block is at least one of sky, sea, beach.

The image block semantic determining unit 920 may obtain semantic description information of the image block as follows: obtaining a Gaussian mixture model of image content corresponding to each semantic description information; and determining semantic description information of the image block according to the image block and the Gaussian mixture model.

The image block semantic determining unit 920 may acquire training images of respective image contents corresponding to respective semantic description information from the database in advance, and acquire a gaussian mixture model of each image content based on the training image of the image content. Alternatively, the image block semantics determining unit 920 may directly acquire a gaussian mixture model of each image content from the database.

Training images of respective image contents including, for example, sky, sea, beach, grass, tree, mountain, and the like are stored in advance in the database. The image content is content that is involved in semantic description information of various images. Each image content may have a different image, for example, the sky is often different in color and brightness under different weather conditions. Thus, there may be multiple training images, for example 512, corresponding to each image content. For each image content, a gaussian model may be established based on the plurality of training images to characterize the image content. The gaussian model accurately quantizes image content using a normal distribution curve, and decomposes gray scale, color, and the like in the image content into a plurality of models formed based on the normal distribution curve. And carrying out weighted average on the Gaussian model corresponding to each image content to obtain a Gaussian mixture model of the image content. The manner in which the gaussian mixture model is obtained does not constitute a limitation on the embodiments of the present disclosure. Generally, a training image for each image content and a gaussian mixture model for each image content are obtained in advance, and a database is built.

As an example, the image block semantic determining unit 920 may determine semantic description information of each image block according to a gaussian mixture model as follows: determining probability estimation of the image blocks belonging to the contents of the images according to the similarity of the image blocks and the Gaussian mixture model; and determining semantic description information of the image block according to the probability estimation that the image block belongs to each image content.

Taking the image block at the upper right corner in fig. 2 as an example, the image data of 16 × 16 pixels of the image block B (1,8) at the upper right corner is respectively matched with the obtained gaussian mixture model of each image content. For example, the image data of the image block B (1,8) is respectively matched with a gaussian mixture model of sky, a gaussian mixture model of sea, a gaussian mixture model of beach, a gaussian mixture model of grass, a gaussian mixture model of big tree, and a gaussian mixture model of mountain to determine probability estimates that the image block B (1,8) belongs to sky, sea, beach, grass, big tree, and mountain, respectively. As can be seen from fig. 2, the image block in the upper right corner is the sky, so that it can be determined that the probability that the image block B (1,8) belongs to the sky is the largest, and the probability of belonging to any one of the sea, the beach, the grass, the big tree and the mountain may be very small or even zero.

Thereafter, the image block semantic determining unit 920 determines semantic description information of the image block according to the probability estimation that the image block belongs to the respective image contents. For example, the image block semantics determining unit 920 may represent the probability that each image block belongs to the respective image content in a histogram, and may also represent the probability that each image block belongs to the respective image content in a scale map. Depending on the specific requirements of the semantic description information, the probability estimates may be suitably processed to generate the required semantic description information.

For each image block in the image to be processed, the image block semantic determining unit 920 determines the similarity between the image block and the gaussian mixture model of each image content to determine a probability estimation that each image block belongs to each image content, and determines semantic description information of each image block based on the probability estimation, thereby obtaining the semantic description information of each image block in the image to be processed. The semantic description information of the image block obtained by the image block semantic determining unit 920 can refer to the illustration of fig. 4 and the related description made in conjunction with fig. 4.

The image block semantics determining unit 920 may be implemented using a memory and a processor. When the processor executes the program stored in the memory, the respective operations of the image block semantic determination unit 920 may be completed.

The image semantic determining unit 930 determines semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block determined by the image block semantic determining unit 920. As an example, the image semantic determining unit 930 may determine the weight of the image block according to the semantic description information of the image block and the spatial position of the image block in the image; and determining the semantic description information of the image according to the semantic description information of the image block and the weight of the image block. The importance of each image block can be set distinguishably by setting the weight for the image block according to the spatial position of the image block in the image, so that the semantic description information of the whole image can be more accurately expressed. Therefore, the image blocks divided by the dividing unit 910 are not independent from each other, but semantic description information of the image is determined based on the spatial positions of the image blocks in the image.

The image semantic determination unit 930 may determine the weights of the image blocks as follows: calculating the similarity between the image block and the adjacent image block according to the semantic description information of the image block; setting the weight of the image block based on the similarity between the image block and its neighboring image blocks.

The closer the semantic description information between two adjacent image blocks is, the greater the similarity between the two adjacent image blocks. For example, in fig. 4, the image block B (1,4) and its adjacent image blocks B (1,3), B (1,5), and B (2,4) are mainly sky, and the contents of the ocean and the beach occupy a very small portion, so that the similarity between the image block B (1,4) and its adjacent image blocks is large. In calculating the similarity, the respective image contents of the image blocks and the proportions occupied by the respective image contents are considered. The image semantic determining unit 930 may calculate the similarity between two image blocks by calculating the euclidean distance between semantic description information of the two image blocks, for example. In fig. 4, the image block B (2,4) has high similarity with the adjacent image blocks B (2,3), B (2,5) and B (1,4), and is dominated by the sky, and the contents of the sea and the beach occupy a very small part; however, the image block B (2,4) has low similarity with the adjacent image block B (3,4), because the image block B (3,4) is dominated by the ocean and the contents of the sky and the beach occupy a very small portion. Therefore, the similarity between the image block B (2,4) and its neighboring image block is lower than the similarity between the image block B (1,4) and its neighboring image block.

In the case where the similarity between an image block and its neighboring image block is high, the image semantic determination unit 930 may set a high weight for the image block; and in the case that the similarity between the image block and the adjacent image block is low, setting low weight for the image block. For example, for the above-described image blocks B (1,4) and B (2,4), the weight W (1,4) of the image block B (1,4) determined by the image semantic determining unit 930 is greater than the weight W (2,4) of the image block B (2, 4). And so on, to obtain the weight W (i, j) of each image block B (i, j). In application, the image semantic determination unit 930 may also determine the weight of each image block in other manners, for example, may determine according to the pixel information of each image block.

The image semantic determination unit 930 then determines semantic description information of the image according to the semantic description information of the image block and the weight of the image block. The entire image may include multiple image content, each located at a different position in the image. Accordingly, the semantic description information of the image is the distribution of the semantic description information of different location areas of the image.

Continuing with the example of the image in fig. 2, the image semantic determining unit 930 may find that, according to the semantic description information of the image blocks and the weights of the image blocks, the semantic description information of each image block in the first row and the second row is dominated by the sky, and the proportions of the sea and the beach are small, and each image block in the first row has a large weight, and the weight of each image block in the second row is lower than the weight of the image block in the first row as soon as possible, but also has a higher weight, so that it may be determined that the semantic description information of the image in this area is the sky, and the sea and the beach occupy a very small proportion, based on the semantic description information and the weights of each image in the first row and the second row. Similarly, the image semantic determination unit 930 may determine semantic description information in other location areas of the image.

The semantic description information of the image determined by the image semantic determination unit 930 can refer to the illustration and the related description of fig. 6. In short, the first and second rows of the image may be described by the first semantic description information, the right four columns of the third, fourth, and fifth rows of the image may be described by the second semantic description information, and the left four columns of the fifth row of the image, the sixth, seventh rows may be described by the third semantic description information. The first semantic description information indicates that the sky occupies the largest proportion and the ocean and the beach occupy smaller proportions, respectively. The second semantic description information indicates that the sea occupies the largest proportion and the sky and the beach occupy smaller proportions, respectively. The third semantic description information indicates that the sand beach occupies the largest proportion and the sky and the sea occupy smaller proportions, respectively.

Alternatively, the image semantic determination unit 930 may also determine semantic description information of the image in other ways. For example, the image semantic determining unit 930 may determine, based on the spatial position of the image block in the image and the semantic description information of the image block, an adjacent image block having similar semantic description information as the semantic description information of the content aggregation area; determining the weight of the content gathering area according to the semantic description information of the content gathering area and the spatial position of the content gathering area in the image; and determining semantic description information of the image according to the weight of the content aggregation area and the semantic description information of the content aggregation area.

Here, the image semantic determining unit 930 takes adjacent image blocks in the image having similar semantic description information as the content aggregation area. The content aggregation area typically comprises a plurality of image blocks, having a larger area. However, in the case where the content of the image is rich, the content aggregation area may include one image block. Then, the image semantic determination unit 930 calculates the weight of the content aggregation area in a similar manner to the weight calculation of the image block, and determines semantic description information of the image from the weight of the content aggregation area and the semantic description information. For example, the image semantic determining unit 930 may calculate the similarity between the content aggregation area and its neighboring content aggregation areas according to the semantic description information of the content aggregation area; setting the weight of the content gathering area based on the similarity between the content gathering area and the adjacent content gathering area, and determining the semantic description information of the image according to the weight of the content gathering area and the semantic description information. That is, the image semantic determination unit 930 divides the entire image into the content aggregation area based on the image blocks, and obtains the semantic description information of the image based on the semantic description information of the content aggregation area in a similar manner to obtaining the semantic description information of the image based on the semantic description information of the image blocks. When the image is large and the content is rich, the information processing amount can be reduced, and therefore the semantic description information of the image can be determined more quickly.

The image semantic determination unit 930 may be implemented using a memory and a processor. When the processor executes the program stored in the memory, the respective operations of the image semantic determination unit 930 may be completed.

In the technical scheme of the image processing device according to the embodiment of the disclosure, the semantic description information of the whole image is determined based on the spatial position of each image block in the whole image and the semantic description information of each image block, so that the relevant image blocks in the whole image are organically related, and the understanding habit of the image is better met.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of ordinary skill in the art will appreciate that the various illustrative elements and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units may be combined or integrated into another system, or some features may be omitted, or not executed.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. An image processing method comprising:

averagely dividing an image to be processed into a plurality of image blocks according to a pixel matrix;

obtaining semantic description information of the image block;

determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block,

wherein the determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block comprises:

determining the weight of the image block according to the semantic description information of the image block and the spatial position of the image block in the image;

and determining the semantic description information of the image according to the semantic description information of the image block and the weight of the image block.

2. The image processing method according to claim 1, wherein the obtaining semantic description information of the image block comprises:

obtaining a Gaussian mixture model of image content corresponding to each semantic description information;

and determining semantic description information of the image block according to the image block and the Gaussian mixture model.

3. The image processing method according to claim 2, wherein the determining semantic description information of the image block according to the image block and the gaussian mixture model comprises:

determining probability estimation of the image blocks belonging to the contents of the images according to the similarity of the image blocks and the Gaussian mixture model;

and determining semantic description information of the image block according to the probability estimation that the image block belongs to each image content.

4. The image processing method according to claim 1, wherein the determining the weights of the image blocks according to the semantic description information of the image blocks and the spatial positions of the image blocks in the image comprises:

calculating the similarity between the image block and the adjacent image block according to the semantic description information of the image block;

setting the weight of the image block based on the similarity between the image block and its neighboring image blocks.

5. The image processing method according to claim 1, wherein the determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block further comprises:

determining adjacent image blocks with similar semantic description information as content aggregation areas based on the spatial positions of the image blocks in the image and the semantic description information of the image blocks, and using the similar semantic description information as the semantic description information of the content aggregation areas;

determining the weight of the content gathering area according to the semantic description information of the content gathering area and the spatial position of the content gathering area in the image;

and determining semantic description information of the image according to the weight of the content aggregation area and the semantic description information of the content aggregation area.

6. An image processing apparatus comprising:

a memory; and

a processor to perform the following operations:

obtaining semantic description information of the image block;

7. The image processing apparatus according to claim 6, wherein said obtaining semantic description information of the image block comprises:

8. The image processing apparatus according to claim 7, wherein the determining semantic description information of the image block according to the image block and the gaussian mixture model comprises:

9. The image processing apparatus according to claim 7, wherein the determining the weights of the image blocks according to the semantic description information of the image blocks and the spatial positions of the image blocks in the image comprises:

10. The image processing apparatus according to claim 7, wherein the determining semantic description information of the image based on the spatial position of the image block in the image and the semantic description information of the image block further comprises: