CN110648299A

CN110648299A - Image processing method, image processing apparatus, and computer-readable storage medium

Info

Publication number: CN110648299A
Application number: CN201810670236.4A
Authority: CN
Inventors: 廖可; 张宇鹏; 王炜
Original assignee: Liguang Co
Current assignee: Liguang Co; Ricoh Co Ltd
Priority date: 2018-06-26
Filing date: 2018-06-26
Publication date: 2020-01-03

Abstract

The embodiment of the invention provides an image processing method, an image processing device and a computer readable storage medium, wherein the image processing method comprises the following steps: acquiring a panoramic image and one or more local images within the panoramic image; acquiring panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic division areas in the panoramic image; determining one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and acquiring detailed semantic information according to the determined focus areas; and obtaining image description information by using the panoramic semantic information and the detail semantic information.

Description

Image processing method, image processing apparatus, and computer-readable storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to an image processing method, an image processing apparatus, and a computer-readable storage medium.

Background

Multi-sensor imaging systems consist of multiple and/or multiple sensors located at the same or different locations. After image or video data is acquired by the multi-sensor imaging system, a plurality of image or video information from the multi-sensor may be processed to output corresponding image processing results.

In the prior art, when an image acquired by a multi-sensor imaging system includes a panoramic image and one or more local images within a panoramic image range, the obtained panoramic image and the local images are generally subjected to fusion processing, and a panoramic fusion image after fusion is obtained. However, only the panoramic fused image obtained by simply fusing the panoramic image and the local image cannot obtain all application information desired by the user, such as related semantic information and description information.

Disclosure of Invention

To solve the above technical problem, according to an aspect of the present invention, there is provided an image processing method including: acquiring a panoramic image and one or more local images within the panoramic image; acquiring panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic division areas in the panoramic image; determining one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and acquiring detailed semantic information according to the determined focus areas; and obtaining image description information by using the panoramic semantic information and the detail semantic information.

According to another aspect of the present invention, there is provided an image processing apparatus comprising: an acquisition unit that acquires a panoramic image and one or more partial images within the panoramic image; the semantic dividing unit acquires panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic dividing areas in the panoramic image; a focus area obtaining unit, which determines one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and obtains detail semantic information according to the determined focus areas; and the description unit is used for obtaining image description information by utilizing the panoramic semantic information and the detail semantic information.

According to another aspect of the present invention, there is provided an image processing apparatus comprising: a processor; and a memory having computer program instructions stored therein, wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of: acquiring a panoramic image and one or more local images within the panoramic image; acquiring panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic division areas in the panoramic image; determining one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and acquiring detailed semantic information according to the determined focus areas; and obtaining image description information by using the panoramic semantic information and the detail semantic information.

According to another aspect of the invention, there is provided a computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the steps of: acquiring a panoramic image and one or more local images within the panoramic image; acquiring panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic division areas in the panoramic image; determining one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and acquiring detailed semantic information according to the determined focus areas; and obtaining image description information by using the panoramic semantic information and the detail semantic information.

According to the image processing method, the image processing device and the computer readable storage medium of the invention, the panoramic semantic information and the detail semantic information can be respectively acquired for the panoramic image and one or more local images within the panoramic image range, and the image description information can be obtained according to the panoramic semantic information and the detail semantic information. Therefore, the image description information obtained by the method, the device and the computer readable storage medium of the invention can give consideration to the panoramic semantic information of the scene description of the panoramic image and the detail semantic information of the detail description of the focal region of the local image, thereby improving the accuracy of the image description and being effectively applied to the fields of automatic driving, robot interaction and the like.

Drawings

The above and other objects, features, and advantages of the present invention will become more apparent from the detailed description of the embodiments of the present invention when taken in conjunction with the accompanying drawings.

FIG. 1 shows a flow diagram of an image processing method of one embodiment of the invention;

FIG. 2(a) shows a panoramic image according to one embodiment of the present invention; fig. 2(b) illustrates an infrared partial image according to an embodiment of the present invention, and fig. 2(c) illustrates a panorama fused image obtained by fusing the panorama image of fig. 2(a) and the infrared partial image of fig. 2(b) according to an embodiment of the present invention;

FIG. 3(a) shows a panoramic fusion image in accordance with one embodiment of the present invention; fig. 3(b) shows a coordinate-transformed local fusion image obtained by coordinate-transforming the local fusion image in fig. 3 (a); fig. 3(c) is a schematic diagram showing the positions of the clear area 1 and the blurred area 2 in the local fusion image after coordinate transformation in fig. 3 (b);

FIG. 4(a) shows a panoramic fusion image in accordance with one embodiment of the present invention; fig. 4(b) shows a coordinate-transformed local fusion image obtained by coordinate-transforming the local fusion image in fig. 4 (a); FIG. 4(c) shows the local fusion image after coordinate transformation of FIG. 4(b) being resampled; fig. 4(d) shows that the local fusion image resampled in fig. 4(c) is subjected to coordinate inverse transformation to obtain a panoramic image;

FIG. 5 shows a schematic diagram of a panoramic image, according to one embodiment of the present invention;

FIG. 6 illustrates a schematic location of a local image and a focal region in a panoramic image, according to one embodiment of the present invention;

FIG. 7 shows a block diagram of an image processing apparatus according to an embodiment of the invention;

fig. 8 shows a block diagram of an image processing apparatus according to an embodiment of the present invention.

Detailed Description

An image processing method, an image processing apparatus, and a computer-readable storage medium according to embodiments of the present invention will be described below with reference to the accompanying drawings. In the drawings, like reference numerals refer to like elements throughout. It should be understood that: the embodiments described herein are merely illustrative and should not be construed as limiting the scope of the invention.

An image processing method according to an embodiment of the present invention will be described below with reference to fig. 1. The image processing method of the embodiment of the present invention may be applied to a still image, a video frame in a video that changes with time, and the like, and is not limited herein. Fig. 1 shows a flow chart of the image processing method 100.

As shown in fig. 1, in step S101, a panoramic image and one or more partial images within the panoramic image are acquired.

In this step, the panoramic image and the one or more partial images may be acquired using a multi-sensor system. The series of images acquired by the multi-sensor imaging system may include a panoramic image acquired by a panoramic sensor in the multi-sensor imaging system and one or more partial images within the panoramic image acquired by one or more partial sensors in the multi-sensor imaging system. Here, the panoramic image may be acquired by photographing scene image information of, for example, 360 degrees by a wide angle technique by a panoramic sensor, and may be further mapped into a two-dimensional image by conversion of a longitude and latitude coordinate system. Accordingly, within the range of the scene shot by the panoramic image, one or more local images can also be acquired by one or more local sensors. Among these, the local sensor may be, for example: one or more of a high-definition sensor, an infrared sensor, a light field sensor, a point cloud sensor, a stereoscopic vision sensor and a laser sensor. By means of the above-mentioned local sensors, the corresponding, for example: one or more of a high-definition partial image, an infrared partial image, a light field partial image, a point cloud partial image, a stereoscopic vision partial image, and a laser partial image.

After the panoramic image and the one or more local images in the scene range of the panoramic image are acquired through the multi-sensor system, further, the one or more local images can be fused according to the position of the acquired local image to obtain a local fusion image, wherein the local fusion image is in one-to-one correspondence with the local images. And finally, the panoramic image and the fused local image can be fused to obtain a panoramic fused image.

Fig. 2 shows a schematic diagram of a panoramic image, a partial image and a panoramic fusion image according to an embodiment of the present invention. Specifically, fig. 2(a) is a panoramic image acquired by using a panoramic sensor in an embodiment of the present invention; fig. 2(b) shows an infrared partial image obtained by an infrared sensor, and fig. 2(c) shows a panoramic fused image obtained by fusing the panoramic image in fig. 2(a) and the infrared partial image in fig. 2 (b). As shown in fig. 2(c), the infrared partial image in fig. 2(b) may be subjected to a fusion process, and the processed partial fusion image may be fused to the central region of the panoramic image in fig. 2 (a).

In this step, an independent panoramic image and one or more partial images may be acquired, respectively; alternatively, a panoramic fusion image such as that shown in fig. 2(c) may be obtained in an initial stage, and a separated panoramic image and a local image are obtained by processing according to the panoramic fusion image, so as to be processed in a subsequent step.

In one example, when the image obtained at the initial stage is a panoramic fusion image, one or more local fusion images may be obtained from the panoramic fusion image based on the positions of the one or more local fusion images in the panoramic fusion image; and then the local fusion image is processed to respectively obtain a local image and/or a panoramic image. In practical applications, optionally, the position of the local fusion image in the panoramic fusion image may be obtained through position information of the local fusion image included in the panoramic fusion image, for example, the position information may be known through metadata (metadata) of the panoramic fusion image, or may be known through a related description in a picture file of the panoramic fusion image. After the position of the local fusion image (or the local image) in the panoramic fusion image is known, the local fusion image can be separated from the panoramic fusion image. Here, the acquired local fusion image is generally a local fusion image in which a local image is distorted so as to be adapted to the longitude and latitude coordinate system of the panoramic image. Thus, optionally, to obtain a local image without distortion, in one example, the one or more local fused images may be coordinate transformed to remove distortion to obtain the local image. In another example, the locally fused image may also be coordinate transformed first to remove distortion; subsequently acquiring one or more image-related features for the coordinate-transformed locally fused image (e.g., a search may be initiated from the center of the coordinate-transformed locally fused image to acquire image pixel-level features such as image resolution and/or focus information); and finally, removing the fuzzy area of the local fusion image after coordinate transformation according to the characteristics of the acquired image to obtain the local image required in the step.

Fig. 3 shows a schematic diagram of obtaining a partial image from a panoramic fusion image. Fig. 3(a) shows a panoramic fused image, with the locations of the partially fused images outlined by dashed lines, according to one embodiment of the invention. Fig. 3(b) shows the coordinate-transformed local fusion image obtained by coordinate-transforming the local fusion image in fig. 3(a) to remove distortion. Further, feature extraction may be performed in the image of fig. 3(b), to acquire features such as image resolution and/or focus information, and to remove a blurred region from the coordinate-transformed local fusion image to obtain a local image. The blurred region of the local fusion image may be, for example, a transition region of some lines or colors in the image, or an edge region of the image. Fig. 3(c) shows a schematic position diagram of the clear region 1 and the blurred region 2 in the local fusion image after coordinate transformation in fig. 3 (b). Wherein, the area 1 in the central square frame is a clear area, and the area 2 nested by the two square frames is a fuzzy area. In one example, the blurred region 2 may be processed and removed using the extracted image features to make the resulting partial image (not shown) sufficiently sharp.

Optionally, when the image acquired at the initial stage is a panoramic fusion image, the panoramic image may also be acquired based on the panoramic fusion image and a local fusion image acquired according to the position of the panoramic fusion image. Specifically, the one or more locally fused images may first be coordinate transformed to remove distortion; subsequently acquiring one or more image-related features for the coordinate-transformed locally fused image, e.g. features at the image pixel level, such as image resolution and/or focus information, may be acquired from the surroundings of the coordinate-transformed locally fused image; next, considering that the sensors used for initially acquiring the panoramic image and the local image are different, so that the images may have different image characteristics such as image resolution and focus information, and that the fused panoramic fusion image and the local fusion image may also have correspondingly different image characteristics, the local fusion image after coordinate transformation may be resampled using the acquired image-related characteristics to adjust the local fusion image after coordinate transformation to have the same characteristics (image resolution, focus information, etc.) as the panoramic fusion image around it; and finally, performing coordinate inverse transformation (namely, the coordinate transformation processing process is opposite to the coordinate transformation processing process) on the resampled local fusion image, namely, projecting the resampled local fusion image back to a coordinate system of the panoramic fusion image for fusion after performing distortion processing on the resampled local fusion image so as to replace the area of the original local fusion image in the panoramic fusion image with the processed local fusion image with the same image characteristics as the panoramic fusion image, so as to obtain the panoramic image. In one example of the invention, resampling has different modes of operation for different local sensors in a multi-sensor imaging system. For example, the high-definition sensor may perform pixel resampling of high-definition data, the infrared sensor may perform visible light acquisition processing of infrared data, the light field sensor may perform resolution and focusing information averaging and adjustment of light field data, the point cloud sensor may perform projection and pixel supplementation of point cloud data, and the stereo vision sensor may perform depth information removal and resolution adjustment of stereo vision data. The above resampling process and method are only examples, and in practical applications, any relevant resampling process method may be adopted, and is not limited herein.

Fig. 4 shows a schematic diagram of acquiring a panoramic image from a panoramic fusion image. Fig. 4(a) shows a panoramic fusion image in which the positions of the partial fusion images are outlined by dashed lines, according to an embodiment of the present invention. Fig. 4(b) shows the coordinate-transformed local fusion image obtained by coordinate-transforming the local fusion image in fig. 4(a) to remove distortion. Further, feature extraction may be performed around the image of fig. 4(b), acquiring image features such as image resolution and/or focus information. Fig. 4(c) shows a schematic diagram of resampling the coordinate-transformed locally fused image of fig. 4(b) with the extracted image features. Fig. 4(d) shows a schematic diagram of the local fused image resampled in fig. 4(c) being subjected to coordinate inverse transformation to be projected to the panoramic fused image of fig. 4(a), resulting in a panoramic image with consistent image characteristics (e.g., image resolution, focus information).

In step S102, panoramic semantic information is obtained according to the panoramic image, wherein the panoramic semantic information corresponds to a semantic division area in the panoramic image.

In this step, the panoramic image may be processed to obtain one or more pieces of panoramic semantic information of the panoramic image and corresponding semantic division regions thereof. In one example, the panoramic semantic information and the range of the corresponding semantic division area can be obtained by using image recognition and other technologies. Optionally, information related to the panoramic image, such as background information and/or scene description information of the image, may be further acquired according to the acquired panoramic semantic information.

Fig. 5 shows a schematic diagram of a panoramic image acquired according to an embodiment of the present invention. According to the example shown in fig. 5, the panoramic semantic information in the panoramic image may obtain information such as sky, ground, people, etc. according to the image recognition content, which may respectively correspond to different region ranges obtained by image recognition, and the background information and/or scene description information obtained according to this may include, for example: outdoors, crowded people, face-to-face conversations, and the like.

In step S103, one or more focus areas may be determined in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and detail semantic information may be acquired according to the determined focus areas. Optionally, one or more pieces of focus semantic information related to the one or more local images may be selected from the panoramic semantic information, a region is divided according to the selected focus semantic information and corresponding semantics thereof, a corresponding region of interest is obtained in the one or more local images through a neural network, image information processing, and the like, to serve as the one or more focus regions, and corresponding detail semantic information is obtained according to the determined focus region. Of course, the above specific operation manner is only an example, and in practical applications, the determined focus area may not be completely in the local image, for example, the focus area may only partially overlap with the local image area, or the focus area may not completely overlap with the local image area; accordingly, the selected focus semantic information for the focus area and the subsequently acquired detail semantic information may also be only partially related to the local image, or not related to the local image at all. At this time, in this step, one or more focus areas may be determined only according to the panoramic semantic information and the corresponding semantic division areas thereof, and the detailed semantic information may be acquired according to the determined focus areas.

Fig. 6 illustrates a panoramic image, a partial image and a position diagram of a focal area being acquired according to an embodiment of the present invention. According to the example shown in fig. 6, the area range where the local fusion image obtained by fusing the local image acquired by the local sensor is located is within the dotted line frame in the panoramic image, and the infrared image at the lower part of fig. 6 is an enlarged schematic view of the focal area determined in the local image converted from the local fusion image. According to the infrared image of the focus area enlarged in fig. 6, the corresponding detail semantic information can be obtained, for example: the human mood is happy.

In step S104, image description information is obtained by using the panoramic semantic information and the detail semantic information. In this step, the panoramic semantic information and the detailed semantic information may be fused, and the image description information may be obtained by combining weights. Optionally, the panoramic semantic information E for describing the scene may be fused with the detail semantic information S for describing details, and the final image description information may be obtained based on different model structures (e.g., weight averaging, bayesian estimation, data fusion neural network, reinforcement learning, etc.).

As described above, the panoramic image and the local image according to the embodiment of the present invention may be both still images, or may be a frame of video frame in a video. When the panoramic image and the local image are video frames in a video, the panoramic image and the local image may be a panoramic image in the panoramic video and one or more local images in one or more local videos, respectively, which are captured at the same time i. When the panoramic image and the local image are respectively a frame of video frame collected at the same time in the video, the obtaining of the image description information by using the panoramic semantic information and the detail semantic information may include: and respectively fusing the panoramic semantic information and the detail semantic information at different moments, and processing according to a time sequence to obtain the image description information which changes along with time. That is, the image description information for the video may be information that gradually evolves over time, rather than being fixed. At this time, the panorama semantic information in the image description information may be represented as E_iThe detail semantic information may be represented as S_iAnd i is time. Accordingly, the time series of the panoramic semantic information varying with i may be E_i-2，E_i-1，E_i，E_i+1…, the time series of detail semantic information that varies with i may be S_i-2，S_i-1，S_i，S_i+1… are provided. For example, examples of temporal changes in image description information may be: two people talk outdoors, are happy at time i-1, begin to quarrel at time i, and so on.

In one example, the image description information using weight averaging may be represented as:

R＝W_siS_i+W_eiE_i+W_s(i-1)S_i-1+W_e(i-1)E_i-1+…

where R is the image semantic weighted average information (i.e., weighted image description information), W_siFor detail semantic information weight, W_eiAnd i is time.

In another example, the image description information based on bayesian estimation may be expressed as:

P_r＝minL(P(P_si,P_ei))

wherein P is_siBayesian estimation for detail semantic information, P_eiBayesian estimation for panoramic semantic information, i being time, P (P)_si，P_ei) As a joint distribution function, P_rIs the minimum likelihood estimation of the joint distribution function P, i.e. the image description information fusion value.

In another example, the reinforcement learning based image description information may be expressed as:

turple(S,A,R,P)＝turple((S_si,S_ei),A,(R_si,R_ei),P)

wherein turple (x) is a reinforcement learning four element system; s, A, R is an input; p is the output; wherein S is environment information or state, and can be divided into detail semantic information S_siAnd panoramic semantic information S_ei(ii) a A is the behavior or action in the state; r is the reward of each action in each state, and can be divided into rewards R brought by detail semantic information_siReward R brought by panoramic semantic information_ei(ii) a And P is the image description information in the current state or the corresponding behavior function.

According to the image processing method, the panoramic semantic information and the detail semantic information can be respectively acquired aiming at the panoramic image and one or more local images in the panoramic image range, and the image description information can be obtained according to the panoramic semantic information and the detail semantic information. Therefore, the image description information obtained by the method can give consideration to the panoramic semantic information of the panoramic image about the scene description and the detail semantic information about the detail description in the local image focus area, the image description accuracy is improved, and the method can be effectively applied to the fields of automatic driving, robot interaction and the like.

For example, in the field of robot interaction, in the prior art, corresponding panoramic semantic information can be generally obtained only from a panoramic image, so that only description about a scene is performed, and corresponding detail semantic analysis and response cannot be performed on a focus area in a targeted manner. According to the method provided by the embodiment of the invention, not only can panoramic semantic information for scene description be obtained, but also detailed semantic information for a focus area can be further obtained, and the focus area can be changed as required, so that the description of different focuses in the scene and the scene can be considered, and more accurate and targeted communication and reaction of the robot in the scene can be facilitated.

Next, an image processing apparatus according to an embodiment of the present invention is described with reference to fig. 7. Fig. 7 shows a block diagram of an image processing apparatus 700 according to an embodiment of the present invention. The image processing apparatus of the embodiment of the present invention may be applied to both a still image and a video frame in a video that changes with time, and is not limited herein. As shown in fig. 7, the image processing apparatus 700 includes an acquisition unit 710, a semantic division unit 720, a focus area acquisition unit 730, and a description unit 740. The apparatus 700 may include other components in addition to these units, however, since these components are not related to the contents of the embodiments of the present invention, illustration and description thereof are omitted herein. Further, since the specific details of the following operations performed by the image processing apparatus 700 according to the embodiment of the present invention are the same as those described above with reference to fig. 1 to 6, a repetitive description of the same details is omitted herein to avoid redundancy.

The acquisition unit 710 of the image processing apparatus 700 in fig. 7 is configured to acquire a panoramic image and one or more partial images within the panoramic image.

The acquisition unit 710 may acquire the panoramic image and the one or more partial images using a multi-sensor system. The series of images acquired by the multi-sensor imaging system may include a panoramic image acquired by a panoramic sensor in the multi-sensor imaging system and one or more partial images within the panoramic image acquired by one or more partial sensors in the multi-sensor imaging system. Here, the panoramic image may be acquired by photographing scene image information of, for example, 360 degrees by a wide angle technique by a panoramic sensor, and may be further mapped into a two-dimensional image by conversion of a longitude and latitude coordinate system. Accordingly, within the range of the scene shot by the panoramic image, one or more local images can also be acquired by one or more local sensors. Among these, the local sensor may be, for example: one or more of a high-definition sensor, an infrared sensor, a light field sensor, a point cloud sensor, a stereoscopic vision sensor and a laser sensor. By means of the above-mentioned local sensors, the corresponding, for example: one or more of a high-definition partial image, an infrared partial image, a light field partial image, a point cloud partial image, a stereoscopic vision partial image, and a laser partial image.

In a specific operation process, the obtaining unit 710 may obtain an independent panoramic image and one or more local images, respectively; it is also possible to acquire a panoramic fusion image such as that shown in fig. 2(c) at an initial stage and process the acquired panoramic fusion image and local image separately according to the panoramic fusion image for use in the subsequent steps.

In one example, when the image obtained at the initial stage is a panoramic fusion image, one or more local fusion images may be obtained from the panoramic fusion image based on the positions of the one or more local fusion images in the panoramic fusion image; and then the local fusion image is processed to respectively obtain a local image and/or a panoramic image. In practical applications, optionally, the position of the local fusion image in the panoramic fusion image may be obtained through position information of the local fusion image included in the panoramic fusion image, for example, the position information may be known through metadata (metadata) of the panoramic fusion image, or may be known through a related description in a picture file of the panoramic fusion image. After the position of the local fusion image (or the local image) in the panoramic fusion image is known, the local fusion image can be separated from the panoramic fusion image. Here, the acquired local fusion image is generally a local fusion image in which a local image is distorted so as to be adapted to the longitude and latitude coordinate system of the panoramic image. Thus, optionally, to obtain a local image without distortion, in one example, the one or more local fused images may be coordinate transformed to remove distortion to obtain the local image. In another example, the locally fused image may also be coordinate transformed first to remove distortion; subsequently acquiring one or more image-related features for the coordinate-transformed locally fused image (e.g., a search may be initiated from the center of the coordinate-transformed locally fused image to acquire image pixel-level features such as image resolution and/or focus information); and finally, removing the fuzzy area of the local fusion image after coordinate transformation according to the characteristics of the acquired image to obtain the required local image.

The semantic dividing unit 720 obtains panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to a semantic division area in the panoramic image.

The semantic division unit 720 may process the panoramic image to obtain one or more panoramic semantic information of the panoramic image and corresponding semantic division regions thereof. In one example, the panoramic semantic information and the range of the corresponding semantic division area can be obtained by using image recognition and other technologies. Optionally, information related to the panoramic image, such as background information and/or scene description information of the image, may be further acquired according to the acquired panoramic semantic information.

The focus area obtaining unit 730 may determine one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and obtain detailed semantic information according to the determined focus areas. Optionally, one or more pieces of focus semantic information related to the one or more local images may be selected from the panoramic semantic information, a region is divided according to the selected focus semantic information and corresponding semantics thereof, a corresponding region of interest is obtained in the one or more local images through a neural network, image information processing, and the like, to serve as the one or more focus regions, and corresponding detail semantic information is obtained according to the determined focus region. Of course, the above specific operation manner is only an example, and in practical applications, the determined focus area may not be completely in the local image, for example, the focus area may only partially overlap with the local image area, or the focus area may not completely overlap with the local image area; accordingly, the selected focus semantic information for the focus area and the subsequently acquired detail semantic information may also be only partially related to the local image, or not related to the local image at all. At this time, the focus area acquiring unit 730 may determine one or more focus areas only from the panoramic semantic information and the corresponding semantic division areas thereof, and acquire detailed semantic information from the determined focus areas.

Fig. 6 illustrates a schematic diagram of a local image and a focal region acquired by a panoramic image according to an embodiment of the present invention. According to the example shown in fig. 6, the area range where the local fusion image obtained by fusing the local image acquired by the local sensor is located is within the dotted line frame in the panoramic image, and the infrared image at the lower part of fig. 6 is an enlarged schematic view of the focal area determined in the local image converted from the local fusion image. According to the infrared image of the focus area enlarged in fig. 6, the corresponding detail semantic information can be obtained, for example: the human mood is happy.

The description unit 740 obtains image description information using the panorama semantic information and the detail semantic information. The description unit 740 may fuse the panoramic semantic information and the detail semantic information, and obtain the image description information by combining weights. Optionally, the panoramic semantic information E for describing the scene may be fused with the detail semantic information S for describing details, and the final image description information may be obtained based on different model structures (e.g., weight averaging, bayesian estimation, data fusion neural network, reinforcement learning, etc.).

As described above, the panoramic image and the local image according to the embodiment of the present invention may be both still images, or may be a frame of video frame in a video. When the panoramic image and the local image are both video frames in the video, they may be a panoramic image in the panoramic video captured at the same time i and one or more local images in one or more local videos captured. When the panoramic image and the local image are respectively a frame of video frame collected at the same time in the video, the obtaining of the image description information by using the panoramic semantic information and the detail semantic information may include: will not be simultaneousAnd respectively fusing the carved panoramic semantic information and the detailed semantic information, and processing according to a time sequence to obtain the image description information which changes along with time. That is, the image description information for the video may be information that gradually evolves over time, rather than being fixed. At this time, the panorama semantic information in the image description information may be represented as E_iThe detail semantic information may be represented as S_iAnd i is time. Accordingly, the time series of the panoramic semantic information varying with i may be E_i-2，E_i-1，E_i，E_i+1…, the time series of detail semantic information that varies with i may be S_i-2，S_i-1，S_i，S_i+1… are provided. In one example, the result of the temporal change in image description information may be: two people talk outdoors, are happy at time i-1, begin to quarrel at time i, and so on.

R＝W_siS_i+W_eiE_i+W_s(i-1)S_i-1+W_e(i-1)E_i-1+…

P_r＝minL(P(P_si,P_ei))

turple(S,A,R,P)＝turple((S_si,S_ei),A,(R_si,R_ei),P)

According to the image processing device of the invention, the panoramic semantic information and the detail semantic information can be respectively acquired for the panoramic image and one or more local images in the panoramic image range, and the image description information can be obtained according to the panoramic semantic information and the detail semantic information. Therefore, the image description information obtained by the method can give consideration to the panoramic semantic information of the panoramic image about the scene description and the detail semantic information about the detail description in the local image focus area, the image description accuracy is improved, and the method can be effectively applied to the fields of automatic driving, robot interaction and the like.

Next, an image processing apparatus according to an embodiment of the present invention is described with reference to fig. 8. Fig. 8 shows a block diagram of an image processing apparatus 800 according to an embodiment of the present invention. As shown in fig. 8, the apparatus 800 may be a computer or a server.

As shown in fig. 8, the image processing device 800 includes one or more processors 810 and a memory 820, although, in addition to this, the image processing device 800 may include a multi-sensor imaging system and output device (not shown), etc., which may be interconnected via a bus system and/or other form of connection mechanism. It should be noted that the components and structure of the image processing apparatus 800 shown in fig. 8 are only exemplary and not limiting, and the image processing apparatus 800 may have other components and structures as necessary.

The processor 810 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may utilize computer program instructions stored in the memory 820 to perform desired functions, which may include: acquiring a panoramic image and one or more local images within the panoramic image; acquiring panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic division areas in the panoramic image; determining one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and acquiring detailed semantic information according to the determined focus areas; and obtaining image description information by using the panoramic semantic information and the detail semantic information.

Memory 820 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 810 to implement the functions of the image processing apparatus of the embodiments of the present invention described above and/or other desired functions, and/or may execute an image processing method according to an embodiment of the present invention. Various applications and various data may also be stored in the computer-readable storage medium.

In the following, a computer readable storage medium according to an embodiment of the present invention is described, on which computer program instructions are stored, wherein the computer program instructions, when executed by a processor, implement the steps of: acquiring a panoramic image and one or more local images within the panoramic image; acquiring panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic division areas in the panoramic image; determining one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and acquiring detailed semantic information according to the determined focus areas; and obtaining image description information by using the panoramic semantic information and the detail semantic information.

Of course, the above-mentioned embodiments are merely examples and not limitations, and those skilled in the art can combine and combine some steps and apparatuses from the above-mentioned separately described embodiments to achieve the effects of the present invention according to the concepts of the present invention, and such combined and combined embodiments are also included in the present invention, and such combined and combined embodiments are not necessarily described herein.

Note that advantages, effects, and the like mentioned in the present invention are merely examples and not limitations, and they cannot be considered essential to various embodiments of the present invention. Furthermore, the foregoing detailed description of the invention is provided for the purpose of illustration and understanding only, and is not intended to be limiting, since the invention will be described in any way as it would be understood by one skilled in the art.

The block diagrams of devices, apparatuses, systems involved in the present invention are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The flowchart of steps in the present invention and the above description of the method are only given as illustrative examples and are not intended to require or imply that the steps of the various embodiments must be performed in the order presented. As will be appreciated by those skilled in the art, the order of the steps in the above embodiments may be performed in any order. Words such as "thereafter," "then," "next," etc. are not intended to limit the order of the steps; these words are only used to guide the reader through the description of these methods. Furthermore, any reference to an element in the singular, for example, using the articles "a," "an," or "the" is not to be construed as limiting the element to the singular.

In addition, the steps and devices in the embodiments are not limited to be implemented in a certain embodiment, and in fact, some steps and devices in the embodiments may be combined according to the concept of the present invention to conceive new embodiments, and these new embodiments are also included in the scope of the present invention.

The individual operations of the methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software components and/or modules including, but not limited to, a circuit, an Application Specific Integrated Circuit (ASIC), or a processor.

The various illustrative logical blocks, modules, and circuits described may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an ASIC, a field programmable gate array signal (FPGA) or other Programmable Logic Device (PLD), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

The steps of a method or algorithm described in connection with the invention may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may reside in any form of tangible storage medium. Some examples of storage media that may be used include Random Access Memory (RAM), Read Only Memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, and the like. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. A software module may be a single instruction, or many instructions, and may be distributed over several different code segments, among different programs, and across multiple storage media.

The inventive methods herein comprise one or more acts for implementing the described methods. The methods and/or acts may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of actions is specified, the order and/or use of specific actions may be modified without departing from the scope of the claims.

The functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions on a tangible computer-readable medium. A storage media may be any available tangible media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. As used herein, disk (disk) and disc (disc) includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

Accordingly, a computer program product may perform the operations presented herein. For example, such a computer program product may be a computer-readable tangible medium having instructions stored (and/or encoded) thereon that are executable by one or more processors to perform the operations described herein. The computer program product may include packaged material.

Software or instructions may also be transmitted over a transmission medium. For example, the software may be transmitted from a website, server, or other remote source using a transmission medium such as coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, or microwave.

Further, modules and/or other suitable means for carrying out the methods and techniques described herein may be downloaded and/or otherwise obtained by a user terminal and/or base station as appropriate. For example, such a device may be coupled to a server to facilitate the transfer of means for performing the methods described herein. Alternatively, the various methods described herein can be provided via storage means (e.g., RAM, ROM, a physical storage medium such as a CD or floppy disk) so that the user terminal and/or base station can obtain the various methods when coupled to or providing storage means to the device. Further, any other suitable technique for providing the methods and techniques described herein to a device may be utilized.

Other examples and implementations are within the scope and spirit of the invention and the following claims. For example, due to the nature of software, the functions described above may be implemented using software executed by a processor, hardware, firmware, hard-wired, or any combination of these. Features implementing functions may also be physically located at various locations, including being distributed such that portions of functions are implemented at different physical locations. Also, as used herein, including in the claims, "or" as used in a list of items beginning with "at least one" indicates a separate list, such that a list of "A, B or at least one of C" means a or B or C, or AB or AC or BC, or ABC (i.e., a and B and C). Furthermore, the word "exemplary" does not mean that the described example is preferred or better than other examples.

Various changes, substitutions and alterations to the techniques described herein may be made without departing from the techniques of the teachings as defined by the appended claims. Moreover, the scope of the present claims is not intended to be limited to the particular aspects of the process, machine, manufacture, composition of matter, means, methods and acts described above. Processes, machines, manufacture, compositions of matter, means, methods, or acts, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or acts.

The previous description of the inventive aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the invention to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. An image processing method comprising:

acquiring a panoramic image and one or more local images within the panoramic image;

acquiring panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic division areas in the panoramic image;

determining one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and acquiring detailed semantic information according to the determined focus areas;

and obtaining image description information by using the panoramic semantic information and the detail semantic information.

2. The method of claim 1, wherein the acquiring the panoramic image and the one or more local images within the panoramic image comprises:

acquiring one or more local fusion images from a panoramic fusion image based on the positions of the one or more local fusion images in the panoramic fusion image, wherein the panoramic fusion image is obtained by fusing the panoramic image and the one or more local images, and the local images are fused and correspond to the local fusion images in the panoramic fusion image one to one;

and processing the local fusion image to obtain the panoramic image and/or the local image.

3. The method of claim 2, wherein the processing the locally fused image to obtain the panoramic image and/or the local image comprises:

and carrying out coordinate transformation on the one or more local fusion images to obtain the local images.

4. The method of claim 2, wherein the processing the locally fused image to obtain the panoramic image and/or the local image comprises:

performing coordinate transformation on the one or more locally fused images;

resampling the local fusion image after coordinate transformation;

and carrying out coordinate inverse transformation on the local fusion image after resampling so as to obtain the panoramic image.

5. The method of claim 1, wherein the obtaining panoramic semantic information from the panoramic image further comprises:

and acquiring background information and/or scene description information of the panoramic image according to the panoramic semantic information of the panoramic image and the corresponding semantic division area.

6. The method of claim 1, wherein the determining one or more focal regions in the one or more local images according to the panoramic semantic information and its corresponding semantic zoning regions comprises:

and selecting one or more pieces of focus semantic information related to the local images from the panoramic semantic information, and determining one or more focus areas in the one or more local images according to the selected focus semantic information and the corresponding semantic division areas.

7. The method of claim 1, wherein the deriving image description information using the panorama semantic information and the detail semantic information comprises:

and fusing the panoramic semantic information and the detail semantic information to obtain the image description information.

8. The method of claim 1, wherein,

the panoramic image and the local image are respectively a frame of video frame collected at the same time in the video.

9. The method of claim 8, wherein when the panoramic image and the local image are respectively a frame of video frames captured at the same time in a video, the obtaining image description information by using the panoramic semantic information and the detail semantic information comprises:

and respectively fusing the panoramic semantic information and the detail semantic information at different moments, and processing according to a time sequence to obtain the image description information which changes along with time.

10. The method of any one of claims 1-9,

the panoramic image is acquired by a panoramic sensor in a multi-sensor imaging system;

the one or more local images are acquired by one or more local sensors in a multi-sensor imaging system.

11. An image processing apparatus comprising:

an acquisition unit that acquires a panoramic image and one or more partial images within the panoramic image;

the semantic dividing unit acquires panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic dividing areas in the panoramic image;

a focus area obtaining unit, which determines one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and obtains detail semantic information according to the determined focus areas;

and the description unit is used for obtaining image description information by utilizing the panoramic semantic information and the detail semantic information.

12. An image processing apparatus comprising:

a processor;

and a memory having computer program instructions stored therein,

wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of:

13. A computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the steps of: