WO2019184275A1

WO2019184275A1 - Image processing method, device and system

Info

Publication number: WO2019184275A1
Application number: PCT/CN2018/106752
Authority: WO
Inventors: 金海善; 林圣拿; 何溯; 杨俊�
Original assignee: 杭州海康威视系统技术有限公司
Priority date: 2018-03-29
Filing date: 2018-09-20
Publication date: 2019-10-03

Abstract

Disclosed in an embodiment of the present invention are an image processing method, device and system. The method comprises: adding a tag to a target position in a video frame image, and displaying the video frame image added with the tag. The tag can help a user understand a specific content contained in the video frame image. The video frame image added with the tag can more intuitively present a content therein, thereby providing better presentation effect.

Description

Image processing method, device and system

The present application claims priority to Chinese Patent Application No. 201810272370.9, entitled "Image Processing Method, Apparatus and System", filed on March 29, 2018, the entire contents of which are incorporated herein by reference. In the application.

Technical field

The present application relates to the field of video surveillance technologies, and in particular, to an image processing method, device, and system.

Background technique

At present, image acquisition devices are provided in many scenes, and related personnel can monitor the scene through video frame images collected by the device. In general, when displaying a video frame image, the display content only includes the image itself and the acquisition time of the image. For the user who views the video frame image, he can only familiarize himself with the real environment corresponding to the video frame image, and understand the specific content contained in the image based on the real environment. It can be seen that this image display method is not intuitive and the display effect is poor.

Summary of the invention

An object of the embodiments of the present application is to provide an image processing method, device, and system, which improve the display effect of a video frame image.

To achieve the above objective, an embodiment of the present application discloses an image processing method, including:

Determining at least one target location in the video frame image for the video frame image acquired by the first acquisition device;

Adding a label at each determined target location, the label being generated according to an input instruction or an image acquired by the second collection device;

The video frame image after the tag is displayed according to the preset display rule.

Optionally, the video frame image is a panoramic image, the first collecting device is configured to correspond to at least one second collecting device, and the second collecting device performs image capturing on the sub-scene corresponding to the panoramic image.

Before determining at least one target location in the video frame image, the method further includes:

Obtaining a sub-scene image collected by the second collection device;

Generating a label according to the sub-scene image;

The step of determining at least one target location in the video frame image includes:

And determining, according to the calibration information of the first collection device and the second collection device that are acquired in advance, a target location of the label corresponding to the second collection device in the panoramic image.

Optionally, the first collection device is an augmented reality AR panoramic camera.

Optionally, the step of generating a label according to the image of the sub-scene includes:

Adding the sub-scene image and/or target information in the sub-scene image to the content of the tag.

Optionally, the step of adding the target information in the sub-scene image to the content of the label includes:

Identifying the sub-scene image, determining target information in the sub-scene image according to the recognition result; adding the target information to the content of the label;

Or receiving the target information sent by the second collection device; adding the target information to the content of the label;

Or receiving the target information sent by a server communicatively coupled to the second collection device; adding the target information to the content of the tag.

Optionally, the step of displaying the tagged video frame image according to the preset display rule includes:

In the first area, displaying the video frame image after the tag is added;

In the second area, the content of the added tag is displayed.

In the form of a picture-in-picture, the video frame image after the tag is added, and the content of the added tag.

Optionally, display the contents of the added tags, including:

In the added tag, determine the current display tag;

Show the content of the current display tag.

Optionally, the method further includes:

After detecting that the user clicks on the label in the video frame image, the clicked label is determined as the target label;

The content of the target tag is displayed in the video frame image.

Optionally, before the step of determining the at least one target location in the video frame image, the method further includes:

Receiving a tag addition instruction;

Generating a label according to the label adding instruction;

A target location of the added tag is determined according to the tag addition instruction.

Determining a layer corresponding to each label according to a preset layer classification policy;

Determining a layer display strategy, and determining, according to the layer display strategy, a current display layer and a display manner of the current display layer;

In the display manner, the label corresponding to the current display layer is displayed.

Optionally, the step of acquiring the sub-scene image collected by the second collection device includes:

Detecting whether an abnormal event occurs in the panoramic image;

If yes, determining a target second collection device corresponding to the abnormal event;

Obtaining a sub-scene image collected by the target second collection device;

According to the sub-scene image, the step of generating a label includes:

And generating, according to the sub-scene image, a label corresponding to the abnormal event.

Optionally, the step of detecting whether an abnormal event occurs in the panoramic image comprises:

Matching the panoramic image with a preset anomaly model;

If the match is successful, it means that an abnormal event occurs in the panoramic image.

Or determining whether abnormal event alarm information for the panoramic image is received;

If received, an abnormal event occurs in the panoramic image.

Optionally, the step of determining the target second collection device corresponding to the abnormal event includes:

Determining a location of the abnormal event in the panoramic image;

Determining, according to the pre-acquired calibration information of the first collection device and each second collection device, the target second collection device corresponding to the location.

Optionally, in the case that an abnormal event occurs in the panoramic image is detected, the method further includes:

Determining whether the location of the abnormal event in the panoramic image is located in a preset focus area;

If yes, the step of displaying the tagged video frame image according to the preset display rule includes:

The label is displayed in the video frame image in a preset alarm mode.

In order to achieve the above objective, an embodiment of the present application further discloses an image processing apparatus, including: a processor and a memory;

a memory for storing a computer program;

The processor, when used to execute the program stored on the memory, implements the following steps:

Adding a label at each determined target location, the label being generated according to the user input content or an image acquired by the second collection device;

The processor is further configured to implement the following steps:

Obtaining a sub-scene image collected by the second collection device;

Generating a label according to the sub-scene image;

Optionally, the processor is further configured to implement the following steps:

In the first area, displaying the video frame image after the tag is added;

In the second area, the content of the added tag is displayed.

In the added tag, determine the current display tag;

Show the content of the current display tag.

The content of the target tag is displayed in the video frame image.

Receiving a tag addition instruction;

Generating a label according to the label adding instruction;

Detecting whether an abnormal event occurs in the panoramic image;

Obtaining a sub-scene image collected by the target second collection device;

Matching the panoramic image with a preset anomaly model;

If received, an abnormal event occurs in the panoramic image.

Determining a location of the abnormal event in the panoramic image;

When it is detected that an abnormal event occurs in the panoramic image, determining whether a location of the abnormal event in the panoramic image is located in a preset focus area;

If so, the label is displayed in the video frame image in a preset alarm mode.

In order to achieve the above object, an embodiment of the present application further discloses an image processing system, including: a first collection device and an image processing device, where

The first collecting device is configured to collect a video frame image, and send the collected video frame image to the image processing device;

The image processing device is configured to determine, according to a video frame image acquired by the first collection device, at least one target location in the video frame image; add a label at each determined target location, the label is based on user input The content or the image acquired by the second collection device is generated; and the video frame image after the tag is added is displayed according to the preset display rule.

Optionally, the system further includes: at least one second collection device,

The second collection device is configured to perform image collection on a sub-scene corresponding to the panoramic image, where the panoramic image is a video frame image collected by the first collection device;

The image processing device is further configured to acquire a sub-scene image acquired by the second collection device; generate a label according to the sub-scene image; and determine, according to the calibration information of the first collection device and the second collection device acquired in advance The label corresponding to the second collection device is at a target position in the panoramic image.

In order to achieve the above object, an embodiment of the present application further discloses a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, implement any one of the foregoing image processing methods. .

In order to achieve the above object, an embodiment of the present application further discloses an executable program code for being executed to execute any of the image processing methods described above.

Applying the embodiment of the present application, adding a label at a target position in the video frame image, and then displaying the labeled video frame image; the label can help the user understand the specific content included in the video frame image, and therefore, adding the tagged video The frame image can display the image content more intuitively, and the display effect is better.

DRAWINGS

In order to more clearly illustrate the embodiments of the present application and the technical solutions of the prior art, the following description of the embodiments and the drawings used in the prior art will be briefly introduced. Obviously, the drawings in the following description are only Some embodiments of the application may also be used to obtain other figures from those of ordinary skill in the art without departing from the scope of the invention.

FIG. 1 is a schematic diagram of a first process of an image processing method according to an embodiment of the present disclosure;

FIG. 1 is a schematic diagram of a display interface according to an embodiment of the present application;

FIG. 1b is a schematic diagram of another display interface provided by an embodiment of the present application;

2 is a second schematic flowchart of an image processing method according to an embodiment of the present application;

2a is a schematic diagram of an application scenario provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a third process of an image processing method according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

FIG. 4b is a schematic structural diagram of another image processing device according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of an image processing system according to an embodiment of the present application.

detailed description

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

In order to solve the above technical problem, an embodiment of the present application provides an image processing method, device, and system. The method can be applied to various image processing devices, and is not specifically limited.

An image processing method provided by an embodiment of the present application is described in detail below.

FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure, including:

S101: Determine, for the video frame image acquired by the first collection device, at least one target location in the video frame image.

The processing object of the embodiment of the present application is a video frame image, and the image provided by the embodiment of the present application may be processed for each frame of the video.

There are several ways to determine the target location. For example, one or more fixed positions may be set in advance in the video frame image as the target position; for example, the intermediate position of the video frame image may be set as the target position. Alternatively, the user-specified location may be determined as the target location according to the user instruction. It should be noted that, for the same video, or multiple videos of the same scene, the user may only send an instruction once, according to the instruction, may be in this paragraph or The target position is determined in each of the plurality of pieces of video.

It can be understood that the installation location of the first collection device is generally fixed, and the scene corresponding to the captured video frame image is also substantially unchanged. Therefore, in each video frame image, the difference in the screen content corresponding to the preset position is usually The difference in the screen content corresponding to the position specified in the above user command is usually not large, and the target position can be determined in the plurality of video frame images according to an instruction sent by the user.

The way to determine the target location can also be other, and is not limited.

S102: Add a label at each determined target position, and the label is generated according to an input instruction or an image acquired by the second collection device.

The input command can be a tagged instruction entered by the user. The second collection device may be an acquisition device that is configured in the same scenario as the first collection device. For example, the first collection device performs image collection for the scene A, and the second collection device performs image collection for the sub-scenario A1 in the scene A.

For example, the label may include a "tag symbol" and a "tag content". For example, the "tag symbol" may be an arrow, a triangle, etc., and the "tag symbol" is for marking a position in the video frame image. The specific format of the label is not limited; the content of the label may be an image collected by other collection devices, or may be some image analysis data, or may be associated data of the scene at the label, and the like, and is not limited.

The image analysis data may be a face recognition result, a vehicle recognition result, or the like, and the associated data of the scene may be an introduction content of the scene, or if the scene is a traffic bayonet, the associated data may be traffic flow data or the like. In addition, the tag may also include a "tag name", for example, may be some simple text information, such as "some building", "some park" and the like.

For example, if the input instruction is the text information “some building” input by the user and the specific introduction of the building, a label may be generated, and the label symbol may be an arrow, and the label name may be the text information “some building”, the label The content can be a specific introduction to the building.

For another example, the target location is a traffic bayonet, and the label content added at the traffic bayonet may be video data collected at the bayonet, a captured image at the bayonet, traffic flow data at the bayonet, and the like.

As an implementation manner, the user can design his own label according to his own needs. Specifically, the user may click on a certain position in the video frame image and input some text or image content; the device performing the scheme may generate a corresponding label according to the content input by the user, and after the video frame image and the subsequent In the video frame image, the location clicked by the user is determined as the target location, and the generated tag is added at the target location.

Or, as another implementation manner, the label may be generated according to the image collected by the other collection device. For example, the first collection device performs image collection for the scene A, and the second collection device performs image collection for the sub-scenario A1 in the scene A. The tag may be generated according to the image collected by the second collecting device, and the position corresponding to the sub-scene A1 is determined as the target position in S101, and the tag corresponding to the sub-scene A1 is added at the target position.

Alternatively, the label added in the video frame image includes both the label generated according to the user's needs and the label generated according to the image collected by other collection devices, so that the label type is more abundant.

S103: Display the video frame image after the labeling according to the preset display rule.

As described above, the tag may include a "tag symbol" and a "tag content". As an embodiment, the "tag symbol" and the "tag content" may be separately displayed. For example, the "tag symbol" may be added to the video frame. In the image, the "tag content" is displayed in an area other than the video frame image, so that the content of the tag does not cover the video frame image, and the display effect is better. If the tag further includes a "tag name", the "tag name" may be displayed in the video frame image, or may be displayed in an area outside the video frame image, which is not limited.

For example, in the first area, the video frame image after the tag is added may be displayed, and in the second area, the content of the added tag may be displayed. The first area and the second area may be different areas of the same display device, or may be a display area in an adjacent display device, which is not limited.

Or, as shown in FIG. 1a, in the form of picture-in-picture, the video frame image after adding the label and the content of the added label are displayed. Specifically, the video frame image after the tag is added may be displayed in the main screen area, and the content of the added tag is displayed in the small screen area. The small screen area may be located at any position on the right side, the left side, the upper side, and the lower side of the main screen area, and is not limited.

As described above, the "content of the tag" can be of various types, such as video data, captured images, image analysis data, and the like, and different types of data can be displayed in different areas. For example, the video data and the captured image may be displayed in the small screen area or the second area in the above picture, and the image analysis data may be displayed in the video frame image, etc., and the specific display manner is not limited.

In addition, the specific shape, color, transparency, and specific type of "tag content" of the "tag symbol" may be set in advance or may be changed according to user selection.

If you add a large number of labels, you can overlay the labels, or you can display only the contents of some labels in the second area or in the small area of the picture. Specifically, in the added label, the current display label may be determined; and the content of the current display label is displayed.

There are various ways to determine the current display label. For example, the display order can be set, and the current display label is determined according to the order. The display order can be determined randomly, or can be set according to the importance degree of each label. Alternatively, after receiving the display instruction of the user for a certain label, the label corresponding to the display instruction is determined as the current display label, and the like, and is not limited.

As an embodiment, if it is detected that the user clicks on the tag in the video frame image, the clicked tag may be determined as the target tag; the content of the target tag is displayed in the video frame image.

It can be understood that if the user clicks on the label in the video frame image, in order to better respond to the user's needs, the content of the label can be directly displayed in the video frame image.

As an implementation manner, a layer classification policy may be preset, and according to the policy, a layer category corresponding to each label is determined. In other words, it is to divide each label into different layer categories. For example, you can divide labels into intersection label layers, bayonet label layers, area label layers, building label layers, and more.

In such an embodiment, the layer display strategy can be determined based on user instructions. The layer display strategy can include the current display layer and how the current display layer is displayed.

In the first case, the user instruction only includes the current display layer information, and the device determines the current display layer according to the user instruction. In addition, the device stores the display manner corresponding to each layer, so that the device can further determine the current display image. The display mode of the layer; in the second case, the user instruction includes the current display layer information and the display mode information, and the device can determine the current display layer and the display mode of the current display layer according to the user instruction, which are all reasonable.

The display mode may include: flashing display, jitter display, static display, etc., and is not limited.

It should be noted that, in this embodiment, the label is displayed separately from the content of the label, and the display manner may include the manner in which the label is displayed, or the manner in which the label content is displayed, for example, a building label. The corresponding display manner of the layer may be: the label is displayed in the video frame image, and the corresponding label content is flashed in other areas (the second area or the picture-in-picture area).

As an implementation manner, the detailed image corresponding to the video frame image collected by the first collection device may be acquired, and after S101, according to the pixel point correspondence between the detail image and the video frame image acquired in advance, Determining that the target location corresponds to the location in the detail image as the to-be-processed location; adding the label added at the target location to the to-be-processed location corresponding to the target location; in this embodiment, S103 may include: The video frame image after the tag is added and the detailed image after the tag is displayed according to the preset display rule.

For example, the video frame image acquired in S101 may be a panoramic image, and in addition, a detailed image corresponding to the panoramic image may be acquired, and the panoramic view is obtained according to a pixel point correspondence relationship between the panoramic image and the detailed image. The label added to the image corresponds to the detail image, and the label is also added in the detail image.

Specifically, the third collection device may be disposed outside the first collection device, where the first collection device and the third collection device perform image collection for the same scene, the first collection device collects the panoramic image, and the third collection device collects the detailed image. . The third collecting device can be a ball machine, the ball machine can be rotated, and detailed images of different viewing angles can be collected. The pixel point correspondence between the panoramic image and the detail image may be obtained according to calibration information between the first collection device and the third collection device.

For example, assuming that the panoramic image A includes four regions: region 1, region 2, region 3, and region 4, the dome camera can collect detailed images corresponding to the four regions, which are the detail image B1 and the detail image B2, respectively. Detail image B3 and detail image B4. The four detail images can be displayed in turn in a preset order.

Assume that the currently displayed detail image is B1, assuming that 10 target positions are determined in area 1, and labels are added for the 10 target positions, correspondingly, there are also 10 pending positions in the detail image B1, and Add the same 10 labels to these 10 pending locations. In one case, since the number of tags is large, only a part of the tags may be displayed in the area 1 of the panoramic image A, and the 10 tags are displayed in the detail image B1.

As an implementation manner, the video frame image after the label is added may be displayed in the first area, and the detailed image after the label is added may be displayed in the third area; or, the label may be displayed in the form of picture-in-picture The video frame image and the detailed image after the tag is added.

In one case, the display label described here only displays the "tag symbol" and displays the "tag content" in another area. For example, in the first area, the image of the added video frame may be displayed, in the second area, the content of the added label is displayed, and in the third area, the detailed image after the label is added. The first area, the second area, and the third area mentioned herein may be different areas of the same display device, or may be display areas in different display devices.

For another example, as shown in FIG. 1b, the video frame image after the tag is added, the detailed image after the tag is added, and the content of the added tag can be displayed in the form of picture-in-picture. In FIG. 1b, the video frame image after adding the label is displayed in the main screen area, the detailed image after adding the label is displayed in the small screen area in the lower left corner, and the content of the added label is displayed in the small screen area on the right side. There are many ways to display, and the specifics are not limited.

If the label further includes a "tag name", the "tag name" may be displayed in the video frame image, or may be displayed in an area outside the video frame image, which is not limited.

Applying the embodiment shown in FIG. 1 of the present application, adding a label at a target position in the video frame image, and then displaying the labeled video frame image; the label can help the user understand the specific content included in the video frame image, and therefore, adding The video frame image behind the label can display the image content more intuitively, and the display effect is better.

FIG. 2 is a second schematic flowchart of an image processing method according to an embodiment of the present disclosure. The embodiment shown in FIG. 2 is based on the embodiment shown in FIG.

S201: Acquire a sub-scene image collected by the second collection device.

In the embodiment shown in FIG. 2, the video frame image collected by the first collection device is a panoramic image, the first collection device corresponds to at least one second collection device, and the second collection device performs image collection on the sub-scene corresponding to the panoramic image. The image collected by the second collection device is a sub-scene image.

As an implementation manner, the first collection device may be an augmented reality AR panoramic camera, so that the collected panoramic image is better.

Alternatively, the first collecting device may also be a plurality of guns, and the images collected by the plurality of guns are spliced to obtain a panoramic image.

The second collection device can be an ordinary camera, such as a ball machine, a capture machine, and the like. If the second collection device is a dome camera, the sub-scene image may be a surveillance video image. If the second collection device is a capture camera, the sub-scene image may be a snapshot image, and the like, which is not limited.

For example, as shown in FIG. 2a, a large scene A includes four sub-scenes: A1, A2, A3, and A4. The first collection device performs image collection on scene A, and the second collection device 1 performs A1 on A1. Image acquisition, the second collection device 2 performs image acquisition on A2, the second collection device 3 performs image acquisition on A3, and the second collection device 4 performs image acquisition on A4.

As another example, the first collection device and the second collection device may be the same device, such as an AR eagle eye device, and the AR eagle eye device has an augmented reality function. The AR eagle eye device may be integrated with a plurality of camera lenses and one In the ball machine lens, the image obtained by splicing the plurality of camera lenses can be used as a panoramic image, and the image captured by the camera lens is used as a sub-scene image. The AR Hawkeye device can also be provided with a platform for scheduling and managing the plurality of camera lenses and a dome camera lens.

In the first solution, the second collection device sends the collected sub-scene image to the device that executes the solution in real time.

In the second solution, the device that executes the solution acquires the sub-scene image from the second collection device after receiving the user instruction.

In a third aspect, the device that executes the solution acquires the sub-scene image from the second collection device corresponding to the abnormal event after detecting an abnormal event in the video frame image (panoramic image) of the S101. The abnormal event may be a traffic accident, a robbery event, etc., and is not limited.

The embodiment of the present application does not limit the timing of acquiring a sub-scene image.

S202: Generate a label according to the sub-scene image.

For example, the label may include a "tag symbol" and a "tag content". For example, the "tag symbol" may be an arrow, a triangle, etc., and the "tag symbol" is for marking a position in the video frame image. The label, the specific form is not limited; the "content of the label" may include the sub-scene image. In addition, the tag may also include a "tag name", for example, may be some simple text information, such as "some building", "some park" and the like.

As an embodiment, the sub-scene image and/or the target information in the sub-scene image may be added to the content of the tag.

That is to say, in the first case, only the sub-scene image is included in the tag, and the sub-scene image acquired by S102 is added to the content of the tag.

In the second case, the tag contains the target information in the sub-scene image.

For example, if the scene for the panoramic image in S101 is a traffic intersection, the target information may include vehicle information in the image, such as a license plate number, a vehicle body color, etc., and may also include road information, such as traffic flow in the road; or In the above third solution, the target information may be abnormal event information, such as a traffic accident.

If the scene for the panoramic image in S101 is a scene in the corridor, the target information may be character information in the image, such as height, gender, etc.; or, in the third scheme, the target information may be abnormal event information, such as Robbery, fire, etc.

The method for obtaining the target information is different. For example, (1) the device that executes the solution may identify the sub-scene image acquired by the S 201, and determine the target information in the sub-scene image according to the recognition result; (2) The second collection device may have an image recognition function, and the second collection device sends the identified target information to the device; (3) the server connected to the second collection device identifies the sub-scene image, and The identified target information is sent to the device; these methods are reasonable.

In the third case, the tag contains both the sub-scene image and the target information in the sub-scene image.

The target information can be understood as an introduction or description of the sub-scene image, and the target information can be set around the sub-scene image so that the user can better understand what is happening in the sub-scene image.

In the embodiment shown in FIG. 2, S101 may be S101A: determining, according to the calibration information of the first collection device and the second collection device that are acquired in advance, a target position of the label corresponding to the second collection device in the panoramic image.

A person skilled in the art can understand that in the scenario shown in FIG. 2a, there is a calibration relationship between the first acquisition device and the four second acquisition devices, and the calibration relationship can be understood as a relationship between the panoramic image coordinate system and the sub-scene image coordinate system. Conversion relationship. For example, there is a position X in the sub-scene A1, the pixel coordinate point of the position X in the panoramic image is (x1, y1), and the pixel coordinate point in the sub-scene image acquired by the second acquisition device 1 is (x2, Y2), the calibration relationship is the conversion relationship between (x1, y1) and (x2, y2).

In this embodiment, related information (calibration information) of the calibration relationship may be acquired in advance, and the calibration information may be used to determine a position of the label of the second collection device in the panoramic image.

In an embodiment, a third collection device is further disposed in addition to the first collection device and the second collection device. For example, the first collection device is a plurality of guns, and the panoramic image is acquired, and the second collection device is a capture camera. The captured image is captured as a sub-scene image, and the third acquisition device acquires a detailed image.

This embodiment can be:

1. acquiring a panoramic image collected by the first collecting device, a sub-scene image collected by the second collecting device, and a detailed image collected by the third collecting device;

Determining at least one target position in the panoramic image, and determining, according to the calibration information between the first collection device and the third collection device, the target position corresponding to the position in the detail image as the to-be-processed position;

Third, generating a label according to the sub-scene image collected by the second collection device, or using the sub-scene image as the content of the label;

And determining, according to the calibration information between the first collection device and the second collection device, a target position of the label corresponding to the second collection device in the panoramic image, adding the label at the determined target position; and determining Add the label at the location to be processed corresponding to the target location;

5. According to the preset display rule, the panoramic image after the label is added, the detailed image after the label is added, and the content of the added label are displayed.

In the existing solution, the images collected by different devices can only be displayed separately (there is no relationship between the images). If the user needs to pay attention to the images collected by multiple devices, you need to switch back and forth between the images collected by the multiple devices. complex.

Applying the embodiment shown in FIG. 2, the first collecting device collects the panoramic image, and the second collecting device collects the image of the sub-scene in the panoramic image to generate a sub-scene image; generates a label according to the sub-scene image, and adds the label to the label. In the panoramic image, the panoramic image after the label is displayed is displayed; thus, the solution displays the image (the panoramic image) collected by the first collection device and the image (label) collected by the second collection device, and the user does not display If you need to switch, you can pay attention to the images collected by multiple devices, and the operation is simple.

The third scheme mentioned in the embodiment shown in Fig. 2 will be described below.

Specifically, an abnormal event may be detected in the panoramic image collected by the first collection device; if yes, the target second collection device corresponding to the abnormal event is determined; and the sub-scene image collected by the target second collection device is acquired.

As an implementation manner, the abnormality model may be preset: according to the above description, the abnormal events may include traffic accidents, robberies, fires, etc., and these abnormal events may be simulated in advance to generate corresponding abnormal models. The panoramic image is then matched with the preset anomaly model. If the matching is successful, an abnormal event occurs in the panoramic image. The position where the match is successful is the position of the abnormal event in the panoramic image.

Alternatively, as another implementation manner, the abnormal event alarm information sent by the other device or the user for the panoramic image may be received, and the alarm information is received, and an abnormal event occurs in the panoramic image.

It can be understood that the device that implements the solution can communicate with other devices, and other devices can send abnormal event alarm information to the device after determining that an abnormal event occurs in the panoramic image. Alternatively, the user can also send an abnormal event alarm message to the device, which is also reasonable. The abnormal event alarm information may carry the position of the abnormal event in the panoramic image.

According to the above description, in the scenario shown in FIG. 2a, there is a calibration relationship between the first collection device and the four second collection devices. In this embodiment, related information (calibration information) of the calibration relationship may be acquired in advance. The calibration information can determine the target second collection device corresponding to the above-mentioned "position of the abnormal event in the panoramic image", that is, the second collection device that performs image acquisition for the sub-scene where the abnormal event is located.

In this solution, S202 is: generating a label corresponding to the abnormal event according to the sub-scene image.

In addition, in the present solution, the focus area may be divided in the panoramic image in advance, and when an abnormal event occurs in the panoramic image is detected, it may be determined whether the position of the abnormal event in the panoramic image is at a preset. The focus area; if so, the label is displayed in the video frame image in a preset alarm mode.

For example, if the intersection A in the panoramic image is an area that needs to be focused, the intersection A is set as the focus area in the panoramic image in advance; if an abnormal event occurs in the panoramic image, and the abnormal event occurs at the intersection In A, the label is displayed in the video frame image in a preset alarm mode.

There are various preset alarm modes, such as flickering, dithering, or direct output of prompt information. It should be noted that, if an embodiment in which the content of the label and the label are separately displayed is used, the content of the label may be displayed in the second area or the picture-in-picture area by an alarm method, for example, the color change of the pop-up window, the pop-up window shake, etc. The specific is not limited.

Applying the above scheme, it is possible to monitor the occurrence of an abnormal event in the panoramic image, and generate a label for the abnormal event to improve the monitoring effect.

FIG. 3 is a third schematic flowchart of an image processing method according to an embodiment of the present disclosure. The embodiment shown in FIG. 3 is based on the embodiment shown in FIG.

S301: Receive a label adding instruction sent by a user.

For example, in the interface shown in FIG. 1a, the user can click on a target such as a building or an intersection in a video frame image, and then input the content related to the target (target content), and the target content may include text information (such as a building name). , or other relevant instructions), or can also contain images.

When the device that executes the solution detects the user's click and receives the target content sent by the user, it considers that the tag addition instruction sent by the user is received. That is to say, the tag addition instruction can carry the target location (the location clicked by the user) and the target content (the content, text or image input by the user).

It should be noted that the user may also obtain the sub-scene image collected by the second collection device, and use the acquired sub-scene image as the target content, or the user may select the sub-scene image and the target information in the sub-scene image (as shown in FIG. 2 The target information in the embodiment has the same meaning and will not be described again as the target content.

S302: Generate a label according to the label adding instruction.

For example, the label may include a "tag symbol" and a "tag content". For example, the "tag symbol" may be an arrow, a triangle, etc., and the "tag symbol" is for marking a position in the video frame image. The specific form of the label is not limited; in this embodiment, the target content input by the user may be used as the content of the label.

In addition, the tag may also include a "tag name", for example, may be some simple text information, such as "some building", "some park" and the like. It is also possible to use part of the content input by the above user as the name of the tag.

In this case, S101 is S101B: determining the target position of the added tag according to the tag adding instruction. The target location is the location that the above user clicks.

Applying the embodiment shown in FIG. 3 of the present application, the location and content of the label are determined by the user, that is, the user can design his own label according to his own needs, and the user experience is better.

Corresponding to the above method embodiment, the embodiment of the present application further provides an image processing device.

The embodiment of the present application further provides an image processing device, as shown in FIG. 4a, comprising: a processor 401 and a memory 402;

a memory 402 for storing a computer program;

The processor 401 is configured to implement any of the above image processing methods when executing a program stored on the memory 402.

FIG. 4b is a schematic structural diagram of another image processing apparatus according to an embodiment of the present disclosure, including: a housing 501, a processor 502, a memory 503, a circuit board 504, and a power supply circuit 505, wherein the circuit board 504 is disposed in the housing 501. Inside the enclosed space, the processor 502 and the memory 503 are disposed on the circuit board 504; the power supply circuit 505 is configured to supply power to the respective circuits or devices of the image processing apparatus; the memory 503 is configured to store executable program code; the processor 502 passes The executable program code stored in the memory 503 is read to execute a program corresponding to the executable program code for performing the following steps:

As an embodiment, the video frame image is a panoramic image, the first collection device corresponds to at least one second collection device, and the second collection device performs image collection on the sub-scene corresponding to the panoramic image.

The processor is further configured to implement the following steps:

Obtaining a sub-scene image collected by the second collection device;

Generating a label according to the sub-scene image;

And determining, according to the calibration information of the first collection device and the second collection device that are acquired in advance, a target position of the label corresponding to the second collection device in the panoramic image.

As an implementation manner, the processor is further configured to implement the following steps:

In the first area, displaying the video frame image after the tag is added;

In the second area, the content of the added tag is displayed.

In the added tag, determine the current display tag;

Show the content of the current display tag.

The content of the target tag is displayed in the video frame image.

Receiving a tag addition instruction;

Generating a label according to the label adding instruction;

Detecting whether an abnormal event occurs in the panoramic image;

Obtaining a sub-scene image collected by the target second collection device;

Matching the panoramic image with a preset anomaly model;

If received, an abnormal event occurs in the panoramic image.

Determining a location of the abnormal event in the panoramic image;

If so, the label is displayed in the video frame image in a preset alarm mode.

Obtaining a detail image corresponding to the video frame image collected by the first collection device;

Determining, according to a pixel point correspondence relationship between the detail image and the video frame image acquired in advance, that the target position corresponds to a position in the detail image as a to-be-processed position;

Adding a label added at the target location to a to-be-processed location corresponding to the target location;

The video frame image after the tag is added and the detailed image after the tag is displayed according to the preset display rule.

In the first area, displaying the video frame image after the label is added; in the third area, displaying the detailed image after adding the label;

Or, in the form of a picture-in-picture, the video frame image after the tag is added, and the detail image after the tag is added.

Applying the embodiment shown in the present application, adding a label at a target position in the video frame image, and then displaying the labeled video frame image; the label can help the user understand the specific content included in the video frame image, and therefore, after adding the label The video frame image can display the image content more intuitively, and the display effect is better.

The embodiment of the present application further provides an image processing system, where the system may include: a first collection device and an image processing device, where

The image processing device is configured to determine, according to a video frame image acquired by the first collection device, at least one target location in the video frame image; adding a label at each determined target location, the label according to the input instruction Or the image collected by the second collection device is generated; and the video frame image after the label is added is displayed according to the preset display rule.

As shown in FIG. 5, the system further includes: at least one second collection device (second acquisition device 1, second collection device 2, second collection device 3, and second collection device 4) ,

The image processing device in this embodiment may be a platform device, which may acquire resources from multiple collection devices, display images, and interact with users.

As an implementation manner, the first collection device is an augmented reality AR panoramic camera.

As an implementation manner, the image processing device can also be used to:

In the first area, displaying the video frame image after the tag is added;

In the second area, the content of the added tag is displayed.

As an implementation manner, the image processing device can also be used to:

In the added tag, determine the current display tag;

Show the content of the current display tag.

As an implementation manner, the image processing device can also be used to:

The content of the target tag is displayed in the video frame image.

As an implementation manner, the image processing device can also be used to:

Receiving a tag addition instruction;

Generating a label according to the label adding instruction;

As an implementation manner, the image processing device can also be used to:

Detecting whether an abnormal event occurs in the panoramic image;

Obtaining a sub-scene image collected by the target second collection device;

As an implementation manner, the image processing device can also be used to:

Matching the panoramic image with a preset anomaly model;

If received, an abnormal event occurs in the panoramic image.

As an implementation manner, the image processing device can also be used to:

Determining a location of the abnormal event in the panoramic image;

As an implementation manner, the image processing device can also be used to:

If so, the label is displayed in the video frame image in a preset alarm mode.

As an implementation manner, the system may further include: a third collection device;

The third collection device is configured to collect a detailed image corresponding to the panoramic image, where the panoramic image is a video frame image collected by the first collection device;

The image processing device is further configured to acquire a detail image collected by the third collection device; and determine, according to a pixel point correspondence relationship between the detail image and the video frame image, the target location is corresponding to a position in the detail image as a to-be-processed location; adding a tag added at the target location to a to-be-processed location corresponding to the target location; according to a preset display rule, the tagged video frame image, and The detailed image after the label is added for display.

Applying the system provided by the embodiment of the present application, the image processing device acquires a video frame image collected by the first collection device, adds a label to a target position in the video frame image, and then displays the video frame image after the label is added; Help users understand the specific content contained in the video frame image. Therefore, the video frame image after the label is added can display the image content more intuitively, and the display effect is better.

The embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by the processor, implements any of the above image processing methods.

The embodiment of the present application also provides an executable program code for being executed to execute any of the image processing methods described above.

It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

The various embodiments in the present specification are described in a related manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device embodiment shown in Figures 4a and 4b, the system embodiment shown in Figure 5, and the computer readable storage medium embodiment, executable program code embodiment, since it is substantially similar to the method embodiment Therefore, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

One of ordinary skill in the art can understand that all or part of the steps in implementing the above method embodiments can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium, which is referred to herein. Storage media such as ROM/RAM, disk, CD, etc.

The above is only the preferred embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc., which are made within the spirit and principles of the present application, should be included in the present application. Within the scope of protection.

Claims

An image processing method, comprising:

Determining at least one target location in the video frame image for the video frame image acquired by the first acquisition device;

Adding a label at each determined target location, the label being generated according to an input instruction or an image acquired by the second collection device;

The video frame image after the tag is displayed according to the preset display rule.
The method according to claim 1, wherein the video frame image is a panoramic image, the first collecting device corresponds to at least one second collecting device, and the second collecting device is for a sub-scene corresponding to the panoramic image. Perform image acquisition;

Before determining at least one target location in the video frame image, the method further includes:

Obtaining a sub-scene image collected by the second collection device;

Generating a label according to the sub-scene image;

The step of determining at least one target location in the video frame image includes:

And determining, according to the calibration information of the first collection device and the second collection device that are acquired in advance, a target location of the label corresponding to the second collection device in the panoramic image.
The method of claim 2 wherein said first acquisition device is an augmented reality AR panoramic camera.
The method according to claim 2, wherein the step of generating a label according to the sub-scene image comprises:

Adding the sub-scene image and/or target information in the sub-scene image to the content of the tag.
The method according to claim 4, wherein the step of adding the target information in the sub-scene image to the content of the tag comprises:

Identifying the sub-scene image, determining target information in the sub-scene image according to the recognition result; adding the target information to the content of the label;

Or receiving the target information sent by the second collection device; adding the target information to the content of the label;

Or receiving the target information sent by a server communicatively coupled to the second collection device; adding the target information to the content of the tag.
The method according to claim 1, wherein the step of displaying the tagged video frame image according to the preset display rule comprises:

In the first area, displaying the video frame image after the tag is added;

In the second area, the content of the added tag is displayed.
The method according to claim 1, wherein the step of displaying the tagged video frame image according to the preset display rule comprises:

In the form of a picture-in-picture, the video frame image after the tag is added, and the content of the added tag.
The method according to claim 6 or 7, wherein displaying the content of the added tag comprises:

In the added tag, determine the current display tag;

Show the content of the current display tag.
The method according to claim 6 or 7, wherein the method further comprises:

After detecting that the user clicks on the label in the video frame image, the clicked label is determined as the target label;

The content of the target tag is displayed in the video frame image.
The method of claim 1 wherein prior to the step of determining at least one target location in said video frame image, said method further comprising:

Receiving a tag addition instruction;

Generating a label according to the label adding instruction;

The step of determining at least one target location in the video frame image includes:

A target location of the added tag is determined according to the tag addition instruction.
The method according to claim 1, wherein the step of displaying the tagged video frame image according to the preset display rule comprises:

Determining a layer corresponding to each label according to a preset layer classification policy;

Determining a layer display strategy, and determining, according to the layer display strategy, a current display layer and a display manner of the current display layer;

In the display manner, the label corresponding to the current display layer is displayed.
The method according to claim 2, wherein the step of acquiring the sub-scene image collected by the second collection device comprises:

Detecting whether an abnormal event occurs in the panoramic image;

If yes, determining a target second collection device corresponding to the abnormal event;

Obtaining a sub-scene image collected by the target second collection device;

According to the sub-scene image, the step of generating a label includes:

And generating, according to the sub-scene image, a label corresponding to the abnormal event.
The method according to claim 12, wherein the step of detecting whether an abnormal event occurs in the panoramic image comprises:

Matching the panoramic image with a preset anomaly model;

If the matching is successful, it indicates that an abnormal event occurs in the panoramic image;

Or determining whether abnormal event alarm information for the panoramic image is received;

If received, an abnormal event occurs in the panoramic image.
The method according to claim 12, wherein the step of determining the target second collection device corresponding to the abnormal event comprises:

Determining a location of the abnormal event in the panoramic image;

Determining a target second collection device corresponding to the location according to the pre-acquired calibration information of the first collection device and each second collection device.
The method according to claim 12, wherein in the case that an abnormal event occurs in the panoramic image is detected, the method further comprises:

Determining whether the location of the abnormal event in the panoramic image is located in a preset focus area;

If yes, the step of displaying the tagged video frame image according to the preset display rule includes:

The label is displayed in the video frame image in a preset alarm mode.
The method of claim 1 further comprising:

Obtaining a detail image corresponding to the video frame image collected by the first collection device;

After determining the at least one target location in the video frame image, the method further includes:

Determining, according to a pixel point correspondence relationship between the detail image and the video frame image acquired in advance, that the target position corresponds to a position in the detail image as a to-be-processed position;

Adding a label added at the target location to a to-be-processed location corresponding to the target location;

And displaying, according to the preset display rule, the video frame image after the label is added, including:

The video frame image after the tag is added and the detailed image after the tag is displayed according to the preset display rule.
The method according to claim 16, wherein the displaying the tagged video frame image and the tagged detail image according to the preset display rule comprises:

In the first area, displaying the video frame image after the label is added; in the third area, displaying the detailed image after adding the label;

Or, in the form of a picture-in-picture, the video frame image after the tag is added, and the detail image after the tag is added.
An image processing device, comprising: a processor and a memory;

a memory for storing a computer program;

The processor, when executed to execute a program stored on the memory, implements the image processing method according to any one of claims 1-17.
An image processing system, comprising: a first collection device and an image processing device, wherein

The first collecting device is configured to collect a video frame image, and send the collected video frame image to the image processing device;

The image processing device is configured to determine, according to a video frame image acquired by the first collection device, at least one target location in the video frame image; adding a label at each determined target location, the label according to the input instruction Or the image collected by the second collection device is generated; and the video frame image after the label is added is displayed according to the preset display rule.
The system of claim 19, wherein the system further comprises: at least one second collection device,

The second collection device is configured to perform image collection on a sub-scene corresponding to the panoramic image, where the panoramic image is a video frame image collected by the first collection device;

The image processing device is further configured to acquire a sub-scene image acquired by the second collection device; generate a label according to the sub-scene image; and determine, according to the calibration information of the first collection device and the second collection device acquired in advance The label corresponding to the second collection device is at a target position in the panoramic image.
The system of claim 19 wherein said first acquisition device is an augmented reality AR panoramic camera.
The system of claim 19, wherein the system further comprises: a third collection device;

The third collection device is configured to collect a detailed image corresponding to the panoramic image, where the panoramic image is a video frame image collected by the first collection device;

The image processing device is further configured to acquire a detail image collected by the third collection device; and determine, according to a pixel point correspondence relationship between the detail image and the video frame image, the target location is corresponding to a position in the detail image as a to-be-processed location; adding a tag added at the target location to a to-be-processed location corresponding to the target location; according to a preset display rule, the tagged video frame image, and The detailed image after the label is added for display.
A computer readable storage medium, wherein the computer readable storage medium stores a computer program, the computer program being executed by a processor to implement the method steps of any of claims 1-17.
An executable program code, characterized in that the executable program code is operative to perform the method steps of any of claims 1-17.