WO2019242207A1

WO2019242207A1 - Image display method, apparatus, system and device, and readable storage medium

Info

Publication number: WO2019242207A1
Application number: PCT/CN2018/114074
Authority: WO
Inventors: 田楠; 李伟
Original assignee: 广州视源电子科技股份有限公司; 广州视臻信息科技有限公司
Priority date: 2018-06-20
Filing date: 2018-11-06
Publication date: 2019-12-26
Also published as: CN108900787A; CN108900787B

Abstract

Disclosed in embodiments of the present invention are an image display method, apparatus, system and device, and a readable storage medium. The method comprises: obtaining an image photographed by a panoramic camera as a first image; determining the position of a speaker; selecting a close-up camera corresponding to the position from two or more close-up cameras to photograph an image of the speaker as a second image; and displaying the first image and the second image. By implementing the embodiments of the present invention, the close-up camera corresponding to the position of the speaker can be quickly selected from a plurality of close-up cameras to photograph the speaker, and then the image photographed by the panoramic camera and the image photographed by the selected close-up camera, and the close-up photographing of the speaker can be achieved without performing an addition operation on the close-up camera by an artificial director, thereby effectively improving the close-up implementation efficiency.

Description

Image display method, device, system and equipment, and readable storage medium

Technical field

The present invention relates to the field of image processing technologies, and in particular, to an image display method, device system, and device, and a readable storage medium.

Background technique

In the meeting scene, in order to make it easier for the participants to see the speakers of the meeting, related technologies can set up multiple camera equipment at the conference venue, and at least part of the camera equipment can be adjusted by manual guides to achieve close-up of the speaker, such as shooting the speaker Positive image.

However, when manually guiding a camera to adjust the camera device, a series of operations such as panning, tilting, sliding, and other operations need to be performed on the camera device, which takes a long time and reduces the efficiency of close-ups.

Summary of the Invention

The invention provides a close-up image determination method, a device system and a device, and a readable storage medium, which solves the problem that the time required for the manual guide to adjust the camera equipment in the related art is long and reduces the efficiency of close-up implementation.

According to a first aspect of the embodiments of the present invention, an image display method is provided, including the following steps:

Acquiring an image captured by a panoramic camera as a first image;

Determine the position of the speaker;

Selecting a close-up camera corresponding to the position from two or more close-up cameras to capture an image of the speaker as a second image;

The first image and the second image are displayed.

In one embodiment, the determining the position of the speaker includes:

Acquiring a positioning result of the associated acoustic positioning device on the speaker;

Determining the position of the speaker according to the positioning result.

In one embodiment, the position includes a relative position parameter of the speaker and a local display screen; and from more than two close-up cameras, selecting a close-up camera corresponding to the position to capture an image of the speaker, include:

Recall the relative position parameters of each close-up camera and the local display;

Calculating the relative position parameters of the speaker and each close-up camera based on the position and the retrieved relative position parameters;

Selecting a close-up camera with a minimum relative position parameter to the speaker from the two or more close-up cameras according to the calculated relative position parameters;

An image obtained by the selected close-up camera to capture the speaker is acquired.

In one embodiment, selecting a close-up camera corresponding to the position from two or more close-up cameras to capture an image of the speaker includes:

Obtain a predetermined correspondence between each scene position and each close-up camera in a real scene; the close-up camera corresponding to each scene position is dedicated to close-up the speaker at the scene position;

Selecting the close-up camera corresponding to the position as the target camera from the two or more close-up cameras according to the predetermined correspondence relationship;

Acquire an image obtained by the target camera taking the speaker.

In one embodiment, the relative position parameter includes a relative angle and / or a relative distance.

In one embodiment, the position is a relative angle between the speaker and the local display screen, and the relative position parameter is a relative angle.

In one embodiment, the relative angle is a relative angle in a horizontal direction.

In one embodiment, the position includes the coordinates of the speaker in a predetermined coordinate system; selecting a close-up camera corresponding to the position to capture an image of the speaker from two or more close-up cameras includes:

Obtain the predetermined coordinates of the center of the local display screen and the predetermined coordinates of each close-up camera;

Calculating the included angle between the center of the local display screen and each camera with respect to the speaker according to the obtained predetermined coordinates and the position, which is the relative angle between the close-up camera and the speaker;

According to the calculated relative angle, a close-up camera with the smallest relative angle to the speaker is selected as the target camera.

In one embodiment, if the number of close-up cameras selected is more than two, the method further includes:

Calculating the relative distance between the close-up camera and the speaker according to the predetermined coordinates of the selected close-up camera and the position;

According to the calculated relative distance, from the selected target cameras, a close-up camera with the smallest relative distance from the speaker is selected as the target camera.

Calculating the relative angle between each selected close-up camera and the speaker in the horizontal direction;

From the selected close-up cameras, a close-up camera having the smallest relative angle with the speaker in the horizontal direction is selected as the target camera.

In one embodiment, the method further includes the following steps:

Redefining the position of the speaker;

Judging whether the position change amount of the speaker is less than a predetermined change amount according to the newly determined position and the last determined position;

If it is less than, performing the step of displaying the first image and the second image;

If it is not less than, select an image obtained by the close-up camera corresponding to the re-determined position from the two or more close-up cameras to capture the speaker as a third image;

The first image and the third image are displayed.

In one embodiment, displaying the first image and the second image includes:

Map the image position of the speaker in the second image according to the position;

Extracting image data at a mapped image position in the second image to obtain a close-up image of the speaker;

The close-up image and the first image are displayed.

In one embodiment, after extracting image data at a position mapped in the second image, the method includes:

Identifying and cropping image data of a target part of the speaker from the extracted image data;

The cropped image data is determined as the close-up image.

In an embodiment, mapping an image position of the speaker in the second image according to the position includes:

Obtaining the correspondence between each image area of the panoramic image and each scene area in the real scene;

Positioning an image region to which the position belongs in the first image based on the correspondence relationship;

Matching the image regions whose features in the second image match the localized image regions through feature matching;

Use the coordinates of the matched image area as the mapped image position.

Retrieve the position information of each image area of each close-up camera in the real scene;

Matching the retrieved location information with the location;

Obtaining an image area to which the speaker belongs in the second image according to the matching result;

The coordinates of the obtained image area are used as the mapped image position.

In one embodiment, the position is a relative angle between the speaker and a local display screen;

The position information is a scene area corresponding to each image area in a real scene, and a relative angle with the local display screen.

In one embodiment, the step of pre-generating the location information includes:

Calibrating the relative angle between the scene area of the close-up camera and the close-up camera according to the lens angle of each close-up camera;

According to the calibrated relative angle and the relative position parameters of the close-up camera and the local display, calculate the relative angle between the scene area of the close-up camera and the local display.

According to a first aspect of the embodiments of the present invention, an image display system is provided, including an image display device, a panoramic camera, and two or more close-up cameras. The image display device includes:

Display:

processor;

Memory storing processor-executable instructions;

The processor is coupled to the memory, and is configured to read program instructions stored in the memory and, in response, perform operations in the method described above.

In one embodiment, the panoramic camera and each close-up camera are installed on the image display device.

In one embodiment, the relative angle between each close-up camera and the display screen is different.

In one embodiment, the panoramic camera is installed at a frame on the upper side of the image display device, a first close-up camera is installed at a frame on the left side of the image display device, and a second close-up camera is installed at the image display. At the border on the right side of the device.

In an embodiment, the relative angle between the first close-up camera and the display screen is between 10 degrees and 50 degrees;

The relative angle between the second close-up camera and the display screen is between 130 degrees and 170 degrees.

In one embodiment, the image display device further includes an acoustic positioning device for positioning the speaker.

In one embodiment, the acoustic positioning device includes a microphone array.

In one embodiment, the image display device is a conference interaction device.

In one embodiment, the conference interactive device is a smart interactive tablet.

According to a third aspect of the embodiments of the present invention, an image display device is provided, including:

Display:

processor;

Memory storing processor-executable instructions;

In one embodiment, the image display device of the embodiment of the present invention is associated with a panoramic camera and at least two close-up cameras.

In one embodiment, the relative angle between a close-up camera and the display screen is between 10 degrees and 50 degrees; the relative angle between the other close-up camera and the display screen is between 130 degrees and 170 degrees.

In one embodiment, the acoustic positioning device includes a microphone array.

In one embodiment, the image display device is a conference interaction device.

According to a fourth aspect of the embodiments of the present invention, one or more machine-readable storage media are provided with instructions stored thereon, and when executed by one or more processors, perform operations in the method described above.

According to a fifth aspect of the embodiments of the present invention, an image display device is provided, including:

A first image acquisition module, configured to acquire an image captured by a panoramic camera as a first image;

A speech position determining module, configured to determine the position of a speaker;

A second image acquisition module, configured to select, from two or more close-up cameras, a close-up camera corresponding to the position to capture an image of the speaker as a second image;

An image display module is configured to display the first image and the second image.

In one embodiment, the position includes a relative position parameter of the speaker and a local display screen; the second image acquisition module includes:

Position parameter acquisition module, used to retrieve the relative position parameters of each close-up camera and the local display;

A relative position calculation module, configured to calculate the relative position parameters of the speaker and each close-up camera based on the position and the retrieved relative position parameters;

A camera selection module for selecting a close-up camera with the smallest relative position parameter of the speaker from the two or more close-up cameras according to the calculated relative position parameters;

The first acquisition submodule is configured to acquire an image obtained by the selected close-up camera and capturing the speaker.

In one embodiment, the second image acquisition module includes:

A predetermined relationship acquisition module, configured to obtain a predetermined correspondence between each scene position and each close-up camera in a real scene; the close-up camera corresponding to each scene position is specifically used to close-up the speaker at the scene position;

A target camera selection module, configured to select a close-up camera corresponding to the position as a target camera from the two or more close-up cameras according to the predetermined correspondence relationship;

A second acquisition submodule is configured to acquire an image obtained by the target camera shooting the speaker.

In one embodiment, the image display module includes:

An image position mapping module, configured to map an image position of the speaker in the second image according to the position;

A close-up image extraction module, configured to extract image data at a mapped image position in the second image to obtain a close-up image of the speaker;

An image display submodule, configured to display the close-up image and the first image.

In one embodiment, the apparatus further includes a target extraction module, configured to:

The cropped image data is determined as the close-up image.

In one embodiment, the image position mapping module is configured to:

Use the coordinates of the matched image area as the mapped image position.

In one embodiment, the image position mapping module is configured to:

Matching the retrieved location information with the location;

In one embodiment, the module for pre-generating the location information is configured to:

According to a sixth aspect of the embodiments of the present invention, a smart interactive tablet is provided, including a panoramic camera, a first close-up camera, and a second close-up camera. The panoramic camera, the first close-up camera, and the second close-up camera are provided. On the frame of the smart interactive tablet, the optical axes of the first close-up camera and the second close-up camera are inclined to the display plane of the smart interactive tablet.

In one embodiment, the smart interactive tablet is further configured to:

Acquiring an image captured by a panoramic camera as a first image;

Determine the position of the speaker;

The first image and the second image are displayed.

In one embodiment, the relative angle between the optical axis of the first close-up camera and the display screen is between 10 degrees and 50 degrees;

The relative angle between the optical axis of the second close-up camera and the display screen is between 130 degrees and 170 degrees.

In the embodiment of the present invention, by determining the position of the speaker, a close-up camera corresponding to the position of the speaker can be quickly selected from a plurality of close-up cameras, the speaker is shot, and then the image taken by the panoramic camera and the selected close-up camera are taken. Images, without the need for manual guides to pan, tilt, push and pull a series of additional operations on the close-up camera, you can achieve close-up of the speaker, and compared to related technologies that require manual guides to perform some additional operations on the camera, can Effectively improve the efficiency of close-ups.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a schematic diagram of an image display system according to an exemplary embodiment of the present invention; FIG.

FIG. 1B is a schematic diagram of an image display system according to another exemplary embodiment of the present invention; FIG.

FIG. 1C is a schematic diagram of an image display system according to another exemplary embodiment of the present invention; FIG.

Fig. 2A is a processing logic diagram of an image display system according to another exemplary embodiment of the present invention;

FIG. 2B is an interaction schematic diagram of an image display system according to another exemplary embodiment of the present invention; FIG.

2C is a display screen of a display screen according to an exemplary embodiment of the present invention;

FIG. 3 is a schematic diagram of an image display method according to an exemplary embodiment of the present invention; FIG.

FIG. 4A is a schematic diagram of an image display method according to another exemplary embodiment of the present invention; FIG.

FIG. 4B is a schematic diagram of dividing a panoramic image according to an exemplary embodiment of the present invention; FIG.

5 is a schematic diagram of an image display method according to another exemplary embodiment of the present invention;

Fig. 6 is a block diagram of an image display device according to an exemplary embodiment of the present invention;

Fig. 7 is a hardware structural diagram of an image display device according to an exemplary embodiment of the present invention.

detailed description

Exemplary embodiments will be described in detail here, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the present invention. Rather, they are merely examples of devices and methods consistent with some aspects of the invention as detailed in the appended claims.

The terminology used in the present invention is for the purpose of describing particular embodiments and is not intended to limit the present invention. The singular forms "a," "the," and "the" as used in this invention and the appended claims are also intended to include the majority, unless the context clearly indicates otherwise. It should also be understood that the term "and / or" as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the present invention to describe various kinds of information, these information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of the present invention, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information. Depending on the context, the word "if" as used herein can be interpreted as "at" or "when" or "in response to determination".

In order to improve the close-up realization efficiency, the embodiments of the present invention provide an image display method and an image display system for implementing the image display method. The provided image display system may include an image display device, a panoramic camera, and at least two close-up cameras.

The image display device may include a memory, a processor, and a display screen for displaying an image. The memory stores program instructions executable by the processor. The processor is coupled to the memory and is used to read the program instructions stored in the memory and execute the program instructions in response. An operation for implementing an image display method according to an embodiment of the present invention.

The display screen mentioned here can be an independent display screen, such as an LED display screen; it can also be a screen of an image display device with interactive capabilities, such as a touch screen display, a smart interactive tablet, or other interactive display screens. Capable computer equipment.

The panoramic camera is used to shoot the realistic scene of the target location. Compared with the close-up camera, the angle of view range is wider and the shooting range is wider. In some examples, the panoramic camera may be a wide-angle camera.

Close-up camera for close-up of speakers in the target location. The designer of the present invention can preset different close-up cameras to shoot different predetermined scene areas of the target place according to the close-up requirements of the actual application scene, and / or preset different close-up cameras to the same reservation according to different relative position parameters. The scene area is used for shooting. The relative position parameter mentioned herein may include a relative angle and / or a relative distance between the close-up camera and the subject in the predetermined scene area. In the actual close-up process, it is usually not necessary to manually guide the close-up camera to adjust with the changes of the speaker.

It should be noted that when the actual application scenario is different, the target scene and the predetermined scene area of the close-up camera may be different. For example, the application scenario can be a conference scenario, a smart education scenario, a live broadcast scenario, or other scenarios that require close-ups of the speakers; correspondingly, the target location can be a conference room, a lecture room, a live broadcast room, or other needs to perform a speaker Close-up scene place. The predetermined scene area may be an area in which speakers may appear in the meeting, an area in which speakers may appear in the course, an area in which anchors may appear, or an area in which speakers may appear in other scenarios.

In addition, when the close-up requirements of the application scene are different, the predetermined scene area of the close-up camera and / or the relative position parameters of the close-up camera and the predetermined scene area may also be different. For example, in a conference scenario, if the conference presenter is close-up, the conference presenter is usually at the end of the conference table near the conference interactive device, and the scheduled scene area may be the conference table area near the conference interactive device; for example, for any participant who may speak Close-up, scheduled scene area can be the area where participants are located. For example, if the speaker is front-closed, the relative angle between the close-up camera and the scheduled scene area is a positive angle; for oblique side close-up of the speaker, the close-up camera and the scheduled scene area The relative angle is the oblique angle. The front angle mentioned here refers to the vertical angle with the front of the subject. The oblique angle refers to the deviation from the front angle, or left or right around the subject in the predetermined scene area to move to The angle experienced by the side angle. The side angle refers to the angle perpendicular to the side of the subject.

Furthermore, with different application scenarios and / or close-up needs, the panoramic camera and each close-up camera can be used as an accessory device of the image display device, installed on the image display device, or can exist independently of the image display device and installed on the image display. In the space outside the device. The image display system according to the embodiment of the present invention is described in detail below with an application scenario as a conference scenario and a target location as a conference room as an example.

Please refer to FIG. 1A, which is a schematic diagram of an image display system according to an exemplary embodiment of the present invention.

The image display system shown in FIG. 1A is disposed in the conference room 100 and may include an image display device 110, a display screen 111, a panoramic camera 112, at least two close-up

cameras

113 and 114, and the like.

The image display device 110 may be a conference display device with a display function, or a conference interactive device with interactive capabilities, such as a smart interactive tablet. The panoramic camera 112, the close-up camera 113, and the close-up camera 114 are disposed on the frame of the smart interactive tablet, and the optical axes of the close-up camera 113 and the close-up camera 114 are inclined to the display plane of the smart interactive tablet.

The conference room 100 may also include a conference table 120. In an actual conference, speakers and participants A, B, C, D, E, and F are seated on both sides of the table 120, respectively. .

In one example, in order to capture as many realistic scenes around the conference table 120 as possible, the panoramic camera 112 is installed at a frame on the upper side of the image display device 110.

In an example, while taking a panoramic shot of the real scene of the conference room 100, in order to close-up speakers in different scene areas, different close-up cameras can be set at different positions of the image display device 110, such as the image display device 110 The different borders make different close-up cameras to shoot different scene areas.

In order to distinguish the close-up cameras for shooting different scene areas, the embodiment of the present invention may use the relative angles of the close-up cameras and the display 111 to identify the close-up cameras, and the relative angles of the close-up cameras for different scene areas and the display 111. different. The relative angle here can refer to the angle between the optical axis of the close-up camera and the display plane of the display 111, and the specific value can be determined by the scene area and the installation position of the close-up camera.

For example, taking the screen centerline of the display 111 as a reference, the angle θ2 between the optical axis of the close-up camera 113 and the screen centerline of the display 111 distributed along the y direction, that is, the relative angle between the close-up camera 113 and the display 111; The angle θ3 between the axis and the center line of the screen distributed along the y-direction of the display screen 111 is the relative angle between the close-up camera 114 and the display screen 111. In other embodiments, the relative angle between each close-up camera and the display screen 111 may also be calculated by referring to other reference objects, and details are not described herein again.

Furthermore, in order to perform close-up of the front or side of the speaker as much as possible, at least one close-up camera 113 is installed at a frame on the left side of the image display device 110, and at least one close-up camera 114 is installed at a frame on the right side of the image display device 110.

Further, in order to make a close-up of the front of the speaker as much as possible, the embodiment of the present invention uses the three-dimensional xyz coordinate system shown in FIG. 1A as a reference, and installs close-up

cameras

113 and 114, in which the display screen 111 is located on the yz plane and the display screen 111 The center point of is the origin of the coordinates and can also be used as the point of sight of the speaker.

After the close-up camera is installed, the relative angle of the close-up camera 113 installed on the left border of the image display device 110 and the display 111 is between 10 degrees and 50 degrees (a value of θ1); installed on the image display device The close-up camera at the right border of 110 has a relative angle with the display screen between 130 degrees and 170 degrees (an example value of θ2). In some sizes of conference rooms, the relative angle between the close-up camera 113 and the display 111 is 15 degrees, and the relative angle between the close-up camera 114 and the display 111 is 165 degrees. Install the close-up camera at such a relative angle to give a close-up of the speaker The effect is relatively good.

The speaker involved in the embodiment of the present invention may refer to an object that emits audio, such as a participant who speaks in the conference scene of this embodiment. In order to accurately close the speaker, it is necessary to first determine the position of the speaker, and then select and A close-up camera corresponding to the determined position to obtain an image captured by the selected camera. The close-up camera mentioned here may be the camera that shoots the area of the scene where the speaker is located, or the camera that shoots the speaker in the scene area in front.

In the embodiment of the present invention, in order to distinguish an image captured by the panoramic camera 112 from an image captured by the selected close-up

camera

113 or 114, the image captured by the panoramic camera 112 is referred to as a first image, and an image captured by the selected close-up

camera

113 or 114 is used Called the second image. After the first image and the second image are acquired, the first image and the second image are displayed through the display screen 111.

In addition, the audio emitted by the speaker in the embodiment of the present invention is different from the ambient audio. In order to accurately locate the speaker, the embodiment of the present invention may determine the speaker's The position determines the position of the speaker according to the positioning result of the acoustic positioning device.

The acoustic positioning device 115 mentioned here may be mounted on the image display device 110, such as a microphone array including the image display device 110. In other embodiments, the acoustic positioning device 115 may also exist independently of the image display device 110. The present invention can also determine the position of the speaker by means other than the acoustic positioning method, which is not repeated here.

The specific way of determining the position of the speaker through the acoustic positioning device 115 can be determined by the positioning principle of the acoustic positioning device 115. In an example, the acoustic positioning device 115 may include a vertically arranged microphone and a horizontally arranged microphone. The time difference between the audio signals collected by the two microphones is combined with the spatial position of the microphone to locate the relative position parameters of the speaker and the acoustic positioning device 115. The relative position parameters mentioned here include relative angle and / or relative distance.

In another example, referring to a predetermined three-dimensional coordinate system, the processor of the acoustic positioning device 115 or the image display device 110 may calculate the relative position parameters of the speaker and the acoustic positioning device 115 and the three-dimensional coordinates of the acoustic positioning device 115. The coordinates of the speaker in a three-dimensional coordinate system.

In other examples, the processor of the acoustic positioning device 115 or the image display device 110 may pre-store the relative position parameters of the display screen 111 and the acoustic positioning device 115, and then based on the relative position parameters of the speaker and the acoustic positioning device 115 and the pre-stored relative position Parameters, the relative position parameters of the speaker and the display 111 are calculated. The relative position parameters mentioned here include relative angle and relative distance.

For example, in FIG. 1A, taking the center line of the screen distributed in the display 111y direction as a reference, the length of the vector that the speaker points to the center of the display 111 is the relative distance between the speaker and the display 111, and the vector is related to the screen distributed in the y direction. The included angle of the center line, that is, the relative angle between the speaker and the display screen 111, is equal to θ1 and is opposite to each other.

With reference to the speaker position determined in the foregoing embodiment, a close-up camera corresponding to the determined position may be selected, and an image captured by the close-up camera is used as a second image.

As can be seen from the above, in the image display system shown in FIG. 1A, the panoramic camera 112 and at least two close-up

cameras

113 and 114 are installed on the image display device 110. In other embodiments, the panoramic camera 112 and at least two close-up cameras 113 The

sums

114 and 114 may also exist independently of the image display device 110. For details, refer to FIG. 1B.

Please refer to FIG. 1B, which is a schematic diagram of an image display system according to another exemplary embodiment of the present invention.

The image display system shown in FIG. 1B is disposed in the conference room 100 and may include an image display device 110, a memory (not shown), a processor (not shown), and a display screen 111 provided on the image display device 110. A panoramic camera 131 and at least two close-up

cameras

132 and 133 in a three-dimensional space outside the image display device 110.

For the technical content of the embodiments of the present invention, reference may be made to the foregoing embodiments, and details are not described herein again. The difference is that the panoramic camera 131 and at least two close-up

cameras

132 and 133 are provided in a three-dimensional space outside the image display device 110.

Accordingly, the relative distance between the panoramic camera 131 and at least two close-up

cameras

132 and 133 and the image display device 110 in the direction of at least one coordinate axis may be increased.

In addition, referring to the foregoing embodiments, if the speaker's line of sight falls on the center of the display screen 111 or the center line of the screen distributed in the y direction, the cameras of the close-up cameras are parallel to the center line of the display screen 111 in the y direction. The relative angle between the close-up camera and the display screen 11 is the relative angle in the horizontal direction (in the xy plane), and θ1, θ2, θ3 shown in FIG. 1A and FIG. 1B are equal to θ1p, θ2p, θ3p shown in FIG. 1C.

If the speaker's line of sight falls above or below the center of the display 111, θ1p is the projection of θ1 on the horizontal plane. If the optical axis of each close-up camera and the vertical plane of the display screen 111, θ2p and θ3p are the projections of θ2 and θ3 on the horizontal plane, and FIG. 1C is the projection of the display system shown in FIG. 1A on the horizontal plane. The technical content involved Referring to the embodiment related to FIG. 1A, details are not described herein again. The following describes the processing logic of the image display system according to the embodiment of the present invention with reference to FIG. 2A and FIG. 2B to implement a close-up image display process.

The image display system shown in FIG. 2A may include an image display device 210, a panoramic camera 221, at least two close-up

cameras

222, 223, and the like. The image display device 210 may include a processor 211, a display screen 212, a memory 213, a non-volatile memory 214, and a device interface 215 connected through an internal bus. The panoramic camera 221 and at least two close-up

cameras

222 and 223 are connected to the processor 211 through the device interface 215. The specific form of the device interface 215 may match the interfaces of the panoramic camera 221 and at least two close-up

cameras

222 and 223, such as a USB interface.

In addition, the image display system may further include a positioning device, such as an acoustic positioning device, for determining the position of the speaker. The acoustic positioning device may be associated with the image display device 210 and exist independently of the image associated device; it may also be a microphone array installed on the image display device 210 and connected to the processor 211 through an internal bus.

The designer of the present invention can store the program instructions (program instructions corresponding to the processing logic 213 a) that realize the close-up in the non-volatile memory 214. During the actual image display process, the processor 211 reads the program instructions into the memory 213 for operation, and in response performs the operations shown in the processing logic 213a: acquiring the image captured by the panoramic camera 221 as the first image; determining the speaker's Position; from two or more close-up

cameras

222, 223, etc., select a close-up camera corresponding to the position to capture the image of the speaker as the second image; display the first image and the display on the display screen 212 Second image.

The implementation of the processing logic 213a may be implemented by interactions between devices in the image display system. For a specific interaction process, refer to FIG. 2B.

Please refer to FIG. 2B. In an actual application scenario, the panoramic camera 221 and two or more close-up

cameras

222, 223, etc. execute step S201 to capture respective shooting areas according to a set frequency, and pass the captured images through the device interface 215. Sending to the processor 211 (step S202), in order to facilitate the processor 211 to distinguish different images, when transmitting the images, the panoramic camera 221 and two or more close-up

cameras

222, 223, etc. may be identified by their respective identities, or The relative position to the display screen is sent to the processor 211 along with the image.

The process 211 may select the panoramic image captured by the panoramic camera 221 as the first image (step S203). If the speaker in the target scene speaks before the first image is sent to the display screen, the positioning device 230 may determine the speaker ’s Position (this step S204), and send it to the processor 211 (step S205), and the processor 211 then selects the close-up corresponding to the position from the images sent by the two or more close-up

cameras

222, 223, etc. according to the position The image sent by the camera is a second image (step S206), and the first image and the second image are sent to the display screen 212 (step S207), and the display screen 212 displays the first image and the second image (S208). In an example, in a conference scene shown in FIG. 1A, the first image and the second image displayed on the display screen 212 are shown in FIG. 2C. The second image may be superimposed on the first image, or may be suspended on the first image, and a specific display manner may be set by a relevant person according to actual actual needs.

If no speaker speaks in the target scene before sending the first image to the display screen, the processor 21 sends a first image value display screen 212, and the display screen 212 displays the first image.

In addition, in some application scenarios, when the image display device involved in FIG. 1A to FIG. 2B is a smart interactive tablet, the smart interactive tablet may be an integrated projector, electronic whiteboard, curtain, audio, television, and video conference terminal. Or an integrated device with multiple functions.

The smart interactive tablet may also establish a data connection with at least one external device. The external devices include, but are not limited to, smart phones, USB flash drives, laptop computers, desktop computers, tablet computers, personal digital assistants (PDAs), and the like.

The communication methods of the data connection between the external device and the smart interactive tablet include, but are not limited to, communication methods such as USB connection, Internet, local area network, Bluetooth, Wi-Fi, or ZigBee, which are not limited in the embodiments of the present invention.

Further, when a data interaction occurs between the intelligent interactive tablet and at least one external device, the projection data is sent to the interactive intelligent tablet, so that the intelligent interactive tablet displays the content of the projection data of the projection data. The external device serves as the projection client, and generally In particular, there may be one or more projection screen clients, which are set according to specific application scenarios, which are not limited in the embodiment of the present invention.

The image display method according to the embodiment of the present invention is described in detail below with reference to the accompanying drawings:

Please refer to FIG. 3. FIG. 3 is a flowchart of an image display method according to an exemplary embodiment of the present invention. This embodiment can be applied to an image display system for close-up of a speaker, and includes the following steps S301-S304:

Step S301: Acquire an image captured by a panoramic camera as a first image.

Step S302: Determine the position of the speaker.

Step S303: Select a close-up camera corresponding to the position from the two or more close-up cameras to capture an image of the speaker as a second image.

Step S304: Display the first image and the second image.

For technical content related to the embodiments of the present invention, reference may be made to the foregoing embodiments, and details are not described herein again. The method of the embodiment of the present invention may be implemented by an image display system.

If the image display system includes a single image display device as described in FIGS. 1A to 2B, the method of the embodiment of the present invention may be applied to and executed by a single image display device.

If the image display system includes multiple image display devices, as shown in FIG. 1B, the three sides of the conference room against the wall are provided with image display devices, and each image display device is provided with a panoramic camera associated with it and at least two Close-up camera. The embodiment of the present invention may be executed by each image display device or a general control device of each image display device. The positioning device determines which image display device the speaker is facing, and then uses the image display device as a reference to determine the speaker position. Then, from the close-up cameras associated with each image display device, a corresponding close-up camera is selected.

In practical applications, after determining the position of the speaker, the specific manner of selecting the image captured by the close-up camera in the embodiment of the present invention can be determined by the specific shape, formula or preparatory work of the determined position. Situation:

Case 1: The speaker speaks to the image display device, and the acoustic positioning device is used to locate the speaker. The acoustic positioning device may include a vertically arranged microphone and a horizontally arranged microphone. The time difference between the audio signals collected by the two microphones is combined with the spatial position of the microphone to locate the relative position parameters of the speaker and the acoustic positioning device. The position parameters are as described above, and are not repeated here.

The position determined in the embodiment of the present invention is a relative position parameter between the speaker and the acoustic positioning device. When selecting a close-up camera according to the position of the speaker, the relative position parameters of each close-up camera and the acoustic positioning device can be obtained according to the position of each close-up camera and the position of the acoustic positioning device, and then the position determined by the speaker, and the close-up cameras and The relative position parameters of the acoustic positioning device calculate the relative position of each close-up camera and the speaker, and then select the close-up camera with reference to the calculated relative position parameters. For frontal close-ups, choose the smallest relative angle. For clearer and larger close-ups, choose a smaller distance.

It should be noted that the relative distance in the relative position parameter may refer to a straight line distance between the position of the speaker and the position of the close-up camera, as shown in FIG. 1A, a dashed line between the speaker and the close-up camera 114, or a projection distance in a certain direction; The relative angle in the relative position parameter is the angle between the vector of the speaker pointing to the close-up camera and the front direction of the speaker, or the projection of the angle in a certain direction or a certain plane.

For example, when the point of sight of the speaker is the center of the display screen of the image display device, the relative angle is the angle between the vector from the speaker to the close-up camera and the vector from the speaker to the center of the display, as shown in θ4 in FIG. 1A. Or the projection of the included angle in a certain direction or a plane. The plane mentioned here may be a horizontal plane.

Case 2: The acoustic positioning device refers to the predetermined coordinate system in the figure, pre-stores its own position coordinates, and locates the relative position of the speaker and the acoustic positioning device, and then can locate the speaker's position coordinates in the predetermined coordinate system.

When selecting a close-up camera according to the position of the speaker, if the speaker's line of sight falls at the origin of the coordinates, the embodiment of the present invention can directly calculate each close-up camera according to the position coordinates of each camera in a predetermined coordinate system and the position coordinates of the speaker. The relative position parameter with the speaker, and then select the close-up camera with reference to the calculated relative position parameter.

In an example, in order to make a close-up of the front of the speaker as far as possible, when the position includes the coordinates of the speaker in a predetermined coordinate system, the embodiment of the present invention may select the same from two or more close-up cameras through the following operations. A close-up camera corresponding to the position captures an image of the speaker:

Obtain the predetermined coordinates of the center of the local display screen and the predetermined coordinates of each close-up camera; here, the local display screen is relative to the remote display screen.

According to the obtained predetermined coordinates and the position, the angle between the center of the local display screen and each camera with respect to the speaker is calculated, which is the relative angle between the close-up camera and the speaker.

The predetermined coordinate system may be a three-dimensional coordinate system shown in FIG. 1A, and details are not described herein again.

In addition, in the scene shown in FIG. 1A, when the speaker is in the position shown in FIG. 1A, the relative angle between the speaker and the close-up camera 114 is smaller than the angle between the speaker and the close-up camera 113. Therefore, the close-up camera 114 is selected as the target camera.

Referring to the foregoing image display system, it can be known that in some scenes, when there are more close-up cameras arranged in a three-dimensional space, a close-up camera with the smallest relative angle to the speaker may appear. In more than two cases, the embodiment of the present invention may use The images taken by the selected close-up camera are displayed as the second image. However, in order to reduce the obstruction of the first image by the excessive second image, in an example, a close-up camera can be further selected according to the relative distance, which can be implemented by the following operations:

The relative distance between the close-up camera and the speaker is calculated according to the predetermined coordinates of each selected close-up camera and the position.

In another example, the relative angle can be projected on a two-dimensional horizontal plane to obtain the relative angle in the horizontal direction, and then a close-up camera can be selected. For the projection, see FIG. 1A and FIG.

Calculate the relative angle of each selected close-up camera with the speaker in the horizontal direction.

In other embodiments, the coordinate origin of the predetermined three-dimensional coordinate system is not the center of the display screen 111 as shown in FIG. 1A, and the point of sight of the speaker is the center of the display screen 111. In the embodiment of the present invention, predetermined coordinates (coordinates in a three-dimensional coordinate system) of the center of the display screen 111 can be obtained in advance, and then each close-up is calculated based on the predetermined coordinates of the center of the display screen 111, the position coordinates of the speaker, and the position coordinates of the close-up cameras. The relative position of the camera and the speaker, and then select the close-up camera with reference to the calculated relative position.

Case 3: Considering that in the embodiment of the present invention, in the process of implementing close-ups, manual guides are not required to adjust the installation positions and angles of the close-up cameras, etc. In order to further improve the close-up efficiency, the close-up cameras can be determined in advance. The relative position parameter with the display screen of the image display device (as shown in FIG. 1A, the angle between the close-up camera and the center line of the display screen in the y direction), and then after determining the position of the speaker, call the predetermined relative position parameter , Calculate the relative position of each close-up camera and the speaker, and then select the close-up camera with reference to the calculated relative position.

In an example, the position includes the relative position parameters of the speaker and the local display screen. In the embodiment of the present invention, a close-up camera corresponding to the position can be selected from two or more close-up cameras to perform shooting by the following operations: Image of the speaker:

Recall the relative position parameters of each close-up camera and the local display.

Based on the position and the retrieved relative position parameters, the relative position parameters of the speaker and each close-up camera are calculated.

According to the calculated relative position parameters, from the two or more close-up cameras, a close-up camera with the smallest relative position parameter of the speaker is selected.

In other embodiments, based on the relative angle between each camera and the local display, and the relative angle between the speaker and the local display in different scene areas, it is possible to estimate the speech of each close-up camera in different scene areas. The close-up effect of the speaker, and then different cameras correspond to different relative angle ranges of the speaker and the local display. After determining the speaker position or the relative angle of the speaker and the local display in the later stage, select the close-up according to the corresponding relationship. camera.

For example, in the scenario shown in FIG. 1A, the relative angle between the close-up camera 114 and the local display 111 is 165 degrees, the relative angle between the close-up camera 113 and the local display 111 is 15 degrees, and the relative angle between the speaker and the local display 111 When it is 0 to 90 degrees, it corresponds to the close-up camera 114, and when the relative angle between the speaker and the local display 111 is 90 degrees to 180 degrees, it corresponds to the close-up camera 113.

Case 4: Considering that different close-up cameras can be dedicated to close-up of speakers in different predetermined scene areas, in order to further improve the close-up efficiency, the corresponding relationship between each scene position in the real scene and each close-up camera can be determined in advance. Among them, a close-up camera corresponding to each scene position is dedicated to close-up with a speaker at the scene position. For example, the relative position parameter of the close-up camera corresponding to the scene position is smaller than that of other close-up cameras at the scene position.

After the position of the speaker is determined, a close-up camera corresponding to the position is selected according to a predetermined correspondence relationship. In an example, from the two or more close-up cameras, a close-up camera corresponding to the position may be selected to take an image of the speaker by performing the following operations:

Obtain a predetermined correspondence between each scene position in a real scene and each close-up camera.

According to the predetermined correspondence relationship, from the two or more close-up cameras, a close-up camera corresponding to the position is selected as a target camera.

Acquire an image obtained by the target camera taking the speaker.

According to the above embodiment, after the first image and the second image are obtained, the first image and the second image are directly displayed, and the close-up of the speaker can be realized while the panoramic image is displayed, and the close-up camera is not required to be shaken manually. A series of additional operations, such as tilting, pitching, pushing and pulling, can quickly achieve close-up of the speaker, and can effectively improve the efficiency of close-ups compared to related technologies that require manual guides to perform some additional operations on the camera.

For example, in the scene shown in FIG. 1A, the close-up camera 114 may be dedicated to close-ups: speakers whose relative angle with the local display 111 is in the range of 0 to 90 degrees; the close-up camera 113 may be dedicated to close-ups: relative to the local display 111 Speakers with an angle ranging from 90 degrees to 180 degrees.

In view of the above, in some embodiments, it can also be determined in advance whether the speaker is in a scene area (shooting area) that can be captured by each close-up camera. Close-up camera corresponding to the position of the person.

In other application scenarios, after the first image and the second image are obtained, the first image and the second image are not directly displayed, but only the speaker or its target part is close-up. The image of the speaker or its target part needs to be extracted from the second image, and then the first image and the extracted image are displayed. For a specific implementation process, refer to FIG. 4A. The method shown in FIG. 4A may include steps S401-S406:

Step S401: Acquire an image captured by a panoramic camera as a first image.

Step S402: Determine the position of the speaker.

Step S403: From the two or more close-up cameras, select a close-up camera corresponding to the position to capture an image of the speaker as the second image.

Step S404: Map the image position of the speaker in the second image according to the position.

Step S405: Extract image data at a mapped image position in the second image to obtain a close-up image of the speaker.

Step S406: Display the close-up image and the first image.

Steps S401, S402, and S403 in this embodiment correspond to the foregoing embodiments, and details are not described herein again.

For step S404, the image position of the speaker in the second image is mapped in order to roughly determine the location of the image data to be extracted, and then perform the extraction to reduce the time required to match the image features during direct extraction and improve the speaker extraction. Or the efficiency of the image data of the speaker's target part. The target part mentioned here can be the face or upper body.

When the position of the position in the second image is actually mapped, the image position of the speaker in the second image can be mapped in real time according to the four coordinate systems involved in the imaging process of the camera. The four coordinate systems mentioned here are World coordinate system, camera coordinate system, image coordinate system, and pixel coordinate system.

If the second image uses the image coordinate system to describe the coordinates of points in the image, the position can be mapped to the image position in the second image according to the mapping relationship between the world coordinate system to the camera coordinate system and the camera coordinate system to the image coordinate system. .

If the second image uses a pixel coordinate system to describe the coordinates of pixel points in the image, the positions can be mapped according to the mapping relationship between the world coordinate system to the camera coordinate system, the camera coordinate system to the image coordinate system, and the image coordinate system to the pixel coordinate system. To the image position in the second image.

In addition, considering that the panoramic image includes images of the subject being photographed at most scene positions in the target scene, in the embodiment of the present invention, the correspondence between each image region of the panoramic image and each scene region in the real scene can be determined in advance. Referring to the conference scene shown in FIG. 1A, the embodiment of the present invention can obtain the correspondence between each area in the conference room and each image in the panoramic image, as shown in FIG. 4B.

In FIG. 4B, the included angle between the subject in different scene areas in the conference room 100 and the center line of the screen distributed in the y-direction of the display 111 corresponds to different image areas in the panoramic image, where the panoramic image is divided into 6 images Area, from left to right, corresponding to the scene area is 0 degrees to 30 degrees, 30 degrees to 60 degrees, 60 degrees to 90 degrees, 90 degrees to 120 degrees, 120 degrees to 150 degrees, 150 degrees to 180 degrees. When the angle θ1 between the speaker and the center line of the screen distributed in the y-direction of the display screen 111 is 65 degrees, the mapped image area is an image area corresponding to 60 degrees to 90 degrees (the shaded area in FIG. 4B).

After mapping the image area in the panoramic image, the area in the second image can be obtained through feature matching, and then the image position in the second image can be obtained. Specifically, the following operations can be performed. Map the image position of the speaker in the second image:

A correspondence relationship between each image region of the panoramic image and each scene region in a real scene is obtained.

Based on the correspondence, an image region to which the position belongs in the first image is located.

Through feature matching, an image region whose features in the second image match the located image region is matched.

Use the coordinates of the matched image area as the mapped image position.

When the features are matched, the features of the speakers in the image area to which the first image belongs are matched with the features of the image in the second image.

In other embodiments, the designer of the present invention may also pre-generate the position information of each image area of the image captured by each close-up camera in a real scene, and the position information may be a scene corresponding to each image area in the real scene. Area, the relative angle to the local display. The determined position of the speaker is the relative angle between the speaker and the local display screen. The relative angle mentioned here is the included angle between the speaker or the subject in the scene area and the center line of the screen distributed along the y direction in the display screen 111 shown in FIG. 1A.

Furthermore, the image position of the speaker in the second image can be obtained. In an example, the following operations can be used to map the image position of the speaker in the second image:

Recall the position information of each image area of each close-up camera in the real scene.

Match the retrieved location information with the location.

According to the matching result, an image area to which the speaker belongs in the second image is obtained.

In this example, the position is the relative angle between the speaker and the local display; the position information is the scene area corresponding to each image area in the real scene. When the relative angle with the local display is pre-generated, The step of the location information may include:

According to the lens angle of each close-up camera, the relative angle between the scene area of the close-up camera and the close-up camera is calibrated.

For example, the horizontal camera angle parameter (such as 160 degrees) of the close-up camera is used to calibrate the image taken by the close-up camera. The leftmost, middle, or rightmost image area in a photo is 10 degrees and 90 degrees relative to the camera. 160 degrees. Based on this, referring to the distance of the camera from the center of the display screen shown in FIG. 1A and the angle of the screen centerline distributed in the y direction of the display screen, the relative angle of a position in the image and the screen centerline distributed in the y direction of the display screen can be calculated .

After mapping the image position, if a close-up of the speaker's target part is required, after extracting the image data at the mapped position in the second image, the speaker can be identified and cropped from the extracted image data Image data of the target part; determining the cropped image data as the close-up image.

According to the foregoing embodiment, a close-up camera corresponding to the position of the speaker can be selected from a plurality of close-up cameras, the speaker is shot, and then the image captured by the panoramic camera and the extracted image data are displayed, and at the same time the panoramic image is displayed, Realize close-up of the speaker or the target part of the speaker, without a series of additional operations such as panning, tilting, pushing and pulling of the close-up camera without manual guidance, it can achieve close-up of the speaker, compared with the need to manually guide the camera. Some related techniques for additional operations can effectively improve the efficiency of close-ups.

In some application scenarios, the speaker may move his position with time. In other application scenarios, non-positioned objects may alternately speak as speakers. In these application scenarios, to close the speaker for accuracy, After the close-up is implemented, the position of the speaker needs to be re-determined, and then the close-up camera is selected again to achieve close-up of the speaker at the new position. For details, refer to FIG. 5. The method shown in FIG. 5 may include steps.

Step S501: Acquire an image captured by a panoramic camera as a first image.

Step S502: Determine the position of the speaker.

Step S503: Select a close-up camera corresponding to the position from the two or more close-up cameras to capture the image of the speaker as the second image.

Step S504: Display the first image and the second image.

Step S505: Re-determine the position of the speaker.

Step S506: Determine whether the position change amount of the speaker is less than a predetermined change amount according to the re-determined position and the last determined position; if it is less, perform step S504. If it is not smaller, step S507 is performed.

Step S507: From the two or more close-up cameras, select an image obtained by the close-up camera corresponding to the re-determined position to capture the speaker, which is a third image;

Step S508: Display the first image and the third image.

The technical content of this embodiment corresponds to the foregoing embodiment, and details are not described herein again.

Among them, the designer of the present invention may predetermine conditions for re-determining the position of the speaker, such as periodicity, timing, and user triggering.

In order to perform the operation of selecting a close-up camera multiple times when the position of the speaker changes little, and the selected close-up camera is the same as the previous camera, the embodiment of the present invention may predetermine a change amount, which is determined by the actual application scenario. And the target place of the application scenario is determined. For example, in a conference scenario, the interval between two participants can be set.

In addition, in some scene areas, the selected close-up camera is the same regardless of whether the speaker is changed or the position of the speaker changes. Therefore, in this case, it is not necessary to frequently perform the operation of selecting the close-up camera, and the previously taken close-up camera is directly taken. The image may be a second image, or image data may be extracted from the second image.

In an example, before selecting a close-up camera corresponding to the re-determined position from two or more close-up cameras to capture an image obtained by the speaker, it may be determined whether the re-determined position and the last determined position are within a predetermined range. Within the range of the position; if yes, perform the operation of displaying the first image and the second image; if not, perform the selection of a close-up camera pair corresponding to the re-determined position from more than two close-up cameras The speaker performs an operation of shooting to obtain a third image.

As shown in the meeting scene shown in FIG. 1A, the position is the relative angle between the speaker and the center line of the screen distributed along the y-direction of the display screen 111. It can be determined whether the newly determined position and the last determined position are both 75 degrees to 105 degrees In between, if the operation of displaying the first image and the second image is performed.

Corresponding to the embodiment of the foregoing method, the present invention also provides an embodiment of the device.

Referring to FIG. 6, FIG. 6 is a block diagram of an image display device according to an exemplary embodiment of the present invention. The device may be applied to the image display system in the foregoing embodiment, and may include a first image acquisition module 610 and a speaking position determination module. 620. A second image acquisition module 630 and an image display module 640.

The first image acquisition module 610 is configured to acquire an image captured by a panoramic camera as a first image.

The speaking position determining module 620 is configured to determine the position of the speaker.

The second image acquisition module 630 is configured to select, from the two or more close-up cameras, a close-up camera corresponding to the position to capture an image of the speaker as a second image.

The image display module 640 is configured to display the first image and the second image.

In an example, the position includes a relative position parameter of the speaker and a local display screen; the second image acquisition module 630 may include:

In another example, the second image acquisition module 630 may include:

A target camera selection module, configured to select a close-up camera corresponding to the position as a target camera from the two or more close-up cameras according to the predetermined corresponding relationship;

As an example, the position is a relative angle between the speaker and the local display screen, and the relative position parameter is a relative angle.

In another example, the image display module 64 may include:

As an example, the image display device according to the embodiment of the present invention may further include a target extraction module, configured to:

The cropped image data is determined as the close-up image.

As an example, the image position mapping module is configured to:

Use the coordinates of the matched image area as the mapped image position.

As an example, the image position mapping module is configured to:

Matching the retrieved location information with the location;

As an example, the position is the relative angle between the speaker and the local display screen;

As an example, a module that pre-generates the location information is configured to:

In other embodiments, in terms of hardware, as shown in FIG. 7, it is a hardware structure diagram of the image display device of the present invention, except for the processor, memory, network interface, and non-volatile memory shown in FIG. 7. In addition, the image display device in which the device is located in the embodiment may generally include other hardware according to the actual function of the device, and details are not described herein again. The memory and the non-volatile memory are device-readable memory, and the memory of the image display device may store program instructions executable by the processor; the processor may be coupled to the memory and used to read the program instructions stored in the storage medium, And in response, the operations in the image display method described above are performed.

In other embodiments, for operations performed by the processor, reference may be made to related descriptions in the foregoing method embodiments, and details are not described herein.

In addition, an embodiment of the present invention also provides a machine-readable storage medium (storage device / peripheral device / receiver device memory), where the readable storage medium stores program instructions, and the program instructions include the foregoing Instructions for each step of the method. When executed by one or more processors, the image display device is caused to perform the operations in the above corresponding method.

Embodiments of the present invention may take the form of a computer program product implemented on one or more readable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing program code. Computer-readable storage media includes permanent and non-permanent, removable and non-removable media, and information storage can be accomplished by any method or technology. Information may be computer-readable instructions, data structures, modules of a program, or other data. Examples of machine-readable storage media include, but are not limited to: phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only Memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD), or other optical storage , Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.

The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the present invention. Within the scope of protection.

Claims

An image display method includes the following steps:

Acquiring an image captured by a panoramic camera as a first image;

Determine the position of the speaker;

Selecting a close-up camera corresponding to the position from two or more close-up cameras to capture an image of the speaker as a second image;

The first image and the second image are displayed.
The method according to claim 1, wherein the determining the position of the speaker comprises:

Acquiring a positioning result of the associated acoustic positioning device on the speaker;

Determining the position of the speaker according to the positioning result.
The method according to claim 1, wherein the position includes a relative position parameter of the speaker and a local display screen; and from among two or more close-up cameras, a close-up camera corresponding to the position is selected for shooting The speaker's image includes:

Recall the relative position parameters of each close-up camera and the local display;

Calculating the relative position parameters of the speaker and each close-up camera based on the position and the retrieved relative position parameters;

Selecting a close-up camera with a minimum relative position parameter to the speaker from the two or more close-up cameras according to the calculated relative position parameters;

An image obtained by the selected close-up camera to capture the speaker is acquired.
The method according to claim 1, wherein selecting a close-up camera corresponding to the position from two or more close-up cameras to capture an image of the speaker comprises:

Obtain a predetermined correspondence between each scene position and each close-up camera in a real scene; the close-up camera corresponding to each scene position is dedicated to close-up the speaker at the scene position;

Selecting the close-up camera corresponding to the position as the target camera from the two or more close-up cameras according to the predetermined correspondence relationship;

Acquire an image obtained by the target camera taking the speaker.
The method according to claim 3 or 4, wherein the relative position parameter comprises a relative angle and / or a relative distance.
The method according to claim 5, wherein the position is a relative angle between the speaker and a local display screen, and the relative position parameter is a relative angle.
The method according to claim 6, wherein the relative angle is a relative angle in a horizontal direction.
The method according to claim 1, wherein the position includes coordinates of the speaker in a predetermined coordinate system; and from two or more close-up cameras, a close-up camera corresponding to the position is selected to capture the speech. Images of people, including:

Obtain the predetermined coordinates of the center of the local display screen and the predetermined coordinates of each close-up camera;

Calculating the included angle between the center of the local display screen and each camera with respect to the speaker according to the obtained predetermined coordinates and the position, which is the relative angle between the close-up camera and the speaker;

According to the calculated relative angle, a close-up camera with the smallest relative angle to the speaker is selected as the target camera.
The method according to claim 8, wherein if the number of close-up cameras selected is more than two, the method further comprises:

Calculating the relative distance between the close-up camera and the speaker according to the predetermined coordinates of the selected close-up camera and the position;

According to the calculated relative distance, from the selected target cameras, a close-up camera with the smallest relative distance from the speaker is selected as the target camera.
The method according to claim 8, wherein if the number of close-up cameras selected is more than two, the method further comprises:

Calculating the relative angle between each selected close-up camera and the speaker in the horizontal direction;

From the selected close-up cameras, a close-up camera having the smallest relative angle with the speaker in the horizontal direction is selected as the target camera.
The method according to claim 1, further comprising the following steps:

Redefining the position of the speaker;

Judging whether the position change amount of the speaker is less than a predetermined change amount according to the newly determined position and the last determined position;

If it is less than, performing the step of displaying the first image and the second image;

If it is not less than, select an image obtained by the close-up camera corresponding to the re-determined position from the two or more close-up cameras to capture the speaker as a third image;

The first image and the third image are displayed.
The method according to claim 1, wherein displaying the first image and the second image comprises:

Map the image position of the speaker in the second image according to the position;

Extracting image data at a mapped image position in the second image to obtain a close-up image of the speaker;

The close-up image and the first image are displayed.
The method according to claim 12, wherein after extracting image data at a position mapped in the second image, the method further comprises:

Identifying and cropping image data of a target part of the speaker from the extracted image data;

The cropped image data is determined as the close-up image.
The method according to claim 12, wherein mapping the image position of the speaker in the second image according to the position comprises:

Obtaining the correspondence between each image area of the panoramic image and each scene area in the real scene;

Positioning an image region to which the position belongs in the first image based on the correspondence relationship;

Matching the image regions whose features in the second image match the localized image regions through feature matching;

Use the coordinates of the matched image area as the mapped image position.
The method according to claim 12, wherein mapping the image position of the speaker in the second image according to the position comprises:

Retrieve the position information of each image area of each close-up camera in the real scene;

Matching the retrieved location information with the location;

Obtaining an image area to which the speaker belongs in the second image according to the matching result;

The coordinates of the obtained image area are used as the mapped image position.
The method according to claim 15, wherein the position is a relative angle between the speaker and a local display screen;

The position information is a scene area corresponding to each image area in a real scene, and a relative angle with the local display screen.
The method according to claim 16, wherein the step of pre-generating the location information comprises:

Calibrating the relative angle between the scene area of the close-up camera and the close-up camera according to the lens angle of each close-up camera;

According to the calibrated relative angle and the relative position parameters of the close-up camera and the local display, calculate the relative angle between the scene area of the close-up camera and the local display.
An image display system includes an image display device, a panoramic camera, and two or more close-up cameras. The image display device includes:

Display:

processor;

Memory storing processor-executable instructions;

The processor is coupled to the memory, and is configured to read program instructions stored in the memory and, in response, perform the operations in the method according to any one of claims 1 to 17.
The system according to claim 18, wherein the panoramic camera and each close-up camera are installed on the image display device.
The system according to claim 18, wherein the relative angle between each close-up camera and the display screen is different.
The system according to claim 20, wherein the panoramic camera is installed at a frame on the upper side of the image display device, a first close-up camera is installed at a frame on the left side of the image display device, and a second close-up camera is installed The camera is installed at a frame on the right side of the image display device.
The system according to claim 21, wherein the relative angle between the first close-up camera and the display screen is between 10 degrees and 50 degrees;

The relative angle between the second close-up camera and the display screen is between 130 degrees and 170 degrees.
The system according to claim 18, wherein the image display device further comprises an acoustic positioning device for positioning the speaker.
The system of claim 23, wherein the acoustic positioning device comprises a microphone array.
The system according to any one of claims 18 to 23, wherein the image display device is a conference interactive device.
The system according to claim 25, wherein the conference interactive device is a smart interactive tablet.
An image display device, comprising:

Display:

processor;

Memory storing processor-executable instructions;

The processor is coupled to the memory, and is configured to read the program instructions stored in the memory and, in response, perform the operations in the method according to any one of claims 1 to 17.
The device according to claim 27, wherein the device is associated with a panoramic camera and at least two close-up cameras.
The device according to claim 28, wherein a relative angle between each close-up camera and the display screen is different.
The device according to claim 29, wherein a relative angle between a close-up camera and the display screen is between 10 degrees and 50 degrees; a relative angle between the other close-up camera and the display screen is 130 degrees to 170 Degrees between.
The device according to claim 27, further comprising an acoustic positioning device for positioning the speaker.
The device of claim 31, wherein the acoustic positioning device comprises a microphone array.
The device according to any one of claims 27 to 32, wherein the device is a conference interaction device.
The device according to claim 33, wherein the conference interactive device is a smart interactive tablet.
One or more machine-readable storage media, characterized in that instructions are stored thereon, and when executed by one or more processors, perform the operations in the method according to any one of claims 1 to 17.
An image display device, comprising:

A first image acquisition module, configured to acquire an image captured by a panoramic camera as a first image;

A speech position determining module, configured to determine the position of a speaker;

A second image acquisition module, configured to select, from two or more close-up cameras, a close-up camera corresponding to the position to capture an image of the speaker as a second image;

An image display module is configured to display the first image and the second image.
The device according to claim 36, wherein the position includes a relative position parameter of the speaker and a local display screen; and the second image acquisition module includes:

Position parameter acquisition module, used to retrieve the relative position parameters of each close-up camera and the local display;

A relative position calculation module, configured to calculate the relative position parameters of the speaker and each close-up camera based on the position and the retrieved relative position parameters;

A camera selection module for selecting a close-up camera with the smallest relative position parameter of the speaker from the two or more close-up cameras according to the calculated relative position parameters;

The first acquisition submodule is configured to acquire an image obtained by the selected close-up camera and capturing the speaker.
The apparatus according to claim 36, wherein the second image acquisition module comprises:

A predetermined relationship acquisition module, configured to obtain a predetermined correspondence between each scene position and each close-up camera in a real scene; the close-up camera corresponding to each scene position is specifically used to close-up the speaker at the scene position;

A target camera selection module, configured to select a close-up camera corresponding to the position as a target camera from the two or more close-up cameras according to the predetermined correspondence relationship;

A second acquisition submodule is configured to acquire an image obtained by the target camera shooting the speaker.
The device according to claim 37 or 38, wherein the position is a relative angle between the speaker and a local display screen, and the relative position parameter is a relative angle.
The apparatus according to claim 36, wherein the image display module comprises:

An image position mapping module, configured to map an image position of the speaker in the second image according to the position;

A close-up image extraction module, configured to extract image data at a mapped image position in the second image to obtain a close-up image of the speaker;

An image display submodule, configured to display the close-up image and the first image.
The apparatus according to claim 40, wherein the apparatus further comprises a target extraction module, configured to:

Identifying and cropping image data of a target part of the speaker from the extracted image data;

The cropped image data is determined as the close-up image.
The apparatus according to claim 40, wherein the image position mapping module is configured to:

Obtaining the correspondence between each image area of the panoramic image and each scene area in the real scene;

Positioning an image region to which the position belongs in the first image based on the correspondence relationship;

Matching the image regions whose features in the second image match the localized image regions through feature matching;

Use the coordinates of the matched image area as the mapped image position.
The apparatus according to claim 40, wherein the image position mapping module is configured to:

Retrieve the position information of each image area of each close-up camera in the real scene;

Matching the retrieved location information with the location;

Obtaining an image area to which the speaker belongs in the second image according to the matching result;

The coordinates of the obtained image area are used as the mapped image position.
The device according to claim 43, wherein the position is a relative angle between the speaker and a local display screen;

The position information is a scene area corresponding to each image area in a real scene, and a relative angle with the local display screen.
The apparatus according to claim 44, wherein the module for pre-generating the position information is configured to:

Calibrating the relative angle between the scene area of the close-up camera and the close-up camera according to the lens angle of each close-up camera;

According to the calibrated relative angle and the relative position parameters of the close-up camera and the local display, calculate the relative angle between the scene area of the close-up camera and the local display.
An intelligent interactive tablet is characterized in that it includes a panoramic camera, a first close-up camera, and a second close-up camera. The panoramic camera, the first close-up camera, and the second close-up camera are disposed on a frame of the intelligent interactive tablet. The optical axes of the first close-up camera and the second close-up camera are inclined to the display plane of the smart interactive tablet.
The intelligent interactive tablet according to claim 46, wherein the intelligent interactive tablet is further configured to:

Acquiring an image captured by a panoramic camera as a first image;

Determine the position of the speaker;

Selecting a close-up camera corresponding to the position from two or more close-up cameras to capture an image of the speaker as a second image;

The first image and the second image are displayed.
The smart interactive tablet according to claim 46 or 47, wherein the relative angle between the optical axis of the first close-up camera and the display screen is between 10 degrees and 50 degrees; The relative angle between the optical axis and the display screen is between 130 degrees and 170 degrees.