CN116248826A

CN116248826A - Method, apparatus, device and computer program product for displaying shared content

Info

Publication number: CN116248826A
Application number: CN202210116843.2A
Authority: CN
Inventors: 赵宏; 杨杰; 马尚华
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-12-08
Filing date: 2022-02-07
Publication date: 2023-06-09
Also published as: WO2023103672A1

Abstract

The embodiment of the application discloses a method, a device, equipment and a computer program product for displaying shared content, belonging to the field of image processing. The method is applied to a conference system, wherein the conference system comprises a first terminal and a second terminal, and is characterized by comprising the following steps: determining the expected display size of characters in a target text region of the shared content shared by the first terminal on the second terminal; determining an effective viewing size; and enlarging the shared content based on the predicted display size and the effective viewing size, and displaying the enlarged shared content on the second terminal. By adopting the demonstration device, demonstration efficiency can be improved.

Description

Method, apparatus, device and computer program product for displaying shared content

The present application claims priority from chinese patent application No. 202111494221.5 entitled "a method, apparatus and system for adjusting auxiliary stream content" filed on 12/08 of 2021, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to the field of image processing, and in particular, to a method, apparatus, device and computer program product for displaying shared content.

Background

The remote demonstration technology refers to a technology of sharing demonstration images to remote equipment after the local equipment collects the demonstration images, and is widely applied to remote communication scenes such as remote conferences and remote teaching. The presentation image may generally be an image of the content displayed on the screen of the local device, or an image of the content written by a presenter captured by a camera connected to the local device, etc. The local device may send the presentation image to at least one remote device in real time, and the remote device may receive and display the presentation image. In a common remote presentation scenario, text is included in the presentation image.

Currently, when the actual size of text displayed on a remote device screen is small or a viewer is far from the remote device screen, the viewer cannot see the text content in the presentation image clearly. For this case, the viewer needs to actively prompt the presenter to zoom in on the relevant text content.

However, the above approach interrupts the normal presentation process, resulting in inefficiency of presentation.

Disclosure of Invention

The embodiment of the application provides a method for displaying shared content, which can solve the problem of low demonstration efficiency in the prior art. The technical scheme is as follows:

In a first aspect, a method for displaying shared content is provided, which may be applied to a conference system, where the conference system includes a first terminal and a second terminal, and the method is characterized in that the method includes: determining an expected display size of characters in a target character area of the shared content shared by the first terminal on the second terminal; determining an effective viewing size; the shared content is enlarged based on the predicted display size and the effective viewing size, and the enlarged shared content is displayed on the second terminal.

The first terminal can be a demonstration terminal, the second terminal can be a playing terminal, and communication can be directly established between the demonstration terminal and the playing terminal or can be established through a server. The shared content may be a presentation image. The predicted display size is an actual size of characters when displayed on a screen of the terminal. The effective viewing size is an actual size required to enable a viewer to see a character when the character is displayed on a screen of the terminal.

According to the scheme, firstly, the expected display size of characters in the shared content shared by the first terminal on the second terminal is determined, then the effective viewing size is determined, and further the shared content is enlarged based on the expected display size and the effective viewing size, and the enlarged shared content is displayed on the second terminal. Thus, the shared content can be automatically amplified when the viewer cannot see the text content in the shared content, so that the viewer can see the related text content. Therefore, a viewer is not required to prompt a presenter to zoom in characters in the presentation process, the presentation process is prevented from being interrupted, and the presentation efficiency is improved.

In one possible implementation, a target text region of the shared content is determined based on the region identification model, wherein the target text region is a text region of the target application.

The region identification model may be pre-trained so that it can identify windows of several designated applications in the shared content. During remote presentation, the shared content may be input into the region recognition model. If the shared content includes an image of the body region of a certain application (i.e., the target application) among the above-specified applications, the model may output the position range information of the body region of the target application included in the shared content. The position range information may have various possible forms, for example, four vertex coordinates of the text region, or a lower left corner vertex coordinate, a wide and high form of the text region, and the like. If the image of the target application is not included in the shared content, the model may output corresponding instruction information. The region identification model may be a machine learning model, such as a convolutional recurrent neural network or the like.

Alternatively, the target text region may be a window region of the target application, and the region identification model may identify window regions of a number of specified applications in the shared content.

Alternatively, in the case where the image of the target application is not included in the shared content, the entire area of the shared content may be taken as the target text area.

According to the scheme, the target text region of the shared content is determined based on the region identification model, and then subsequent processing is performed based on the target text region. The text area of applications in shared content is often the focus of attention in the conference process. The text region of the application program is identified in the shared content, so that the region which is concerned by people in the shared content can be accurately positioned, and the follow-up processing is performed based on the target text region, so that the accuracy of the follow-up processing can be improved.

In one possible implementation, location range information of characters included in the shared content is determined; based on the location range information, a target text region including characters is determined in the shared content.

The position range information of the text content unit included in the shared content may be determined, and the text content unit may be a portion including one character in the target text region, a portion including one line of characters, or a portion including one column of characters. Taking the case where the text content unit is a portion including one line of characters in the target text region as an example, the position range information of the text content unit included in the shared content is determined, that is, the position range information of the characters in the shared content is determined.

Alternatively, a target text region including all characters may be determined in the shared content.

The literal content unit identification model may be pre-trained so that the model can identify literal content units in the shared content. During remote presentation, the shared content may be input into a text unit identification model, and if the shared content includes text units, the model may output location range information of the text units in the shared content, where the location range information may have a plurality of possible forms, for example, four vertex coordinates of an image area of the text units, or a lower left corner vertex coordinate of the text units, a wide, high form, and the like. If the text content unit is not included in the shared content, the model may output corresponding indication information. The literal content unit recognition model may be a machine learning model, such as a convolutional recurrent neural network or the like.

Then, based on the position range information of each text unit, position range information of a circumscribed rectangular area including all text units (including all text units, that is, including all characters) may be determined in the shared content, the rectangular area being a smallest circumscribed rectangular area including all text units; alternatively, it is also possible to be a circumscribed rectangular area that includes all the text content units, has a pitch with each text content unit that is greater than the first pitch threshold, and has the same aspect ratio as that of the shared content. The circumscribed rectangular area is the target text area.

According to the scheme, position range information of characters included in the shared content is determined, and then a target text area including the characters is determined in the shared content based on the position range information. Text content in shared content is often the focus of attention in conference processes. The image area comprising the characters is identified in the shared content, so that the area which is concerned by people in the shared content can be accurately positioned, and the accuracy of the follow-up processing can be improved by carrying out the follow-up processing based on the target text area.

In one possible implementation, determining a plurality of text regions based on the location range information, wherein a first condition is satisfied between characters in the text regions, the first condition is not satisfied between characters in different text regions, and the first condition is that a distance between the characters is less than a distance threshold; one or more of the plurality of text regions are determined as target text regions.

After determining the location range information of the text content units included in the shared content, a set of at least one text content unit satisfying the first condition may be determined in the shared content based on the location range information of each text content unit. The first condition may be a number of possibilities, such as a distance between adjacent text content units of less than 5, etc. The location range information of the circumscribed rectangular area including all of the text units within the set of text units may be determined based on the location range information of all of the text units within the set of text units. The circumscribed rectangular area may be a smallest circumscribed rectangular area that includes all of the text units within the set of text units; alternatively, it is also possible to include all the text units in the set of text units, and the distance from each text unit is larger than the second distance threshold, and the aspect ratio is the same as the aspect ratio of the shared content. The circumscribed rectangular area is a text area.

After determining the text region, a target text region may be determined in the text region. For the case where there is only one text region, the text region may be determined as the target text region. For the case where there are a plurality of text regions, one or more of the text regions may be determined to be a target text region. For example, the text region in which the area is largest may be confirmed as the target text region, or all text regions may be determined as the target text region.

According to the scheme, a plurality of text areas are determined based on text position range information, and one or more of the text areas are determined to be target text areas. Text content in shared content is often the focus of attention in conference processes. The image area comprising the characters is identified in the shared content, so that the area which is concerned by people in the shared content can be accurately positioned, and the accuracy of the follow-up processing can be improved by carrying out the follow-up processing based on the target text area. For the case of the text content units with longer distances, a plurality of text content areas are divided, so that the interval areas among the text content areas are not enlarged in the enlarging process, and the utilization rate of screen display resources is improved.

In one possible implementation, determining a plurality of text regions based on the location range information, wherein a first condition is satisfied between characters in the text regions, the first condition is not satisfied between characters in different text regions, and the first condition is that a distance between the characters is less than a distance threshold; and determining the text area with the highest cursor stay frequency or the longest cursor stay time as a target text area.

Wherein the first condition may be that the distance between the characters is less than a distance threshold. The cursor may be various, such as a mouse cursor, a laser pen cursor, etc.

When the first terminal transmits the shared content, the position information of the cursor in the shared content can be simultaneously transmitted. The location information may be location coordinates of a cursor in the shared content. Position information of a cursor in the history shared content within the target history period may be stored. The stay frequency of the cursor in each text region in the history duration can be determined according to the position information of the cursor in the history shared content and the position range information of the text region in the target history duration. Then, the frequency of stay of the cursor in each text area can be determined, and the frequency of stay of the cursor in the text area is the ratio of the frequency of stay of the cursor in the text area to the total frequency of stay of the cursor. Further, the text region having the highest frequency of stay of the cursor may be determined as the target text region. Or, the stay time of the cursor in each text region can be determined, and further, the text region with the largest stay time of the cursor in the text regions can be determined as the target text region.

According to the scheme, a plurality of text areas are determined based on the position range information, and the text area with the highest cursor stay frequency or the longest cursor stay time is determined to be the target text area. In the demonstration process, the cursor often moves and stays in cooperation with the focus, and the character area with the highest stay frequency or the longest stay time of the cursor is identified in the shared content, so that the area which is concerned by people in the shared content can be accurately positioned, and the follow-up processing is performed based on the target character area, so that the accuracy of the follow-up processing can be improved. Thus, only the content which is more focused by the presenter can be amplified, and the utilization rate of the screen display resource can be improved.

In one possible implementation, determining a resolution of an image of each character in a target text region of the shared content; the predicted display size of the character on the second terminal is determined based on the resolution of the image of the character, the resolution of the shared content, and the screen size of the second terminal.

An image of the target text region may be acquired in the shared content based on the positional range information of the target text region. The text unit recognition model may be pre-trained so that the model can recognize text units in the image. The image of the target text region may be input into the text unit recognition model, and if the text unit is included in the image of the target text region, the model may output position range information of the image region of each text unit in the image of the target text region (i.e., position range information of the text unit), which may have various possible forms, for example, four vertex coordinates of the image region of the text unit, or a lower left corner vertex coordinate and a wide, high form of the image region of the text unit. If the image of the target text area does not include text content units, the model may output corresponding indication information. The literal content unit recognition model may be a machine learning model, such as a convolutional recurrent neural network, or the like. Taking the example that the position range information is the left lower corner vertex coordinates, width and height of the image area of the text unit, the resolution of the image of the text unit may be expressed as the width of the text unit.

The image of the text unit can be subjected to gray level binarization to obtain a binarized pixel matrix corresponding to the image of the text unit (the pixel values of all pixels in the matrix are 255 or 0). Then, a characterization value corresponding to each pixel column in the binarized pixel matrix may be calculated, where the characterization value is used to indicate whether a pixel point with a pixel value of 255 exists in the corresponding pixel column. The calculation process of the characterization value can be as follows: summing pixel values of all pixels in one pixel column, and if the value obtained by summation is 0, the characterization value corresponding to the pixel column is 0, which indicates that no pixel point with the pixel value of 255 exists in the pixel column; if the sum is not 0, the corresponding characterization value of the pixel column is 1, which indicates that the pixel point with the pixel value of 255 exists in the pixel column. The pixel column with the characterization value of 1 is simply referred to as a "1" pixel column, and the pixel column with the characterization value of 0 is simply referred to as a "0" pixel column.

Positional information of each pixel column, which may be positional coordinates of the pixel column in the horizontal direction, may be determined. One or more sets of pixel columns may then be determined in the binarized pixel matrix based on the corresponding characterization value for each pixel column in the binarized pixel matrix. For any pixel column set, the specified condition is satisfied between any two 1 pixel columns in the pixel column set, and the specified condition is not satisfied between any 1 pixel column in the pixel column set and any 1 pixel column outside the pixel column set. The specified conditions may be: there are no consecutive multiple "0" pixel columns between two "1" pixel columns, or there are consecutive multiple "0" pixel columns between two "1" pixel columns, and the number of "0" pixel columns in the consecutive multiple "0" pixel columns is smaller than a preset threshold, which may be set based on a general font size and image resolution. Referring to fig. 17, it can be considered that each pixel column set corresponds to one character, and the number of pixel column sets included in the text content unit is the number of characters included in the text content unit.

The number of pixels that the image of the text unit includes in the horizontal direction (i.e., the width of the image of the text unit) and the number of pixels that the image of the text unit includes in the vertical direction (i.e., the height of the image of the text unit) may be determined according to the resolution of the image of the text unit. The wide value of the image of the text unit may then be divided by the number of characters in the text unit to obtain a wide value for each character (i.e., the number of pixels the image of each character includes in the horizontal direction). Since each text unit is a portion including one line of characters in the target text area, the height of the image of the text unit can be regarded as the height of each character (i.e., the number of pixel points included in the image of each character in the horizontal direction).

According to the scheme, firstly, the resolution of the image of each character in the target text region of the shared content is determined, and then the expected display size of the characters on the second terminal is determined based on the resolution of the image of the characters, the resolution of the shared content and the screen size of the second terminal. Thus, the calculated predicted display size is more accurate, the follow-up processing can be performed on the basis of the predicted display size of each character, and the accuracy of the amplifying processing can be improved.

In one possible implementation, a distance between the second terminal and the at least one viewer is determined; an effective viewing size is determined based on a distance between the second terminal and the at least one viewer.

A binocular camera may be integrated and may be considered to be identical in location to the second terminal. The binocular camera can be controlled to shoot the meeting place, and a first meeting place image shot by the left-eye camera and a second meeting place image shot by the right-eye camera are respectively obtained. Wherein, the first meeting place image and the second meeting place image both comprise face images of at least one viewer. Then, the first meeting place image may be input into a face recognition model, and position information of the face image included in the first meeting place image may be determined. The face recognition model may be a pre-trained machine learning model that may be used to recognize images of several faces included in an image.

The first meeting place image and the second meeting place image can be input into a binocular vision ranging algorithm matched with the binocular camera, and depth information of a space point corresponding to each pixel point in the first meeting place image and the second meeting place image is determined. Wherein the depth information of the spatial point is the distance between the spatial point and the binocular camera (and the second terminal). The binocular vision ranging algorithm matched with the binocular camera may include an image preprocessing algorithm, an extraction (or detection) algorithm of image feature points, a matching algorithm of image feature points, a distance measurement algorithm, and the like. Then, the distance between the viewer corresponding to the face image and the second terminal may be determined based on the depth information of the spatial point corresponding to each pixel point in the first conference place image and the second conference place image and the position information of the face image in the first conference place image.

The distance between the second terminal and all viewers may be determined. The distances between the second terminal and all viewers may be averaged, and the average value may be taken as the target viewing distance. Then, a pre-stored correspondence table may be obtained, in which a correspondence between the viewing distance and the effective viewing size is recorded, and the correspondence table may be queried to determine the effective viewing size corresponding to the target viewing distance.

Optionally, the target viewing distance may be preset, may be a user-defined value preset by the user, or may be a default value preset by the system. The preset target viewing distance can be directly obtained, then the corresponding relation table is queried, and the effective viewing size corresponding to the target viewing distance is determined.

Alternatively, the effective viewing size may be preset by the viewer, and the preset effective viewing size may be directly acquired.

According to the scheme, the distance between the second terminal and at least one viewer is determined, and then the effective viewing size is determined based on the distance between the second terminal and the at least one viewer. Therefore, the calculated effective viewing size can meet actual requirements, the subsequent processing can be performed on the basis of the effective viewing size, and the accuracy of the amplifying processing can be improved.

In one possible implementation, determining that a proportion of characters in the target text region, for which the display size is expected to be smaller than the effective viewing size, in all characters in the target text region is greater than a proportion threshold; alternatively, it is determined that the projected display size of the characters in the target text region on the second terminal is smaller than the effective viewing size.

The pre-stored proportion threshold value can be obtained, the proportion threshold value can be a default value set by a system, the proportion threshold value can also be a user-defined value set by a user, and the numerical value of the proportion threshold value can be possible in a plurality of ways.

According to the scheme, firstly, the proportion of characters with the expected display size smaller than the effective watching size in the target character area in all characters in the target character area is determined to be larger than a proportion threshold; or determining that the predicted display size of the character in the target character area on the second terminal is smaller than the effective watching size, and performing amplification processing. Thus, the shared content which cannot be seen clearly by the viewer can be accurately amplified, and the shared content which can be seen clearly by the viewer can be not amplified, so that the processing resources and the storage resources can be saved, and the use efficiency of the processing resources and the storage resources can be improved.

In one possible implementation, the first magnification scale is determined based on the projected display size and the effective viewing size; and amplifying the image of the target text area in the shared content based on the first amplification ratio or the resolution of the shared content.

The effective viewing size and the predicted display size of the smallest character in the target text area on the second terminal may be obtained, the width of the effective display size divided by the width of the predicted display size to obtain a first quotient, and the height of the effective display size divided by the height of the predicted display size to obtain a second quotient. The magnitude of the first and second quotient can then be compared to determine the larger quotient as the first magnification ratio. Alternatively, the first magnification may be determined based on the intended display size and the effective viewing size of the next smallest character (or the third smallest character, etc.) in the target text region on the second terminal.

The width and height of the resolution of the image of the target text area may be multiplied by the first magnification ratio, respectively, to obtain a first magnification resolution corresponding to the resolution of the image of the target text area.

In the shared content, the image of the target text region is enlarged based on the first enlargement ratio.

The width and height of the image of the target text area may be multiplied by the first magnification ratio, respectively, to obtain the width and height of the image of the enlarged target text area. In the image amplification process, pixel interpolation processing can be performed by using a pixel interpolation algorithm, a bilinear interpolation algorithm, a bicubic interpolation algorithm, a fractal algorithm and other algorithms. The center of the enlarged image of the target text area (i.e., the target placement center) may coincide with the center of the image of the target text area or with the center of the shared content.

Optionally, for the case that a plurality of images of the target text area exist in the shared content, it may be determined whether a distance condition is satisfied between the plurality of images of the enlarged target text area, and if the plurality of images of the enlarged target text area in the shared content satisfy the distance condition, the image of the target text area is enlarged based on a first enlargement ratio corresponding to the target text area in the shared content. If the images of the target text areas in the shared content exist in the images of the target text areas which do not meet the spacing condition, the manual amplification prompt information can be sent to the first terminal.

The position coordinates (x, y) of the center of the image of each target text region in the shared content can be determined, and the resolution of the images of any two target text regions after enlargement is respectively expressed as W ₃ *H ₃ 、W ₄ *H ₄ The position coordinates of the center thereof are respectively expressed as (x ₁ ，y ₁ )、(x ₂ ，y ₂ ) If the following is satisfiedAny formula is listed:

it may be determined that the images of the two target text regions satisfy the spacing condition, and if neither of the two formulas is satisfied, it may be determined that the images of the two target text regions do not satisfy the spacing condition. Wherein Z is ₁ And Z ₂ A pitch threshold in the horizontal direction and a pitch threshold in the vertical direction, respectively.

Alternatively, in the shared content, the image of the target text region is enlarged based on the resolution of the shared content.

A second magnification ratio required to magnify the width of the image of the target text region to the width of the shared content may be determined, and a third magnification ratio required to magnify the height of the image of the target text region to the height of the shared content may be determined. Then, the magnitudes of the values of the second amplification degree and the third amplification degree may be compared, and the amplification degree in which the magnitude is smaller is determined as the fourth amplification degree.

Alternatively, for the case where the aspect ratio of the target text region is the same as that of the shared content, the second magnification ratio may be directly determined as the fourth magnification ratio, or the third magnification ratio may be directly determined as the fourth magnification ratio.

The width and height of the image of the target text area may be multiplied by the fourth magnification ratio, respectively, to obtain the width and height of the image of the enlarged target text area. In the image amplification process, pixel interpolation processing can be performed by using a pixel interpolation algorithm, a bilinear interpolation algorithm, a bicubic interpolation algorithm, a fractal algorithm and other algorithms. The center of the image of the enlarged target text region (i.e., the target placement center) may coincide with the center of the shared content.

Alternatively, an image of the target text region may be displayed.

The image of the target text area may be displayed, and the display mode may be a display mode of the adaptation terminal. After the image of the target text area is adapted, full-screen display can be performed on the screen of the second terminal.

According to the scheme, the first amplification ratio is determined based on the expected display size and the effective watching size, and then the image of the target text area in the shared content is amplified based on the first amplification ratio or the resolution of the shared content. In this way, text content in the shared content can be enlarged to a size that is visually recognizable to the viewer without the viewer actively prompting the presenter. Therefore, the method and the device avoid the interruption of the demonstration process and improve the demonstration efficiency.

In one possible implementation, the first magnification scale is determined based on the projected display size and the effective viewing size; determining a first amplification resolution corresponding to the resolution of the image of the target text region based on the first amplification proportion; determining a second amplified resolution based on the first amplified resolution, the resolution of the shared content, and the resolution of the second terminal screen; and amplifying the image of the target text area in the shared content based on the second amplification resolution, and setting the display mode of the amplified image to be a display mode for maintaining the image resolution.

The effective viewing size and the predicted display size of the smallest character in the target text area on the second terminal may be obtained, the width of the effective display size divided by the width of the predicted display size to obtain a first quotient, and the height of the effective display size divided by the height of the predicted display size to obtain a second quotient. The magnitude of the first and second quotient can then be compared to determine the larger quotient as the first magnification ratio. Alternatively, the first magnification may also be determined based on the predicted display size and the effective viewing size of the next smallest character (or the third smallest character, etc.) in the target text region.

The width of the first amplified resolution may be divided by the width of the shared content to obtain a first quotient, and the height of the first amplified resolution may be divided by the height of the shared content to obtain a second quotient. The magnitude of the first quotient and the magnitude of the second quotient can be compared, and the quotient with the larger magnitude is determined as a fifth amplification ratio. The width and height of the resolution of the screen of the second terminal may be multiplied by the fifth magnification ratio, respectively, to obtain a second magnification resolution corresponding to the resolution of the screen of the second terminal.

Then, a sixth magnification ratio required to enlarge the width of the image of the target character region to the width of the second magnification resolution may be determined, and a seventh magnification ratio required to enlarge the image of the target character region to the height of the second magnification resolution may be determined. Then, the magnitudes of the values of the sixth amplification degree and the seventh amplification degree may be compared, and the amplification degree in which the magnitude is smaller is determined as the eighth amplification degree.

The width and height of the image of the target text area may be multiplied by the eighth magnification ratio, respectively, in the shared content to obtain the width and height of the image of the enlarged target text area. In the image amplification process, pixel interpolation processing can be performed by using a pixel interpolation algorithm, a bilinear interpolation algorithm, a bicubic interpolation algorithm, a fractal algorithm and other algorithms. The center of the image of the enlarged target text region may coincide with the center of the shared content.

The display mode of the enlarged image may be an original resolution display mode, and the second terminal displays the enlarged image on the screen according to the resolution of the enlarged image.

According to the scheme, first, a first amplification proportion is determined based on the expected display size and the effective viewing size, then, based on the first amplification proportion, a first amplification resolution corresponding to the resolution of the image of the target text region is determined, and based on the first amplification resolution, the resolution of the shared content and the resolution of the second terminal screen, a second amplification resolution is determined, and further, based on the second amplification resolution, the image of the target text region in the shared content is amplified. In this way, text content in the shared content can be enlarged to a size that is visually recognizable to the viewer without the viewer actively prompting the presenter. Therefore, the method and the device avoid the interruption of the demonstration process and improve the demonstration efficiency.

In one possible implementation, a manual magnification prompt is issued.

According to the scheme, manual amplification prompt information is sent. The second terminal can display a popup window for manually amplifying the prompt information, and the presenter can operate the second terminal to adjust the size of the text content of the shared content through reminding of manually amplifying the prompt information. In this way, the presenter can actively zoom in on text content in the shared content without the viewer actively prompting the presenter. Therefore, the method and the device avoid the interruption of the demonstration process and improve the demonstration efficiency.

In one possible implementation, the shared content and the image obtained by the image enlargement processing are output.

According to the scheme, the shared content and the image obtained by image amplification processing are output, the second terminal can be provided with more than two screens, and the shared content and the image obtained by image amplification processing can be displayed on different screens of the second terminal respectively. Therefore, the flexibility of the display mode is improved, so that a viewer can simultaneously watch the shared content and the image obtained by image amplification processing, and the user experience is improved.

In a second aspect, there is provided an apparatus for displaying shared content, the apparatus comprising one or more modules for implementing the method of the first aspect and possible implementations thereof.

In a third aspect, a computer device is provided, the computer device comprising a memory for storing computer instructions and a processor executing the computer instructions stored by the memory to cause the computer device to perform the method of the first aspect and possible implementations thereof.

In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing computer program code which, when executed by a computer device, performs the method of the first aspect and possible implementations thereof.

In a fifth aspect, a computer program product is provided, the computer program product comprising computer program code for, when executed by a computer device, performing the method of the first aspect and possible implementations thereof.

The beneficial effects that technical scheme that this application embodiment provided brought are:

according to the scheme, firstly, the expected display size of characters in the shared content shared by the first terminal on the second terminal is determined, then the effective watching size is determined, and further the shared content is enlarged and displayed based on the expected display size and the effective watching size. Thus, the shared content can be automatically amplified when the viewer cannot see the text content in the shared content, so that the viewer can see the related text content. Therefore, a viewer is not required to prompt a presenter to zoom in characters in the presentation process, the presentation process is prevented from being interrupted, and the presentation efficiency is improved.

Drawings

FIG. 1 is a schematic diagram of image content provided in an embodiment of the present application;

fig. 2 is a schematic view of a display effect provided in an embodiment of the present application;

fig. 3 is a schematic view of a display effect provided in an embodiment of the present application;

Fig. 4 is a schematic view of a display effect provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a conference system provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a demonstration terminal according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a playing terminal according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application;

FIG. 9 is a schematic flow chart of displaying shared content according to an embodiment of the present application;

FIG. 10 is a schematic flow chart of displaying shared content according to an embodiment of the present application;

FIG. 11 is a schematic illustration of a presentation image provided in an embodiment of the present application;

FIG. 12 is a schematic flow chart showing sharing content according to an embodiment of the present application;

FIG. 13 is a schematic illustration of a presentation image provided in an embodiment of the present application;

FIG. 14 is a schematic flow chart showing sharing content according to an embodiment of the present application;

FIG. 15 is a schematic illustration of a presentation image provided in an embodiment of the present application;

FIG. 16 is a schematic flow chart showing sharing content according to an embodiment of the present application;

FIG. 17 is a schematic diagram of a character pixel according to an embodiment of the present application;

FIG. 18 is a schematic flow chart showing sharing content according to an embodiment of the present application;

FIG. 19 is a schematic diagram of an enlarged target text region beyond the boundary of a presentation image according to an embodiment of the present application;

fig. 20 is a schematic diagram of an image enlarging process provided in an embodiment of the present application;

fig. 21 is a schematic diagram of an image enlarging process provided in an embodiment of the present application;

fig. 22 is a schematic diagram of an image enlarging process provided in an embodiment of the present application;

fig. 23 is a schematic diagram of an image enlarging process provided in an embodiment of the present application;

fig. 24 is a schematic diagram of an apparatus for displaying shared content according to an embodiment of the present application.

Detailed Description

Some nouns used in the present embodiment are explained below.

The display size is expected to be: refers to the actual size of the character when displayed on the screen of the terminal. The representation may be a multiplication of two values, e.g. 2 x 2 (units: cm), the former being the actual width of the character in the horizontal direction of the screen and the latter being the actual height of the character in the vertical direction of the screen.

Effective viewing size: refers to the actual size required to allow a viewer to see the character when the character is displayed on the screen of the terminal. The representation may be the same size as the intended display. The effective viewing size is generally determined in consideration of the distance of the viewer from the terminal.

Text content unit: refers to the area of the image that contains the character content. The text content unit may be a part of the image containing one character, a part containing one line of characters, a part containing one column of characters, or the like. There may be a plurality of text units in an image, the text units not overlapping each other. The extent of the text units is determined by a model that identifies the text units. In this embodiment, the text content unit is a portion of the image including a line of characters, and the other cases are the same, which is not described herein.

Text region: a main area for editing contents in a window area of an application program having an editing function. For example, the relationship between the image, the window area of the application, and the text area may be referred to in fig. 1.

Resolution: the resolution of an image may also be referred to as the resolution of the image, referring to the number of pixels the image contains. The representation may be a multiplication of two values, such as 200 x 100 (units: one). The former value represents the number of pixels (also referred to as the width of the image resolution) that the image contains in the horizontal direction of the screen, and the latter value represents the number of pixels (also referred to as the height of the image resolution) that the image contains in the vertical direction of the screen. Taking an image resolution of 200 x 100 as an example, this means that the image contains 200 pixels in the horizontal direction of the screen and 100 pixels in the vertical direction of the screen.

Screen resolution: refers to the number of pixels contained in the screen. The representation may be a multiplication of two values, such as 800 x 480. The former value represents the number of pixels contained in the horizontal direction of the screen (also referred to as the width of the screen resolution), and the latter value represents the number of pixels contained in the vertical direction of the screen (also referred to as the height of the screen resolution).

Amplification ratio: the resolution of the image obtained by the enlargement processing is an enlargement ratio with respect to the resolution of the original image. Taking the resolution of the original image as 200×100 as an example, if the resolution of the image obtained after the enlargement processing is 400×200, the enlargement ratio is 2. If the resolution of the image obtained after the enlargement processing is the same as that of the original image, the enlargement ratio is 1.

Original resolution display mode: refers to a mode in which an image is displayed without changing the resolution of the image, that is, a display mode in which the resolution of the image is maintained. For the case that the resolution of the image is smaller than that of the screen, for example, the resolution of the screen of the terminal is 400×200, and the resolution of the image is 200×100, as shown in fig. 2, the terminal displays the image by using 200×100 pixels in the screen. For the case that the resolution of the image is greater than that of the screen, for example, the resolution of the screen of the terminal is 400×200 and the resolution of the image is 800×400, as shown in fig. 3, the screen of the terminal may display a local area of the image, and a scroll bar is added in the display area, and a viewer may adjust the local area of the image displayed in the screen by sliding the scroll bar so as to view other local areas of the image.

Adapting a terminal display mode: the method refers to performing adaptation adjustment on the resolution of the image based on the resolution of the terminal screen, so that the image after the adaptation adjustment can be displayed on the terminal screen in a full screen mode. For example, in the case that the resolution of the image is smaller than the resolution of the screen, the resolution of the screen of the terminal is 400×200, and the resolution of the image is 200×100, as shown in fig. 4, the terminal amplifies the resolution of the image to 400×200 by using an interpolation method, and then displays the image in full screen according to the amplified resolution. For the case that the resolution of the image is greater than that of the screen, for example, the resolution of the screen of the terminal is 400×200 and the resolution of the image is 800×400, the terminal may compress the resolution of the image to 400×200, and then perform full-screen display according to the compressed resolution.

The embodiment of the application provides a method for displaying shared content, which can be applied to a remote demonstration technology. Remote presentation technology is widely applied to remote communication scenes such as teleconferencing, remote teaching and the like. Taking a conference scene as an example, in a common teleconference scene, a presenter and a viewer are located in conference rooms at different places, and the shared content of the presenter is a presentation image. The presenter can collect the demonstration image through the demonstration terminal and then directly send the demonstration image to the play terminals at different places, or forward the demonstration image to the play terminals at different places through the server, and the play terminals can receive and display the demonstration image. The presentation image may be an image of the content displayed on the screen of the presentation terminal (such as a shared desktop image, etc.), or an image of the content written by a presenter photographed by a camera connected to the presentation terminal. In addition to the presentation image, the presentation terminal may also acquire a scene image and then transmit. The scene image may generally be a conference room live image taken by a camera connected to the presentation terminal. In such a scenario, the scene image may be referred to as a primary stream and the presentation image may be referred to as a secondary stream.

It should be noted that the remote demonstration technology can also be used for on-site communication scenes such as on-site conferences and on-site teaching. Taking a scene of a live conference as an example, a presenter and a viewer can be located in the same conference room, a presentation terminal of the presenter can be a notebook computer or the like, a play terminal in the conference room can be an intelligent screen or the like, the presentation terminal can send acquired presentation images to the play terminal, and the play terminal can receive and display the presentation images.

In this embodiment, a teleconference scene and a presentation image are taken as examples of a shared desktop image of a presentation terminal, and other cases are similar, and are not described herein.

Based on the application scenario, the embodiment of the application provides a method for displaying shared content, which can be applied to a conference system, and referring to fig. 5, the conference system can include a presentation terminal, a playing terminal, a server and the like. The demonstration terminal can establish communication with the playing terminal through the server, and can also directly establish communication with the playing terminal. The execution subject of the method can be a presentation terminal, a play terminal, a server, etc. in the conference system. The presentation terminal and the play terminal can be smart screens, desktop computers, notebook computers, mobile phones, tablet computers, smart watches, etc. The server may be a separate server or a group of servers.

Fig. 6 is a schematic structural diagram of a presentation terminal according to an embodiment of the present application, and from a hardware composition point of view, a presentation terminal 600 may be configured as shown in fig. 6, and includes a processor 601, a memory 602, a communication unit 603, and a display unit 604.

The processor 601 may be a central processing unit (central processing unit, CPU) or a system on chip (SoC) or the like, and the processor 601 may be configured to determine an expected display size of characters in a target text region of the shared content shared by the first terminal on the second terminal, to determine an effective viewing size, and to enlarge the shared content based on the expected display size and the effective viewing size. Etc.

The memory 602 may include various volatile memory or non-volatile memory, such as Solid State Disk (SSD), dynamic random access memory (dynamic random access memory, DRAM) memory, and the like. The memory 602 may be used to store shared content, effective viewing sizes, and the like.

The communication component 603 may be a wired network connector, a wireless fidelity (wireless fidelity, wiFi) module, a bluetooth module, a cellular network communication module, or the like. The communication unit 603 may be used for data transmission with other devices, which may be servers or terminals. For example, the presentation terminal 600 may receive the shared content, and may also transmit the shared content and the amplified shared content.

The display unit 604 may be a separate screen, a screen integrated with the terminal body, a projector, or the like, the screen may be a touch screen, or a non-touch screen, and the display unit 604 may be used to display a system interface, an application interface, or the like, for example, the display unit 604 may display shared content, or may display the enlarged shared content.

Fig. 7 is a schematic structural diagram of a playback terminal according to an embodiment of the present application, and from a hardware perspective, a playback terminal 700 may be configured as shown in fig. 7, and includes a processor 701, a memory 702, a communication unit 703, and a display unit 704.

The processor 701 may be a central processing unit (central processing unit, CPU) or a system on chip (SoC) or the like, and the processor 701 may be configured to determine an expected display size of characters in a target text region of the shared content shared by the first terminal on the second terminal, to determine an effective viewing size, and to enlarge the shared content based on the expected display size and the effective viewing size. Etc.

The memory 702 may include various volatile memory or non-volatile memory, such as Solid State Disk (SSD), dynamic random access memory (dynamic random access memory, DRAM) memory, and the like. The memory 702 may be used to store shared content, effective viewing sizes, and the like.

The communication means 703 may be a wired network connector, a wireless fidelity (wireless fidelity, wiFi) module, a bluetooth module, a cellular network communication module, etc. The communication unit 703 may be used for data transmission with other devices, which may be servers or terminals. For example, the playback terminal 700 may receive the shared content, and may also transmit the shared content and the amplified shared content.

The display unit 704 may be a separate screen, a screen integrated with the terminal body, a projector, or the like, and the screen may be a touch screen or a non-touch screen, and the display unit 704 may be used to display a system interface, an application interface, or the like, for example, the display unit 704 may display shared content or display the shared content after the enlargement processing.

Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, and from a hardware composition point of view, a structure of a server 800 may be shown in fig. 8, including a processor 801, a memory 802, and a communication unit 803.

The processor 801 may be a central processing unit (central processing unit, CPU) or a system on chip (SoC) or the like, and the processor 801 may be configured to determine an expected display size of characters in a target text region of the shared content shared by the first terminal on the second terminal, to determine an effective viewing size, and to enlarge the shared content based on the expected display size and the effective viewing size. Etc.

The memory 802 may include various volatile memories or non-volatile memories, such as Solid State Disk (SSD), dynamic random access memory (dynamic random access memory, DRAM) memory, and the like. The memory 802 may be used to store shared content, effective viewing sizes, and the like.

The communication component 803 may be a wired network connector, a wireless fidelity (wireless fidelity, wiFi) module, a bluetooth module, a cellular network communication module, or the like. The communication unit 803 may be used for data transmission with other devices, which may be servers or terminals. For example, the server 800 may receive the shared content, and may also transmit the shared content and the shared content subjected to the enlargement processing.

Based on the application scenario and the execution device, the embodiment of the present application uses the execution subject as the playing terminal for illustration, and other cases are similar, and the embodiment of the present application will not be repeated. The process flow shown in fig. 9 will be described in detail with reference to the specific embodiments, and the following may be included:

901, the playing terminal determines a target text region in the demonstration image based on the region identification model.

Wherein the target text region is a text region of the target application. The target application may be of various types, such as a document type application, a communication type application, etc. The target application program may be one or a plurality of target application programs. The presentation image is a frame of image in the presentation video.

In practice, the region identification model may be pre-trained so that the model can identify windows of a number of specified applications in the presentation image. During remote presentation, the presentation image may be input into the region recognition model. If the presentation image includes an image of the text region of a certain application program (i.e., the target application program) among the above-specified application programs, the model may output position range information of the text region of the target application program included in the presentation image. The position range information may have various possible forms, for example, four vertex coordinates of the text region, or a lower left corner vertex coordinate, a wide and high form of the text region, and the like. If the presentation image does not include the image of the target application, the model may output corresponding indication information. The region identification model may be a machine learning model, such as a convolutional recurrent neural network (convolutional recurrent neural network, RCNN), or the like.

Alternatively, the target text region may be a window region of the target application, and the region identification model may identify window regions of a number of specified applications in the presentation image.

Alternatively, for the case where the image of the target application is not included in the presentation image, the entire area of the presentation image may be taken as the target text area.

Alternatively, before the presentation image is input into the region identification model, the presentation image may be input into the application window judgment model to determine whether the window region of the target application program is included in the presentation image. The application window judgment model may be a machine learning model that is pre-trained to judge whether one of several specified applications exists in the image. If the judging result is that the window area of the target application program is contained in the demonstration image, inputting the demonstration image into an area identification model to obtain the position range information of the text area of the target application program contained in the demonstration image, and then taking the text area as a target text area. And if the judgment result is that the window area of the target application program is not contained in the demonstration image, taking the whole area of the demonstration image as a target text area. The application window judgment model and the region identification model may be independent machine learning models, or may be two modules in the same machine learning model.

The text area of an application in a presentation image is often the focus of attention in the course of a meeting. The text region of the application program is identified in the demonstration image, so that the image region which is concerned by people in the demonstration image can be accurately positioned, and the image region is subjected to subsequent processing. Thus, the accuracy of the subsequent image processing can be improved.

The playback terminal determines 902 the intended display size of each character in the target text region of the presentation image on the playback terminal.

In an implementation, a playback terminal may determine the resolution of an image of each character in a target text region of a presentation image, and then determine an expected display size of each character on the playback terminal based on the resolution of the image of each character, the resolution of the presentation image, and the screen size of the playback terminal.

The playback terminal may determine an expected display size of the presentation image based on a screen size of the playback terminal. The ratio of the width of the resolution of the image of each character to the width of the resolution of the presentation image may be determined, and the width of the intended display size of each character on the playback terminal may be determined based on the ratio and the width of the intended display size of the presentation image on the playback terminal. Also, a ratio of the high resolution of the image of each character to the high resolution of the presentation image may be determined, and the high predicted display size of each character on the playback terminal may be determined based on the ratio and the high predicted display size of the presentation image on the playback terminal.

For example, the resolution of an image of a certain character is 20×10, the resolution of a presentation image is 200×100, the screen size of a playing terminal is 40×20 cm, the aspect ratio of the resolution of the presentation image is the same as the aspect ratio of the screen size of the playing terminal, and the playing terminal displays the presentation image according to the adaptive terminal display mode, so that the predicted display size of the presentation image is 40×20 cm. The ratio of the resolution of the image of the character to the resolution of the presentation image is 0.1, the width of the intended display size of the image of the character is 4 cm, the ratio of the resolution of the image of the character to the resolution of the presentation image is 0.1, and the width of the intended display size of the image of the character is 2 cm.

It should be noted that, in the adaptive terminal display mode, for the case that the aspect ratio of the resolution of the presentation image is different from the aspect ratio of the screen size of the playing terminal, the playing terminal may enlarge or compress the resolution of the presentation image to the resolution at which the screen of the playing terminal can be displayed in full screen. Then, the playback terminal may determine a ratio of the resolution width of the image of each character to the resolution width of the presentation image, and determine the width of the intended display size of each character on the playback terminal based on the ratio and the width of the intended display size of the presentation image on the playback terminal. Also, a ratio of the high resolution of the image of each character to the high resolution of the presentation image may be determined, and the high predicted display size of each character on the playback terminal may be determined based on the ratio and the high predicted display size of the presentation image on the playback terminal.

For example, the resolution of the image of a certain character is 10×20, the resolution of the presentation image is 100×200, and the screen size of the playing terminal is 40 cm×20 cm, and after the presentation image is adapted, the predicted display size of the presentation image is 10 cm×20 cm. The ratio of the resolution of the image of the character to the resolution of the presentation image is 0.1, the width of the intended display size of the image of the character is 4 cm, the ratio of the resolution of the image of the character to the resolution of the presentation image is 0.1, and the width of the intended display size of the image of the character is 2 cm.

Alternatively, the playback terminal may determine the actual size of each pixel in the screen based on the screen size and the screen resolution. The playback terminal may determine a ratio of the resolution of each character to the resolution of the presentation image. The playing terminal can calculate and obtain the resolution of the demonstration image after the adaptation adjustment, and calculate and obtain the resolution of each character after the adaptation adjustment according to the ratio of the resolution of each character to the resolution of the demonstration image and the resolution of the demonstration image after the adaptation adjustment. Then, the resolution of the demonstration image after the adaptation and adjustment of each character and the actual size of each pixel point in the screen can determine the expected display size of each character on the playing terminal.

903, the playback terminal determines the effective viewing size.

In implementations, the playback terminal may determine a distance between the playback terminal and the viewer, and then determine the effective viewing size based on the distance between the playback terminal and the viewer.

The playing terminal can be integrated with a binocular camera, and the positions of the binocular camera and the playing terminal can be considered to be the same. The playing terminal can control the binocular camera to shoot the meeting place, and a first meeting place image shot by the left-eye camera and a second meeting place image shot by the right-eye camera are respectively obtained. Wherein, the first meeting place image and the second meeting place image both comprise face images of at least one viewer. Then, the playing terminal can input the first meeting place image into a face recognition model, and determine the position information of the face image included in the first meeting place image. The face recognition model may be a pre-trained machine learning model that may be used to recognize images of several faces included in an image.

The playing terminal can input the first meeting place image and the second meeting place image into a binocular vision ranging algorithm matched with the binocular camera, and depth information of a space point corresponding to each pixel point in the first meeting place image and the second meeting place image is determined. The depth information of the spatial point is the distance between the spatial point and the binocular camera (and the playing terminal). The binocular vision ranging algorithm matched with the binocular camera may include an image preprocessing algorithm, an extraction (or detection) algorithm of image feature points, a matching algorithm of image feature points, a distance measurement algorithm, and the like.

Then, the playing terminal can determine the distance between the viewer corresponding to the face image and the playing terminal based on the depth information of the space point corresponding to each pixel point in the first conference place image and the second conference place image and the position information of the face image in the first conference place image.

Optionally, the playing terminal may also determine the distance between the playing terminal and the viewer using a monocular camera, a face recognition model, and a monocular vision ranging algorithm matched to the monocular camera.

Through the above processing, the playback terminal can determine the distances between the playback terminal and all viewers. The playback terminal may average the distances between the playback terminal and all viewers, with the average being the target viewing distance. Then, the playing terminal may obtain a pre-stored correspondence table, in which the correspondence between the viewing distance and the effective viewing size is recorded, and query the correspondence table to determine the effective viewing size corresponding to the target viewing distance.

Optionally, the playing terminal may calculate a first average value of distances between the playing terminal and all viewers, calculate a second average value of distances between the playing terminal and the first average value, and determine an effective viewing size corresponding to the target viewing distance by taking the second average value as the target viewing distance.

Optionally, the playing terminal may determine that the farthest distance between the playing terminal and the viewer is the target viewing distance, and then query the correspondence table to determine the effective viewing size corresponding to the target viewing distance.

Optionally, the target viewing distance may be preset, may be a user-defined value preset by the user, or may be a default value preset by the system. The playing terminal can directly acquire a preset target viewing distance, then inquire a corresponding relation table and determine an effective viewing size corresponding to the target viewing distance.

Alternatively, the effective viewing size may be preset by the viewer, and the playing terminal may directly acquire the preset effective viewing size.

904, when the presentation image satisfies the enlargement condition, the playback terminal performs image enlargement processing based on the predicted display size and the effective viewing size of each character in the target character region on the playback terminal.

The specific processing procedures of the enlargement condition and the image enlargement processing will be described in detail later.

905, the playback terminal displays the presentation image after the image enlargement processing.

The screen on which the presentation image after the image enlargement processing is displayed is the same as the screen used in the processing of calculating the predicted display size of each character in step 902 in terms of screen parameters such as screen size, screen resolution, and the like.

Alternatively, the playback terminal may display the presentation image after the image enlargement processing and the non-enlarged presentation image. The playing terminal can be connected with more than two screens, and can respectively display the two screens in which the image after the image amplification processing and the non-amplified demonstration image are arranged.

It should be noted that, the playing terminal may perform the processing from step 901 to step 903 (may be referred to as image preprocessing) on each frame of the presentation image in the presentation video, or may perform image preprocessing on one frame of the presentation image every several frames, or perform image preprocessing on one frame of the presentation image every target time period. For a presentation image that is not subjected to image preprocessing, the determination of whether or not to enlarge the condition and the image enlargement processing can be performed on the presentation image in accordance with the preprocessing data of the last time. In this way, processing resources and memory resources in displaying the shared content can be saved.

In addition to the processing flow shown in fig. 9, the flow of displaying shared content may also be as shown in fig. 10.

In 1001, the playback terminal determines position range information of characters included in the presentation image, and determines a target text region including all characters in the presentation image based on the position range information of the characters.

In implementation, the playing terminal may determine the position range information of the text content unit included in the presentation image, where the text content unit may be a portion including one character in the target text area, a portion including one line of characters, or a portion including one column of characters. In the processing procedure, taking the part of the text unit including one line of characters in the target text area as an example, the position range information of the text unit is the position range information of the characters included in the text unit, and the position range information of the text unit can be considered to be the position range information of the characters included in the text unit. The position range information of all the text content units included in the presentation image is determined, and the position range information of all the characters in the presentation image is determined.

The text unit recognition model may be pre-trained so that the model can recognize text units in the presentation image. In the process of remote demonstration, a demonstration image can be input into a character content unit identification model, if the demonstration image comprises the character content unit, the model can output position range information of the character content unit in the demonstration image, and the position range information can have various possible forms, for example, four vertex coordinates of an image area of the character content unit, left lower corner vertex coordinates of the character content unit, wide and high forms and the like. If the presentation image does not include a text content unit, the model may output corresponding indication information. The literal content unit recognition model may be a machine learning model, such as a convolutional recurrent neural network or the like.

Then, the playback terminal may determine, in the presentation image, position range information of a circumscribed rectangular area including all the text units (including all the text units, that is, including all the characters) based on the position range information of each text unit, the rectangular area being a smallest circumscribed rectangular area including all the text units; alternatively, it is also possible to be a circumscribed rectangular area that includes all the text content units and has a spacing from each text content unit that is greater than the first spacing threshold and the aspect ratio is the same as the aspect ratio of the presentation image. The circumscribed rectangular area is the target text area. Illustratively, the target text region may be as shown in FIG. 11.

Alternatively, the playback terminal may determine the text region in the presentation image based on the region identification model, then determine the position range information of the text content units included in the text region, and then determine the target text region including the text content units in the image of the text region based on the position range information of the text content units.

1002, a playback terminal determines an expected display size of each character in a target text region of a presentation image on the playback terminal.

The processing of this step is similar to the corresponding processing of step 902, and reference may be made to the description of step 902, which is not repeated here.

1003, the playback terminal determines the effective viewing size.

The processing of this step is similar to the corresponding processing of step 903, and reference may be made to the description of step 903, which is not repeated here.

When the presentation image satisfies the enlargement condition, the playback terminal performs image enlargement processing based on the predicted display size and the effective viewing size of each character in the target text region on the playback terminal 1004.

The processing of this step is similar to the corresponding processing of step 904, and reference may be made to the description of step 904, which is not repeated here.

1005, the playback terminal displays the presentation image after the image enlargement processing.

The process of this step is similar to the corresponding process of step 905, and reference may be made to the description of step 905, which is not repeated here.

It should be noted that, the playing terminal may perform the processing (may be referred to as image preprocessing) from step 1001 to step 1003 on each frame of the presentation image in the presentation video, or may perform image preprocessing on one frame of the presentation image every several frames, or perform image preprocessing on one frame of the presentation image every target time period. For a presentation image that is not subjected to image preprocessing, the determination of whether or not to enlarge the condition and the image enlargement processing can be performed on the presentation image in accordance with the preprocessing data of the last time. In this way, processing resources and memory resources in displaying the shared content can be saved.

In addition to the above-described processing flow, the processing flow of displaying the shared content may also be as shown in fig. 12.

1201, the playing terminal determines position range information of characters included in the presentation image, determines at least one text region in the presentation image based on the position range information of the characters, and determines a target text region in the text region.

Each text area comprises one or more text content units, the text content units in the same text area meet a first condition, and the text content units in different text areas do not meet the first condition. The first condition may be that a distance between the text content units is less than a distance threshold.

The processing of determining, by the playing terminal, the position range information of the text content unit included in the presentation image is similar to the corresponding processing in step 1001, and the relationship between the position range information of the text content unit and the position range information of the character is already described in the corresponding processing in step 1001, and reference may be made to the related description in step 1001, which is not repeated here.

After determining the location range information of the text units included in the presentation image, the playback terminal may determine, in the presentation image, a set of at least one text unit satisfying the first condition based on the location range information of each text unit. The first condition may be a number of possibilities, such as a distance between adjacent text content units of less than 5, etc. The playback terminal may determine the positional range information of the circumscribed rectangular area including all the text units in the set of text units based on the positional range information of all the text units in the set of text units. The circumscribed rectangular area may be a smallest circumscribed rectangular area that includes all of the text units within the set of text units; alternatively, it is also possible that all the text units in the set of text units are included, and the distance from each text unit is larger than the second distance threshold, and the aspect ratio is the same as the aspect ratio of the presentation image. The circumscribed rectangular area is a text area. Illustratively, the text region may be as shown in FIG. 13.

Alternatively, the playback terminal may determine the text region in the presentation image based on the region identification model, then determine the position range information of the text content units included in the text region, and then determine at least one text region in the image of the text region based on the position range information of the text content units. Wherein, for each text region, the text region comprises one or more text content units, the text content units in the same text region satisfy a first condition, and the text content units in different text regions do not satisfy the first condition.

After determining the text region, a target text region may be determined in the text region. For the case where there is only one text region, the text region may be determined as the target text region. For the case where there are a plurality of text regions, one or more of the text regions may be determined to be a target text region. For example, it may be confirmed that the text region in which the area is largest is the target text region, or as shown in fig. 13, all text regions may be determined as the target text region.

1202, a playback terminal determines an expected display size of each character in a target text region of a presentation image on the playback terminal.

1203, the playing terminal determines an effective viewing size.

When the demonstration image satisfies the magnification condition, the playing terminal performs image magnification processing based on the predicted display size and the effective viewing size of each character in the target text region on the playing terminal 1204.

1205, the playing terminal displays the demonstration image after the image amplifying process.

It should be noted that, the playing terminal may perform the processing from step 1201 to step 1203 (may be referred to as image preprocessing) on each frame of the presentation image in the presentation video, or may perform image preprocessing on one frame of the presentation image every several frames, or perform image preprocessing on one frame of the presentation image every target time period. For a presentation image that is not subjected to image preprocessing, the determination of whether or not to enlarge the condition and the image enlargement processing can be performed on the presentation image in accordance with the preprocessing data of the last time. In this way, processing resources and memory resources in displaying the shared content can be saved.

In addition to the above-described processing flow, the processing flow of displaying the shared content may also be as shown in fig. 14.

1401, the playing terminal determines position range information of characters included in the presentation image, determines a plurality of text areas in the presentation image based on the position range information of the characters, and determines a text area with the highest cursor stay frequency or the longest cursor stay time among the plurality of text areas as a target text area.

Each text area comprises one or more text content units, the text content units in the same text area meet a first condition, and the text content units in different text areas do not meet the first condition. The cursor may be various, such as a mouse cursor, a laser pen cursor, etc.

The processing of determining the position range information of the text content unit included in the presentation image by the playing terminal is similar to the corresponding processing of step 1001, and the relationship between the position range information of the text content unit and the position range information of the character is also described in the corresponding processing of step 1001, and the description related to step 1001 may be referred to, and will not be repeated here.

After determining the position range information of the text units included in the presentation image, the playback terminal may determine a plurality of text areas in the presentation image based on the position range information of each text unit. The process of determining the text region is similar to the corresponding process of step 1201, and reference may be made to the description of step 1201, which is not repeated here.

When the demonstration terminal transmits the demonstration image, the position information of the cursor in the demonstration image can be simultaneously transmitted. The position information may be position coordinates of a cursor in the presentation image. The play terminal may store the position information of the cursor in the history presentation image within the target history duration. The playing terminal can determine the stay frequency of the cursor in each text area in the history duration according to the position information of the cursor in the history demonstration image and the position range information of the text area in the target history duration. Then, the playing terminal can determine the stay frequency of the cursor in each text area, wherein the stay frequency of the cursor in the text area is the ratio of the stay frequency of the cursor in the text area to the total stay frequency of the cursor. Further, the text region having the highest frequency of stay of the cursor may be determined as the target text region.

For example, as shown in fig. 15, the presentation image includes a character area a, a character area B, and a character area C, the frequency of stay of the cursor in the character area a is 6, the frequency of stay in the character area B is 4, the frequency of stay in the character area C is 3, the frequency of stay in the area outside the character area is 7, the frequency of stay of the cursor in the character area a is the highest, and the character area a can be determined as the target character area.

Alternatively, the text region with the highest dwell frequency of the cursor may be determined as the target text region.

Alternatively, a text region with a cursor dwell frequency greater than a frequency threshold may be determined as the target text region. The frequency threshold may be preset, may be a user-defined value preset by the user, or may be a default value preset by the system.

Alternatively, it may be determined that, among the text areas, the text area where the stay frequency of the cursor is greater than the frequency threshold is the target text area. The stay frequency of the cursor in the text area is the ratio of the stay frequency of the cursor in the text area to the total stay frequency of the cursor. The frequency threshold may be preset, may be a user-defined value preset by the user, or may be a default value preset by the system.

Alternatively, the text region with the highest stay frequency of the cursor can be determined as the target text region in the text regions.

Alternatively, the playback terminal may determine the text region in the presentation image based on the application window recognition model and the text region recognition model, then determine the position range information of the text content units included in the text region, and then determine the text region in the image of the text region based on the position range information of the text content units. In the text region, a target text region is determined. For each text region, the text region comprises one or more text content units, a second distance approaching condition is met between the text content units in the text region, and a second distance approaching condition is not met between the text content units in the text region and the text content units outside the text region. And in the target historical time, in the terminal for acquiring the demonstration image, the stay frequency of the cursor in the target text region is greater than a frequency threshold. The cursor may be various, such as a mouse cursor, a laser pen cursor, etc.

Or the playing terminal can determine the stay time of the cursor in each text region, and further, in the text region, the text region with the largest stay time of the cursor is determined as the target text region.

1402, the playback terminal determines an expected display size of each character in the target text region of the presentation image on the playback terminal.

The playback terminal determines the effective viewing size 1403.

When the presentation image satisfies the enlargement condition, 1404, the playback terminal performs image enlargement processing based on the predicted display size and the effective viewing size of each character in the target character region on the playback terminal.

1405, the playing terminal displays the presentation image after the image enlargement processing.

It should be noted that, the playing terminal may perform the processing from step 1401 to step 1403 (may be referred to as image preprocessing) on each frame of the presentation image in the presentation video, or may perform image preprocessing on one frame of the presentation image every several frames, or perform image preprocessing on one frame of the presentation image every target time period. For a presentation image that is not subjected to image preprocessing, the determination of whether or not to enlarge the condition and the image enlargement processing can be performed on the presentation image in accordance with the preprocessing data of the last time. In this way, processing resources and memory resources in displaying the shared content can be saved.

A description is given of resolution of an image determining each character in a target text region of a presentation image. The specific process may be as shown in fig. 16, including the following steps.

1601, the playback terminal determines location range information of the text content unit in the target text area.

The text content unit may be a portion containing one character in the target text region, a portion containing one line of characters, or a portion containing one column of characters. The present process takes as an example that the text content unit is a portion of the target text region that includes a line of characters.

In implementations, an image of the target text region may be acquired in the presentation image based on the location range information of the target text region. The method for obtaining the image of the target text region can be various, taking four vertex coordinates as an example in the form of position range information of the target text region, calculating to obtain rectangular range information surrounding the target text region according to the four vertex coordinates of the target text region, and then obtaining pixel values of pixels in the rectangular range to obtain the image of the target text region.

The text unit recognition model may be pre-trained so that the model can recognize text units in the image. The playing terminal may input the image of the target text area into the text unit recognition model, and if the image of the target text area includes text units, the model may output position range information (i.e., position range information of the text units) of the image area of each text unit in the image of the target text area, where the position range information may have various possible forms, for example, four vertex coordinates of the image area of the text unit, or a lower left corner vertex coordinate and a wide and high form of the image area of the text unit. If the image of the target text area does not include text content units, the model may output corresponding indication information. The literal content unit recognition model may be a machine learning model, such as a convolutional recurrent neural network, or the like.

1602, the playback terminal determines the resolution of the image of the text unit in the target text area based on the position range information of the text unit in the target text area.

Taking the example that the position range information is the left lower corner vertex coordinates, width and height of the image area of the text unit, the resolution of the image of the text unit may be expressed as the width of the text unit.

1603, the playback terminal determines an image of the text content unit in the target text region based on the position range information of the text content unit in the target text region.

In implementations, an image of a text unit may be acquired from an image of a target text region based on location range information of the text unit. The manner of acquisition is similar to the process of acquiring the image of the target text region in the presentation image based on the position range information of the target text region in step 1601, and the description thereof in step 1601 may be referred to.

1604, the playback terminal determines the number of characters included in the text content unit in the target text region based on the image of the text content unit in the target text region.

In implementation, gray level binarization processing may be performed on the image of the text unit, so as to obtain a binarized pixel matrix corresponding to the image of the text unit (the pixel values of all pixels in the matrix are 255 or 0). Then, a characterization value corresponding to each pixel column in the binarized pixel matrix may be calculated, where the characterization value is used to indicate whether a pixel point with a pixel value of 255 exists in the corresponding pixel column. The calculation process of the characterization value can be as follows: summing pixel values of all pixels in one pixel column, and if the value obtained by summation is 0, the characterization value corresponding to the pixel column is 0, which indicates that no pixel point with the pixel value of 255 exists in the pixel column; if the sum is not 0, the corresponding characterization value of the pixel column is 1, which indicates that the pixel point with the pixel value of 255 exists in the pixel column. The pixel column with the characterization value of 1 is simply referred to as a "1" pixel column, and the pixel column with the characterization value of 0 is simply referred to as a "0" pixel column.

The playback terminal may determine positional information of each pixel column, which may be positional coordinates of the pixel column in the horizontal direction. The playback terminal may then determine one or more sets of pixel columns in the binarized pixel matrix based on the corresponding characterization value for each pixel column in the binarized pixel matrix. For any pixel column set, the specified condition is satisfied between any two 1 pixel columns in the pixel column set, and the specified condition is not satisfied between any 1 pixel column in the pixel column set and any 1 pixel column outside the pixel column set. The specified conditions may be: there are no consecutive multiple "0" pixel columns between two "1" pixel columns, or there are consecutive multiple "0" pixel columns between two "1" pixel columns, and the number of "0" pixel columns in the consecutive multiple "0" pixel columns is smaller than a preset threshold, which may be set based on a general font size and image resolution. Referring to fig. 17, it can be considered that each pixel column set corresponds to one character, and the number of pixel column sets included in the text content unit is the number of characters included in the text content unit. The above description may be applied to a case where the font color is darker relative to the background color, and for a case where the font color is lighter relative to the background color, a similar process to the above description may be used to determine the number of characters in the text content unit.

1605, the playback terminal determines the resolution of the image of the character included in the text content unit in the target text area based on the resolution of the image of the text content unit in the target text area and the number of characters included in the text content unit in the target text area.

In implementations, the number of pixels that the image of the text unit includes in the horizontal direction (i.e., the width of the image of the text unit) and the number of pixels that the image of the text unit includes in the vertical direction (i.e., the height of the image of the text unit) can be determined based on the resolution of the image of the text unit. The wide value of the image of the text unit may then be divided by the number of characters in the text unit to obtain a wide value for each character (i.e., the number of pixels the image of each character includes in the horizontal direction). Since each text unit is a portion including one line of characters in the target text area, the height of the image of the text unit can be regarded as the height of each character (i.e., the number of pixel points included in the image of each character in the horizontal direction).

The explanation about the magnification condition.

Several possible magnification conditions are listed below:

amplification condition one

And determining that the proportion of the characters with the expected display size smaller than the effective watching size in the target text area in all the characters in the target text area is larger than a proportion threshold value.

The playback terminal may determine the expected display size of all characters in the target text region in a manner similar to the corresponding process in step 902, and reference may be made to the description associated with step 902. Then, the playing terminal may compare the predicted display size of each character in the target text area with the effective viewing size, determine the number of characters whose predicted display size is smaller than the effective viewing size, and further determine, according to the number of characters in the target text area, the proportion of the characters whose predicted display size is smaller than the effective viewing size in all the characters in the target text area.

The playing terminal can acquire a pre-stored proportion threshold value, the proportion threshold value can be set manually, and the numerical value of the proportion threshold value can be possible.

Second amplification condition

And determining that the proportion of the characters with the expected display size smaller than the effective watching size in the target text area in all the characters in the target text area is larger than a proportion threshold value, and the proportion of the characters with the expected display size smaller than the effective watching size in all the characters in the history target text area in each history demonstration image subjected to image preprocessing in the history duration is larger than the proportion threshold value.

The playback terminal may record intermediate data and result data generated in the image preprocessing process for each image-preprocessed history presentation image within the history period, for example, positional range information of a target text region of the image-preprocessed history presentation image, an estimated display size of characters in the target text region of the image-preprocessed history presentation image, the number of characters in the target text region of the image-preprocessed history presentation image, an effective viewing size of characters of the image-preprocessed history presentation image, and the like.

The play terminal may compare the predicted display size of each character in the history target text region of the history demonstration image subjected to the image preprocessing with the effective viewing size, determine the number of characters whose predicted display size is smaller than the effective viewing size, and further determine the proportion of the characters whose predicted display size is smaller than the effective viewing size in all the characters in the history target text region of the history demonstration image subjected to the image preprocessing according to the number of characters in the history target text region of the history demonstration image subjected to the image preprocessing.

Amplifying condition III

And determining that the predicted display size of the smallest character in the target text region on the playing terminal is smaller than the effective watching size.

The playback terminal may determine the expected display size of all characters in the target text region in a manner similar to the corresponding process in step 902, and reference may be made to the description associated with step 902. The playback terminal may compare the predicted display sizes of the characters and determine the predicted display size of the character having the smallest predicted display size (i.e., the smallest character). Then, the playback terminal may compare the predicted display size of the minimum character with the effective viewing size, and determine that the predicted display size of the minimum character is smaller than the effective viewing size.

Fourth amplification condition

And determining that the predicted display size of the minimum character in the target text region on the playing terminal is smaller than the effective watching size, and the predicted display size of the minimum character in the historical target text region in each image-preprocessed historical demonstration image in the historical duration is smaller than the effective watching size.

The description will be made regarding the enlargement processing of the image.

The process of enlarging the image may include the following steps as shown in fig. 18.

1801, the playback terminal determines a first magnification ratio based on the predicted display size and the effective viewing size of each character in the target text region on the playback terminal.

The playing terminal can obtain the effective watching size and the expected display size of the smallest character in the target text area on the playing terminal, the width of the effective display size is divided by the width of the expected display size to obtain a first quotient, and the height of the effective display size is divided by the height of the expected display size to obtain a second quotient. The magnitude of the first and second quotient can then be compared to determine the larger quotient as the first magnification ratio.

Alternatively, the playback terminal may also determine the first magnification ratio based on the predicted display size and the effective viewing size of the next smallest character (or the third smallest character, etc.) in the target text region on the playback terminal.

1802, the playback terminal determines a first magnification resolution corresponding to the resolution of the image of the target text region based on the first magnification ratio.

In implementation, the playing terminal may multiply the width and height of the resolution of the image of the target text region by the first magnification ratio, to obtain a first magnification resolution corresponding to the resolution of the image of the target text region.

1803, the playing terminal determines whether the image of the target text area exceeds the boundary of the demonstration image after amplifying the image by a first amplifying proportion based on the first amplifying resolution and the resolution of the demonstration image.

The following description will be made in different cases:

in the first case, the center of the image of the target character region may be calculated and determined based on the position range information of the target character region. The playback terminal may use the center of the image of the target text area as the center of the image of the enlarged target text area (hereinafter referred to as target placement center), as shown in fig. 19. The playing terminal can respectively calculate and determine the distance between the target placement center and the two wide sides of the demonstration image, and determine the distance with smaller value as the wide boundary distance D ₁ . Similarly, the playing terminal can respectively calculate and determine the distance between the target placement center and the two high edges of the demonstration image, and determine the distance with smaller value as the high boundary distance D ₂ . The width of the first amplification resolution is denoted as L ₁ The height is denoted as L ₂ If any of the following formulas is satisfied:

the first magnification ratio of the image of the target text region can be determined to exceed the boundary of the presentation image, and if the two formulas are not satisfied, the first magnification ratio of the image of the target text region can be determined to not exceed the boundary of the presentation image.

In the second case, the center of the presentation image may be used as the target placement center, and the width of the enlarged image of the target text region may be represented as W ₁ The height is denoted as H ₁ The width of the presentation image is denoted as W ₂ The height is denoted as H ₂ If any of the following formulas is satisfied:

W ₁ -W ₂ ＞0

H ₁ -H ₂ ＞0

1804, the playing terminal performs corresponding image amplification processing according to the judging result.

(1) If the image of the target text area is enlarged by a first enlargement ratio and does not exceed the boundary of the demonstration image, the corresponding image enlargement processing mode has the following possibility:

treatment mode one

In the presentation image, the image of the target text region is enlarged based on the first enlargement ratio.

The playing terminal can multiply the width and the height of the image of the target text area by the first amplification ratio respectively to obtain the width and the height of the image of the amplified target text area. In the image amplification process, pixel interpolation processing can be performed by using a pixel interpolation algorithm, a bilinear interpolation algorithm, a bicubic interpolation algorithm, a fractal algorithm and other algorithms. The center of the enlarged image of the target text area (i.e., the target placement center) may coincide with the center of the image of the target text area or with the center of the presentation image.

For example, as shown in fig. 20, the resolution of the presentation image is 400×200, the resolution of the image of the target text region is 200×100, the first magnification ratio is 1.5, and the image of the target text region is magnified according to the first magnification ratio, thereby obtaining the magnified image of the target text region. The resolution of the enlarged image of the target text region is 300×150, and the center of the target placement coincides with the center of the image of the target text region in the presentation image.

Optionally, for a case that a plurality of images of the target text area exist in the demonstration image, the playing terminal may determine whether a distance condition is satisfied between the plurality of images of the enlarged target text area, and if the plurality of images of the enlarged target text area in the demonstration image satisfy the distance condition, the image of the target text area is enlarged based on a first enlargement ratio corresponding to the target text area in the demonstration image. If the images of the target text areas in the demonstration image exist in the images of the target text areas which do not meet the spacing condition, the playing terminal can send out manual amplification prompt information to the demonstration terminal.

The playing terminal can determine the position coordinates (x, y) of the center of the image of each target text region in the demonstration image, and the resolution of the images of any two target text regions after amplification is respectively expressed as W ₃ *H ₃ 、W ₄ *H ₄ The position coordinates of the center thereof are respectively expressed as (x ₁ ，y ₁ )、(x ₂ ，y ₂ ) If any of the following formulas is satisfied:

In the presentation image, the process of enlarging the image of the target text region based on the first enlargement ratio corresponding to the target text region is similar to the process of the first processing mode, and the description of the first processing mode can be referred to. The difference is that the center of the target placement coincides with the center of the image of the target text region in the presentation image.

Treatment mode II

In the presentation image, the image of the target text region is enlarged based on the resolution of the presentation image.

The playback terminal may determine a second magnification ratio required to magnify the width of the image of the target text region to the width of the presentation image, and may also determine a third magnification ratio required to magnify the height of the image of the target text region to the height of the presentation image. Then, the magnitudes of the values of the second amplification degree and the third amplification degree may be compared, and the amplification degree in which the magnitude is smaller is determined as the fourth amplification degree.

Alternatively, for the case where the aspect ratio of the target text region is the same as that of the presentation image, the playback terminal may directly determine that the second magnification is the fourth magnification, or directly determine that the third magnification is the fourth magnification.

The playing terminal can multiply the width and the height of the image of the target text area by the fourth amplification ratio respectively to obtain the width and the height of the image of the amplified target text area. In the image amplification process, pixel interpolation processing can be performed by using a pixel interpolation algorithm, a bilinear interpolation algorithm, a bicubic interpolation algorithm, a fractal algorithm and other algorithms. The center of the image of the enlarged target text region (i.e., the target placement center) may coincide with the center of the presentation image.

For example, as shown in fig. 21, the resolution of the presentation image is 400×200, the resolution of the image of the target text region is 150×100, the fourth magnification ratio is 2, and the image of the target text region is magnified according to the target magnification ratio, thereby obtaining the magnified image of the target text region. The resolution of the image of the amplified target text region is 300×200, and the target placement center coincides with the center of the presentation image.

Treatment mode III

And the playing terminal displays the image of the target text area.

The playing terminal can display the image of the target text area, and the display mode can be the display mode of the adaptive terminal. After the image of the target text area is adapted, the image can be displayed on the screen of the playing terminal in a full screen mode.

For example, the resolution of the screen of the playing terminal is 800×400, the resolution of the image of the target text area is 200×100, and after the image of the target text area is adapted, the image of the target text area can be displayed on the screen of the playing terminal in a full screen.

(2) If the image of the target text area exceeds the boundary of the demonstration image after being amplified by the first amplifying proportion, the corresponding image amplifying processing mode has the following possibility:

treatment mode IV

And determining a second amplification resolution corresponding to the resolution of the screen of the playing terminal based on the first amplification resolution, the resolution of the demonstration image and the resolution of the screen of the playing terminal, and amplifying the image of the target text region based on the second amplification resolution.

In an implementation, the playing terminal may determine a fifth magnification ratio based on the first magnification resolution and the resolution of the presentation image, determine a second magnification resolution corresponding to the resolution of the screen of the playing terminal based on the resolution of the screen of the playing terminal and the fifth magnification ratio, amplify the image of the target text region based on the second magnification resolution, and set the display mode of the amplified image to the original resolution display mode.

The playing terminal may divide the width of the first amplified resolution by the width of the presentation image to obtain a first quotient, and divide the height of the first amplified resolution by the height of the presentation image to obtain a second quotient. The magnitude of the first quotient and the magnitude of the second quotient can be compared, and the quotient with the larger magnitude is determined as a fifth amplification ratio. The playing terminal may multiply the width and the height of the resolution of the screen of the playing terminal by the fifth amplification ratio, respectively, to obtain a second amplification resolution corresponding to the resolution of the screen of the playing terminal.

Then, the playback terminal may determine a sixth magnification ratio required to magnify the width of the image of the target text region to the width of the second magnification resolution, and may also determine a seventh magnification ratio required to magnify the image of the target text region to the height of the second magnification resolution. Then, the magnitudes of the values of the sixth amplification degree and the seventh amplification degree may be compared, and the amplification degree in which the magnitude is smaller is determined as the eighth amplification degree.

The playing terminal can multiply the width and the height of the image of the target text area by the eighth amplification ratio in the demonstration image to obtain the width and the height of the image of the amplified target text area. In the image amplification process, pixel interpolation processing can be performed by using a pixel interpolation algorithm, a bilinear interpolation algorithm, a bicubic interpolation algorithm, a fractal algorithm and other algorithms. The center of the image of the enlarged target text region (i.e., the target placement center) may coincide with the center of the presentation image.

For example, as shown in fig. 23, the first magnification resolution is 600×300, the resolution of the presentation image is 400×200, the resolution of the image of the target text region is 200×100, the resolution of the screen of the playback terminal is 800×400, the fifth magnification ratio is calculated to be 1.5, the second magnification resolution is 1200×1.5×400×1.5, the eighth magnification ratio is calculated to be 6, and the resolution of the image of the amplified target text region is 1200×600.

The display mode of the playing terminal for the amplified image can be an original resolution display mode, and the playing terminal displays the amplified image on a screen according to the resolution of the amplified image. For example, taking the enlarged image shown in fig. 23 as an example, the resolution of the screen of the playing terminal is 800×400, the resolution of the enlarged image is 1200×600, the screen of the playing terminal displays a local area of the enlarged image, and a scroll bar is added in the display area, so that a viewer can view other local areas of the enlarged image by sliding the scroll bar. The playback terminal may set the display mode for displaying the enlarged image to the original resolution display mode.

Alternatively, the playback terminal may display only the image of the enlarged target text region, and not the presentation image.

Fifth treatment mode

And sending out manual amplification prompt information.

The playing terminal can send out manual amplifying prompt information to the demonstration terminal. The demonstration terminal can display a popup window for manually amplifying the prompt information, and a demonstrator can operate the demonstration terminal to adjust the word size of the words through reminding of manually amplifying the prompt information.

The above-mentioned processing flow is described by taking the playing terminal as an example, and the executing subject may be a presentation terminal or a server in addition to the playing terminal. The process flow of displaying the shared content by the presentation terminal or the server is similar to the process flow of displaying the shared content by the play terminal, and is different in that before determining the predicted display size of each character in the target text region on the play terminal, the presentation terminal or the server needs to acquire the screen size parameter of the play terminal. After the amplifying processing is performed by the demonstration terminal, the amplified demonstration image and/or the non-amplified demonstration image can be sent to the server, and then forwarded to the target playing terminal by the server. After the server performs the amplification processing, the amplified presentation image and/or the non-amplified presentation image may be sent to the target playing terminal.

Based on the same technical concept, the embodiment of the present application further provides an apparatus for displaying shared content, where the apparatus may be applied to a conference system, as shown in fig. 24, and the apparatus includes: a determining module 2401, configured to determine an expected display size of characters in a target text region of the shared content shared by the first terminal on the second terminal; an effective viewing size is determined. The determined functions in steps 901-905, 1001-1004, 1201-1204, 1401-1404, 1601-1605, 1801-1803 described above, as well as other implicit steps, may be implemented in particular.

A processing module 2402 for magnifying the shared content based on the predicted display size and the effective viewing size, and displaying the magnified shared content on the second terminal. The processing functions in steps 905-906, 1004-1005, 1204-1205, 1404-1405, 1804 described above, as well as other implicit steps, may be implemented in particular.

In one possible implementation, the determining module 2401 is further configured to determine a target text region of the shared content based on a region identification model, where the target text region is a text region of a target application program.

In one possible implementation, the determining module 2401 is further configured to determine location range information of characters included in the shared content; and determining a target text area comprising the character in the shared content based on the position range information.

In a possible implementation manner, the determining module 2401 is configured to determine, based on the location range information, a plurality of text regions, where a first condition is satisfied between characters in the text regions, and the first condition is not satisfied between characters in the different text regions, where the first condition is that a distance between characters is less than a distance threshold; one or more of the plurality of text regions is determined as a target text region.

In one possible implementation manner, the determining module 2401 is configured to determine a plurality of text regions based on the location range information, where a first condition is satisfied between characters in the text regions, and the first condition is not satisfied between characters in the different text regions, where the first condition is that a distance between characters is less than a distance threshold; and determining the text area with the highest cursor stay frequency or the longest cursor stay time as a target text area.

In one possible implementation, the determining module 2401 is configured to determine a resolution of an image of each character in a target text region of the shared content; and determining the expected display size of the character on the second terminal based on the resolution of the image of the character, the resolution of the shared content and the screen size of the second terminal.

In one possible implementation, the determining module 2401 is configured to determine a distance between the second terminal and at least one viewer; an effective viewing size is determined based on a distance between the second terminal and at least one viewer.

In one possible implementation, the determining module 2401 is further configured to determine that a proportion of characters in the target text area, where the characters are expected to be displayed in a size smaller than the effective viewing size, in all characters in the target text area is greater than a proportion threshold; alternatively, it is determined that the projected display size of the characters in the target text area on the second terminal is less than the effective viewing size.

In one possible implementation, the processing module 2402 is configured to determine a first magnification based on the projected display size and the effective viewing size; and amplifying the image of the target text area in the shared content based on the first amplification ratio or the resolution of the shared content.

In one possible implementation, the processing module 2402 is configured to determine a first magnification based on the projected display size and the effective viewing size; determining a first amplification resolution corresponding to the resolution of the image of the target text region based on the first amplification proportion; determining a second amplified resolution based on the first amplified resolution, the resolution of the shared content, and the resolution of the second terminal screen; and amplifying the image of the target text area in the shared content based on the second amplification resolution, and setting a display mode of the amplified image to be a display mode for maintaining the image resolution.

It should be noted that, the determining module 2401 and the processing module 2402 may be implemented by a processor, or implemented by a processor in combination with a memory and a transceiver.

It should be noted that: the apparatus for displaying shared content provided in the above embodiment is only exemplified by the above-mentioned division of each functional module when executing the display of shared content, and in practical application, the above-mentioned functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. In addition, the apparatus for displaying the shared content provided in the above embodiment and the method embodiment for displaying the shared content belong to the same concept, and the specific implementation process is detailed in the method embodiment, which is not repeated here.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof, and when implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions that, when loaded and executed on a device, produce, in whole or in part, a process or function in accordance with embodiments of the present application. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a device or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape, etc.), an optical medium (such as a digital video disk (digital video disk, DVD), etc.), or a semiconductor medium (such as a solid state disk, etc.).

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description is provided for the purpose of illustrating an exemplary embodiment of the present application and is not intended to limit the scope of the present application, but is intended to cover any adaptations, alternatives, modifications, and variations that may include within the scope of the present application.

Claims

1. A method for displaying shared content, applied to a conference system, the conference system including a first terminal and a second terminal, the method comprising:

determining the expected display size of characters in a target text region of the shared content shared by the first terminal on the second terminal;

determining an effective viewing size;

and enlarging the shared content based on the predicted display size and the effective viewing size, and displaying the enlarged shared content on the second terminal.

2. The method of claim 1, wherein prior to determining the projected display size, further comprising:

And determining a target text region of the shared content based on a region identification model, wherein the target text region is a text region of a target application program.

3. The method of claim 1, wherein prior to determining the projected display size, further comprising:

determining position range information of characters included in the shared content;

and determining a target text area comprising the character in the shared content based on the position range information.

4. The method of claim 3, wherein the determining a target text region in the shared content that includes the character based on the location range information comprises:

determining a plurality of character areas based on the position range information, wherein the characters in the character areas meet a first condition, the characters in the different character areas do not meet the first condition, and the first condition is that the distance between the characters is smaller than a distance threshold value;

one or more of the plurality of text regions is determined as a target text region.

5. The method according to claim 3 or 4, wherein the determining a target text area including the character in the shared content based on the location range information includes:

and determining the text area with the highest cursor stay frequency or the longest cursor stay time as a target text area.

6. The method of any of claims 1-5, wherein determining the projected display size of the character in the target text region of the shared content on the second terminal comprises:

determining the resolution of the image of each character in the target text region of the shared content;

and determining the expected display size of the character on the second terminal based on the resolution of the image of the character, the resolution of the shared content and the screen size of the second terminal.

7. The method of any of claims 1-6, wherein determining the effective viewing size comprises:

determining a distance between the second terminal and at least one viewer;

an effective viewing size is determined based on a distance between the second terminal and at least one viewer.

8. The method of any of claims 1-7, wherein prior to the amplifying the shared content, further comprising:

determining that the proportion of the characters with the expected display size smaller than the effective watching size in the target text area in all the characters in the target text area is larger than a proportion threshold; or alternatively, the process may be performed,

and determining that the predicted display size of the characters in the target text area on the second terminal is smaller than the effective viewing size.

9. The method of any of claims 1-8, wherein the magnifying the shared content based on the projected display size and the effective viewing size comprises:

determining a first magnification scale based on the projected display size and the effective viewing size;

and amplifying the image of the target text area in the shared content based on the first amplification ratio or the resolution of the shared content.

10. The method of any of claims 1-9, wherein the magnifying the shared content based on the projected display size and the effective viewing size comprises:

Determining a first amplification resolution corresponding to the resolution of the image of the target text region based on the first amplification proportion;

determining a second amplified resolution based on the first amplified resolution, the resolution of the shared content, and the resolution of the second terminal screen;

and amplifying the image of the target text area in the shared content based on the second amplification resolution, and setting a display mode of the amplified image to be a display mode for maintaining the image resolution.

11. An apparatus for displaying shared content, applied to a conference system, the conference system including a first terminal and a second terminal, the apparatus comprising:

a determining module, configured to determine an expected display size of characters in a target text region of the shared content shared by the first terminal on the second terminal; determining an effective viewing size;

and a processing module for magnifying the shared content based on the predicted display size and the effective viewing size, and displaying the magnified shared content on the second terminal.

12. The apparatus of claim 11, wherein the means for determining is further configured to:

13. The apparatus of claim 11, wherein the means for determining is further configured to:

14. The apparatus of claim 13, wherein the determining module is configured to:

15. The apparatus of claim 13, wherein the determining module is configured to:

16. The apparatus according to any one of claims 11-15, wherein the determining module is configured to:

17. The apparatus according to any one of claims 11-16, wherein the determining module is configured to:

determining a distance between the second terminal and at least one viewer;

18. The apparatus of any one of claims 11-17, wherein the determining module is further configured to:

19. The apparatus of any one of claims 11-18, wherein the processing module is configured to:

20. The apparatus of any one of claims 11-19, wherein the processing module is configured to:

21. A computer device comprising a memory and a processor, the memory for storing computer instructions; the processor is configured to execute the computer instructions stored in the memory to cause the computer device to perform the method of any one of the preceding claims 1 to 10.

22. A computer readable storage medium, characterized in that the computer readable storage medium stores computer program code which, when executed by a computer device, performs the method of any of the preceding claims 1 to 10.

23. A computer program product, characterized in that the computer program product comprises computer program code which, when executed by a computer device, performs the method of any of the preceding claims 1 to 10.