CN116582440A

CN116582440A - Visual image transmission processing method and device, electronic equipment and storage medium

Info

Publication number: CN116582440A
Application number: CN202310286884.0A
Authority: CN
Inventors: 江昌林; 段锴; 袁玉洁; 段秋梅
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-03-22
Filing date: 2023-03-22
Publication date: 2023-08-11

Abstract

The invention discloses a processing method and a device thereof for visual image transmission, electronic equipment and a storage medium, and relates to the field of financial science and technology or other related fields, wherein the processing method comprises the following steps: and receiving the image information file and the attention information file, determining the current throughput rate, carrying out degradation processing on the rectangular blocks indicated by the vector values smaller than the preset threshold based on the play segment vector matrix under the condition that the current throughput rate is smaller than the rectangular block code rate, obtaining a processing result, and downloading all the rectangular blocks of the play segment based on the processing result and the storage path. The invention solves the technical problem that the image quality of the user concerned area can not be effectively controlled according to the network condition in the related technology.

Description

Visual image transmission processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of financial science and technology, and in particular, to a processing method and apparatus for visual image transmission, an electronic device, and a storage medium.

Background

Currently, a dynamic adaptive streaming media transmission protocol (Dynamic Adaptive Streaming over HTTP, DASH for short) based on HTTP (Hyper Text Transfer Protocol, i.e. hypertext transmission protocol) is one of the most effective solutions for ensuring high quality and reliable transmission of streaming media, which can solve the problems of limited bandwidth and the like encountered in the streaming media transmission process, when the network bandwidth capability is insufficient, the bit rate of video is reduced, so that the number of occurrence of a click event in the video playing process is reduced, and when the network bandwidth is better, the bit rate of the transmitted video is increased, thereby ensuring the reliability of data transmission and high quality user experience.

However, although the dynamic adaptive streaming media transmission protocol based on HTTP can push a suitable image quality according to the network condition, when the network condition is not good, the overall video quality will be reduced, so that the quality of the area desired to be watched will be reduced, which not only wastes the bandwidth of the network, but also reduces the watching experience of the user.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a processing method and a device for visual image transmission, electronic equipment and a storage medium, which at least solve the technical problem that the image quality of a user concerned area cannot be effectively controlled according to the network condition in the related technology.

According to an aspect of an embodiment of the present invention, there is provided a processing method for visual image transmission, applied to a client, including: receiving an image information file and a focus information file, wherein the image information file at least comprises: the storage path of the rectangular block and the code rate of the rectangular block, and the attention information file at least comprises: playing a segment vector matrix, wherein the rectangular block is obtained by encoding an image by a server; determining a current throughput rate; under the condition that the current throughput rate is smaller than the rectangular block code rate, performing degradation processing on the rectangular block indicated by the vector value smaller than a preset threshold based on the play segment vector matrix to obtain a processing result; and downloading all rectangular blocks of the playing fragment based on the processing result and the storage path.

Optionally, before receiving the image information file and the attention information file, the method further includes: the server receives an image transmission request sent by the client, wherein the image transmission request at least comprises: an image identifier to be transmitted; the server determines a target image corresponding to the image identifier to be transmitted; the server encodes the target image to obtain an encoded image and the image information file; and the server obtains the attention information file based on all the rectangular blocks of each playing fragment.

Optionally, the step of encoding the target image by the server includes: the server divides each image frame in the target image into a plurality of rectangular blocks; the server divides the target image into a plurality of playing fragments, wherein each playing fragment corresponds to a plurality of rectangular blocks; the server encodes each rectangular block in the playing fragment into a plurality of image blocks, wherein the hierarchy type of the image blocks comprises: a base layer for decoding an image with a minimum quality, and an enhancement layer for enhancing the quality of the image.

Optionally, the step of obtaining the attention information file by the server based on all the rectangular blocks of each playing segment includes: the server inputs all the rectangular blocks of the playing fragment to a preset attention model to obtain attention values of each rectangular block; the server generates the play segment vector matrix based on all the attention values, wherein each vector value in the play segment vector matrix corresponds to the attention value one by one; and the server generates the attention information file based on the play segment vector matrix.

Optionally, before the server inputs all the rectangular blocks of the play segment into a preset attention model, the method further includes: the server acquires a plurality of marked rectangular block data to obtain training data; the server side adopts the training data to train an initial attention model, and obtains the preset attention model under the condition that training is completed.

Optionally, before determining the current throughput rate, the method further includes: downloading a base layer image block and an enhancement layer image block of a preset rectangular block in a preset playing fragment; downloading the base layer image blocks and the enhancement layer image blocks of the residual rectangular blocks in the preset playing segment under the condition that the length of a preset storage area at the current downloading time is greater than or equal to the length of a rectangular block buffer area, and updating the current throughput rate, wherein the length of the rectangular block buffer area is the sum of the length of the base layer buffer area and the length of the enhancement layer buffer area of all the residual rectangular blocks in the preset playing segment, and the residual rectangular blocks are rectangular blocks except the preset rectangular blocks in the preset playing segment; and downloading the base layer image blocks of the residual rectangular blocks in the preset playing fragments and updating the current throughput rate under the condition that the length of the preset storage area at the current downloading time is smaller than the length of the rectangular block buffer area and is larger than or equal to the length of the base layer buffer area of all the residual rectangular blocks.

Optionally, based on the processing result and the storage path, the step of downloading all the rectangular blocks of the play segment includes: determining a downloading vector value of each rectangular block based on a target vector matrix in the processing result; downloading the base layer image block of the rectangular block according to the storage path under the condition that the download vector value is a first preset value; and downloading the base layer image block and the enhancement layer image block of the rectangular block according to the storage path under the condition that the download vector value is a second preset value.

Optionally, after downloading all the rectangular blocks of the playing fragment based on the processing result and the storage path, the method further includes: before playing a target playing fragment, checking whether a base layer image block of the target playing fragment exists or not to obtain a first checking result; checking whether the enhancement layer image block of the target playing fragment exists or not under the condition that the first checking result indicates that the base layer image block of the target playing fragment exists, and obtaining a second checking result; and decoding each rectangular block in the target playing fragment when the second checking result indicates that the enhancement layer image block of the target playing fragment exists, and playing the target playing fragment when decoding is completed.

Optionally, after checking whether the base layer image block of the target play segment exists, the method further includes: suspending playing the target playing fragment under the condition that the first checking result indicates that the base layer image block of the target playing fragment does not exist; and downloading the base layer image block of the target playing fragment.

According to another aspect of the embodiment of the present invention, there is also provided a processing apparatus for transmitting a visual image, applied to a client, including: the receiving unit is used for receiving the image information file and the attention information file, wherein the image information file at least comprises: the storage path of the rectangular block and the code rate of the rectangular block, and the attention information file at least comprises: playing a segment vector matrix, wherein the rectangular block is obtained by encoding an image by a server; a determining unit for determining a current throughput rate; the processing unit is used for carrying out degradation processing on the rectangular block indicated by the vector value smaller than a preset threshold value based on the play segment vector matrix under the condition that the current throughput rate is smaller than the rectangular block code rate to obtain a processing result; and the downloading unit is used for downloading all rectangular blocks of the playing fragment based on the processing result and the storage path.

Optionally, the processing device further includes: the first receiving module is configured to, before receiving the image information file and the attention information file, receive, by the server, an image transmission request sent by the client, where the image transmission request at least includes: an image identifier to be transmitted; the first determining module is used for determining a target image corresponding to the image identifier to be transmitted by the server; the first coding module is used for coding the target image by the server to obtain a coded image and the image information file; and the first output module is used for the server to obtain the attention information file based on all the rectangular blocks of each playing fragment.

Optionally, the first determining module includes: the first dividing sub-module is used for controlling the server to divide each image frame in the target image into a plurality of rectangular blocks; the second dividing sub-module is used for controlling the server to divide the target image into a plurality of playing fragments, wherein each playing fragment corresponds to a plurality of rectangular blocks; the first encoding submodule is used for controlling the server to encode each rectangular block in the playing fragment into a plurality of image blocks, wherein the hierarchy type of the image blocks comprises: a base layer for decoding an image with a minimum quality, and an enhancement layer for enhancing the quality of the image.

Optionally, the first output module includes: the first input sub-module is used for controlling the server to input all the rectangular blocks of the playing fragment into a preset attention model to obtain attention values of each rectangular block; the first generation sub-module is used for controlling the server to generate the play segment vector matrix based on all the attention values, wherein each vector value in the play segment vector matrix corresponds to the attention value one by one; and the second generation sub-module is used for controlling the server to generate the attention information file based on the play segment vector matrix.

Optionally, the processing device further includes: the first acquisition module is used for controlling the server to acquire a plurality of marked rectangular block data before the server inputs all the rectangular blocks of the playing fragment to a preset attention model to obtain training data; the first training module is used for controlling the server to train the initial attention model by adopting the training data, and obtaining the preset attention model under the condition that training is completed.

Optionally, the processing device further includes: the first downloading module is used for downloading the base layer image block and the enhancement layer image block of the preset rectangular block in the preset playing segment before determining the current throughput rate; the second downloading module is used for downloading the base layer image block and the enhancement layer image block of the residual rectangular blocks in the preset playing segment under the condition that the length of a preset storage area at the current downloading time is greater than or equal to the length of a rectangular block buffer area, and updating the current throughput rate, wherein the length of the rectangular block buffer area is the sum of the length of the base layer buffer area and the length of the enhancement layer buffer area of all the residual rectangular blocks in the preset playing segment, and the residual rectangular blocks are rectangular blocks except the preset rectangular blocks in the preset playing segment; and the third downloading module is used for downloading the base layer image blocks of the residual rectangular blocks in the preset playing segment and updating the current throughput rate under the condition that the length of the preset storage area at the current downloading time is smaller than the length of the rectangular block buffer area and is larger than or equal to the length of the base layer buffer area of all the residual rectangular blocks.

Optionally, the downloading unit includes: the second determining module is used for determining a downloading vector value of each rectangular block based on a target vector matrix in the processing result; a fourth downloading module, configured to download, according to the storage path, the base layer image block of the rectangular block when the download vector value is a first preset value; and a fifth downloading module, configured to download the base layer image block and the enhancement layer image block of the rectangular block according to the storage path when the download vector value is a second preset value.

Optionally, the processing device further includes: the first checking module is used for checking whether the base layer image block of the target playing fragment exists or not before the target playing fragment is played after downloading all rectangular blocks of the playing fragment based on the processing result and the storage path, and obtaining a first checking result; the second checking module is used for checking whether the enhancement layer image block of the target playing fragment exists or not to obtain a second checking result when the first checking result indicates that the base layer image block of the target playing fragment exists; and the first decoding module is used for decoding each rectangular block in the target playing fragment when the second checking result indicates that the enhancement layer image block of the target playing fragment exists, and playing the target playing fragment when the decoding is completed.

Optionally, the processing device further includes: the first pause module is used for pausing playing the target playing fragment under the condition that the first checking result indicates that the base layer image block of the target playing fragment does not exist after checking whether the base layer image block of the target playing fragment exists or not to obtain a first checking result; and a sixth downloading module, configured to download the base layer image block of the target playing segment.

According to another aspect of the embodiment of the present invention, there is further provided a computer readable storage medium, where the computer readable storage medium includes a stored computer program, and when the computer program runs, the processing method for controlling the device where the computer readable storage medium is located to execute the above-mentioned visual image transmission is provided.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the processing method for visual image transmission described above.

In the method, an image information file and an attention information file are received, the current throughput rate is determined, under the condition that the current throughput rate is smaller than a rectangular block code rate, a rectangular block indicated by a vector value smaller than a preset threshold value is subjected to degradation processing based on a play segment vector matrix, a processing result is obtained, and all rectangular blocks of a play segment are downloaded based on the processing result and a storage path. In the disclosure, a client may determine a current throughput rate first, then compare the current throughput rate with a rectangular block code rate in an image information file sent by a server, if the current throughput rate is smaller than the rectangular block code rate, then, degradation processing is required to be performed on a user non-attention area (i.e., a rectangular block indicated by a vector value smaller than a preset threshold) according to a play segment vector matrix in an attention information file sent by the server, and then, according to a processing result and a storage path of the rectangular block in the image information file, all the rectangular blocks of a play segment are downloaded for playing, so that the image quality of the user attention area can be controlled according to a network condition, the user image viewing experience is improved, and further, the technical problem that the image quality of the user attention area cannot be effectively controlled according to the network condition in the related art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of an alternative visual image transmission processing method according to an embodiment of the application;

FIG. 2 is a schematic diagram of an alternative visual image transmission optimization architecture according to an embodiment of the application;

FIG. 3 is a schematic diagram of an alternative Tile encoding result according to an embodiment of the present application;

fig. 4 is a schematic diagram of an alternative SVC encoding result according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an alternative buffer length according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an alternative visual image transmission processing device according to an embodiment of the application;

fig. 7 is a block diagram of a hardware structure of an electronic device (or mobile device) for a processing method for visual image transmission according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

To facilitate an understanding of the invention by those skilled in the art, some terms or nouns involved in the various embodiments of the invention are explained below:

tile coding: the method comprises the steps of dividing a planar video into sub-rectangular blocks with equal sizes according to grids, namely block coding, wherein each sub-rectangular block is called tile, and each tile can be independently decoded and played.

SVC coding: i.e., scalable video coding, encodes video into multiple quality levels with dependencies.

It should be noted that, the processing method and the device for transmitting the visual image in the present disclosure may be used in the field of financial technology for processing the visual image transmission, and may also be used in any field other than the field of financial technology for processing the visual image transmission.

It should be noted that, related information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions, and be provided with corresponding operation entries for the user to select authorization or rejection. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.

The following embodiments of the present invention are applicable to a variety of systems/applications/devices for processing transmission of visual images. The invention provides a method for increasing the image quality of a user attention area according to the network condition, which can overcome the defect of lower viewing experience of a user due to lower resolution of transmitted video under the condition of insufficient network transmission capacity, and improves the viewing experience of the user.

The present invention will be described in detail with reference to the following examples.

Example 1

According to an embodiment of the present invention, there is provided an embodiment of a processing method for visual image transmission, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different from that herein.

Fig. 1 is a flowchart of an alternative processing method for visual image transmission according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S101, receiving an image information file and an attention information file, where the image information file at least includes: the storage path of the rectangular block and the code rate of the rectangular block, and the attention information file at least comprises: and playing the segment vector matrix, wherein the rectangular block is obtained by encoding the image by the server.

Step S102, determining the current throughput rate.

Step S103, under the condition that the current throughput rate is smaller than the code rate of the rectangular block, based on the play segment vector matrix, performing degradation processing on the rectangular block indicated by the vector value smaller than the preset threshold value to obtain a processing result.

Step S104, based on the processing result and the storage path, downloading all rectangular blocks of the playing fragment.

Through the steps, the image information file and the attention information file can be received, the current throughput rate is determined, under the condition that the current throughput rate is smaller than the rectangular block code rate, the rectangular blocks indicated by the vector values smaller than the preset threshold value are subjected to degradation processing based on the play segment vector matrix, processing results are obtained, and all the rectangular blocks of the play segment are downloaded based on the processing results and the storage path. In the embodiment of the invention, the client can firstly determine the current throughput rate, then compare the current throughput rate with the rectangular block code rate in the image information file sent by the server, if the current throughput rate is smaller than the rectangular block code rate, the degradation processing is required to be carried out on the non-attention area (namely, the rectangular block indicated by the vector value smaller than the preset threshold) of the user according to the play fragment vector matrix in the attention information file sent by the server, and then all the rectangular blocks of the play fragment are downloaded for playing according to the processing result and the storage path of the rectangular block in the image information file, so that the image quality of the attention area of the user can be controlled according to the network condition, the image watching experience of the user is improved, and the technical problem that the image quality of the attention area of the user cannot be effectively controlled according to the network condition in the related technology is solved.

Embodiments of the present invention will be described in detail with reference to the following steps. The steps described below are applicable to the client.

According to the embodiment, two systems are respectively designed at the client and the server based on a client-server (C/S) mode and used for pushing proper videos to the client according to network conditions, so that a user can obtain better watching experience.

Fig. 2 is a schematic diagram of an alternative visual image transmission optimization architecture according to an embodiment of the present invention, as shown in fig. 2, a video server (i.e. a server) is connected to a client via a network, and the server completes video encoding (i.e. the server performs video encoding firstAfter tile encoding, a plurality of rectangular blocks are obtained, and then the video is divided into a plurality of playing fragments (seg) ₀ ，...，seg _n-1 ) And then SVC encoding is carried out on each rectangular block in the playing fragment, each rectangular block is encoded into three layers of Video blocks of L0, L1, L2 and the like, then the encoded Video file (namely Video) and a corresponding MPD file (the MPD file is used for describing the related information of streaming media Video slices at a DASH server side) are stored, and a Video attention area file is stored. The main task of the client is to initialize a video request, adjust the quality of a user attention area according to a video attention area file, adjust the quality of a downloaded video file and the video playing quality according to the video downloading throughput change, specifically, receive an MPD file and a video attention area file sent by a server through a network interface, control a self-adaptive logic module to carry out self-adaptive adjustment through a bandwidth prediction module, an attention area selection module and a buffer area state control self-adaptive logic module of a buffer area, call the network interface through an adjustment decision to download a corresponding video block Chunk to the buffer area, and finally use a playing module to play the video.

An alternative embodiment, before receiving the image information file and the attention information file, further includes: the server receives an image transmission request sent by the client, wherein the image transmission request at least comprises: an image identifier to be transmitted; the server determines a target image corresponding to the image identifier to be transmitted; the server encodes the target image to obtain an encoded image and an image information file; the server obtains the attention information file based on all the rectangular blocks of each playing segment.

In the embodiment of the present invention, the server may first receive an image transmission request sent by the client (the image transmission request includes an image identifier to be transmitted (i.e. a unique identifier of an image), then determine, according to the image transmission request, a target image corresponding to the image identifier to be transmitted, and then encode the target image to obtain an encoded image and a corresponding image information file (i.e. an MPD file, which is a file for recording image information generated in the encoding process of the image, for example, image attribute information, an image storage path, etc.), and may further obtain an attention information file according to all rectangular blocks of a play segment, where the attention information file includes a play segment vector matrix corresponding to each play segment (i.e. according to how much content is included in the rectangular block, determine an attention value of the rectangular block, where a higher attention value indicates that the content is more included in the rectangular block and is worth attention by a user).

Optionally, the step of encoding the target image by the server includes: the method comprises the steps that a server divides each image frame in a target image into a plurality of rectangular blocks; the method comprises the steps that a target image is divided into a plurality of playing fragments by a server, wherein each playing fragment corresponds to a plurality of rectangular blocks; the server encodes each rectangular block in the playing fragment into a plurality of image blocks, wherein the hierarchical type of the image blocks comprises: the base layer is used for decoding the image with the lowest quality, and the enhancement layer is used for enhancing the quality of the image.

In the embodiment of the present invention, the server may first encode the target image, specifically: the video frames (i.e., image frames) may be spatially divided into a plurality of equal-sized tiles (i.e., the server divides each image frame in the target image into a plurality of tiles) and numbers the tiles in a left-to-right, top-to-bottom order. In the time dimension, the video may be divided into a plurality of playing segments (segments) with equal time length (i.e., the server divides the target image into a plurality of playing segments, and each playing segment corresponds to a plurality of rectangular blocks). For each tile in each segment, the tile is encoded into multiple video blocks (chunk) in the hierarchical dimension (i.e., the server encodes each rectangular block in the play segment into multiple video blocks), and the hierarchical types of the video blocks (i.e., video blocks) include: a base layer, an enhancement layer, which is necessary for decoding a picture with the lowest quality, and for enhancing the quality of the picture, the enhancement layer can be used only if the base layer is present.

In this embodiment, the chunk is the minimum unit for the client to request video playback.

In this embodiment, for a two-dimensional planar video, M may be divided by longitude, and N may be divided by latitude, so that there are m×n tiles in total. FIG. 3 is a schematic diagram of an alternative Tile encoding result according to an embodiment of the present invention, as shown in FIG. 3, a certain two-dimensional planar video is divided into 6×4 tiles and numbered in a left-to-right, top-to-bottom order, i.e., tile_1 to tile_24.

In this embodiment, there is one mandatory Base Layer (BL) and a plurality of optional enhancement layers (Enhancement Layer, EL) in the encoded stream. Fig. 4 is a schematic diagram of an alternative SVC encoding result according to an embodiment of the present invention, and as shown in fig. 4, video is encoded into 3 layers (i.e., base layer BL0, enhancement layer EL1, enhancement layer EL 2). Wherein BL is an independent unit that can decode video with the lowest quality. EL is used to enhance the quality of video. The video layers are highly interdependent. BL is necessary to decode any video. EL1 can only be used when the corresponding BL already exists. Also, EL2 can be used only in the case where BL and EL1 already exist, or the like.

Optionally, the step of obtaining the attention information file by the server based on all the rectangular blocks of each playing segment includes: the server inputs all rectangular blocks of the playing fragment to a preset attention model to obtain attention values of each rectangular block; the service end generates a play segment vector matrix based on all the attention values, wherein each vector value in the play segment vector matrix corresponds to the attention value one by one; the service end generates a focus information file based on the play segment vector matrix.

In the embodiment of the present invention, the server may generate an attention information file (i.e., a video attention area file), specifically: the rectangular block of each segment divided can be analyzed, if the rectangular block has more content, the video block which is prone to be downloaded to the highest layer can be set to the highest attention value (such as 2); if the content in the rectangular block is less or near more content video blocks, then the video blocks of the lower enhancement layer can be downloaded, and the attention value can be set to be higher (such as 1) for preventing the video quality from being reduced too fast; if the rectangular blocks have no content, the attention value can be set to be the lowest (e.g. 0), so that only the base layer video block needs to be downloaded (i.e. the server inputs all the rectangular blocks of the playing fragment into the preset attention model to obtain the attention value of each rectangular block). Then, a vector set (i.e. the service end generates a play segment vector matrix based on all the attention values, where each vector value in the play segment vector matrix corresponds to an attention value one to one), for example, the play segment vector matrix is:

Then, the attention information file can be generated according to the play segment vector matrix.

Optionally, before the server inputs all the rectangular blocks of the play segment into the preset attention model, the method further includes: the method comprises the steps that a server side obtains a plurality of marked rectangular block data to obtain training data; the server side adopts training data to train an initial attention model, and obtains a preset attention model under the condition that training is completed.

In the embodiment of the invention, a preset attention model is used for identifying the content contained in the rectangular block and outputting a corresponding attention value. Therefore, the present embodiment may train the preset attention model by using training data, specifically: the method may include obtaining a plurality of marked rectangular block data (i.e., a rectangular block with pre-marked information, where the marked information is a focus value, and a higher focus value indicates that the rectangular block contains more content), so as to obtain training data, and training an initial focus model with the training data until the model converges, so as to obtain a final preset focus model (i.e., obtain the preset focus model when training is completed).

In the embodiment of the present invention, the client may first receive the image information file and the attention information file sent by the server, where the image information file includes: and the information such as a storage path of the rectangular block (namely, a storage position of the rectangular block), a rectangular block code rate (namely, a rate of downloading the rectangular block) and the like, wherein the rectangular block is obtained by encoding the image by a server side, and the attention information file comprises information such as a play fragment vector matrix and the like (namely, the attention information file is used for indicating whether the rectangular block is a rectangular block of a user attention area).

Optionally, before determining the current throughput rate, the method further includes: downloading a base layer image block and an enhancement layer image block of a preset rectangular block in a preset playing fragment; downloading the base layer image blocks and the enhancement layer image blocks of the residual rectangular blocks in the preset playing segment under the condition that the length of a preset storage area at the current downloading time is greater than or equal to the length of a rectangular block buffer area, and updating the current throughput rate, wherein the length of the rectangular block buffer area is the sum of the length of the base layer buffer area and the length of the enhancement layer buffer area of all the residual rectangular blocks in the preset playing segment, and the residual rectangular blocks are rectangular blocks except the preset rectangular blocks in the preset playing segment; and downloading the base layer image blocks of the residual rectangular blocks in the preset playing segment and updating the current throughput rate under the condition that the length of the preset storage area at the current downloading time is smaller than the length of the rectangular block buffer area and is larger than or equal to the length of the base layer buffer areas of all the residual rectangular blocks.

In the embodiment of the invention, the client needs to perform initialization downloading. In order to estimate the size of the download bandwidth, the client needs to download the base layer and enhancement layer video blocks (i.e., download the base layer video block and the enhancement layer video block of the preset rectangular block in the preset play segment) located in front tile (i.e., the preset play segment, which is the first segment of the image to be played) of the first play segment (i.e., the preset rectangular block is the first rectangular block of the play segment) at the beginning, so that the play segment can play high-quality video at the beginning of play. When the length of the buffer area (i.e., the preset storage area length for storing the image blocks downloaded by the client) at the time of the initialization download (i.e., the current download time) is greater than or equal to the rectangular block buffer area length (the rectangular block buffer area length is the sum of the base layer buffer area length and the enhancement layer buffer area length of all the remaining rectangular blocks in the preset play segment, and the remaining rectangular blocks are the rectangular blocks except the preset rectangular blocks in the preset play segment), downloading all the layers of video blocks of all the remaining tiles at the time, placing the video blocks into the buffer area and updating the average throughput (i.e., when the preset storage area length at the current download time is greater than or equal to the rectangular block buffer area length), downloading the base layer image blocks and the enhancement layer image blocks of the remaining rectangular blocks in the preset play segment, and updating the current throughput rate). When the length of the buffer area at the time of the initialization downloading is smaller than the length of the buffer area of the rectangular block and is larger than or equal to the length of the base layer buffer area of all the remaining rectangular blocks, downloading the video blocks of the base layer of all the remaining tiles at the time, putting the video blocks into the buffer area and updating the average throughput (namely, under the condition that the length of the preset storage area at the current downloading time is smaller than the length of the buffer area of the rectangular block and is larger than or equal to the length of the base layer buffer area of all the remaining rectangular blocks, downloading the base layer image blocks of the remaining rectangular blocks in the preset playing fragment and updating the current throughput).

In the embodiment of the present invention, the buffer length of the base layer is greater than the buffer length of the enhancement layer for the buffer, because the base layer is necessary for playing video, and the enhancement layer is to enhance the quality of video. Fig. 5 is a schematic diagram of an alternative buffer length according to an embodiment of the present invention, and as shown in fig. 5, the base layer (BL 0) buffer length may be set to 5, and the enhancement layer (i.e., EL1 and EL 2) buffer length may be set to 2.

Step S102, determining the current throughput rate.

In the embodiment of the invention, the current throughput rate can be calculated according to the file size of the downloaded rectangular block and the downloading time.

In the embodiment of the invention, the client can adaptively download, specifically: the client may take information in the MPD file (i.e., rectangular block code rate information in the image information file) and compare the information with the currently calculated download throughput (i.e., the current throughput), if the current download throughput is smaller than the code rate of the video, the client downgrade the unimportant video area (i.e., playing the rectangular blocks with vector values smaller than the preset threshold in the slice vector matrix) (i.e., lowering the hierarchy of the enhancement layer or downloading only the base layer) (i.e., in the case that the current throughput is smaller than the rectangular block code rate, based on the play slice vector matrix, downgrade the rectangular blocks indicated by vector values smaller than the preset threshold to obtain a processing result, where the processing result is used to record the hierarchy of the image blocks that can be downloaded by each rectangular block) so that the download throughput is not smaller than the code rate of the video. Also, during this period, in order to reduce the problem of the bandwidth predictive value being low, it is attempted to download higher-level video blocks at intervals.

Optionally, based on the processing result and the storage path, the step of downloading all the rectangular blocks of the play section includes: determining a download vector value of each rectangular block based on a target vector matrix in the processing result; under the condition that the download vector value is a first preset value, downloading the base layer image block of the rectangular block according to the storage path; and under the condition that the download vector value is a second preset value, downloading the base layer image block and the enhancement layer image block of the rectangular block according to the storage path.

In the embodiment of the present invention, the client may download all the rectangular blocks of the play segment according to the processing result and the storage path of the rectangular blocks in the image information file, specifically: determining a downloading vector value (namely, a focus value used for representing the hierarchy of the image blocks which can be downloaded by the rectangular blocks) of each rectangular block according to a target vector matrix (namely, a vector matrix formed by the focus value of each rectangular block after being processed) in a processing result, and downloading the base layer image block of the rectangular block according to a storage path if the downloading vector value is a first preset value (namely, the focus value is smaller, for example, 0 or 1); if the download vector value is a second preset value (i.e., the attention value is larger, for example, 2), the base layer image block and the enhancement layer image block of the rectangular block are downloaded according to the storage path.

Optionally, after downloading all the rectangular blocks of the play section based on the processing result and the storage path, the method further includes: before playing the target playing fragment, checking whether a base layer image block of the target playing fragment exists or not to obtain a first checking result; checking whether the enhancement layer image block of the target playing fragment exists or not under the condition that the first checking result indicates that the base layer image block of the target playing fragment exists, and obtaining a second checking result; and decoding each rectangular block in the target playing fragment when the second checking result indicates that the enhancement layer image block of the target playing fragment exists, and playing the target playing fragment when the decoding is completed.

In the embodiment of the invention, the client can adaptively play, specifically: for video blocks that have been downloaded, when the video block is fast played to the t-th segment (i.e., the target playing segment), it is checked whether the base layer video block of the t-th segment exists (i.e., before the target playing segment is played, it is checked whether the base layer video block of the target playing segment exists, a first check result is obtained), if so, it is continuously checked whether the enhancement layer of L1 exists, if so, it is continuously checked whether the enhancement layer of L2 exists (if the enhancement layer of L1 does not exist, it is not checked again), then it is performed for parsing playback for each tile (i.e., if the first check result indicates that the base layer video block of the target playing segment exists, it is checked whether the enhancement layer video block of the target playing segment exists, a second check result is obtained, if the second check result indicates that the enhancement layer video block of the target playing segment exists, it is decoded each rectangular block in the target playing segment, and if decoding is completed, it is played.

Optionally, after checking whether the base layer image block of the target play segment exists, the method further includes: suspending playing the target play fragment under the condition that the first check result indicates that the base layer image block of the target play fragment does not exist; and downloading the base layer image block of the target playing fragment.

In the embodiment of the present invention, if the base layer video block of the t segment does not exist, the playing is paused (i.e. if the first inspection result indicates that the base layer video block of the target playing segment does not exist, the playing of the target playing segment is paused), and meanwhile, the client adaptive download module requests to download the base layer video block of the region, and after the downloading is completed, the playing can be continued (i.e. the base layer video block of the target playing segment is downloaded).

In the embodiment of the invention, the SVC coding and the block coding technology are used for dividing the whole video into a plurality of video areas for transmission, and the video of each area is divided into a plurality of layers, so that the video quality of the user concerned area can be controlled according to the network condition under the condition of insufficient network transmission capability, thereby improving the video watching experience of the user.

The following describes in detail another embodiment.

Example two

The processing device for transmitting visual images provided in the present embodiment includes a plurality of implementation units, each implementation unit corresponding to each implementation step in the first embodiment.

Fig. 6 is a schematic diagram of an alternative processing device for visual image transmission according to an embodiment of the present invention, as shown in fig. 6, the processing device may include: a receiving unit 60, a determining unit 61, a processing unit 62, a downloading unit 63, wherein,

the receiving unit 60 is configured to receive an image information file and an attention information file, where the image information file includes at least: the storage path of the rectangular block and the code rate of the rectangular block, and the attention information file at least comprises: playing a segment vector matrix, wherein the rectangular block is obtained by encoding the image by the server;

a determining unit 61 for determining a current throughput rate;

the processing unit 62 is configured to, based on the play segment vector matrix, perform degradation processing on the rectangular block indicated by the vector value smaller than the preset threshold value, to obtain a processing result, where the current throughput rate is smaller than the rectangular block code rate;

and a downloading unit 63 for downloading all rectangular blocks of the play section based on the processing result and the storage path.

In the above processing apparatus, the receiving unit 60 may receive the image information file and the attention information file, the determining unit 61 may determine the current throughput rate, the processing unit 62 may perform degradation processing on the rectangular blocks indicated by the vector values smaller than the preset threshold based on the play segment vector matrix when the current throughput rate is smaller than the rectangular block code rate, to obtain a processing result, and the downloading unit 63 may download all the rectangular blocks of the play segment based on the processing result and the storage path. In the embodiment of the invention, the client can firstly determine the current throughput rate, then compare the current throughput rate with the rectangular block code rate in the image information file sent by the server, if the current throughput rate is smaller than the rectangular block code rate, the degradation processing is required to be carried out on the non-attention area (namely, the rectangular block indicated by the vector value smaller than the preset threshold) of the user according to the play fragment vector matrix in the attention information file sent by the server, and then all the rectangular blocks of the play fragment are downloaded for playing according to the processing result and the storage path of the rectangular block in the image information file, so that the image quality of the attention area of the user can be controlled according to the network condition, the image watching experience of the user is improved, and the technical problem that the image quality of the attention area of the user cannot be effectively controlled according to the network condition in the related technology is solved.

Optionally, the processing device further comprises: the first receiving module is configured to, before receiving the image information file and the attention information file, receive, by the server, an image transmission request sent by the client, where the image transmission request at least includes: an image identifier to be transmitted; the first determining module is used for determining a target image corresponding to the image identifier to be transmitted by the server; the first coding module is used for coding the target image by the server to obtain a coded image and an image information file; and the first output module is used for the server to obtain the attention information file based on all the rectangular blocks of each playing fragment.

Optionally, the first determining module includes: the first dividing sub-module is used for controlling the server to divide each image frame in the target image into a plurality of rectangular blocks; the second dividing sub-module is used for controlling the server to divide the target image into a plurality of playing fragments, wherein each playing fragment corresponds to a plurality of rectangular blocks; the first coding submodule is used for controlling the server to code each rectangular block in the playing fragment into a plurality of image blocks, wherein the hierarchy type of the image blocks comprises: the base layer is used for decoding the image with the lowest quality, and the enhancement layer is used for enhancing the quality of the image.

Optionally, the first output module includes: the first input sub-module is used for controlling the server to input all rectangular blocks of the playing fragment into a preset attention model to obtain an attention value of each rectangular block; the first generation submodule is used for controlling the service end to generate a play segment vector matrix based on all the attention values, wherein each vector value in the play segment vector matrix corresponds to the attention value one by one; and the second generation submodule is used for controlling the service end to generate the attention information file based on the play segment vector matrix.

Optionally, the processing device further comprises: the first acquisition module is used for controlling the server to acquire a plurality of marked rectangular block data before the server inputs all rectangular blocks of the playing fragment to a preset attention model to obtain training data; the first training module is used for controlling the server to train the initial attention model by adopting training data and obtaining a preset attention model under the condition that training is completed.

Optionally, the processing device further comprises: the first downloading module is used for downloading the base layer image block and the enhancement layer image block of the preset rectangular block in the preset playing segment before determining the current throughput rate; the second downloading module is used for downloading the base layer image blocks and the enhancement layer image blocks of the residual rectangular blocks in the preset playing fragments and updating the current throughput rate under the condition that the length of a preset storage area at the current downloading time is greater than or equal to the length of a rectangular block buffer zone, wherein the length of the rectangular block buffer zone is the sum of the length of the base layer buffer zone and the length of the enhancement layer buffer zone of all the residual rectangular blocks in the preset playing fragments, and the residual rectangular blocks are rectangular blocks except the preset rectangular blocks in the preset playing fragments; and the third downloading module is used for downloading the base layer image blocks of the residual rectangular blocks in the preset playing segment and updating the current throughput rate under the condition that the length of the preset storage area at the current downloading time is smaller than the length of the rectangular block buffer area and is larger than or equal to the length of the base layer buffer areas of all the residual rectangular blocks.

Optionally, the downloading unit includes: the second determining module is used for determining the downloading vector value of each rectangular block based on the target vector matrix in the processing result; the fourth downloading module is used for downloading the base layer image block of the rectangular block according to the storage path under the condition that the downloading vector value is a first preset value; and the fifth downloading module is used for downloading the base layer image block and the enhancement layer image block of the rectangular block according to the storage path under the condition that the downloading vector value is a second preset value.

Optionally, the processing device further comprises: the first checking module is used for checking whether the base layer image block of the target playing fragment exists or not before the target playing fragment is played after all the rectangular blocks of the playing fragment are downloaded based on the processing result and the storage path, so as to obtain a first checking result; the second checking module is used for checking whether the enhancement layer image block of the target playing fragment exists or not to obtain a second checking result when the first checking result indicates that the base layer image block of the target playing fragment exists; the first decoding module is used for decoding each rectangular block in the target playing fragment when the second checking result indicates that the enhancement layer image block of the target playing fragment exists, and playing the target playing fragment when the decoding is completed.

Optionally, the processing device further comprises: the first pause module is used for pausing the playing of the target playing fragment under the condition that the first checking result indicates that the base layer image block of the target playing fragment does not exist after checking whether the base layer image block of the target playing fragment exists or not to obtain the first checking result; and the sixth downloading module is used for downloading the base layer image block of the target playing fragment.

The processing device may further include a processor and a memory, where the receiving unit 60, the determining unit 61, the processing unit 62, the downloading unit 63, and the like are stored as program units, and the processor executes the program units stored in the memory to implement the corresponding functions.

The processor includes a kernel, and the kernel fetches a corresponding program unit from the memory. The kernel may set one or more, and by adjusting the kernel parameters, all tiles of the play segment are downloaded based on the processing result and the storage path.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), which includes at least one memory chip.

The application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: and receiving the image information file and the attention information file, determining the current throughput rate, carrying out degradation processing on the rectangular blocks indicated by the vector values smaller than the preset threshold based on the play segment vector matrix under the condition that the current throughput rate is smaller than the rectangular block code rate, obtaining a processing result, and downloading all the rectangular blocks of the play segment based on the processing result and the storage path.

According to another aspect of the embodiment of the present application, there is also provided a computer readable storage medium, where the computer readable storage medium includes a stored computer program, and when the computer program runs, the processing method for controlling the device in which the computer readable storage medium is located to execute the above-mentioned visual image transmission is provided.

According to another aspect of the embodiments of the present application, there is also provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the above-described processing method for visual image transmission.

Fig. 7 is a block diagram of a hardware structure of an electronic device (or mobile device) for a processing method for visual image transmission according to an embodiment of the present invention. As shown in fig. 7, the electronic device may include one or more processors 702 (shown in fig. 7 as 702a, 702b, … …,702 n) (the processor 702 may include, but is not limited to, a microprocessor MCU, a programmable logic device FPGA, etc.) a memory 704 for storing data. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a keyboard, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 7 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the electronic device may also include more or fewer components than shown in FIG. 7, or have a different configuration than shown in FIG. 7.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. The processing method for the visual image transmission is characterized by being applied to a client and comprising the following steps of:

receiving an image information file and a focus information file, wherein the image information file at least comprises: the storage path of the rectangular block and the code rate of the rectangular block, and the attention information file at least comprises: playing a segment vector matrix, wherein the rectangular block is obtained by encoding an image by a server;

determining a current throughput rate;

under the condition that the current throughput rate is smaller than the rectangular block code rate, performing degradation processing on the rectangular block indicated by the vector value smaller than a preset threshold based on the play segment vector matrix to obtain a processing result;

and downloading all rectangular blocks of the playing fragment based on the processing result and the storage path.

2. The processing method according to claim 1, further comprising, before receiving the video information file and the attention information file:

the server receives an image transmission request sent by the client, wherein the image transmission request at least comprises: an image identifier to be transmitted;

the server determines a target image corresponding to the image identifier to be transmitted;

The server encodes the target image to obtain an encoded image and the image information file;

and the server obtains the attention information file based on all the rectangular blocks of each playing fragment.

3. The method according to claim 2, wherein the step of encoding the target image by the server includes:

the server divides each image frame in the target image into a plurality of rectangular blocks;

the server divides the target image into a plurality of playing fragments, wherein each playing fragment corresponds to a plurality of rectangular blocks;

the server encodes each rectangular block in the playing fragment into a plurality of image blocks, wherein the hierarchy type of the image blocks comprises: a base layer for decoding an image with a minimum quality, and an enhancement layer for enhancing the quality of the image.

4. The processing method according to claim 2, wherein the step of the server obtaining the attention information file based on all the rectangular blocks of each of the play segments includes:

the server inputs all the rectangular blocks of the playing fragment to a preset attention model to obtain attention values of each rectangular block;

The server generates the play segment vector matrix based on all the attention values, wherein each vector value in the play segment vector matrix corresponds to the attention value one by one;

and the server generates the attention information file based on the play segment vector matrix.

5. The method according to claim 4, further comprising, before the server inputs all the rectangular blocks of the play segment into a preset attention model:

the server acquires a plurality of marked rectangular block data to obtain training data;

the server side adopts the training data to train an initial attention model, and obtains the preset attention model under the condition that training is completed.

6. The processing method of claim 1, further comprising, prior to determining the current throughput rate:

downloading a base layer image block and an enhancement layer image block of a preset rectangular block in a preset playing fragment;

downloading the base layer image blocks and the enhancement layer image blocks of the residual rectangular blocks in the preset playing segment under the condition that the length of a preset storage area at the current downloading time is greater than or equal to the length of a rectangular block buffer area, and updating the current throughput rate, wherein the length of the rectangular block buffer area is the sum of the length of the base layer buffer area and the length of the enhancement layer buffer area of all the residual rectangular blocks in the preset playing segment, and the residual rectangular blocks are rectangular blocks except the preset rectangular blocks in the preset playing segment;

And downloading the base layer image blocks of the residual rectangular blocks in the preset playing fragments and updating the current throughput rate under the condition that the length of the preset storage area at the current downloading time is smaller than the length of the rectangular block buffer area and is larger than or equal to the length of the base layer buffer area of all the residual rectangular blocks.

7. The processing method according to claim 1, wherein the step of downloading all rectangular blocks of the play section based on the processing result and the storage path includes:

determining a downloading vector value of each rectangular block based on a target vector matrix in the processing result;

downloading the base layer image block of the rectangular block according to the storage path under the condition that the download vector value is a first preset value;

and downloading the base layer image block and the enhancement layer image block of the rectangular block according to the storage path under the condition that the download vector value is a second preset value.

8. The processing method according to claim 1, further comprising, after downloading all rectangular blocks of a playback section based on the processing result and the storage path:

before playing a target playing fragment, checking whether a base layer image block of the target playing fragment exists or not to obtain a first checking result;

Checking whether the enhancement layer image block of the target playing fragment exists or not under the condition that the first checking result indicates that the base layer image block of the target playing fragment exists, and obtaining a second checking result;

and decoding each rectangular block in the target playing fragment when the second checking result indicates that the enhancement layer image block of the target playing fragment exists, and playing the target playing fragment when decoding is completed.

9. The method according to claim 8, wherein after checking whether the base layer image block of the target play section exists, obtaining a first check result, further comprising:

suspending playing the target playing fragment under the condition that the first checking result indicates that the base layer image block of the target playing fragment does not exist;

and downloading the base layer image block of the target playing fragment.

10. A processing device for visual image transmission, which is applied to a client, and comprises:

the receiving unit is used for receiving the image information file and the attention information file, wherein the image information file at least comprises: the storage path of the rectangular block and the code rate of the rectangular block, and the attention information file at least comprises: playing a segment vector matrix, wherein the rectangular block is obtained by encoding an image by a server;

A determining unit for determining a current throughput rate;

the processing unit is used for carrying out degradation processing on the rectangular block indicated by the vector value smaller than a preset threshold value based on the play segment vector matrix under the condition that the current throughput rate is smaller than the rectangular block code rate to obtain a processing result;

and the downloading unit is used for downloading all rectangular blocks of the playing fragment based on the processing result and the storage path.

11. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the method of processing visual image transmission according to any one of claims 1 to 9.

12. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of processing visual image transmission of any of claims 1-9.