CN113824996A

CN113824996A - Information processing method and device, electronic equipment and storage medium

Info

Publication number: CN113824996A
Application number: CN202111129368.4A
Authority: CN
Inventors: 王勇望
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2021-12-21

Abstract

The present disclosure relates to an information processing method and apparatus, an electronic device, and a storage medium, the method including: the method comprises the steps of obtaining a detection result which is obtained by performing significance detection on at least one image frame in a video and comprises at least two significant regions, performing information integration processing on the at least two significant regions according to preset weights of the at least two significant regions to obtain coding quantization parameters corresponding to each partial region, and coding the at least one image frame in the video according to the coding quantization parameters corresponding to each partial region to obtain a target video. The embodiment of the disclosure can effectively control the distribution of the coding code rate of each part area in the video, and allocate more code rates and complexity to the visual perception area which is obtained by significance detection and is interested by the user, thereby being beneficial to reducing the storage and transmission cost of the video under the condition of meeting the video quality requirement.

Description

Information processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an information processing method and apparatus, an electronic device, and a storage medium.

Background

With the popularization of the fifth generation mobile communication technology (5G), video has become the mainstream information transfer way, and various industries have generated inseparable connection with video. However, as the amount of video data is increased and communication resources are limited, the actual video transmission bandwidth is not sufficient, and the development of communication networks is greatly challenged. Therefore, there is a need for fast and stable transmission of large amounts of image or video data with limited bandwidth resources by means of more efficient video coding solutions.

Disclosure of Invention

The present disclosure proposes an information processing technical solution.

According to an aspect of the present disclosure, there is provided an information processing method including: acquiring a detection result obtained by performing significance detection on at least one image frame in a video, wherein the detection result comprises at least two significant regions; performing information integration processing on the at least two salient regions according to preset weights of the at least two salient regions to obtain coding quantization parameters corresponding to each partial region, wherein the partial region comprises a first partial region which is not overlapped between the at least two salient regions and a second partial region which is overlapped between the at least two salient regions; and coding at least one image frame in the video according to the coding quantization parameter corresponding to each partial region to obtain a target video.

By the method, the distribution of the coding code rate of each part area in the video can be effectively controlled, more code rates and complexity are distributed to the visual perception area which is obtained by significance detection and is interested by the user, and the method is favorable for reducing the storage and transmission cost of the video under the condition of meeting the video quality requirement.

In a possible implementation manner, the saliency detection includes at least two of general perception region detection, personal perception region detection, and image segmentation, and the obtaining of the detection result obtained by saliency detection on at least one image frame in the video includes at least two of the following: acquiring a universal salient region obtained by carrying out universal perception region detection on at least one image frame in a video; acquiring a personal salient region obtained by detecting a personal perception region of at least one image frame in a video; acquiring a target area obtained by image segmentation of at least one image frame in a video; wherein the at least two salient regions comprise at least two of the universal salient region, the personal salient region, and the target region.

By the method, the advantages of various significance detection methods can be integrated, so that the obtained detection result is more accurate, and the subsequent coding efficiency is improved.

In a possible implementation manner, the performing information integration processing on the at least two significant regions according to preset weights of the at least two significant regions to obtain a coding quantization parameter corresponding to each partial region includes: determining a first partial region where the at least two significant regions are not overlapped and a second partial region where the at least two significant regions are overlapped; determining a preset weight value of the salient region to which each first partial region belongs as a first weight value of the first partial region; determining a second weight value of each second partial region according to preset weight values of various salient regions to which the second partial region belongs; and determining the coding quantization parameter of the first partial region according to each first weight value, and determining the coding quantization parameter of the second partial region according to each second weight value.

By the method, the multiple salient regions can be integrated according to the preset weights of the multiple salient regions to obtain multiple partial regions including a first partial region which is not overlapped and a second partial region which is overlapped, and each partial region corresponds to the integrated coding quantization parameter, so that the advantages of multiple salient detection methods can be integrated, the ratio of the influence of each salient detection method on image frame coding can be adjusted through the preset weights of the multiple salient regions, the coding process is more flexible, the coding scheme can be customized according to the service scene requirements, and the coding effect can be further improved.

In a possible implementation manner, the determining a second weight of each second partial region according to preset weights of multiple salient regions to which the second partial region belongs includes: and multiplying the preset weight of each salient region to which each second partial region belongs to obtain a second weight of the second partial region.

By the method, the second weight of the second partial region can be obtained quickly, and the influence of the preset weights of the plurality of salient regions to which the second partial region belongs can be comprehensively considered.

In a possible implementation manner, the encoding at least one image frame in the video according to the coding quantization parameter corresponding to each partial region to obtain a target video includes: coding at least one image frame according to the coding quantization parameters of each first partial region and each second partial region in the at least one image frame to obtain at least one coded image frame; combining at least one encoded image frame to obtain a target video; the coding quantization parameter is used for adjusting the coding rate of the image frame in the coding process.

In this way, the coding rate of each part of area in the image frame of the video can be adjusted through the coding quantization parameter of each part of area, which is beneficial to controlling the coding scheme of the video according to the service scene, so that the area required by the service scene in the coded target video is not distorted or the distortion is reduced.

In a possible implementation manner, before the obtaining a detection result obtained by performing saliency detection on at least one image frame in a video, the method further includes: receiving configuration information, wherein the configuration information comprises one or more of information used for indicating a region of interest detected by the personal perception region, information used for indicating a segmentation object of the image segmentation, and the preset weight; in a case where the configuration information includes information indicating a region of interest for the personal perception region detection, the acquiring a personal saliency region resulting from the personal perception region detection of at least one image frame in a video includes: according to the information of the attention area in the configuration information, carrying out personal perception area detection on at least one image frame in a video to obtain a personal salient area; in a case where the configuration information includes information indicating a segmentation object of the image segmentation, the acquiring a target region obtained by image segmentation of at least one image frame in a video includes: acquiring a target area obtained by image segmentation of at least one image frame in a video according to the information of the segmentation object in the configuration information; when the configuration information is used to indicate the preset weight, before performing information integration processing on the at least two salient regions according to the preset weights of the at least two salient regions, the method further includes: and determining a preset weight of the salient region according to the configuration information.

By the method, different image segmentation methods and a personal perception area detection method with personal characteristics can be introduced through configuration information and combined with other significance detection methods, so that the method disclosed by the embodiment of the disclosure has the effect of customization, the user can perform personalized coding on personal computing equipment, and development of personalized software is facilitated. Furthermore, the method of the embodiment of the disclosure can flexibly configure various significance detection methods and weights corresponding to the methods, improve the applicability of perceptual video coding, and improve the coding quality under the condition that the video occupies a limited storage space.

In a possible implementation manner, the obtaining a detection result obtained by performing saliency detection on at least one image frame in a video includes: identifying a target scene to obtain an identification result of the target scene; the acquiring a detection result obtained by performing significance detection on at least one image frame in a video comprises: determining a detection model for detecting the significance of at least one image frame in the video according to the identification result of the target scene; and according to the detection model, performing significance detection on at least one image frame in the video to obtain a detection result.

By the method, at least one image frame in the video can be subjected to significance detection according to the recognition result of the target scene recognition to obtain the detection result, so that the accuracy of the detection result and the adaptability to the scene can be improved.

In one possible implementation manner, the significance detection includes at least two of general perception region detection, personal perception region detection and image segmentation, and the detection model includes at least two of general perception region detection model, personal perception region detection model and image segmentation model; the universal sensing area detection model is used for carrying out universal sensing area detection on at least one image frame in a video to obtain a universal salient area; the personal perception area detection model is used for carrying out personal perception area detection on at least one image frame in a video to obtain a personal salient area; the image segmentation model is used for carrying out image segmentation on at least one image frame in the video to obtain a target area.

By the method, the method is beneficial to being suitable for different scenes, more accurate general salient regions, individual salient regions and target regions are obtained, and the accuracy of the detection result is improved.

According to an aspect of the present disclosure, there is provided an information processing apparatus including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a detection result obtained by performing significance detection on at least one image frame in a video, and the detection result comprises at least two significant areas; an integration module, configured to perform information integration processing on the at least two salient regions according to preset weights of the at least two salient regions to obtain a coding quantization parameter corresponding to each partial region, where the partial region includes a first partial region that is not overlapped between the at least two salient regions and a second partial region that is overlapped between the at least two salient regions; and the coding module is used for coding at least one image frame in the video according to the coding quantization parameter corresponding to each partial region to obtain a target video.

In a possible implementation manner, the significance detection includes at least two of general perception region detection, personal perception region detection, and image segmentation, and the obtaining module is configured to obtain at least two of the following: acquiring a universal salient region obtained by carrying out universal perception region detection on at least one image frame in a video; acquiring a personal salient region obtained by detecting a personal perception region of at least one image frame in a video; acquiring a target area obtained by image segmentation of at least one image frame in a video; wherein the at least two salient regions comprise at least two of the universal salient region, the personal salient region, and the target region.

In one possible implementation, the integration module is configured to: determining a first partial region where the at least two significant regions are not overlapped and a second partial region where the at least two significant regions are overlapped; determining a preset weight value of the salient region to which each first partial region belongs as a first weight value of the first partial region; determining a second weight value of each second partial region according to preset weight values of various salient regions to which the second partial region belongs; and determining the coding quantization parameter of the first partial region according to each first weight value, and determining the coding quantization parameter of the second partial region according to each second weight value.

In one possible implementation, the encoding module is configured to: coding at least one image frame according to the coding quantization parameters of each first partial region and each second partial region in the at least one image frame to obtain at least one coded image frame; combining at least one encoded image frame to obtain a target video; the coding quantization parameter is used for adjusting the coding rate of the image frame in the coding process.

In one possible implementation, the apparatus further includes: a receiving module, configured to receive configuration information before the obtaining of a detection result obtained by performing saliency detection on at least one image frame in a video, where the configuration information includes one or more of information indicating a region of interest detected by the personal sensing region, information indicating a segmentation object of the image segmentation, and the preset weight; in a case where the configuration information includes information indicating a region of interest for the personal perception region detection, the acquiring a personal saliency region resulting from the personal perception region detection of at least one image frame in a video includes: according to the information of the attention area in the configuration information, carrying out personal perception area detection on at least one image frame in a video to obtain a personal salient area; in a case where the configuration information includes information indicating a segmentation object of the image segmentation, the acquiring a target region obtained by image segmentation of at least one image frame in a video includes: acquiring a target area obtained by image segmentation of at least one image frame in a video according to the information of the segmentation object in the configuration information; and when the configuration information is used for indicating the preset weight, the device further comprises a preset weight determining module, configured to determine the preset weight of the significant region according to the configuration information before performing information integration processing on the at least two significant regions according to the preset weights of the at least two significant regions.

In one possible implementation, the apparatus further includes: the identification module is used for identifying a target scene before the detection result obtained by performing significance detection on at least one image frame in the video is obtained, so as to obtain an identification result of the target scene; the obtaining module is configured to: determining a detection model for detecting the significance of at least one image frame in the video according to the identification result of the target scene; and according to the detection model, performing significance detection on at least one image frame in the video to obtain a detection result.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, a detection result including at least two salient regions obtained by performing saliency detection on at least one image frame in a video can be obtained, information integration processing is performed on the at least two salient regions according to preset weights of the at least two salient regions to obtain a coding quantization parameter corresponding to each part of region, and at least one image frame in the video is coded according to the coding quantization parameter corresponding to each part of region to obtain a target video. By the method, the distribution of the coding code rate of each part area in the video can be effectively controlled, more code rates and complexity are distributed to the visual perception area which is obtained by significance detection and is interested by the user, and the method is favorable for reducing the storage and transmission cost of the video under the condition of meeting the video quality requirement.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flowchart of an information processing method according to an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of an information processing method according to an embodiment of the present disclosure.

Fig. 3 shows a schematic diagram of an integration process according to an embodiment of the present disclosure.

Fig. 4 shows a block diagram of an information processing apparatus according to an embodiment of the present disclosure.

Fig. 5 shows a block diagram of an electronic device according to an embodiment of the disclosure.

Fig. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

In the related art, in view of the characteristic that human vision focuses on the target region of interest, Video compression efficiency can be improved by a Perceptual Video Coding (PVC) method. Perceptual video coding, that is, in the process of video coding, a Human Visual System (HVS) is applied to mine subjective video perceptual redundancy, and the compression efficiency of video coding is further improved by reducing the perceptual redundancy.

With the development of deep learning, perceptual video coding techniques are rapidly developed, wherein a salient object detection method based on a visual Attention model, for example, a Pyramid constraint-based Self-Attention (PCSA) method, can detect salient objects appearing in a video more accurately, and can be used to reduce perceptual redundancy of the video during perceptual coding.

However, a single salient object detection method is not suitable for all scenes and individual customization, and different users have different scene requirements. There may well be some special needs in some scenarios, for example, in a pet monitoring scenario, where the activity of the pet needs to be highlighted.

In view of this, the present disclosure provides an information processing method, which can obtain a detection result including at least two salient regions obtained by performing saliency detection on at least one image frame in a video, perform information integration processing on the at least two salient regions according to preset weights of the at least two salient regions to obtain a coding quantization parameter corresponding to each partial region, and code the at least one image frame in the video according to the coding quantization parameter corresponding to each partial region to obtain a target video. By the method, the distribution of the coding code rate of each part area in the video can be effectively controlled, more code rates and complexity are distributed to the visual perception area which is obtained by significance detection and is interested by the user, and the method is favorable for reducing the storage and transmission cost of the video under the condition of meeting the video quality requirement.

The information processing method provided by the embodiment of the present disclosure may be applied to playing, automatic rendering, expansion of multimedia file generation, and the like of multimedia information, and the embodiment of the present disclosure does not limit this.

Fig. 1 shows a flowchart of an information processing method according to an embodiment of the present disclosure. The information processing method may be executed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer-readable instruction stored in a memory. Alternatively, the method may be performed by a server.

The information processing method according to the embodiment of the present disclosure is described below by taking an electronic device as an execution subject.

As shown in fig. 1, the information processing method includes:

s11: the method comprises the steps of obtaining a detection result obtained by performing significance detection on at least one image frame in a video, wherein the detection result comprises at least two significant areas.

S12: and performing information integration processing on the at least two salient regions according to preset weights of the at least two salient regions to obtain coding quantization parameters corresponding to each partial region, wherein the partial region comprises a first partial region which is not overlapped between the at least two salient regions and a second partial region which is overlapped between the at least two salient regions.

S13: and coding at least one image frame in the video according to the coding quantization parameter corresponding to each partial region to obtain a target video.

In one possible implementation, in S11, a detection result obtained by performing saliency detection on at least one image frame in the video is obtained. In the process of obtaining the significance detection result, a plurality of significance detection models aiming at different scenes or requirements can be connected in a software interface mode, at least one image frame in the video is detected through each significance detection model, the obtained detection result can comprise at least two significance regions, and each significance region can correspond to a region detected by one significance detection model.

For example, for a pet detection scene, a saliency detection model M1 based on a general perception region detection method, a saliency detection model M2 based on a personal perception region detection method, and a saliency detection model M3 based on an image segmentation method may be connected by way of software interfaces.

For the saliency detection model M1 based on the general perception region detection method, the visual characteristics of human beings can be simulated through an intelligent algorithm, and salient regions in the image, such as a central position region of an image frame, a region where a popular pet (including a cat and a dog) is located, and the like, are extracted;

for the significance detection model M2 based on the personal perception region detection method, the visual characteristics of a person (such as a certain user) can be simulated through an intelligent algorithm, and significant regions in an image are extracted, such as regions in an image frame where personal visual habits are focused, regions in which personal preferred pets (such as snakes, spiders and the like) are located, and the like;

for the significance detection model M3 based on the image segmentation method, a target region in an image can be extracted through an intelligent image segmentation algorithm or a manual segmentation mode, for example, an industry user (product manager) of a pet turtle performs image segmentation on an image frame to obtain the target region where the turtle is located.

At least one image frame in the video is detected by the saliency detection models M1-M3, so that a saliency region X1 output by the saliency detection model M1, a saliency region X2 output by the saliency detection model M2 and a saliency region X3 output by the saliency detection model M3 can be obtained. The detection result may include significant regions X1-X3. It should be understood that the embodiments of the present disclosure only exemplify the significant regions X1-X3, and in the case where the significant regions satisfy at least two, the embodiments of the present disclosure do not limit the number of significant regions included in the detection result.

The detection result obtained by the significance detection may include significant regions in multiple data formats, for example, data formats such as json format, xml format, yaml format, and the like. The embodiments of the present disclosure do not limit the category of the significance detection model and the output data format of each significance detection model. The electronic equipment can identify the salient regions of various data formats output by the salient detection models, so that the problem that the salient detection result cannot be identified due to the fact that only the salient region of a single data format can be identified can be solved.

In S12, each of the salient regions included in the detection result may respectively correspond to a preset weight, for example, the detection result may include salient regions X1 to X3, the preset weight of the salient region X1 is W1, the preset weight of the salient region X2 is W2, and the preset weight of the salient region X3 is W3.

According to preset weights W1-W3 of the significant regions X1-X3 included in the detection result, information integration processing can be performed on the significant regions X1-X3, preset weights of non-overlapped first partial regions in the significant regions X1-X3 can be reserved, weights of overlapped second partial regions between the significant regions X1-X3 can be reset, and each partial region corresponds to one weight.

The first partial region a that is not overlapped in the significant region X1, the first partial region B that is not overlapped in the significant region X2, the first partial region C that is not overlapped in the significant region X3, the second partial region D that is overlapped by both the significant region X1 and the significant region X2, the second partial region E that is overlapped by both the significant region X2 and the significant region X3, the second partial region F that is overlapped by both the significant region X1 and the significant region X3, and the second partial region G that is overlapped by all the significant regions X1 to X3 may respectively correspond to a weight, and the weights of the partial regions a to F may be the same or different, and the specific sizes thereof may be determined according to preset weights, which is not limited by the embodiments of the present disclosure.

The encoded quantization parameter for each partial region may then be determined based on the weight for each partial region. The weight of each partial region may be directly determined as the coding quantization parameter of the partial region, or the weight of each partial region may be multiplied by the configuration strength, and the multiplication result may be determined as the coding quantization parameter of the partial region. Wherein the configuration strength represents a coefficient for performing some operation (including multiplication, for example) with the weight of each partial region.

After obtaining the encoding quantization parameter corresponding to each partial region in S12, at least one image frame in the video may be encoded according to the encoding quantization parameter corresponding to each partial region in S13, and a video format including the at least one image frame is converted into an encoding format, so as to obtain an encoded target video. During the encoding process, the encoding quantization parameter may be used to adjust the encoding rate of each part region.

Compared with the video before encoding (original video), the target video can reduce the storage space of the video as much as possible under the condition of meeting the video quality requirement, or can improve the quality of the video as much as possible when the same storage space is occupied. The Format of the encoded target Video may include a Moving Picture Experts Group (MPEG), an Audio Video Interleaved Format (AVI), an Advanced Streaming Format (ASF), a Windows Media Video Format (WMV), and the like.

According to the embodiment of the disclosure, a detection result including at least two salient regions obtained by performing saliency detection on at least one image frame in a video can be obtained, information integration processing is performed on the at least two salient regions according to preset weights of the at least two salient regions to obtain a coding quantization parameter corresponding to each part of region, and at least one image frame in the video is coded according to the coding quantization parameter corresponding to each part of region to obtain a target video. By the method, the distribution of the coding code rate of each part area in the video can be effectively controlled, more code rates and complexity are distributed to the visual perception area which is obtained by significance detection and is interested by the user, and the method is favorable for reducing the storage and transmission cost of the video under the condition of meeting the video quality requirement.

The following explains an information processing method according to an embodiment of the present disclosure.

Fig. 2 illustrates a schematic diagram of an information processing method according to an embodiment of the present disclosure, which may include S10 to S13, as illustrated in fig. 2.

In one possible implementation manner, in S10, an original video to be played is obtained, and the original video is decoded to obtain at least one image frame in the video.

For example, the original Video to be played may be a Video file encapsulated in a certain encapsulation Format, such as a Video file in a Flash Video (FLV) Format, a Video file in an Advanced Streaming Format (ASF) Format, a Video file in an Audio Video Interleaved Format (AVI) Format, a Video file in a Moving Picture Experts Group (MP 4) Format, and the like, and the Format of the original Video is not limited in the embodiment of the present disclosure.

According to the packaging format of each original video, decoding processing can be carried out on the original video in each packaging format by using a decoding mode matched with the packaging format, and a plurality of image frames included in each video are obtained.

By the method, the original videos in various packaging formats can be quickly decoded, the image frames in the videos are obtained, and the efficiency of subsequently obtaining the detection result of the saliency detection of the image frames is improved.

It should be understood that if the image frames of the video can be directly acquired, or the acquired original video is not encoded, the decoding process of S10 can be skipped, and S11 is directly performed to acquire the detection result obtained by performing the saliency detection on at least one image frame in the video.

In one possible implementation, as shown in fig. 2, the saliency detection includes at least two of general perception region detection, personal perception region detection, and image segmentation, and S11 includes at least one of:

acquiring a universal salient region obtained by carrying out universal perception region detection on at least one image frame in a video;

acquiring a personal salient region obtained by detecting a personal perception region of at least one image frame in a video;

acquiring a target area obtained by image segmentation of at least one image frame in a video;

wherein the at least two salient regions comprise at least two of the universal salient region, the personal salient region, and the target region.

For example, saliency detection may include methods of general perception area detection, personal perception area detection, image segmentation, and so forth.

The general perception region detection can simulate the visual characteristics of human beings (including common people, for example) through an intelligent algorithm, namely, when facing a scene, the human beings automatically pay attention to the interested region and selectively ignore the visual characteristics of the non-interested region, and the region which is interested by the human beings (namely, the general salient region) in the image frame is extracted.

At least one image frame in the video can be input into the universal sensing area detection model, universal sensing detection is carried out on the image frame through the universal sensing area detection model, and a universal salient area is output. The general sensing area detection model can be a trained neural network model and can comprise a plurality of convolutional layers, pooling layers, full-link layers and other network layers.

In order to make the trained neural network model have the function of detecting the universal sensing area, the training set used in the training process may be a training set with universal labeling information (including, for example, the area where the universal salient object is located).

The personal perception region detection can simulate the visual characteristics of an individual through an intelligent algorithm, namely when the individual faces a scene, the individual automatically pays attention to the interested region and selectively ignores the visual characteristics of the uninteresting region, and the region of interest of the individual (namely the salient region of the individual) in the image frame is extracted.

At least one image frame in the video can be input into the personal perception area detection model, the image frame is subjected to personal perception detection through the personal perception area detection model, and a personal salient area is output. The personal perception area detection model can be a trained neural network model and can comprise a plurality of network layers such as convolution layers, pooling layers and full-link layers.

In order to make the trained neural network model have the function of detecting the personal perception area, the training set used in the training process may be a training set with personal labeling information (including, for example, the area where an object that is more noticeable by a person is located).

The image segmentation can be realized by extracting a target area where a target object in the image is located through an intelligent graph algorithm or a manual segmentation mode.

At least one image frame in the video can be input into an image segmentation model with prior information, the image frame is processed through the image segmentation model, and a target area where a target object is located is output; or, the target area where the target object is located may also be manually segmented in an interactive manner, and the image segmentation algorithm is not limited in the embodiment of the present disclosure.

Therefore, a detection result obtained by performing saliency detection on the image frame, that is, an output region obtained by performing methods such as general perception region detection, personal perception region detection, image segmentation on the image frame is obtained.

In order to obtain a more accurate detection result, a plurality of general sensing area detections, a plurality of personal sensing area detections and a plurality of image segmentation methods can be performed on the image frame in parallel.

For example, the general perceptual region detection may include general perceptual region detection based on a cognitive attention model, general perceptual region detection based on a decision theory attention model, general perceptual region detection based on a frequency domain analysis attention model, general perceptual region detection based on a graph theory attention model, and the like.

The personal perception area detection may include personal perception area detection based on a cognitive attention model, personal perception area detection based on a decision theory attention model, personal perception area detection based on a frequency domain analysis attention model, personal perception area detection based on a graph theory attention model, and the like.

Image segmentation may include morphology-based image segmentation, pixel-clustering-based image segmentation, depth-learning-based image segmentation, and the like

The embodiments of the present disclosure do not limit the specific types of the general sensing region detection, the types of the personal sensing region detection, and the types of the image segmentation.

The output areas obtained by each type of general sensing area detection, each type of personal sensing area detection, or each type of image segmentation method may be used as one salient area, or the entire output areas detected by a plurality of types of general sensing areas, the entire output areas detected by a plurality of types of personal sensing areas, or the entire output areas obtained by a plurality of types of image segmentation methods may be used as one salient area. The disclosed embodiments are not so limited.

In one possible implementation manner, before the obtaining a detection result obtained by performing saliency detection on at least one image frame in a video, the method includes: identifying a target scene to obtain an identification result of the target scene;

the acquiring a detection result obtained by performing significance detection on at least one image frame in a video comprises: determining a detection model for detecting the significance of at least one image frame in the video according to the identification result of the target scene; and according to the detection model, performing significance detection on at least one image frame in the video to obtain a detection result.

The detection model comprises at least two of a general perception region detection model, a personal perception region detection model and an image segmentation model; the universal sensing area detection model is used for carrying out universal sensing area detection on at least one image frame in a video to obtain a universal salient area; the personal perception area detection model is used for carrying out personal perception area detection on at least one image frame in a video to obtain a personal salient area; the image segmentation model is used for carrying out image segmentation on at least one image frame in the video to obtain a target area.

For example, according to actual service requirements, for a plurality of common service scenes, video sets in different scenes can be utilized in advance to train detection models which can be used for significance detection, such as a general perception region detection model, a personal perception region detection model, an image segmentation model and the like, so as to obtain a trained detection model for each scene.

In the presence of various scene detection models, before S11, an image or a video of a target scene may be captured by a camera of a terminal device (including a mobile phone, for example), and the captured image or at least one image frame representing the target scene in the video is identified, so as to obtain an identification result of the target scene.

For example, in the case where a pet is present in at least one image frame representing a target scene, a recognition result for a pet monitoring scene may be obtained; in the case where a soccer ball is present in at least one image frame representing a current scene, recognition results for a soccer game scene may be obtained.

In S11, a detection model for saliency detection of at least one image frame in a video may be determined from detection models for saliency detection in a plurality of scenes according to a recognition result of a target scene. The detection model may include at least two of a general perceptual region detection model for the target scene, a personal perceptual region detection model for the target scene, and an image segmentation model for the target scene.

Then, a detection result obtained by performing significance detection on at least one image frame in the video can be obtained according to at least two of a general perception region detection model, a personal perception region detection model and an image segmentation model aiming at a target scene. The detection result comprises at least two of a general salient region for the target scene, a personal salient region for the target scene, and a target region (namely a target segmentation region) for the target scene.

For example, according to the recognition result for the pet monitoring scene, a detection model for detecting the saliency of at least one image frame in the video can be determined, and the detection model can comprise a general perception area detection model for pet monitoring, a personal perception area detection model for pet monitoring and a pet segmentation image model.

Then, at least one image frame in the video can be input into a personal perception area detection model aiming at pet monitoring, and a personal perception area under a pet monitoring scene is obtained; at least one image frame in the video can be input into a general sensing area detection model aiming at pet monitoring, so that a general sensing area under a pet monitoring scene is obtained; at least one image frame in the video can be input into a pet segmentation image model aiming at pet monitoring, and a segmentation region with a target of a pet in a pet monitoring scene is obtained; the pet monitoring method comprises the steps of acquiring a personal sensing area, a general sensing area and a set of segmentation areas with the targets of pets in a pet monitoring scene, namely acquiring detection results.

After the detection result is obtained in S11, in S12, information integration processing (such as integration information shown in fig. 2) may be performed on the at least two salient regions according to preset weights of the at least two salient regions, so as to obtain a coding quantization parameter corresponding to each partial region.

In a possible implementation manner, each significant region included in the detection result may respectively correspond to a preset weight, and S12 includes:

s121: determining a first partial region where the at least two significant regions are not overlapped and a second partial region where the at least two significant regions are overlapped;

s122: determining a preset weight value of the salient region to which each first partial region belongs as a first weight value of the first partial region;

s123: determining a second weight value of each second partial region according to preset weight values of various salient regions to which the second partial region belongs;

s124: and determining the coding quantization parameter of the first partial region according to each first weight value, and determining the coding quantization parameter of the second partial region according to each second weight value.

For example, fig. 3 shows a schematic diagram of the integration process according to the embodiment of the disclosure, as shown in fig. 3, the detection result includes 3 kinds of salient regions, i.e., a salient region X1 (circular region of black solid line), a salient region X2 (circular region of gray), and a salient region X3 (circular region of black dotted line). It should be understood that a plurality of salient regions may be included in the detection result, fig. 3 only takes 3 salient regions as an example, and in the case that the salient regions satisfy at least two, the embodiment of the present disclosure does not limit the number of salient regions included in the detection result.

Wherein, each kind of salient region can respectively correspond to a preset weight, namely: the preset weight corresponding to the salient region X1 is W1, the preset weight corresponding to the salient region X2 is W2, and the preset weight corresponding to the salient region X3 is W3. The preset weight of each significant region is a real number greater than 0, and the specific value of the preset weight is not limited in the embodiment of the disclosure.

In S121, the first partial regions a to C where there is no overlap between the respective salient regions X1 to X3 and the second partial regions D to G where there is an overlap between the at least two salient regions may be determined.

Wherein the first partial region a is an un-overlapped region in the significant region X1, the first partial region B is an un-overlapped region in the significant region X2, and the first partial region C is an un-overlapped region in the significant region X3; the second partial region D is an overlapping region between the salient region X1 and the salient region X2, the second partial region E is an overlapping region between the salient region X2 and the salient region X3, the second partial region F is an overlapping region between the salient region X1 and the salient region X3, and the second partial region G is an overlapping region between the salient regions X1 to X3.

It should be appreciated that the overlapping region of the salient region X1 and the salient region X2 may be processed first, and then the overlapping region of the two overlapping regions and the overlapping region of the salient region X3 may be processed, and the second partial region of each overlap in the image frame may be determined in a serial manner one after another; alternatively, multiple salient regions may be processed in parallel to determine the second partial region of each overlap in the image frame, and the method for determining the second partial region in series for one salient region or in parallel for multiple salient regions in the embodiment of the present disclosure is not limited.

In S122, the preset weights W1 to W3 of the significant regions X1 to X3 to which each of the first partial regions a to C belongs are determined as first weights of each of the first partial regions a to C; that is, the preset weight W1 of the significant region X1 is determined as the first weight of the first partial region a, the preset weight W2 of the significant region X2 is determined as the first weight of the first partial region B, and the preset weight W3 of the significant region X3 is determined as the first weight of the first partial region C.

In S123, determining a second weight value of each second partial region D-G according to preset weight values W1-W3 of the multiple salient regions X1-X3 to which each second partial region D-G belongs;

in a possible implementation manner, the preset weight of each salient region to which each second partial region belongs is multiplied to obtain a second weight of the second partial region.

For example, the second partial region D represents an overlapping region of the significant region X1 and the significant region X2, and the preset weight W1 of the significant region X1 may be multiplied by the preset weight W2 of the significant region X2 to obtain a second weight, that is, W1 × W2;

the second partial region E represents an overlapping region of the significant region X2 and the significant region X3, and the preset weight W2 of the significant region X2 may be multiplied by the preset weight W3 of the significant region X3 to obtain a second weight of the second partial region E, that is, W2 × W3;

the second partial region F represents an overlapping region of the significant region X1 and the significant region X3, and the preset weight W1 of the significant region X1 may be multiplied by the preset weight W3 of the significant region X3 to obtain a second weight of the second partial region F, that is, W1 × W3;

the second partial region G represents an overlapping region of three regions, namely, a significant region X1, a significant region X2, and a significant region X3, and the preset weight W1 of the significant region X1, the preset weight W2 of the significant region X2, and the preset weight W3 of the significant region X3 may be multiplied by each other to obtain a second weight of the second partial region G, that is, W1 × W2 × W3.

Those skilled in the art should understand that the manner of determining the second weight of the overlapped second partial region according to the preset weight of each significant region is not limited to the manner of multiplication, and the second weight of the second partial region may also be obtained through other operation manners, such as adding the preset weights of each significant region to which each second partial region belongs, which is not limited in the embodiment of the present disclosure.

It can be seen that, according to S122 to S123, the weights of the partial regions a to G can be determined, wherein the first weight of the first partial region a is W1, the first weight of the first partial region B is W2, the first weight of the first partial region C is W3, the second weight of the second partial region D is W1 × W2, the second weight of the second partial region E is W2 × W3, the second weight of the second partial region F is W1 × W3, and the second weight of the second partial region G is W1 × W2 × W3.

After determining the weight corresponding to each of the partial regions a to G, in S124, the coding quantization parameters of the first partial regions a to C may be determined according to the first weight of each of the first partial regions a to C, and the coding quantization parameters of the second partial regions E to G may be determined according to the second weight of each of the second partial regions E to G.

For example, the weights corresponding to each of the partial regions a to G may be converted into coding quantization parameters according to a preset configuration strength H. The configuration strength H may be a coefficient set artificially for performing some operation with the weight.

The configuration strength H can be multiplied by the weight of each partial region a-G, respectively, to obtain the coding quantization parameter of each partial region a-G, that is: the encoded quantization parameter of the first partial region a may be H × W1, the encoded quantization parameter of the first partial region B may be H × W2, the encoded quantization parameter of the first partial region C may be H × W3, the encoded quantization parameter of the second partial region D may be H × W1 × W2, the encoded quantization parameter of the second partial region E may be H × W2 × W3, the encoded quantization parameter of the second partial region F may be H × W1 × W3, and the encoded quantization parameter of the second partial region G may be H × W1 × W2 × W3. The specific value of the configuration strength H is not limited in the embodiments of the present disclosure.

It should be understood that the above process only obtains the coding quantization parameter of each partial region in a manner of multiplying the configuration strength H by the weight of each partial region, and may also be an addition, an exponential, or a logarithmic operation of the configuration strength and the weight, and the embodiment of the present disclosure does not limit a specific mathematical operation manner of the two.

Through S121 to S124, the multiple salient regions can be integrated according to the preset weights of the multiple salient regions, so as to obtain multiple partial regions including a first partial region that is not overlapped and a second partial region that is overlapped, and each partial region corresponds to one integrated coding quantization parameter. By the method, the advantages of various significance detection methods can be integrated, the ratio of the significance detection methods to the influence of image frame coding can be adjusted through the preset weights of various significance regions, the coding process is more flexible, the coding scheme can be customized according to the requirements of service scenes, and the coding effect can be further improved.

After the integration result is obtained in S12, at least one image frame in the video may be encoded according to the corresponding encoding quantization parameter of each partial region in S13, so as to obtain the target video (i.e., the video encoding in fig. 2).

In one possible implementation, S13 includes:

s131: coding at least one image frame according to the coding quantization parameters of each first partial region and each second partial region in the at least one image frame to obtain at least one coded image frame; the coding quantization parameter is used for adjusting the coding rate of the image frame in the coding process.

S132: and combining the at least one encoded image frame to obtain the target video.

For example, as video is known to be composed of consecutive image frames, due to the effect of persistence of vision of human eyes, when a sequence of image frames is played at a certain rate, video with continuous motion can be seen. Due to the high similarity between consecutive image frames, it is necessary to perform encoding compression on an uncoded video including at least one image frame in order to remove redundancy in a spatial or temporal dimension for storage and transmission.

In S131, at least one image frame may be encoded according to the encoding quantization parameter of each first partial region and each second partial region in the image frame, and the encoding rate of each partial region is adjusted according to the encoding quantization parameter of each partial region, so as to obtain at least one encoded image frame.

The larger the numerical value of the coding quantization parameter of a certain partial region is, the lower the coding code rate corresponding to the partial region is, the smaller the storage space occupied by the partial region after coding is, and the poorer the image quality of the partial region in the image frame is; conversely, the smaller the value of the coding quantization parameter of a certain partial region is, the higher the coding rate corresponding to the partial region is, the larger the storage space occupied by the partial region after coding is, and the better the image quality of the partial region in the image frame is.

In S132, the encoded at least one image frame may be combined according to the order of the image frames in the original video to obtain the target video.

During the encoding process, redundant information in image frame data included in the video may be removed according to an encoding standard, and an uncoded video file including a plurality of image frames may be converted into a target video file in a certain compression format.

The encoding standard may include an h.261 encoding standard, an h.263 encoding standard, an h.264 encoding standard, an M-JPEG encoding standard of the moving picture experts group, an MPEG series standard of the moving picture experts group of the international organization for standardization, and the like, which are derived from the international union, and the encoding standard is not limited in the embodiments of the present disclosure. Based on different coding standards, different formats of the target Video may be obtained, which may include Moving Picture Experts Group (MPEG), Audio Video Interleaved (AVI), Advanced Streaming Format (ASF), Windows Media Video Format (WMV), and the like.

In the above processes from S10 to S13, the weight values (i.e., the first weight value and the second weight value) of each partial region are used to adjust the coding distribution of different regions in the image frame. The encoding distribution is distribution of corresponding encoding code rates (bit numbers occupied by pixels in the image frame) when each part of the image frame is compressed, and different regions of the image frame can have different encoding code rates.

The coding code rates of all the parts in the image frame can be influenced according to the weight values of all the parts, so that the parts in the image frame corresponding to different weight values have different coding code rates, and further the coding distribution of different parts in the image frame is adjusted. For example, in the case of a multiplication operation of the allocation strength and the weight of each partial region, if the value of the allocation strength is a positive number, the larger the weight of each partial region is, the larger the coding quantization parameter obtained by multiplying the allocation strength is, the smaller the coding rate of each partial region is, and the lower the image quality of the video corresponding to the partial region after coding is; if the value of the configuration strength is negative, the larger the weight of each partial region is, the smaller the coding quantization parameter obtained by multiplying the configuration strength is, the larger the coding code rate of each partial region is, and the higher the image quality of the part corresponding to the partial region in the coded video is.

Wherein the configuration strength is used to adjust the video encoding strength. The encoding strength is the overall strength of the corresponding encoding code rate (the number of bits occupied by each pixel in the image frame) when the image frame is compressed, that is, the overall compression degree of the video.

In practical applications, for example, for a live broadcast scene, each partial region in a video has a proper weight, so that the coding rate of a face or a person region in an image frame is relatively high, and the coding rates of other regions except the face or the person region are relatively low. Therefore, the human face or figure region can be clearer under the condition that the storage space occupied by the coded video file is the same; or the storage space occupied by the coded video file can be smaller under the condition that the definition of the human face or the human figure area is the same.

Or, for the pet monitoring scene, each part of area in the video has a proper weight, so that the coding rate of the pet area is higher and the coding rates of other areas except the pet area are lower in the image frame. Therefore, the pet area can be clearer under the condition that the storage space occupied by the coded video file is the same; or the storage space occupied by the video file after being coded can be smaller under the condition that the definition of the pet area is the same.

By the method, larger code rate and complexity can be distributed to visual perception areas interested by users and areas concerned by users by product managers, and the storage and transmission cost of videos can be reduced under the condition of meeting the video quality.

In addition, the overall coding strength of the video can be adjusted through different configuration strengths. Aiming at the scenes that the video does not need to have high-definition image quality, the configuration intensity can be adjusted, so that the integral coding code rate of the video is low, the coded video file is smaller, and the occupied storage space and the transmission bandwidth are less. For a scene requiring a video with high definition, the configuration strength may be adjusted to increase the overall encoding rate of the video, and the encoded video file may be larger but has higher definition.

Therefore, through S10 to S13, in the process of video encoding, the advantages of saliency detection such as general perceptual region detection, personal perceptual region detection, image segmentation and the like can be comprehensively utilized, personalized encoding is performed according to specific service scenes or user requirements, various saliency detection methods and the weight of the methods on encoding influence are flexibly configured, and thus a suitable encoding code rate is determined, and encoding quality meeting the service scenes or the user requirements is realized.

The information processing method according to the embodiment of the present disclosure is further described below with reference to examples.

In one possible implementation, before S11, the method includes: receiving configuration information, wherein the configuration information comprises one or more of information used for indicating a region of interest detected by the personal perception region, information used for indicating a segmentation object of the image segmentation, and the preset weight;

in a case where the configuration information includes information indicating a region of interest for the personal perception region detection, the acquiring a personal saliency region resulting from the personal perception region detection of at least one image frame in a video includes:

according to the information of the attention area in the configuration information, carrying out personal perception area detection on at least one image frame in a video to obtain a personal salient area;

in a case where the configuration information includes information indicating a segmentation object of the image segmentation, the acquiring a target region obtained by image segmentation of at least one image frame in a video includes:

acquiring a target area obtained by image segmentation of at least one image frame in a video according to the information of the segmentation object in the configuration information;

when the configuration information is used to indicate the preset weight, before performing information integration processing on the at least two salient regions according to the preset weights of the at least two salient regions, the method further includes: and determining a preset weight of the salient region according to the configuration information.

It should be understood that, in the embodiment of the present disclosure, the preset weight corresponding to one salient region may be obtained through configuration information, for example, the preset weight corresponding to a target region obtained by image segmentation, and the preset weight corresponding to at least one other salient region may use a default value, and in a case that configuration information is not needed, the default preset weight corresponding to the salient region is directly obtained. The preset weights corresponding to the various salient regions can also be obtained through configuration information, which is not limited in the embodiment of the disclosure.

For example, for a live scene, the anchor may perform personalized coding on a personal computing device (e.g., including a computer, a mobile phone, etc.), for example, the anchor may select a face icon (or other identifier representing a face) on the personal computing device, and input a corresponding definition (i.e., a preset weight value, for example, set to 5) so that the personal computing device may receive configuration information indicating that segmentation object information of image segmentation is face object information and the corresponding preset weight value is 5. The anchor may also set preset weights for other kinds of salient regions (e.g., general salient regions, personal salient regions), or the preset weights for other kinds of salient regions may use default values.

In this case, a face region obtained by image segmentation of at least one image frame in the video may be obtained by using a face-based segmentation algorithm according to the face object information, and the preset weight corresponding to the face region is 5.

In the process of encoding the video, the video can be encoded by adopting methods from S12 to S13 according to the human face region obtained by the configuration information and the corresponding preset weight value. The coding rate of each image frame in the video can be more reasonably distributed, so that the effect of improving the coding quality is achieved on the premise of not increasing the coding rate of the video, and the face area in the video is clearer; or reducing the storage space or compression bandwidth of the coded video under the condition that the definition of the target object (human face) is not lost.

For example, for a pet monitoring scene, a user may perform personalized coding on a personal computing device (for example, including a computer, a mobile phone, and the like), for example, the user may select a dog or cat icon on the personal computing device and input a corresponding definition (i.e., a preset weight value, for example, set to 5), so that the personal computing device may receive configuration information indicating that the image segmentation object information is a dog or a cat, and the corresponding preset weight value is 5.

In this case, a personal salient region (i.e., a region in which a pet of interest appears more frequently) obtained by performing personal perception region detection on at least one image frame in a video may be acquired using a personal perception region detection algorithm based on user attention region information according to information indicating a user attention region (e.g., a region including a region in which a pet appears frequently), and the preset authority of the personal salient region may be 5.

In the process of coding the video, coding can be carried out by adopting methods from S12 to S13 according to the personal salient region obtained by the configuration information and the corresponding preset weight, so that the coding rate of each image frame in the video can be more reasonably distributed, the effect of improving the coding quality is achieved on the premise of not increasing the video coding rate, and the region where the dog or the cat is located in the video is clearer; or reducing the storage space or compression bandwidth of the encoded video without loss of sharpness of the target object (dog or cat).

For example, for a video scene of a football game, a viewer may perform personalized coding on a personal computing device (for example, including a computer, a mobile phone, etc.), and the viewer may select information of an attention region detected by a personal perception region on the personal computing device, for example, a certain viewer pays attention to a referee, and may select information of the attention region detected by the personal perception region as a referee related region among a plurality of pictures preset on the personal computing device according to a personal attention habit, and input a corresponding definition (i.e., a preset weight, for example, set to 4), so that the personal computing device may receive information indicating the attention region (referee region), and correspond to configuration information with the preset weight of 4. In addition, in order to better watch the football match, the spectator may also select a football icon (or other identification representing a football) on the personal computer device at the same time, and input the corresponding definition (i.e., a preset weight value, for example, set to 5), so that the personal computer device may also receive configuration information indicating that the segmentation object information of the image segmentation is football object information and the corresponding preset weight value is 5.

In this case, a personal saliency region (i.e., a region in which a referee concerned appears more frequently) obtained by performing personal saliency region detection on at least one image frame in a video may be acquired according to information indicating a user attention region (referee region) using a personal saliency region detection algorithm based on the user attention region (referee region) information, and the preset authority of the personal saliency region is 4. A football area obtained by image segmentation of at least one image frame in a video can be obtained by adopting a segmentation algorithm based on balls according to football object information, and the preset weight value corresponding to the football area is 5.

In the process of coding the video, coding can be carried out by adopting methods from S12 to S13 based on the individual salient region and the corresponding preset weight thereof as well as the face region and the corresponding preset weight thereof, so that the coding code rate of each image frame in the video can be more reasonably distributed, the effect of improving the coding quality is achieved on the premise of not increasing the video coding code rate, the region where the football is located in the video is clearest, and the region where the judgment is located is clearer; or reducing the storage space or compression bandwidth of the coded video under the condition that the definition of the attention area (the area where the referee is located) and the definition of the target area (the area where the football is located) are not lost or are slightly lost.

It should be understood that the configuration information may include one or more of information indicating a region of interest for personal perception region detection, segmentation object information indicating image segmentation, and preset weight values, and the specific content of the configuration information is not limited by the embodiments of the present disclosure.

Therefore, the industry user can adjust each significance detection method and the preset weight of each method, for example, a live broadcast scene can be added with a segmentation method of a human face, a figure and the like, and the preset weight of a human face object is improved; the pet monitoring scene can be added with a pet segmentation method, and the preset weight of the pet is improved. An individual user can create own coding features, and the details of the concerned area of the individual user are reserved when the video is compressed, so that the coding quality is improved.

Therefore, according to the information processing method of the embodiment of the disclosure, a detection result obtained by performing saliency detection on at least one image frame in a video can be obtained, the detection result includes at least two salient regions, information integration processing is performed on the at least two salient regions according to preset weights of the at least two salient regions to obtain a coding quantization parameter corresponding to each partial region, and at least one image frame in the video is coded according to the coding quantization parameter corresponding to each partial region to obtain a target video. By the method, the distribution of the coding code rate of each part area in the video can be effectively controlled, more code rates and complexity are distributed to the visual perception area which is obtained by significance detection and is interested by the user, and the method is favorable for reducing the storage and transmission cost of the video under the condition of meeting the video quality requirement.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides an information processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the information processing methods provided by the present disclosure, and the descriptions and corresponding descriptions of the corresponding technical solutions and the corresponding descriptions in the methods section are omitted for brevity.

Fig. 4 shows a block diagram of an information processing apparatus according to an embodiment of the present disclosure, which includes, as shown in fig. 4:

an obtaining module 71, configured to obtain a detection result obtained by performing saliency detection on at least one image frame in a video, where the detection result includes at least two salient regions;

an integrating module 72, configured to perform information integration processing on the at least two salient regions according to preset weights of the at least two salient regions to obtain a coding quantization parameter corresponding to each partial region, where the partial region includes a first partial region that is not overlapped between the at least two salient regions and a second partial region that is overlapped between the at least two salient regions;

and the encoding module 73 is configured to encode at least one image frame in the video according to the encoding quantization parameter corresponding to each partial region, so as to obtain a target video.

It should be understood that the obtaining module 71, the integrating module 72, and the encoding module 73 may be applied to any processor, and the disclosed embodiments are not limited thereto.

In a possible implementation manner, the significance detection includes at least two of general perception region detection, personal perception region detection, and image segmentation, and the obtaining module 71 is configured to obtain at least two of the following three: acquiring a universal salient region obtained by carrying out universal perception region detection on at least one image frame in a video; acquiring a personal salient region obtained by detecting a personal perception region of at least one image frame in a video; acquiring a target area obtained by image segmentation of at least one image frame in a video; wherein the at least two salient regions comprise at least two of the universal salient region, the personal salient region, and the target region.

In one possible implementation, the integrating module 72 is configured to: determining a first partial region where the at least two significant regions are not overlapped and a second partial region where the at least two significant regions are overlapped; determining a preset weight value of the salient region to which each first partial region belongs as a first weight value of the first partial region; determining a second weight value of each second partial region according to preset weight values of various salient regions to which the second partial region belongs; and determining the coding quantization parameter of the first partial region according to each first weight value, and determining the coding quantization parameter of the second partial region according to each second weight value.

In a possible implementation manner, the encoding module 73 is configured to: coding at least one image frame according to the coding quantization parameters of each first partial region and each second partial region in the at least one image frame to obtain at least one coded image frame; combining at least one encoded image frame to obtain a target video; the coding quantization parameter is used for adjusting the coding rate of the image frame in the coding process.

Different image segmentation methods and personal perception area detection methods with personal characteristics can be introduced through configuration information and combined with other significance detection methods, so that the device disclosed by the embodiment of the disclosure has the effect of customization, can enable a user to perform personalized coding, and is convenient for developing personalized software. Furthermore, the device of the embodiment of the disclosure can flexibly configure various significance detection methods and weights corresponding to the methods, improve the applicability of perceptual video coding, and improve the coding quality under the condition that the video occupies a limited storage space.

In one possible implementation, the apparatus further includes: the identification module is used for identifying the target scene to obtain an identification result of the target scene; the obtaining module 71 is configured to: determining a detection model for detecting the significance of at least one image frame in the video according to the identification result of the target scene; and according to the detection model, performing significance detection on at least one image frame in the video to obtain a detection result.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The disclosed embodiments also provide a computer program product comprising computer readable code or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 5 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 5, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as wireless network (WiFi), second generation mobile communication technology (2G), third generation mobile communication technology (3G), fourth generation mobile communication technology (4G), long term evolution of universal mobile communication technology (LTE), fifth generation mobile communication technology (5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 6 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 6, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

Electronic device 1900 alsoA power component 1926 may be included that is configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An information processing method characterized by comprising:

acquiring a detection result obtained by performing significance detection on at least one image frame in a video, wherein the detection result comprises at least two significant regions;

performing information integration processing on the at least two salient regions according to preset weights of the at least two salient regions to obtain coding quantization parameters corresponding to each partial region, wherein the partial region comprises a first partial region which is not overlapped between the at least two salient regions and a second partial region which is overlapped between the at least two salient regions;

and coding at least one image frame in the video according to the coding quantization parameter corresponding to each partial region to obtain a target video.

2. The method of claim 1, wherein the saliency detection comprises at least two of general perception area detection, personal perception area detection, image segmentation,

the obtaining of a detection result obtained by performing significance detection on at least one image frame in a video includes at least two of the following three:

3. The method according to claim 1 or 2, wherein the performing information integration processing on the at least two salient regions according to preset weights of the at least two salient regions to obtain the coding quantization parameter corresponding to each partial region comprises:

determining a first partial region where the at least two significant regions are not overlapped and a second partial region where the at least two significant regions are overlapped;

determining a preset weight value of the salient region to which each first partial region belongs as a first weight value of the first partial region;

determining a second weight value of each second partial region according to preset weight values of various salient regions to which the second partial region belongs;

and determining the coding quantization parameter of the first partial region according to each first weight value, and determining the coding quantization parameter of the second partial region according to each second weight value.

4. The method according to claim 3, wherein the determining the second weight of the second partial region according to the preset weights of the plurality of salient regions to which each of the second partial regions belongs comprises:

and multiplying the preset weight of each salient region to which each second partial region belongs to obtain a second weight of the second partial region.

5. The method according to any one of claims 1 to 4, wherein the encoding at least one image frame in the video according to the coding quantization parameter corresponding to each partial region to obtain a target video comprises:

coding at least one image frame according to the coding quantization parameters of each first partial region and each second partial region in the at least one image frame to obtain at least one coded image frame;

combining at least one encoded image frame to obtain a target video;

the coding quantization parameter is used for adjusting the coding rate of the image frame in the coding process.

6. The method of claim 2, wherein prior to obtaining the detection result of the saliency detection of at least one image frame in the video, the method further comprises:

receiving configuration information, wherein the configuration information comprises one or more of information used for indicating a region of interest detected by the personal perception region, information used for indicating a segmentation object of the image segmentation, and the preset weight;

7. The method according to any one of claims 1-6, wherein said obtaining a detection result of saliency detection of at least one image frame in a video comprises:

identifying a target scene to obtain an identification result of the target scene;

the acquiring a detection result obtained by performing significance detection on at least one image frame in a video comprises:

determining a detection model for detecting the significance of at least one image frame in the video according to the identification result of the target scene;

and according to the detection model, performing significance detection on at least one image frame in the video to obtain a detection result.

8. The method according to claim 7, wherein the significance detection comprises at least two of general perception region detection, personal perception region detection and image segmentation, and the detection models comprise at least two of general perception region detection model, personal perception region detection model and image segmentation model;

the universal sensing area detection model is used for carrying out universal sensing area detection on at least one image frame in a video to obtain a universal salient area;

the personal perception area detection model is used for carrying out personal perception area detection on at least one image frame in a video to obtain a personal salient area;

the image segmentation model is used for carrying out image segmentation on at least one image frame in the video to obtain a target area.

9. An information processing apparatus characterized by comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a detection result obtained by performing significance detection on at least one image frame in a video, and the detection result comprises at least two significant areas;

an integration module, configured to perform information integration processing on the at least two salient regions according to preset weights of the at least two salient regions to obtain a coding quantization parameter corresponding to each partial region, where the partial region includes a first partial region that is not overlapped between the at least two salient regions and a second partial region that is overlapped between the at least two salient regions;

and the coding module is used for coding at least one image frame in the video according to the coding quantization parameter corresponding to each partial region to obtain a target video.

10. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 8.

11. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 8.