CN116563771A

CN116563771A - Image recognition method, device, electronic equipment and readable storage medium

Info

Publication number: CN116563771A
Application number: CN202210108219.8A
Authority: CN
Inventors: 梁剑
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2023-08-08

Abstract

The application discloses an image recognition method, an image recognition device, electronic equipment and a readable storage medium, and belongs to the technical field of image processing. The method comprises the following steps: acquiring encoded data of a target image, wherein the encoded data of the target image comprises encoded data of a plurality of macro blocks in the target image, the information amount consumed when any macro block in the plurality of macro blocks is encoded is in direct proportion to the texture complexity of any macro block, and the texture complexity of Ren Yihong blocks is related to the pixel value of each pixel point in any macro block; acquiring a contour image corresponding to the target image based on the coding data of each macro block, wherein the contour image is used for reflecting the contour of an object in the target image; and carrying out image recognition processing on the contour image to obtain an image recognition result. Because the encoding data of the target image is not required to be decoded to obtain the target image, a large amount of computing resources can be saved, and the image recognition efficiency is improved.

Description

Image recognition method, device, electronic equipment and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to an image identification method, an image identification device, electronic equipment and a readable storage medium.

Background

In the field of image processing technology, an image recognition technology for recognizing a target object in an image is important, and an application range of the image recognition technology is also becoming wider and wider. For example, by performing image recognition on a landscape image, animals in the landscape image can be recognized.

In the related art, the image data acquired by the electronic device may be encoded data obtained by encoding one image. At this time, the electronic device needs to perform decoding processing on the encoded data of the image to decode the encoded data to obtain the image, and then performs image recognition on the image. Since a large amount of computing resources are consumed for decoding encoded data of an image, image recognition efficiency is reduced.

Disclosure of Invention

The embodiment of the application provides an image recognition method, an image recognition device, an electronic device and a readable storage medium, which can be used for solving the problem of low image recognition efficiency caused by the fact that a large amount of computing resources are consumed in decoding processing.

In one aspect, an embodiment of the present application provides an image recognition method, where the method includes:

acquiring encoded data of a target image, wherein the encoded data of the target image comprises encoded data of a plurality of macro blocks in the target image, the information amount consumed by any macro block in the plurality of macro blocks in encoding processing is in direct proportion to the texture complexity of any macro block, and the texture complexity of any macro block is related to the pixel value of each pixel point in the any macro block;

Acquiring a contour image corresponding to the target image based on the coding data of each macro block, wherein the contour image is used for reflecting the contour of an object in the target image;

and carrying out image recognition processing on the contour image to obtain an image recognition result.

In another aspect, an embodiment of the present application provides an image recognition apparatus, including:

the acquisition module is used for acquiring the coded data of a target image, wherein the coded data of the target image comprises coded data of a plurality of macro blocks in the target image, the information amount consumed by any macro block in the plurality of macro blocks in coding processing is in direct proportion to the texture complexity of any macro block, and the texture complexity of any macro block is related to the pixel value of each pixel point in the any macro block;

the acquisition module is further used for acquiring a contour image corresponding to the target image based on the coding data of each macro block, wherein the contour image is used for reflecting the contour of an object in the target image;

and the image recognition module is used for carrying out image recognition processing on the outline image to obtain an image recognition result.

In one possible implementation manner, the obtaining module is configured to perform statistical processing on the encoded data of each macroblock to obtain an information amount consumed by each macroblock; determining a maximum first information amount from the information amounts consumed by the respective macro blocks; and determining a contour image corresponding to the target image based on the information amount consumed by each macro block and the first information amount.

In one possible implementation manner, the obtaining module is configured to perform statistical processing on at least one coding component included in the coded data of any macroblock for any macroblock to obtain an information amount consumed by each coding component corresponding to the any macroblock, where any coding component is any one of a macroblock type, a macroblock prediction, a coded block mode, a quantization parameter offset, and a residual; and determining the sum of the information amounts consumed by the coding components corresponding to any one of the macro blocks as the information amount consumed by any one of the macro blocks.

In one possible implementation, the obtaining module is configured to determine, for any macroblock, a ratio between an amount of information consumed by the any macroblock and the first amount of information; and mapping the ratio corresponding to each macro block to a gray value range to obtain the contour image corresponding to the target image.

In one possible implementation, the obtaining module is configured to divide, for any macroblock, the any macroblock into a plurality of sub-macroblocks; carrying out statistical processing on the coded data of any macro block to obtain the information amount consumed by each sub macro block of any macro block; determining a maximum second information amount from the information amounts consumed by the respective sub-macro blocks of the plurality of macro blocks; and determining a contour image corresponding to the target image based on the information amount consumed by each sub-macro block of the plurality of macro blocks and the second information amount.

In one possible implementation manner, the obtaining module is configured to perform statistical processing on at least one coding component included in the coded data of any macroblock to obtain an information amount consumed by each coding component corresponding to each sub-macroblock of the any macroblock, where any coding component is any one of a macroblock type, a macroblock prediction, a coded block mode, a quantization parameter offset, and a residual; for any one sub-macroblock, determining the sum of the information amounts consumed by the respective coding components corresponding to the any one sub-macroblock as the information amount consumed by the any one sub-macroblock.

In one possible implementation manner, the obtaining module is configured to, for any one coding component included in the coded data of the any one macroblock, calculate an amount of information consumed by the any one coding component in response to the any one coding component being any one of the macroblock type, the macroblock prediction, the coded block mode, and the quantization parameter offset, and obtain an amount of information consumed by the any one coding component corresponding to each sub-macroblock of the any one macroblock.

In one possible implementation manner, the obtaining module is configured to determine, for any one coding component included in the coded data of the any one macroblock, a residual coefficient matrix corresponding to any one sub-macroblock of the any one macroblock from the residual coefficient matrices included in the residual in response to the any one coding component being the residual and the number of residual coefficient matrices included in the residual being not less than the number of sub-macroblocks of the any one macroblock, and count an information amount consumed by the residual coefficient matrix corresponding to the any one sub-macroblock; and in response to the any one of the encoded components being the residual and the residual including a number of residual coefficient matrices that is less than a number of sub-macroblocks of the any one of the macroblocks, determining an amount of information consumed by each sub-macroblock corresponding to the any one of the residual coefficient matrices based on the number of sub-macroblocks corresponding to the any one of the residual coefficient matrices and the amount of information consumed by the any one of the residual coefficient matrices.

In one possible implementation, the obtaining module is configured to determine, for any one sub-macroblock, a ratio between an amount of information consumed by the any one sub-macroblock and the second amount of information; mapping the ratio corresponding to each sub-macro block of the macro blocks to a gray value range to obtain the contour image corresponding to the target image.

In one possible implementation, the apparatus further includes:

and the processing module is used for filtering the coded data of the target image or decoding the coded data of the target image to obtain the target image and coding the sensitive content of the target image in response to the image identification result that the sensitive content is included in the contour image.

In another aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one computer program, and the at least one computer program is loaded and executed by the processor, so that the electronic device implements any one of the image recognition methods described above.

In another aspect, there is provided a computer readable storage medium having stored therein at least one computer program loaded and executed by a processor to cause a computer to implement any of the above-described image recognition methods.

In another aspect, there is also provided a computer program or computer program product having at least one computer program stored therein, the at least one computer program being loaded and executed by a processor to cause the computer to implement any of the image recognition methods described above.

The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:

the technical scheme provided by the embodiment of the application is that the contour image corresponding to the target image is obtained based on the coded data of each macro block included in the coded data of the target image, so that the contour image is identified. Because the target image is obtained without decoding the coded data of the target image, a large amount of computing resources can be saved, and the image recognition efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an implementation environment of an image recognition method according to an embodiment of the present application;

fig. 2 is a flowchart of an image recognition method according to an embodiment of the present application;

FIG. 3 is a schematic illustration of a contour image provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of encoded data of a macroblock according to an embodiment of the present application;

FIG. 5 is a schematic diagram of video processing according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an image recognition process provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image recognition device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a terminal device provided in an embodiment of the present application;

fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings. First, terms related to the embodiments of the present application will be explained and explained.

Pixel domain: the state of the video pixel data.

Compressed domain: the state of the video compressed data obtained after the video pixel data is processed by coding (such as H.264/AVC coding). The encoding process is a process of converting video from a pixel domain to a compression domain, whereas the decoding process is a process of restoring video from the compression domain to the pixel domain.

Profile diagram: also referred to as an edge map, is a gray scale or binary image formed by outlining the edges of objects in an image.

Fig. 1 is a schematic diagram of an implementation environment of an image recognition method according to an embodiment of the present application, and as shown in fig. 1, the implementation environment includes a terminal device 101 and a server 102. The image recognition method in the embodiment of the present application may be performed by the terminal device 101, by the server 102, or by both the terminal device 101 and the server 102.

The terminal device 101 may be a smart phone, a game console, a desktop computer, a tablet computer, a laptop computer, a smart television, a smart car device, a smart voice interaction device, a smart home appliance, etc. The server 102 may be one server, or a server cluster formed by a plurality of servers, or any one of a cloud computing platform and a virtualization center, which is not limited in this embodiment of the present application. The server 102 may be in communication connection with the terminal device 101 via a wired network or a wireless network. The server 102 may have functions of data processing, data storage, data transceiving, and the like, which are not limited in the embodiments of the present application. The number of terminal devices 101 and servers 102 is not limited, and may be one or more.

The image recognition method of the embodiment of the application can be realized based on cloud technology. Cloud Technology (Cloud Technology) refers to a hosting Technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Cloud Technology (Cloud Technology) is based on the general terms of network Technology, information Technology, integration Technology, management platform Technology, application Technology and the like applied by Cloud computing business models, and can form a resource pool, and the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

Based on the above implementation environment, the embodiment of the present application provides an image recognition method, taking the flowchart of the image recognition method provided in the embodiment of the present application as shown in fig. 2 as an example, where the method may be performed by the terminal device 101 or the server 102 in fig. 1, or may be performed by the terminal device 101 and the server 102 together. For convenience of description, the terminal device 101 or the server 102 that performs the image recognition method in the embodiment of the present application will be referred to as an electronic device, and the method may be performed by the electronic device. As shown in fig. 2, the method includes steps 201 to 203.

In step 201, encoded data of a target image is obtained, where the encoded data of the target image includes encoded data of a plurality of macro blocks in the target image, an amount of information consumed when any macro block in the plurality of macro blocks performs encoding processing is proportional to a texture complexity of any macro block, and a texture complexity of Ren Yihong blocks is related to a pixel value of each pixel point in any macro block.

The embodiment of the application does not limit the acquisition mode of the coded data of the target image. Illustratively, the target image is encoded based on h.264/advanced video coding (Advanced Video Coding, AVC), where h.264 is a digital video compression format, to obtain encoded data of the target image. The target image is converted from the pixel domain to the compression domain by encoding the target image, so that the data redundancy of the target image is reduced.

Optionally, the target image may be any frame image in a video, and the video may be live video recorded in real time or recorded video. For live video, a camera of a first terminal device (such as a terminal device of a host) collects video in real time, codes the video collected in real time to obtain a video code stream, and sends the video code stream to an electronic device, wherein the code stream can also be called as a bit stream and a code rate. The electronic equipment extracts the code stream of any frame image from the video code stream, and the code stream of the frame image is the coded data of the target image.

The target image includes a plurality of macro blocks (macroblocks), one macro block being composed of one luminance pixel block and two chrominance pixel blocks. Thus, the encoded data of the target image includes encoded data of a plurality of macro blocks. For live video, the encoded data for any macroblock may also be referred to as the bit stream for any macroblock, etc.

In the embodiment of the application, the target image is subjected to coding processing, for example, the target image is subjected to coding processing based on H.264/AVC, so that the coded data of the target image can be obtained. In general, when encoding a target image, if the texture of a macroblock is relatively complex, the amount of information consumed in encoding the macroblock is relatively large, and if the texture of the macroblock is relatively simple, the amount of information consumed in encoding the macroblock is relatively small. That is, the amount of information consumed in encoding a macroblock is proportional to the texture complexity of the macroblock, which is related to the pixel value of each pixel in the macroblock. That is, the amount of information consumed in encoding a macroblock can be obtained based on the encoded data of the macroblock.

Step 202, acquiring a contour image corresponding to the target image based on the coding data of each macro block, wherein the contour image is used for reflecting the contour of an object in the target image.

As mentioned above, the amount of information consumed in encoding a macroblock can be obtained based on the encoded data of the macroblock, and the texture complexity of the macroblock is related to the pixel value of each pixel in the macroblock, so that the pixel information of each macroblock can be obtained based on the encoded data of each macroblock, and thus a contour image corresponding to the target image can be obtained, which can reflect the appearance of the object in the target image. In the embodiment of the application, the outline of the object refers to the outline of the object, and the outline have the same meaning.

Referring to fig. 3, fig. 3 is a schematic diagram of a contour image according to an embodiment of the present application. Wherein (a) is a contour image corresponding to the I Frame (I Frame). I-frames, which may also be referred to as Intra pictures (Intra pictures), are usually referred to as first frames, which are moderately compressed to serve as reference points for random access. (b) Is a contour image corresponding to a P Frame (P Frame) predicted from a previous B Frame or I Frame. (c) Is a contour image corresponding to a B Frame (B Frame) predicted from an adjacent previous Frame, present Frame, and subsequent Frame. The content body of the contour image can be seen from (a), (b) and (c), and therefore the contour image can reflect the appearance of the object in the image.

The method and the device can acquire the outline image corresponding to the target image based on the coded data of each macro block in two modes. See implementation A1 and implementation A2 below for details.

The implementation A1, which obtains a contour image corresponding to a target image based on encoded data of each macroblock, includes: carrying out statistical processing on the coded data of each macro block to obtain the information quantity consumed by each macro block; determining a maximum first information amount from the information amounts consumed by the respective macro blocks; a contour image corresponding to the target image is determined based on the amount of information consumed by each macro block and the first amount of information.

In this embodiment, the encoded data of any one macroblock is statistically processed to obtain the amount of information consumed by the macroblock, where the amount of information consumed by the macroblock includes, but is not limited to, the number of bits consumed by the macroblock. The number of bits consumed by a macroblock may also be referred to as the bit overhead of the macroblock. The information amount consumed by the macro block refers to the information amount consumed when the macro block is coded, and the information amount consumed by the macro block can be used to represent the information entropy of the macro block. The coded data of any one macroblock includes at least one coding component, and the coded information of the macroblock is recorded by the at least one coding component.

In one possible implementation, the statistical processing is performed on the encoded data of each macroblock to obtain the amount of information consumed by each macroblock, including: for any one of the macro blocks, performing statistical processing on at least one coding component contained in the coded data of any one of the macro blocks to obtain the information amount consumed by each coding component corresponding to any one of the macro blocks, wherein any one of the coding components is any one of the macro block type, the macro block prediction, the coded block mode, the quantization parameter offset and the residual error; the sum of the information amounts consumed by the respective coding components corresponding to any one of the macro blocks is determined as the information amount consumed by any one of the macro blocks.

In the embodiment of the present application, any one macroblock corresponds to one macroblock type, which may be a Skip (Skip) type, a Direct (Direct) type, a pulse code modulation (Pulse Code Modulation, PCM) type, and other types, which may be referred to as a non-Skip type, a non-Direct type, and a non-PCM type. The types of macro blocks corresponding to macro blocks are different, and coding components included in the coded data of the macro blocks may also have differences.

Referring to fig. 4, fig. 4 is a schematic diagram of encoded data of a macroblock according to an embodiment of the present application. Fig. 4 illustrates respective coding components included in coded data of a macroblock when the macroblock type of the macroblock is a non-Skip type, a non-Direct type, and a non-PCM type. The coded data of a Macroblock includes, but is not limited to, coded components such as Macroblock Type (Macroblock Type, mb_type), macroblock prediction (Macroblock Prediction, mb_pred), coded block pattern (Coded Block Pattern, CBP), quantization parameter offset (Quantization Parameter Offset, qp_off), and Residual (Residual).

The macroblock type is used to record the coding type of the macroblock, which includes prediction mode, partition size, inter reference direction, etc. The prediction mode comprises intra prediction and inter prediction. The predicted value and the actual value of the intra-frame prediction are both positioned in the current frame image, and the intra-frame prediction mainly eliminates the spatial redundancy of the image. The compression rate of the intra-frame prediction is relatively low, the intra-frame prediction can be independently decoded, the data of other frame images except the current frame image are not relied on, and the intra-frame prediction can be adopted for the key frame image in the video. The actual value of the inter-frame prediction is located in the current frame image, the predicted value is located in the reference frame image, and the inter-frame prediction mainly eliminates the time redundancy of the image. The compression rate of inter-frame prediction is higher than that of intra-frame prediction, but it cannot be independently decoded, and it is necessary to reconstruct the current frame image after the data of the reference frame image is acquired. The partition size is used to describe size information of a macroblock, and the partition size of a macroblock includes 16×16, 16×8, 8×16, 8×8, 4×4, and the like. The inter-frame reference direction is a direction of a reference frame image with respect to a current frame image, and includes forward, backward, and bi-directional directions. The forward direction is the play order relative to the video, with the reference frame picture preceding the current frame picture. The backward direction is the play order relative to the video, with the reference frame image being located after the current frame image. Bi-directional is the order of play relative to video, with reference frame pictures before and after the current frame picture.

Macroblock prediction contains information related to macroblock prediction. The macro blocks are divided into I macro blocks, P macro blocks and B macro blocks. The I macroblock contains intra prediction modes (Intra Prediction Mode, IPM), that is, the I macroblock is intra predicted and the P macroblock and the B macroblock are inter predicted. The partition size of the P macroblock and the partition size of the B macroblock each include 16×16, 16×8, 8×16, and 8×8. Among them, for P macro blocks or B macro blocks of partition sizes 16×16, 16×8, 8×16, reference frame Index (ref_idx) and motion vector difference value (Motion Vector Difference, MVD) are included. For a P Macroblock or B Macroblock of partition size 8×8, it includes a Sub-Macroblock Type (sub_mb_type), a reference frame index, and a motion vector difference value.

The coded block pattern represents a residual coding scheme of a macroblock, and records whether a luminance residual coefficient matrix contains non-zero values or not by using 4 bits, and records whether a chrominance residual coefficient matrix contains non-zero values or not by using 2 bits.

The quantization parameter offset represents an offset of a quantization parameter value of a macroblock with respect to a quantization parameter value of a current frame image. Any macroblock may have a quantization parameter value that is used to quantize the residual coefficient of the macroblock to obtain a quantized residual coefficient, where the residual coefficient of the macroblock can reflect the accuracy of the macroblock prediction value. The current frame image also has a quantization parameter value, which is used to quantize the residual coefficient of the current frame image, and the residual coefficient of the current frame image can reflect the accuracy of the predicted value of the current frame image.

The residual comprises quantized residual coefficients of macroblock luminance and quantized residual coefficients of macroblock chrominance, wherein. The macroblock luminance includes one luminance component Y and the macroblock chrominance includes two chrominance components U and V.

When the macroblock type of the macroblock is Skip type, direct type or PCM type, the encoded data of the macroblock includes at least one encoding component, which is not limited in the embodiment of the present application.

In the embodiment of the present application, the macroblock type included in the encoded data of any macroblock is statistically processed, so that the amount of information consumed by the macroblock type corresponding to the macroblock can be obtained. The amount of information consumed for the macroblock prediction corresponding to any one macroblock can be obtained by performing statistical processing on the macroblock predictions included in the coded data of that macroblock. The amount of information consumed by the coded block pattern corresponding to any macroblock can be obtained by performing statistical processing on the coded block pattern included in the coded data of that macroblock. The quantization parameter offset included in the encoded data of any one macroblock is statistically processed to obtain the amount of information consumed by the quantization parameter offset corresponding to the macroblock. The residual error included in the encoded data of any one macroblock is statistically processed, and the amount of information consumed by the residual error corresponding to the macroblock can be obtained.

Next, the sum of the information amount consumed by at least one of the macroblock type corresponding to any one of the macroblocks, the information amount consumed by the macroblock prediction corresponding to the macroblock, the information amount consumed by the coded block pattern corresponding to the macroblock, the information amount consumed by the quantization parameter offset corresponding to the macroblock, and the information amount consumed by the residual corresponding to the macroblock is calculated, thereby obtaining the information amount consumed by any one of the macroblocks. In this way, the amount of information each consumed by all the macroblocks in the target image can be determined.

Next, the maximum information amount is determined from the information amounts consumed by all the macro blocks in the target image, and the maximum information amount is referred to as the first information amount. And determining a contour image corresponding to the target image based on the information amount and the first information amount consumed by all the macro blocks respectively.

In one possible implementation, determining a contour image corresponding to the target image based on the amount of information consumed by each macroblock and the first amount of information includes: for any one of the macro blocks, determining a ratio between an amount of information consumed by any one of the macro blocks and the first amount of information; and mapping the ratio corresponding to each macro block to a gray value range to obtain a contour image corresponding to the target image.

In the embodiment of the present application, based on the information amount and the first information amount consumed by any one macroblock, the information amount consumed by the macroblock is normalized. The normalization may be performed by calculating a ratio between the amount of information consumed by any one of the macro blocks and the first amount of information, the ratio being a ratio corresponding to any one of the macro blocks, the ratio having a value in the range of 0, 1.

Then, the ratio corresponding to any one of the macro blocks is multiplied by 255 to map the ratio corresponding to any one of the macro blocks to the gradation value range. The mapped ratio corresponding to any one macro block is recorded as a mapped value corresponding to the macro block, and the value range of the mapped value is [0, 255]. After the ratio corresponding to each macro block in the target image is mapped to the gray value range, the contour image corresponding to the target image can be obtained, and the contour image is a gray image.

Through the implementation mode A1, the information amount consumed by each macro block is obtained through statistics, and the contour image corresponding to the target image is determined based on the information amount consumed by each macro block, wherein the contour image can accurately reflect the content main body of the target image. In order to better reflect the content body of the target image and improve the definition of the contour image, the implementation mode A2 may be adopted to determine the contour image corresponding to the target image.

The implementation A2, which obtains a contour image corresponding to a target image based on encoded data of each macroblock, includes: for any one macroblock, dividing any one macroblock into a plurality of sub-macroblocks; carrying out statistical processing on the coded data of any macro block to obtain the information amount consumed by each sub macro block of any macro block; determining a maximum second information amount from the information amounts consumed by the respective sub-macro blocks of the plurality of macro blocks; a contour image corresponding to the target image is determined based on the amount of information consumed by each sub-macroblock of the plurality of macroblocks and the second amount of information.

For any macroblock in the target image, it can be defined that the macroblock is in the mth row and nth column in the target image, then the macroblock can be denoted as { MB ] _(m,n) M-1 is 0.ltoreq.m.ltoreq.M-1, 0.ltoreq.n.ltoreq.N-1, M is the number of macro blocks contained in the widthwise direction of the target image, N is the number of macro blocks contained in the lengthwise direction of the target image, that is, the target image includes macro blocks of M rows and N columns, which are respectively denoted as row 0 to row M-1, and N columns are respectively denoted as column 0 to column N-1.

Any one macroblock may be divided into a plurality of sub-macroblocks, and the embodiments of the present application do not limit the number of sub-macroblocks. For example, any one macroblock may be divided into 16 sub-macroblocks of partition size 4×4, where any one macroblock includes 4 rows and 4 columns of sub-macroblocks, respectively denoted by 0 th row to 3 rd row, and 4 columns by 0 th column to 3 rd column, respectively, then the macroblock MB _(m,n) The sub-macro block of the a-th row and the b-th column in the middle is

Next, the encoded data of any one macroblock is statistically processed to obtain the amount of information consumed by each sub-macroblock of the macroblock, where the amount of information consumed by any one sub-macroblock includes, but is not limited to, the number of bits consumed by the sub-macroblock, which may also be referred to as the bit overhead of the sub-macroblock. The information amount consumed by the sub-macro block refers to the information amount consumed when the sub-macro block is coded, and the information amount consumed by the sub-macro block can be used to represent the information entropy of the sub-macro block.

In one possible implementation, the statistical processing is performed on the encoded data of any one macroblock to obtain the information amount consumed by each sub-macroblock of any one macroblock, including: for any one of the macro blocks, performing statistical processing on at least one coding component contained in the coded data of any one of the macro blocks to obtain the information amount consumed by each coding component corresponding to each sub-macro block of any one of the macro blocks, wherein any one of the coding components is any one of the macro block type, the macro block prediction, the coded block mode, the quantization parameter offset and the residual error; for any one of the sub-macro blocks, the sum of the amounts of information consumed by the respective encoding components corresponding to any one of the sub-macro blocks is determined as the amount of information consumed by any one of the sub-macro blocks.

It should be noted that, the foregoing has already described the coding component included in the coded data of any macroblock, and therefore, in the embodiment of the present application, the coding component is not described in detail.

In this embodiment of the present application, for any one of the coding components in the macroblock type, the macroblock prediction, the coded block pattern, the quantization parameter offset, and the residual included in the coded data of any one macroblock, statistical processing is performed on the coding component to obtain the information amount consumed by the coding component corresponding to each sub-macroblock of the macroblock, and a detailed description of the statistical processing method will be given below.

In one possible implementation manner, performing statistical processing on at least one coding component included in the coded data of any one macroblock to obtain an information amount consumed by each coding component corresponding to each sub-macroblock of any one macroblock, including: for any one of the coding components included in the coded data of any one of the macro blocks, in response to any one of the macro block type, the macro block prediction, the coded block pattern, and the quantization parameter offset, the amount of information consumed by any one of the coding components is counted, and the amount of information consumed by any one of the coding components corresponding to each of the sub-macro blocks of any one of the macro blocks is obtained.

The four coding components, macroblock type, macroblock prediction, coded block pattern, quantization parameter offset, belong to a macroblock and not to a sub-macroblock of the macroblock. That is, the four coding components are shared information of all sub-macroblocks of the macroblock. Therefore, any one of the four encoded components included in the encoded data of any one macroblock corresponds to any one of the encoded components included in the encoded data of each sub-macroblock of the macroblock.

To any oneThe macro block type contained in the coded data of one macro block is subjected to statistical processing, so that the information quantity bit { mb_type) consumed by the macro block type corresponding to the macro block can be obtained _(m,n) }. For any one sub-macroblock in the macroblock, the information amount bit { mb_type { consumed by the macroblock type corresponding to the macroblock is set _(m,n) And the amount of information consumed as the macroblock type to which the sub-macroblock corresponds.

The macroblock predictions contained in the coded data of any one macroblock are statistically processed to obtain the information bit { mb_pred } consumed by the macroblock predictions corresponding to the macroblock _(m,n) }. For any one sub-macroblock in the macroblock, predicting the information amount bit { mb_pred consumed by the macroblock corresponding to the macroblock _(m,n) And the information amount consumed by the macro block prediction corresponding to the sub macro block.

The information amount bit { CBP) consumed by the coded block pattern corresponding to any one of the macro blocks can be obtained by performing statistical processing on the coded block pattern included in the coded data of the macro block _(m,n) }. For any one sub-macroblock in the macroblock, the information amount bit { CBP } consumed by the coded block pattern corresponding to the macroblock is set _(m,n) And the amount of information consumed as the coded block pattern corresponding to the sub-macroblock.

The quantization parameter offset included in the encoded data of any one macroblock is statistically processed to obtain the information bit { QP_off } consumed by the quantization parameter offset corresponding to the macroblock _(m,n) }. For any one sub-macroblock in the macroblock, the information amount bit { QP_off consumed by the quantization parameter offset corresponding to the macroblock is set _(m,n) And the information amount consumed as the quantization parameter offset corresponding to the sub-macroblock.

In one possible implementation manner, performing statistical processing on at least one coding component included in the coded data of any one macroblock to obtain an information amount consumed by each coding component corresponding to each sub-macroblock of any one macroblock, including: for any one of the encoded components included in the encoded data of any one of the macroblocks, in response to any one of the encoded components being a residual, and the number of residual coefficient matrices included in the residual being not less than the number of sub-macroblocks of any one of the macroblocks, determining a residual coefficient matrix corresponding to any one of the sub-macroblocks of any one of the macroblocks from among the residual coefficient matrices included in the residual, and counting an amount of information consumed by the residual coefficient matrix corresponding to any one of the sub-macroblocks; in response to any one of the encoding components being a residual and the residual including a number of residual coefficient matrices that is less than a number of sub-macroblocks of any one of the macroblocks, an amount of information consumed by each of the sub-macroblocks corresponding to any one of the residual coefficient matrices is determined based on the number of sub-macroblocks corresponding to any one of the residual coefficient matrices and the amount of information consumed by any one of the residual coefficient matrices.

The residual includes quantized residual coefficients of macroblock luminance and quantized residual coefficients of macroblock chrominance. Since there may be a difference in quantization parameter coefficients of luminance of different sub-macro blocks of one macro block, and there may also be a difference in quantization parameter coefficients of chrominance of different sub-macro blocks of one macro block, it is necessary to determine a residual error included in encoded data of any sub-macro block of any macro block based on the residual error included in encoded data of any macro block. For convenience of description, the residual error contained in the encoded data of any one sub-macroblock of a macroblock is introduced from the viewpoint of one macroblock.

The residual included in the encoded data of one macroblock includes a plurality of residual coefficient matrices, and at the same time, the macroblock is divided into a plurality of sub-macroblocks, and therefore, the residual included in the encoded data of any one sub-macroblock can be determined based on the relationship between the number of residual coefficient matrices and the number of sub-macroblocks.

When the number of residual coefficient matrices is not smaller than the number of Yu Zihong blocks, that is, one sub-macroblock corresponds to at least one residual coefficient matrix. At this time, a residual coefficient matrix corresponding to any one sub-macroblock is determined from the plurality of residual coefficient matrices, and the residual coefficient matrix corresponding to the sub-macroblock is the residual included in the encoded data of the sub-macroblock. And then, carrying out statistical processing on the residual coefficient matrix corresponding to the sub-macro block to obtain the information quantity consumed by the residual coefficient matrix corresponding to the sub-macro block.

For example, for macroblock MB _(m,n) The discrete cosine transform with the size of 4 multiplied by 4 is adopted to obtain 16 residual coefficient matrixes, and at the moment, any residual coefficient matrix is of the size of 4 multiplied by 4 and can be usedTo be marked as Macroblock MB _(m,n) The division into 16 sub-macro blocks of division size 4 x 4 is performed, where any one sub-macro block is aligned with a residual coefficient matrix of size 4 x 4, i.e. one sub-macro block corresponds to one residual coefficient matrix. Therefore, any residual coefficient matrix is a residual coefficient matrix corresponding to one sub-macroblock, and any residual coefficient matrix is subjected to statistical processing to obtain the information amount (I) consumed by the residual coefficient matrix corresponding to one sub-macroblock>

When the number of residual coefficient matrices is smaller than the number of Yu Zihong blocks, that is, one residual coefficient matrix corresponds to at least one sub-macroblock. At this time, one residual coefficient matrix is a residual included in the encoded data of at least one sub-macroblock. For any sub-macro block corresponding to a residual coefficient matrix, the residual coefficient matrix can be subjected to statistical processing to obtain the information amount consumed by the residual coefficient matrix, and then the information amount consumed by the residual coefficient matrix corresponding to the sub-macro block is determined by combining the number of the sub-macro blocks corresponding to the residual coefficient matrix.

For example, for macroblock MB _(m,n) The 4 residual coefficient matrices are obtained by 8×8 discrete cosine transform, and any residual coefficient matrix is 8×8, and can be described asMacroblock MB _(m,n) The division into 16 sub-macroblocks of division size 4 x 4 is performed, where any one sub-macroblock is aligned with a residual coefficient matrix of size 4 x 4, i.e., one residual coefficient matrix corresponds to four sub-macroblocks. Therefore, any residual coefficient matrix is the residual coefficient matrix corresponding to the four sub-macro blocks, and any residual coefficient matrix is countedIn this way, the amount of information consumed to obtain the residual coefficient matrix corresponding to four sub-macro blocks, and the amount of information consumed by the residual coefficient matrix corresponding to any one sub-macro block can be expressed as + ->

And then, calculating the sum of the information consumption of the macro block type corresponding to any sub macro block, the information consumption of the macro block prediction corresponding to the sub macro block, the information consumption of the coding block mode corresponding to the sub macro block, the information consumption of the quantization parameter offset corresponding to the sub macro block and the information consumption of the residual coefficient matrix corresponding to the sub macro block to obtain the information consumption of the sub macro block.

Alternatively, the amount of information consumed by any one sub-macroblock is determined according to the following formula (1) and formula (2), or the amount of information consumed by any one sub-macroblock is determined according to the following formula (1) and formula (3).

Wherein, the liquid crystal display device comprises a liquid crystal display device,represents the amount of information consumed by any one sub-macroblock, +=represents the accumulated symbol, bit { mb_type } _(m,n) The information amount consumed by the macro block type corresponding to the sub macro block is represented by bit { mb_pred }, and the information amount consumed by the macro block type corresponding to the sub macro block is represented by bit { mb_pred _(m,n) The information amount consumed by the macro block prediction corresponding to the sub macro block is represented by bit { CBP }, and _(m,n) indicates the code corresponding to the sub-macro blockInformation amount consumed by block mode, bit { QP_off _(m,n) The information amount consumed by the quantization parameter offset corresponding to the sub-macroblock,and->All represent the amount of information consumed by the residual coefficient matrix corresponding to the sub-macroblock.

In this way, the amount of information consumed by each of the respective sub-macro blocks of all the macro blocks in the target image, that is, the amount of information consumed by each of the sub-macro blocks in the target image can be determined. Next, the maximum information amount is determined from the information amounts consumed by all the sub-macro blocks in the target image, and the maximum information amount is noted as the second information amount. And determining the contour image corresponding to the target image based on the information amount and the second information amount respectively consumed by all the sub-macro blocks in the target image.

In one possible implementation, determining the contour image corresponding to the target image based on the amount of information consumed by each sub-macroblock of the plurality of macroblocks and the second amount of information includes: for any one of the sub-macro blocks, determining a ratio between an amount of information consumed by any one of the sub-macro blocks and a second amount of information; mapping the ratio corresponding to each sub-macro block of the macro blocks to a gray value range to obtain a contour image corresponding to the target image.

In this embodiment, for any one sub-macroblock in the target image, the amount of information consumed by the sub-macroblock is normalized based on the amount of information consumed by the sub-macroblock and the second amount of information. The normalization may be performed by calculating a ratio between the amount of information consumed by the macroblock and the second amount of information, the ratio being a ratio corresponding to the sub-macroblock, the ratio having a value in the range of 0, 1. The normalization process is performed in the following manner as shown in the formula (4).

Wherein, the liquid crystal display device comprises a liquid crystal display device,representing the ratio of any sub-macroblock in the target image,/->Representing the amount of information consumed by the sub-macroblock, < >>Representing a second amount of information.

Then, for any sub-macroblock in the target image, the corresponding ratio of that macroblock is multiplied by 255, i.e., calculated To map the ratio corresponding to the sub-macroblock to a range of gray values. The mapped ratio corresponding to any sub-macro block is recorded as the mapped value corresponding to the sub-macro block, and the value range of the mapped value is 0, 255]. After the ratios corresponding to all the sub-macro blocks in the target image are mapped to the gray value range, the contour image corresponding to the target image can be obtained, and the contour image is a gray image.

Through the implementation A2, each macro block in the target image is divided into a plurality of sub macro blocks, and the information amount consumed by each sub macro block is counted, so that the statistic granularity is finer. Therefore, the definition, the accuracy and the like of the contour image obtained by determining based on the information amount consumed by all the sub-macro blocks in the target image are improved, and the accuracy of the image recognition result can be improved when the image recognition processing is carried out subsequently.

And 203, performing image recognition processing on the contour image to obtain an image recognition result.

In the embodiment of the application, any image recognition algorithm for the pixel domain or any image recognition model for the pixel domain can be adopted to perform image recognition on the contour image, so that an image recognition result is obtained. The image recognition result is that the target image contains sensitive content or the target image does not contain the sensitive content. The embodiment of the application does not limit the sensitive content, and the sensitive content is exemplified as low-custom content.

If the target image does not include sensitive content, the electronic device may decode the encoded data of the target image to obtain the target image, and render the target image to display the target image on the electronic device. Or, for live video, the electronic device may send the encoded data of the target image to a second terminal device (such as a terminal device of a viewer), and the second terminal device decodes the encoded data of the target image to obtain the target image, and renders the target image to display the target image on the second terminal device.

It should be noted that, for live video, the electronic device transmits a video code stream to the second terminal device, where the video code stream includes a code stream of the target image, that is, the video code stream includes encoded data of the target image.

In one possible implementation manner, the image recognition processing is performed on the contour image, and after the image recognition result is obtained, the method further includes: and filtering the coded data of the target image or decoding the coded data of the target image to obtain the target image and coding the sensitive content of the target image in response to the image identification result that the sensitive content is included in the contour image.

Optionally, if the target image includes sensitive content, the electronic device filters out the encoded data of the target image, so as to avoid displaying and propagating the target image. For live video, the electronic device can stop sending the video code stream to the second terminal device, so that the display and propagation of the target image are avoided.

Optionally, if the target image includes sensitive content, the electronic device decodes the encoded data of the target image to obtain the target image, and performs coding processing on the sensitive content of the target image to mask the sensitive content to obtain the coded target image. The electronic device may then render the coded target image to display the coded target image on the electronic device. For live video, the electronic device may re-encode the target image after rendering the encoded image to obtain encoded data of the encoded target image, send the encoded data of the encoded target image to the second terminal device, and perform decoding processing on the encoded data of the encoded target image by the second terminal device to obtain the encoded target image, and render the encoded target image to display the encoded target image on the second terminal device.

For live video, a video code stream (which may be denoted as a first video code stream) received by the electronic device from the first terminal device may be the same as or different from a video code stream (which may be denoted as a second video code stream) transmitted by the electronic device to the second terminal device. After the electronic equipment receives the first video code stream containing the sensitive content, the electronic equipment can filter the coded data of the target image containing the sensitive content or code the sensitive content in the target image, and send a second video code stream which does not contain the sensitive content to the second terminal equipment, so that the display and the propagation of the sensitive content are avoided, and the quality of the live video is improved.

The method is based on the coded data of each macro block included in the coded data of the target image, and the contour image corresponding to the target image is obtained so as to carry out image recognition on the contour image. Because the target image is obtained without decoding the coded data of the target image, a large amount of computing resources can be saved, and the image recognition efficiency is improved.

The image recognition method is described above from the perspective of method steps, and will be explained and illustrated in detail below in connection with live scenes. In a live scenario, the electronic device performing steps 201 to 203 is a server.

Referring to fig. 5, fig. 5 is a schematic diagram illustrating processing of a video according to an embodiment of the present application. In the embodiment of the application, the terminal equipment of the anchor acquires the video in real time (the video is a live video), and performs pretreatment such as beautifying and special effect adding on the video. Wherein the video is a video related to a host, such as a game video being played by the host, a dance video of the host, etc. In this embodiment of the present application, a video acquired in real time may be referred to as a video pixel stream, and a terminal device of a host performs encoding processing on the video pixel stream to obtain a video bitstream, where the video bitstream is encoded data in a bitstream (may also be referred to as a code stream) corresponding to the video. The anchor terminal device sends a video bitstream to the server. Wherein the video bitstream comprises encoded data of a plurality of frames of images, the encoded data of any frame of images may be the encoded data of the target image mentioned above.

The server receives the video bitstream. The server extracts a bitstream of frame images from the video bitstream. Then, a contour image corresponding to the frame image is constructed using the bit stream of the frame image. And then, carrying out image recognition processing on the contour image to obtain an image recognition result. And when the image recognition result is that the frame image does not contain sensitive content, the server transmits the video bit stream to the terminal equipment of the audience. And when the image identification result shows that the frame image contains sensitive content, the server stops receiving the video bit stream sent by the terminal equipment of the anchor, and stops sending the video bit stream to the terminal equipment of the audience. Optionally, the server may perform image recognition processing on each frame image in the video bitstream, and when the image recognition result of the frame image is that the frame image contains sensitive content, the server stops receiving the video bitstream sent by the terminal device of the anchor, and stops sending the video bitstream to the terminal device of the audience.

After receiving the video bit stream sent by the server, the terminal equipment of the audience decodes the video bit stream to obtain a video pixel stream. And then, rendering the video pixel stream to enable the terminal equipment of the audience to display the video, so that the audience can watch the video conveniently.

Next, referring to fig. 6, fig. 6 is a schematic diagram of an image recognition process according to an embodiment of the present application. The image recognition process shown in fig. 6 is the content of "the server extracts a bit stream of a frame image from a video bit stream, constructs a contour image corresponding to the frame image using the bit stream of the frame image, and performs the image recognition process on the contour image" in fig. 5.

In the embodiment of the application, the server may receive the video bitstream, extract the bitstream of the frame image from the video bitstream, and extract the bitstream of the macroblock from the bitstream of the frame image. Wherein the bit stream of the frame image corresponds to the encoded data of the target image mentioned above and the bit stream of the macroblock corresponds to the encoded data of the macroblock mentioned above. The macro block is divided into sub macro blocks, and since the bit stream of the macro block comprises at least one coding component, each coding component is subjected to statistical processing to obtain the number of bits consumed by each sub macro block. Wherein the number of bits consumed by a sub-macroblock corresponds to the amount of information consumed by the sub-macroblock mentioned above. And carrying out normalization processing on the number of bits consumed by each sub-macro block to obtain a ratio corresponding to each sub-macro block, and mapping the ratio corresponding to each sub-macro block to a gray value range to obtain a contour image corresponding to the frame image. Then, image recognition processing is performed on the contour image.

It should be noted that, the anchor terminal device encodes the live video by using h.264/AVC to obtain a video bitstream, and then the viewer terminal device also decodes the video bitstream by using h.264/AVC to obtain the live video. The H.264/AVC is a video coding and decoding protocol and has wide and unified application. Therefore, for the terminal equipment of the anchor, the method can be seamlessly applied to the streaming scenes of all live broadcast architectures without additional adaptation work. Meanwhile, for the cloud server, a contour image corresponding to the frame image is constructed based on the video bit stream, the contour image is an image of a pixel domain, and image recognition processing can be carried out on the contour image by utilizing all image recognition algorithms and image recognition models aiming at the pixel domain.

When the H.264/AVC is utilized to decode the video bit stream, the decoding process can be divided into three operations, namely prediction and compensation, inverse transformation and inverse quantization, and entropy decoding. In this embodiment of the present application, for frame images of different frame types, the time required for each operation is counted separately, and the time duty ratio of each operation is calculated based on the time required for each operation, to obtain table 1 below.

TABLE 1

As can be seen from table 1, when the h.264/AVC is used to decode the video bitstream, the time duty of prediction and compensation is much larger than that of inverse transform and inverse quantization, and the time duty of prediction and compensation is also much larger than that of entropy decoding. And entropy decoding corresponds to the server in the embodiment of the application constructing the contour image corresponding to the frame image by using the bit stream of the frame image.

That is, in the embodiment of the application, the server only needs to perform entropy decoding to obtain the contour image, so that prediction and compensation, inverse transformation and inverse quantization are saved. For frame pictures of I frames, the temporal overhead may be reduced by 72.07%, for frame pictures of P frames, 80.29%, and for frame pictures of B frames, 85.81%. Therefore, the image recognition method can save a large amount of computing resources and improve the image recognition efficiency.

Fig. 7 is a schematic structural diagram of an image recognition device according to an embodiment of the present application, where, as shown in fig. 7, the device includes:

an obtaining module 701, configured to obtain encoded data of a target image, where the encoded data of the target image includes encoded data of a plurality of macro blocks in the target image, and an amount of information consumed when any macro block in the plurality of macro blocks performs encoding processing is proportional to a texture complexity of any macro block, and the texture complexity of Ren Yihong blocks is related to a pixel value of each pixel point in any macro block;

The acquiring module 701 is further configured to acquire a contour image corresponding to the target image based on the encoded data of each macro block, where the contour image is used to reflect a contour of an object in the target image;

the image recognition module 702 is configured to perform image recognition processing on the contour image, so as to obtain an image recognition result.

In a possible implementation manner, the obtaining module 701 is configured to perform statistical processing on the encoded data of each macroblock, so as to obtain an information amount consumed by each macroblock; determining a maximum first information amount from the information amounts consumed by the respective macro blocks; a contour image corresponding to the target image is determined based on the amount of information consumed by each macro block and the first amount of information.

In a possible implementation manner, the obtaining module 701 is configured to perform statistical processing on at least one coding component included in the coded data of any macroblock to obtain an information amount consumed by each coding component corresponding to any macroblock, where any coding component is any one of a macroblock type, a macroblock prediction, a coded block mode, a quantization parameter offset, and a residual; the sum of the information amounts consumed by the respective coding components corresponding to any one of the macro blocks is determined as the information amount consumed by any one of the macro blocks.

In one possible implementation, the obtaining module 701 is configured to determine, for any macroblock, a ratio between an amount of information consumed by any macroblock and the first amount of information; and mapping the ratio corresponding to each macro block to a gray value range to obtain a contour image corresponding to the target image.

In one possible implementation, the obtaining module 701 is configured to divide, for any macroblock, any macroblock into a plurality of sub-macroblocks; carrying out statistical processing on the coded data of any macro block to obtain the information amount consumed by each sub macro block of any macro block; determining a maximum second information amount from the information amounts consumed by the respective sub-macro blocks of the plurality of macro blocks; a contour image corresponding to the target image is determined based on the amount of information consumed by each sub-macroblock of the plurality of macroblocks and the second amount of information.

In a possible implementation manner, the obtaining module 701 is configured to perform statistical processing on at least one coding component included in the coded data of any macroblock to obtain an information amount consumed by each coding component corresponding to each sub-macroblock of any macroblock, where any coding component is any one of a macroblock type, a macroblock prediction, a coded block mode, a quantization parameter offset, and a residual; for any one of the sub-macro blocks, the sum of the amounts of information consumed by the respective encoding components corresponding to any one of the sub-macro blocks is determined as the amount of information consumed by any one of the sub-macro blocks.

In one possible implementation manner, the obtaining module 701 is configured to, for any coding component included in the coded data of any macroblock, in response to any coding component being any one of a macroblock type, a macroblock prediction, a coded block mode, and a quantization parameter offset, calculate an amount of information consumed by any coding component, and obtain an amount of information consumed by any coding component corresponding to each sub-macroblock of any macroblock.

In one possible implementation manner, the obtaining module 701 is configured to, for any one coding component included in the coded data of any one macroblock, respond to the any one coding component being a residual, and the number of residual coefficient matrices included in the residual is not less than the number of sub-macroblocks of any one macroblock, determine, from the residual coefficient matrices included in the residual, a residual coefficient matrix corresponding to any one sub-macroblock of any one macroblock, and count an information amount consumed by the residual coefficient matrix corresponding to any one sub-macroblock; in response to any one of the encoding components being a residual and the residual including a number of residual coefficient matrices that is less than a number of sub-macroblocks of any one of the macroblocks, an amount of information consumed by each of the sub-macroblocks corresponding to any one of the residual coefficient matrices is determined based on the number of sub-macroblocks corresponding to any one of the residual coefficient matrices and the amount of information consumed by any one of the residual coefficient matrices.

In a possible implementation, the obtaining module 701 is configured to determine, for any one sub-macroblock, a ratio between an amount of information consumed by the any one sub-macroblock and the second amount of information; mapping the ratio corresponding to each sub-macro block of the macro blocks to a gray value range to obtain a contour image corresponding to the target image.

In one possible implementation, the apparatus further includes:

and the processing module is used for filtering the coded data of the target image or decoding the coded data of the target image to obtain the target image and coding the sensitive content of the target image in response to the fact that the image identification result is that the sensitive content is included in the contour image.

The device acquires the contour image corresponding to the target image based on the encoding data of each macro block included in the encoding data of the target image so as to perform image recognition on the contour image. Because the target image is obtained without decoding the coded data of the target image, a large amount of computing resources can be saved, and the image recognition efficiency is improved.

It should be understood that, in implementing the functions of the apparatus provided in fig. 7, only the division of the functional modules is illustrated, and in practical application, the functional modules may be allocated to different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Fig. 8 shows a block diagram of a terminal device 800 according to an exemplary embodiment of the present application. The terminal device 800 includes: a processor 801 and a memory 802.

Processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 801 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 801 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 801 may integrate a GPU (Graphics Processing Unit, image processor) for taking care of rendering and rendering of the content that the display screen is required to display. In some embodiments, the processor 801 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one computer program for execution by processor 801 to implement the image recognition methods provided by the method embodiments herein.

In some embodiments, the terminal device 800 may further optionally include: a peripheral interface 803, and at least one peripheral. The processor 801, the memory 802, and the peripheral interface 803 may be connected by a bus or signal line. Individual peripheral devices may be connected to the peripheral device interface 803 by buses, signal lines, or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 804, a display 805, a camera assembly 806, audio circuitry 807, and a power supply 808.

Peripheral interface 803 may be used to connect at least one Input/Output (I/O) related peripheral to processor 801 and memory 802. In some embodiments, processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 804 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 804 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 804 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 804 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 804 may also include NFC (Near Field Communication ) related circuitry, which is not limited in this application.

The display 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to collect touch signals at or above the surface of the display 805. The touch signal may be input as a control signal to the processor 801 for processing. At this time, the display 805 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 805 may be one, and disposed on a front panel of the terminal device 800; in other embodiments, the display 805 may be at least two, and disposed on different surfaces of the terminal device 800 or in a folded design; in other embodiments, the display 805 may be a flexible display disposed on a curved surface or a folded surface of the terminal device 800. Even more, the display 805 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 805 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 806 is used to capture images or video. Optionally, the camera assembly 806 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 806 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

Audio circuitry 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 801 for processing, or inputting the electric signals to the radio frequency circuit 804 for voice communication. For stereo acquisition or noise reduction purposes, a plurality of microphones may be respectively disposed at different portions of the terminal device 800. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 807 may also include a headphone jack.

The power supply 808 is used to power the various components in the terminal device 800. The power supply 808 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 808 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal device 800 also includes one or more sensors 809. The one or more sensors 809 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, optical sensor 814, and proximity sensor 815.

The acceleration sensor 811 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal apparatus 800. For example, the acceleration sensor 811 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 801 may control the display screen 805 to display a user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 811. Acceleration sensor 811 may also be used for the acquisition of motion data of a game or user.

The gyro sensor 812 may detect a body direction and a rotation angle of the terminal device 800, and the gyro sensor 812 may collect a 3D motion of the user to the terminal device 800 in cooperation with the acceleration sensor 811. The processor 801 may implement the following functions based on the data collected by the gyro sensor 812: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 813 may be disposed at a side frame of the terminal device 800 and/or at a lower layer of the display 805. When the pressure sensor 813 is provided at a side frame of the terminal device 800, a grip signal of the terminal device 800 by a user can be detected, and the processor 801 performs left-right hand recognition or quick operation according to the grip signal acquired by the pressure sensor 813. When the pressure sensor 813 is disposed at the lower layer of the display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 805. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 814 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the display screen 805 based on the ambient light intensity collected by the optical sensor 814. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 805 is turned up; when the ambient light intensity is low, the display brightness of the display screen 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera assembly 806 based on the ambient light intensity collected by the optical sensor 814.

A proximity sensor 815, also known as a distance sensor, is typically provided on the front panel of the terminal device 800. The proximity sensor 815 is used to collect the distance between the user and the front face of the terminal device 800. In one embodiment, when the proximity sensor 815 detects a gradual decrease in the distance between the user and the front face of the terminal device 800, the processor 801 controls the display 805 to switch from the bright screen state to the off screen state; when the proximity sensor 815 detects that the distance between the user and the front surface of the terminal device 800 gradually increases, the processor 801 controls the display screen 805 to switch from the off-screen state to the on-screen state.

It will be appreciated by those skilled in the art that the structure shown in fig. 8 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.

Fig. 9 is a schematic structural diagram of a server provided in the embodiment of the present application, where the server 900 may have a relatively large difference due to different configurations or performances, and may include one or more processors 901 and one or more memories 902, where the one or more memories 902 store at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 901 to implement the image recognition method provided in each of the method embodiments, and the processor 901 is a CPU, for example. Of course, the server 900 may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one computer program loaded and executed by a processor to cause an electronic device to implement any of the image recognition methods described above.

Alternatively, the above-mentioned computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Read-Only optical disk (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program or a computer program product is also provided, in which at least one computer program is stored, which is loaded and executed by a processor, to cause a computer to implement any of the above-mentioned image recognition methods.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

The foregoing description of the exemplary embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to any modification, equivalents, or improvements made within the principles of the present application.

Claims

1. An image recognition method, the method comprising:

2. The method according to claim 1, wherein the acquiring the contour image corresponding to the target image based on the encoded data of each macro block includes:

Carrying out statistical processing on the coded data of each macro block to obtain the information quantity consumed by each macro block;

determining a maximum first information amount from the information amounts consumed by the respective macro blocks;

and determining a contour image corresponding to the target image based on the information amount consumed by each macro block and the first information amount.

3. The method of claim 2, wherein said statistically processing the encoded data of each of said macroblocks to obtain an amount of information consumed by each of said macroblocks, comprising:

for any one macro block, performing statistical processing on at least one coding component contained in the coding data of the any one macro block to obtain the consumed information amount of each coding component corresponding to the any one macro block, wherein any one coding component is any one of macro block type, macro block prediction, coded block mode, quantization parameter offset and residual error;

and determining the sum of the information amounts consumed by the coding components corresponding to any one of the macro blocks as the information amount consumed by any one of the macro blocks.

4. The method of claim 2, wherein the determining a contour image corresponding to the target image based on the amount of information consumed by the respective macro block and the first amount of information comprises:

For any one macroblock, determining a ratio between an amount of information consumed by the any one macroblock and the first amount of information;

and mapping the ratio corresponding to each macro block to a gray value range to obtain the contour image corresponding to the target image.

5. The method according to claim 1, wherein the acquiring the contour image corresponding to the target image based on the encoded data of each macro block includes:

for any one macroblock, dividing the any one macroblock into a plurality of sub-macroblocks;

carrying out statistical processing on the coded data of any macro block to obtain the information amount consumed by each sub macro block of any macro block;

determining a maximum second information amount from the information amounts consumed by the respective sub-macro blocks of the plurality of macro blocks;

and determining a contour image corresponding to the target image based on the information amount consumed by each sub-macro block of the plurality of macro blocks and the second information amount.

6. The method of claim 5, wherein said statistically processing the encoded data of said any one of the macro blocks to obtain the amount of information consumed by each sub-macro block of said any one of the macro blocks, comprises:

For any one of the macro blocks, performing statistical processing on at least one coding component contained in the coding data of the any one of the macro blocks to obtain the information amount consumed by each coding component corresponding to each sub-macro block of the any one of the macro blocks, wherein any one of the coding components is any one of macro block type, macro block prediction, coded block mode, quantization parameter offset and residual error;

for any one sub-macroblock, determining the sum of the information amounts consumed by the respective coding components corresponding to the any one sub-macroblock as the information amount consumed by the any one sub-macroblock.

7. The method according to claim 6, wherein said statistically processing at least one coding component included in the coded data of said any one macroblock to obtain an information amount consumed by each coding component corresponding to each sub-macroblock of said any one macroblock, comprises:

for any one of the coding components included in the coded data of any one of the macroblocks, in response to any one of the macroblock type, the macroblock prediction, the coded block pattern, and the quantization parameter offset, the amount of information consumed by the any one of the coding components is counted, and the amount of information consumed by the any one of the coding components corresponding to each sub-macroblock of the any one of the macroblocks is obtained.

8. The method according to claim 6, wherein said statistically processing at least one coding component included in the coded data of said any one macroblock to obtain an information amount consumed by each coding component corresponding to each sub-macroblock of said any one macroblock, comprises:

for any one of the encoded components contained in the encoded data of any one of the macroblocks, in response to the any one of the encoded components being the residual, and the number of residual coefficient matrices included in the residual being not less than the number of sub-macroblocks of the any one of the macroblocks, determining a residual coefficient matrix corresponding to any one of the sub-macroblocks of the any one of the macroblocks from among the residual coefficient matrices included in the residual, and counting an amount of information consumed by the residual coefficient matrix corresponding to the any one of the sub-macroblocks;

and in response to the any one of the encoded components being the residual and the residual including a number of residual coefficient matrices that is less than a number of sub-macroblocks of the any one of the macroblocks, determining an amount of information consumed by each sub-macroblock corresponding to the any one of the residual coefficient matrices based on the number of sub-macroblocks corresponding to the any one of the residual coefficient matrices and the amount of information consumed by the any one of the residual coefficient matrices.

9. The method of claim 5, wherein the determining a contour image corresponding to the target image based on the amount of information consumed by each sub-macroblock of the plurality of macroblocks and the second amount of information comprises:

for any one sub-macroblock, determining a ratio between an amount of information consumed by the any one sub-macroblock and the second amount of information;

mapping the ratio corresponding to each sub-macro block of the macro blocks to a gray value range to obtain the contour image corresponding to the target image.

10. The method according to any one of claims 1 to 9, wherein after performing image recognition processing on the contour image to obtain an image recognition result, the method further comprises:

and filtering the coded data of the target image or performing decoding processing on the coded data of the target image to obtain the target image and performing coding processing on the sensitive content of the target image in response to the image identification result that the sensitive content is included in the contour image.

11. An image recognition apparatus, the apparatus comprising:

12. An electronic device comprising a processor and a memory, wherein the memory stores at least one computer program, the at least one computer program being loaded and executed by the processor to cause the electronic device to implement the image recognition method of any one of claims 1 to 10.

13. A computer readable storage medium having stored therein at least one computer program loaded and executed by a processor to cause a computer to implement the image recognition method of any one of claims 1 to 10.

14. A computer program product, characterized in that at least one computer program is stored in the computer program product, which is loaded and executed by a processor to cause the computer to implement the image recognition method according to any one of claims 1 to 10.