WO2021237464A1

WO2021237464A1 - Video image processing method and device

Info

Publication number: WO2021237464A1
Application number: PCT/CN2020/092377
Authority: WO
Inventors: 吴更石; 郭栋; 张开明
Original assignee: 华为技术有限公司
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2021-12-02
Also published as: CN115699725A

Abstract

The present application provides a video image processing method and device. Upon recognizing a target area of an object in a first database comprised in a first video image, a first device performs compression on an area in the first video image, other than the target area, to obtain second compression data and sends same to a second device, such that upon receiving the second compression data, the second device can finally obtain a first image according to a combination of a third video image obtained by performing decompression according to the second compression data and an image of the target area stored in a second database of the second device. Therefore, according to the present application, the first device does not need to repeatedly perform compression on a target area of an object frequently appearing in a video image and the second device does not need to repeatedly perform decompression on the target area, thus reducing a data volume of a compressed packet transmitted between the first device and the second device and improving video image processing efficiency.

Description

Video image processing method and device

Technical field

This application relates to data processing technology, and in particular to a video image processing method and device.

Background technique

Video compression is a technology that recompresses video files. It can compress larger video files into smaller compressed files for transmission or storage without affecting the video content. It is common in network video playback and surveillance video. Transmission and other application scenarios that need to transmit or store video files.

In the prior art, H.264 (also known as advanced video codec (AVC)), H.265 (also known as high efficiency video coding (HEVC)), and H.266 ( Video compression protocols, also known as the next-generation video coding standard (VVC), can be used to compress video files. In these protocols, all video images in a video file are divided into different image packages. For example, each consecutive 64 frames of images is an image packet. Then when compressing each frame of image in each image package, each frame of image is divided into image blocks of different sizes. If a certain image block in the current image is similar to the image blocks in other compressed images When it is higher, it can be considered that the contents of the two image blocks in the two frames of images are the same. The compressed image block can be used to represent the image block in the current image. When compressing the current image, you only need to The area of the image except for a certain image block in the current image can be compressed, thereby reducing the amount of calculation when compressing the video file and improving the compression efficiency.

With the prior art, when compressing a video file, each frame of video image in the video file needs to be divided into blocks to obtain multiple image blocks, and different image blocks are compared to obtain similar image blocks, while for objects in the video image Areas with denser distribution and more boundaries need to set denser image blocks for identification and comparison, so that when each frame of video image is compressed, the number of image blocks processed is larger, which reduces the compression of video files. The time efficiency ultimately leads to a lower efficiency of video image processing.

Summary of the invention

The present application provides a video image processing method and device, which are applied to compress video images, so as to solve the technical problem of low compression efficiency when compressing video images in the prior art, resulting in low efficiency of video image processing.

The first aspect of the present application provides a video image processing method, the execution subject is a first device that compresses video images, wherein the first device recognizes the target area of the object in the first database included in the first video image, and then The area other than the target area in a video image is compressed to obtain second compressed data and sent to the second device.

In this process, the first device does not need to compress the target area, but only needs to compress the area of the video image except the target area of the object existing in the database, and for the first device stored in the first device The image of the object in the database that meets the preset conditions is also stored in the second database of the second device, so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data , Combined with the image of the target area stored in the second compressed data to finally obtain the first image. Therefore, in this embodiment, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image. Based on the image of the target area already stored in the second database of the second device, the first device only needs to The area outside the target area can be compressed, so that there is no need to repeatedly perform the compression operation on the recurring target area, thereby reducing the data volume of the final compressed packet and improving the efficiency of the first device for processing the video image.

In an embodiment of the first aspect of the present application, based on the video image method provided in the first aspect, the image of the object that meets the preset condition stored in the second database of the second device may be sent by the first device. Specifically, for the first device, the first compressed data obtained by compressing the image of the object in the first database may be separately sent to the second device. The first compressed data includes the compression result of the target area, so that the second device receives After the first compressed data is reached, the image set obtained by decompressing the first compressed data is stored in the second database.

Wherein, optionally, the first device may simultaneously send the first compressed data and the second compressed data to the second device, and the second device may decompress the first compressed data first, so that after determining the second database, decompress the first compressed data. 2. Compressed data; or, optionally, before the method described in the first aspect of this application, the first device sends the first compressed data to the second device, and the second device decompresses the first compressed data and determines the second After the database, the method described in the first aspect of the present application is executed to send the second compressed data to the second device. In an optional solution, after the first device sends the first compressed data to the second device, it can compress the area in the first video image except the target area to obtain the second compressed data and send it to the second device. Sending the second compressed data, there is no need to wait until the second device decompresses the first compressed data to determine the second database, that is, the process of obtaining the second compressed data by the first device and the second device decompressing the first compressed data to obtain the second database The process can be parallel. The embodiment of the present application does not limit the sequence of the two processes. Therefore, in this embodiment, after the first device can compress the image of the object included in the first database only once, the obtained first compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine The second database enables the subsequent video images processed by the first device to only need to compress the area outside the target area, thereby reducing the image size and number of times the first device compresses the video image, thereby improving the video Image processing efficiency.

In an embodiment of the first aspect of the present application, since the first device only compresses the area outside the target area in the first video image, the third video image obtained by decompressing the second device according to the second compressed data is combined with The image of the target area stored in the second compressed data obtains the first video image, and in order to allow the second device to more quickly and accurately determine the positional relationship between the decompressed third video image and the target area in the first video image, When sending the second compressed data, the first device as the compression terminal can synchronously send the tag information of the target area in the first video image to the first device, where the tag information includes the location information of the target area in the first video image. , The identification information of the image of the object included in the target area in the first database acquires at least one item of the transformation information. Therefore, in the video image processing method provided by this embodiment, the first device can also determine the marking information of the target area when determining the target area, and subsequently send the second compressed data and the marking information of the target area to the second at the same time. The device enables the second device to more quickly and accurately determine the target area in the first video image, and then can determine the first video image faster after receiving the second compressed data, further improving the processing efficiency of the video image .

In an embodiment of the first aspect of the present application, when the provided method is applied in a scene of real-time video image transmission, the preset condition includes: among the N video images before the first video image, the video image of the object is included The number of is greater than or equal to M, where M and N are both positive integers, N>1, M<N. Specifically, this embodiment can be applied to the scene that the first device acquires the first video image in real time, that is, compressed and sent to the second device, then the first device removes the target area from the first video image through the foregoing embodiment. When the other part is compressed, the first database based on the first video image is obtained from the N+1 video image to the first video image, and the object meets the preset The condition is that the number of occurrences of the object in the N video images, or the number of video images including the object, is greater than or equal to M. Therefore, the first database on which the first device determines the target area in this embodiment is also obtained based on the N images before the first video image, which satisfies the real-time nature of the first database and can be applied to the first device to In a scene where the first video image acquired in real time is compressed, it is used to improve the processing efficiency of the real-time video image by the first device.

In an embodiment of the first aspect of the present application, after the first device obtains the target area of the first video image, compresses the second compressed data and sends it to the second device through the foregoing embodiment, the first device may also be based on the latest Update the first database with the acquired first video image. After the first video image is added, the judgment of the preset condition is based on the first video image and the previous N-1 video images, a total of N video images are obtained. Then, when the first device recognizes from the N video images before the first video image that meet the preset condition, that is, the number of occurrences of the object in these N video images or the number of video images including the object is greater than or equal to After M's new target object, the image of the new target object is added to the first database to update the first database, thereby satisfying the real-time nature of the target area determined by the first device when processing the video image.

In an embodiment of the first aspect of the present application, when the first device adds the image of the new target object to the first database, it can send the image of the newly added target object to the second device, so that the second device can The side updates its stored second database to keep the consistency between the first database and the second database.

In one implementation manner, the first device may compress the image of the new target object, and then send the third compressed data obtained to the second device; or, in another implementation manner, the first device may add After the first database after the new target object is compressed as a whole, the obtained fourth compressed data is sent to the second device.

For the second device, upon receiving the third compressed data or the fourth compressed data, the second database may be updated, so that the updated second database stores the new target object. After that, for the video image processed by the first device, the area that includes the new target object in the video image can be used as the target area without compression, and the second device receives the compressed data that does not include the target area. After decompression, combined with obtaining the image of the new target object from the second database, the video image is finally obtained.

In an embodiment of the first aspect of the present application, in addition to facing the image of the newly added object in the first database, the first device may also delete the object stored in the first database after the object stored in the first database does not meet a preset condition. Image to save the storage space of the first database and improve the utilization efficiency of the storage space of the first device.

In an embodiment of the first aspect of the present application, the first device may also replace the image of the object stored in the first database when the definition of the object in the target area in the first video image is relatively high. Wherein, when the first device detects the target area in the first video image, if it is determined that the definition of the first object included in the target area is better than the definition of the first object stored in the database, it uses the The image of the first object replaces the image of the first object stored in the first database. Similarly, updates made to the first database can also be compressed by the first device and sent to the second device, so that the second device updates the second database. Therefore, when the subsequent second device restores the first video image through the object in the second database, it can obtain better definition, and prevent the image definition of the object stored in the second database from being inferior to the actual object in the first video image. The sharpness of the image, causing the problem that the target area in the first video image is not clear.

In an embodiment of the first aspect of the present application, when the provided method is applied in a scene of real-time video image transmission, or when no image is stored in the first database of the first device, for example, for a real-time transmitted video file, When the first device acquires the first video image, it cannot determine the target area based on the first database. Therefore, in order to ensure the completeness of the first device in processing the video image, when the first device acquires the video file with a frame number smaller than the preset When a video image with a frame number is set, the video image can be encoded as a whole, and an object that meets the preset condition is determined based on the part of the video image that is less than the preset frame number, and stored in the first database. Subsequently, after the first device establishes the first database with video images smaller than the preset frame number, after receiving the first video image larger than the preset frame number, the first device may execute the video image in the foregoing embodiment of the present application Approach.

In an embodiment of the first aspect of the present application, when the provided method is applied in a scene of non-real-time video image transmission, since the first device can completely obtain the entire video image, the preset condition may be that the object is in the entire In the video file, the number of video images including the object is greater than or equal to the preset number. Therefore, the first database on which the first device determines the target area in this embodiment is also obtained based on all the images in the video file, thereby satisfying the completeness of the first database and ensuring that all objects added to the first database meet the expected requirements. If conditions are set, it can be applied to a scene where the first device compresses a non-real-time video file and the first video image is used to improve the processing efficiency of the first device on the real-time video image.

In an embodiment of the first aspect of the present application, the first database in the scene of non-real-time video image transmission may also be obtained by the first device according to the video image. Wherein, the first device may first identify the image of the object that meets the preset conditions among all the video images of the video file and store it in the first database before transmitting the video file, and then treat each video image in the video image as the above-mentioned second image. A video image is processed.

In an embodiment of the first aspect of the present application, in addition to directly storing the image of the object that meets the preset condition in the second device, the first device does not recognize the object as the object before determining that the object meets the preset condition. The target area will cause the second compressed data sent by the first device to the second device to include the image of the object. Therefore, in order to save the amount of data transmitted between the first device and the second device, the first device can replace the sent object only by at least one of the boundary pixel position or the frame number of the target area, so that the second device is After receiving this information, the image of the object can be obtained from the boundary pixel position of the corresponding frame number by itself. Therefore, the video image processing method provided by this embodiment can send only the boundary pixel position and frame number of the object image when the first device sends the image of the object in the first database to the second device, so that the second device can Obtain the target area from the received video image, thereby reducing the amount of data actually sent when the first device sends the first video image to the second device, making the first device compress faster and the second device faster Decompression further improves the processing efficiency of video images.

A second aspect of the present application provides a video image processing method. The execution subject is a second device that receives compressed video files, wherein the second device decompresses the received second compressed data to obtain a first video image processing method that does not include the target area. A video image is combined with the second database to determine the image of the object in the target area. After the two are stitched together, the first video image can be obtained. Therefore, for the case where the target area is included in different video images, the second device only needs to decompress the first compressed data once to obtain the image of the object in the target area, but there is no need to decompress the target area in other video images. Exhale directly from the second database, since the target area included in the video image is missing during decompression, the calculation amount of the device during decompression is reduced, and the efficiency of video image processing by the second device can also be improved.

In an embodiment of the second aspect of the present application, the image of the object meeting the preset condition stored in the second database of the second device may be sent by the first device. Specifically, for the second device, after receiving the first compressed data sent by the first device, the image set obtained by decompressing the first compressed data is stored in the second database. Therefore, in this embodiment, after the first device can compress the image of the object included in the first database only once, the obtained first compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine The second database allows the subsequent second device to only need to decompress the second compressed data when processing the video image, instead of repeatedly compressing objects in the target area, thereby reducing the number of second devices. The file size and times of decompression, thereby improving the processing efficiency of the video image.

In an embodiment of the second aspect of the present application, in order to allow the second device to more quickly and accurately determine the positional relationship between the decompressed third video image and the target area in the first video image, the first device as the compression end may When sending the second compressed data, synchronously send to the first device the marking information of the target area in the first video image, where the marking information includes the position information of the target area in the first video image and the objects included in the target area The identification information of the image in the first database acquires at least one item of the transformation information. Then for the second device, after receiving the first compressed data and the mark information of the target area, it can more quickly and accurately determine the target area in the first video image, and then it can be faster after receiving the second compressed data. The first video image is determined accurately, which further improves the processing efficiency of the video image.

In an embodiment of the second aspect of the application, in addition to directly storing the image of an object that meets the preset conditions in the second device, since the first device does not recognize the object as a target before determining that the object meets the preset conditions The area will cause the second compressed data sent by the first device to the second device to include the image of the object. Therefore, in order to save the amount of data transmitted between the first device and the second device, the first device can replace the sent object only by at least one of the boundary pixel position or the frame number of the target area, so that the second device is After receiving this information, the image of the object can be obtained from the boundary pixel position of the corresponding frame number by itself. Therefore, the video image processing method provided by this embodiment can send only the boundary pixel position and frame number of the object image when the first device sends the image of the object in the first database to the second device, so that the second device can Obtain the target area from the received video image, thereby reducing the amount of data actually sent when the first device sends the first video image to the second device, making the first device faster to compress and the second device to decompress faster. Compression further improves the processing efficiency of video images.

In an embodiment of the second aspect of the present application, when the provided method is applied in a scene of real-time video image transmission, after the first device updates the first database based on the newly acquired first video image, the first device When the image of the new target object is added to the first database, the newly added image of the target object can be compressed to obtain the third compressed data, or the first database after the image of the newly added target object is compressed as a whole The obtained fourth compressed data is sent to the second device. At this time, for the second device, the stored second database can be updated according to the third compressed data or the fourth compressed data to maintain the consistency between the first database and the second database. The area of the video image that includes the new target object can be used as the target area without compression. After the second device receives the compressed data that does not include the target area, it decompresses it, and then obtains the new target from the second database. The image of the object, and finally a video image.

In an embodiment of the second aspect of the present application, when the provided method is applied to a scene of real-time video image transmission, the preset condition includes: the N video images before the first video image include the object The number of video images is greater than or equal to M, where M and N are both positive integers, N>1, M<N; when the provided method is applied to a scene of non-real-time video image transmission, the preset conditions can be The object is in the entire video file, and the number of video images including the object is greater than or equal to the preset number.

The third aspect of the present application provides a video image processing device, which can be used as a first device for executing the video image processing method according to any one of the first aspects of the present application, and the device includes: an acquisition module, a first determination module, Compression module and sending module;

Wherein, the obtaining module is used to obtain the first video image; the first determining module is used to determine the target area in the first video image; the target area includes the image of the object that meets the preset condition stored in the first database of the first device; The compression module is used to compress the area except the target area in the first video image to obtain the second compressed data; the sending module is used to send the second compressed data to the second device; the second database of the second device has been stored An image of an object that meets the preset conditions.

In an embodiment of the third aspect of the present application, the compression module is further configured to compress the image of the object stored in the first database to obtain the first compressed data; the sending module is further configured to send the first compressed data to the second device; The first compressed data is used by the second device to determine the second database.

In an embodiment of the third aspect of the present application, the sending module is specifically configured to send the second compressed data and the marking information of the target area to the second device; wherein the marking information includes: position information of the target area in the first video image, At least one of the identification information or transformation information of the image of the object included in the target area in the first database; the transformation information is used to indicate the difference between the image of the object in the target area and the first video image in the first database. the difference.

In an embodiment of the third aspect of the present application, the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M<N.

In an embodiment of the third aspect of the present application, the device further includes: a second determining module and a storage management module;

Wherein, the second determining module is used to identify the target object in the first video image that meets the preset conditions; the storage management module is used to add the image corresponding to the new target object in the target object into the first database, and the new target object is not in the first database. For objects stored in a database, the first database is stored in the storage module.

In an embodiment of the third aspect of the present application, the compression module is further configured to compress the image corresponding to the new target object to obtain third compressed data; the sending module is also configured to send the third compressed data to the second device.

In an embodiment of the third aspect of the present application, after the storage management module adds the image corresponding to the new target object in the target object to the first database, the compression module is also used to compress the image of the object stored in the first database To obtain the fourth compressed data; the sending module is also used to send the fourth compressed data to the second device.

In an embodiment of the third aspect of the present application, the storage management module is further configured to delete the image of the object that does not meet the preset condition stored in the first database.

In an embodiment of the third aspect of the present application, the storage management module is further configured to: when the sharpness of the first object in the target area in the first video image is better than that of the first object stored in the first database For the definition of the image, the image of the first object stored in the first database is replaced with the image of the first object in the first video image.

In an embodiment of the third aspect of the present application, the first video image is a video image with a frame number greater than a preset frame number in a video file that is being compressed and transmitted by the first device in real time; The second video image, the frame number of the second video image in the video to be processed is less than the preset frame number; the second determining module is also used to identify objects in the second video image that meet the preset conditions; the storage management module also uses Therefore, the objects in the second video image that meet the preset conditions are stored in the first database.

In an embodiment of the third aspect of the present application, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.

In an embodiment of the third aspect of the present application, the device further includes: a third determining module; wherein, the third determining module is configured to identify objects that meet preset conditions in all video images in the video file; and the storage management module is configured to The image of the object that meets the preset condition is stored in the first database.

In an embodiment of the third aspect of the present application, the image of the object stored in the first database includes: the boundary pixel position of the object and the frame number of the video image including the object in the video file.

The fourth aspect of the present application provides a video image processing device, which can be used as a second device to execute any video image processing method as in the second aspect of the present application. The device includes: a receiving module, a decompression module, an acquisition module, and Determining module; wherein the receiving module is used to receive the second compressed data sent by the first device; wherein the second compressed data is obtained by compressing the area other than the target area in the first video image; the decompressing module, Used to decompress the second compressed data to obtain a third video image. The third video image includes an image corresponding to an area other than the target area in the first video image; the acquisition module is used to obtain a third video image from the second device. Second, the image corresponding to the target area is acquired from the database; the determining module is used to determine the first video image according to the third video image and the image corresponding to the target area.

In an embodiment of the fourth aspect of the present application, the device further includes: a storage management module; wherein, the receiving module is further configured to receive the first compressed data sent by the first device; the decompression module is further configured to: Decompression is performed to obtain an image set corresponding to the object that meets the preset conditions, the image set includes the image corresponding to the target area; the storage management module is used to store the image set in the second database.

In an embodiment of the fourth aspect of the present application, the receiving module is further configured to receive the marking information of the target area sent by the first device; wherein the marking information includes: position information of the target area in the first video image, and the target area includes At least one of the identification information or transformation information of the object in the first database of the first device; the transformation information is used to indicate the difference between the image of the object in the target area in the first database and the first video image.

In an embodiment of the fourth aspect of the present application, the determining module is specifically configured to stitch the image corresponding to the target area and the third video image to obtain the first video image according to the marking information of the target area.

In an embodiment of the fourth aspect of the present application, the receiving module is further configured to receive the third compressed data sent by the first device; the decompression module is also configured to decompress the third compressed data to obtain the new target object Image; The storage management module is also used to add the image of the new target object to the second database.

In an embodiment of the fourth aspect of the present application, the receiving module is further configured to receive the fourth compressed data sent by the first device; the decompression module is also configured to decompress the fourth compressed data to obtain the updated conforming data Set the image set corresponding to the conditional object; the storage management module is also used to update the second database based on the updated image set corresponding to the object that meets the preset condition.

In an embodiment of the fourth aspect of the present application, the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M<N; or, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.

A fifth aspect of the present application provides a video image processing device, including: a processor and a transmission interface; the device communicates with other devices through the transmission interface; the processor is configured to read software stored in a memory Instructions to implement the method described in any one of the first aspect of the present application.

A sixth aspect of the present application provides a video image processing device, including: a processor and a transmission interface; the device communicates with other devices through the transmission interface; the processor is configured to read software stored in a memory Instructions to implement the method described in any one of the second aspect of the present application.

A seventh aspect of the present application provides a computer-readable storage medium that stores instructions in the computer-readable storage medium. When the instructions are executed by a computer or a processor, the computer or the processor realizes The method of any one of the aspects.

The eighth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium. When the instructions are executed by a computer or a processor, the computer or the processor can realize The method of any one of the two aspects.

The ninth aspect of the present application provides a computer program product, characterized in that the computer program product contains instructions, when the instructions run on a computer or a processor, the computer or the processor is Apply for the method described in any one of the first aspect.

The tenth aspect of the present application provides a computer program product, characterized in that the computer program product contains instructions, when the instructions run on a computer or a processor, the computer or the processor is Apply for the method described in any one of the second aspect.

Description of the drawings

Figure 1 is a schematic diagram of the application scenario of this application;

Figure 2 is a schematic diagram of a video compression technology;

Figure 3 is a schematic diagram of a video image divided into image blocks;

4 is a schematic flowchart of an embodiment of a video image processing method provided by this application;

Figure 5 is a schematic diagram of the database setting method provided by this application;

FIG. 6 is a schematic diagram of an object in a video image provided by this application;

FIG. 7 is a schematic flowchart of another embodiment of a video image processing method provided by this application;

FIG. 8 is an exemplary flowchart of an embodiment of a video image processing method provided by this application;

FIG. 9 is an exemplary flowchart of an embodiment of a video image processing method provided by this application;

FIG. 10 is an exemplary flowchart of an embodiment of a video image processing method provided by this application;

FIG. 11 is an exemplary flowchart of an embodiment of a video image processing method provided by this application;

FIG. 12 is an exemplary flowchart of an embodiment of a video image processing method provided by this application;

FIG. 13 is a schematic structural diagram of an embodiment of a video image processing device provided by this application;

FIG. 14 is a schematic structural diagram of an embodiment of a video image processing device provided by this application;

15 is a schematic structural diagram of an embodiment of a video image processing device provided by this application;

FIG. 16 is a schematic structural diagram of an embodiment of a video image processing device provided by this application;

FIG. 17 is a schematic structural diagram of an embodiment of a video image processing device provided by this application;

FIG. 18 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.

Detailed ways

Before introducing the embodiments of the present application, the following describes the application scenarios and existing problems of the present application with reference to the accompanying drawings.

Figure 1 is a schematic diagram of the application scenario of this application. This application is applied to the scenario of video file transmission between different devices. The device described in Figure 1 can be a mobile phone, a tablet computer, a notebook computer, a desktop computer, or a server. For devices with video file processing functions, the embodiments of the present application can be executed by the device as described in Figure 1, or by the processor of the device as described in Figure 1 (for example: central processing unit, CPU ) Or a graphics processing unit (GPU), etc. In the embodiment of the present application, the execution of the device in Fig. 1 is taken as an example. As shown in Fig. 1, the first device 10 and the second device 20 are With a communication connection relationship, the first device 10 can send the video file 30 to the second device 20 through the communication connection relationship, and the transmitted video file 30 can be divided into real-time video files and non-real-time video files according to the requirement of timeliness. Real-time video file. For example, when applied to a non-real-time transmission scene, the video file can be a file of a movie, and the first device 10 can send the file at time T1 before sending the file of this movie to the second device 20 After the overall compression and other processing, the data packet 40 is obtained, and the compressed data packet 40 is sent to the second device 20; after the second device 20 receives the data packet 40 at time T2 and decompresses it, the entire movie can be obtained File, it is understandable that because the first device needs to compress all video images in the video file, the time interval between T1-T2 is relatively large; when applied to real-time transmission scenarios, the video file can be a monitoring screen, For time-sensitive files such as TV screens, the first device cannot obtain each frame of the video image in the complete video file, but needs to transmit the latest currently obtained video image. The first device 10 can transmit the video image at time T1. The data packet 40 is obtained by compression and sent to the second device 20. The second device 20 receives and decompresses the data packet 40 at time T2. Since the first device only needs to compress one frame of video image, between time T1 and time T2 The time interval is small, and then the first device 10 and the second device 20 continue to repeat this process, so that the first device 10 can send the latest video images in video files such as monitoring screens, TV screens, etc., to the first device in real time.二装置20。 Two devices 20.

Since the video file is composed of continuous video images, the number of video images included in the video file is increasing, and each video image itself has a higher resolution, which greatly improves the overall data of the video file quantity. Therefore, when video files are transmitted between the devices as shown in FIG. 1 using limited communication resources, the video files can be compressed with higher quality, and larger video files can be compressed into smaller compressed files for transmission and storage. In the process of compressing video files, it is not only necessary to reduce the amount of data of the transmitted video files, but also to ensure that the opposite end can completely restore the video files based on the compressed files.

In order to realize the compression of video files, the video compression protocols such as H.264, H.265 and H.266 proposed in some technologies can all be compressed and encoded, and the video files can be compressed by re-encoding the video files. Processing, for example, FIG. 2 is a schematic diagram of a video compression technology, where the execution subject may be the first device 10 as shown in FIG. Different image packages. For example, in Figure 2, each 64 video images in the video file are taken as an image package to obtain image packages 1-64, image packages 65-128, image packages 129-256... Then, for each image package, select some of the frames as key frames. For example, for the image package 1-64, you can use the first and 64th frames as key frames to perform overall compression encoding. For the second frame -Each video image in the 63rd frame can be used as a non-key frame. Before these video images are compressed and encoded, the video image is divided into different image blocks according to the distribution of objects in the video image and the degree of boundary density. The image blocks in the key frame are compared. For example, in Figure 2, the first device 10 will first compress and encode the key frames in the video file. During the subsequent processing of the non-key frames, if an image block in the currently compressed non-key frames includes object A, and When the similarity of the image blocks including the object A in the compressed and encoded key frame is relatively high, it can be considered that the two image blocks in the two frames of video images are similar. Therefore, when compressing the current non-key frames, the image blocks in the key frames that have been compressed and encoded can be used to represent the similar image blocks in the current non-key frames, and only the current compressed non-key frames can pass the key The area outside the image block represented by the frame image block can be compressed and encoded. Similarly, for the video image in the image pack 65-128, the image block including the object B and the video image in the image pack 129-256 can be compared. Compare the image block including the object C, thereby reducing the amount of calculation when compressing the video file and improving the compression efficiency.

More specifically, when compressing and encoding the aforementioned non-key frames in a video file through a video compression protocol, the video image needs to be divided into different image blocks. Figure 3 is a schematic diagram of a video image divided into image blocks, as shown in Figure 3. The image on the left in the center has a large number of objects distributed at the four corners, especially the upper left corner, so the boundary information of the objects to be processed is more. The four corners with more objects have a larger number of image blocks, and the area with fewer objects in the middle has a smaller number of image blocks, so that the subsequent comparison between the non-key frames in Figure 3 and the key frames that have been compressed and encoded , It is possible to divide the more distributed objects in the non-key frame into smaller image blocks and compare them with the key frames. Since the smaller the image block, the more accurate the boundary information included, and the more accurate the corresponding position of the image block The image is compared with the key frame, so that a higher accuracy can be achieved when comparing with the image block in the key frame.

However, in the above process of dividing the video image into different image blocks, it is necessary to set up denser image blocks for identification and comparison of areas with denser object distribution and more boundaries in the video image, so that in the video file When each frame of image is compressed, a larger number of image blocks need to be processed, which reduces the efficiency of each frame of video image during compression, thereby reducing the efficiency of compressing video files. At the same time, in the above technology, because the video file is divided into different image packages, the image block comparison is only performed on the video images in the image package, and there is a lack of overall recognition and comparison. If an object is in each of the entire video files If a frame of video image exists, the object will still be repeatedly identified and compared in each image package, which also reduces the efficiency of video file compression.

Therefore, this application provides a video image processing method and device, which are used in the video file compression process to separately extract, compare, and compress objects included in the video image in the video file that meet certain preset conditions, so that the When a video file is compressed, after compressing only part of the object once, when each frame of the video image is compressed, only the area except this part of the object needs to be processed, thereby improving the compression of each frame of video image. Efficiency, thereby improving the efficiency of compressing video files.

The technical solution of the present application will be described in detail below with specific embodiments. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.

FIG. 4 is a schematic flowchart of an embodiment of a video image processing method provided by this application. The method shown in FIG. 4 can be applied to the real-time transmission scene or the non-real-time transmission scene in the scene shown in FIG. The device and the second device execute. Specifically, the video image processing method provided in this embodiment includes:

S101: The first device acquires a first video image.

First, as the first device for sending a video file, a first video image to be sent is acquired in S101. The first video image may be a frame of video image in a non-real-time video file sent by the first device to the second device, or the first video image may also be a real-time video image sent by the first device to the second device. Video image.

S102: The first device determines a target area in the first video image; the target area includes an image of an object that meets a preset condition stored in a first database of the first device;

Subsequently, the first device recognizes a target area in the first video image, where the target area is an image including objects stored in the first database, and the objects stored in the first database all satisfy a preset condition. For example, FIG. 5 is a schematic diagram of a database setting method provided by this application, in which a first database is provided in the first device, and a second database is provided in the second device. The first database in the first device may be used to store images of objects meeting preset conditions.

Optionally, when applied in a non-real-time transmission scenario, the preset condition may be that among all the video images of the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number; In a real-time transmission scenario, the preset condition may be that among the N video images before the first video image, the number of video images including the object is greater than or equal to M, M and N are both positive integers, and N>1, M ＜N.

Fig. 6 is a schematic diagram of an object in a video image provided by this application. Taking the first video image shown on the left side of Fig. 6 as an example, the first video image includes at least electric vehicles a, Cyclist b, passer-by c, vehicle d, and passer-by e, these objects can be objects that may move in the video image, except for the above objects, other parts are static objects, which can be regarded as the background of the video image. Assuming that the first database of the first device stores images of two objects, electric vehicle a and cyclist b, the first device can determine in this step that the target area in the first video image is electric vehicle a in the figure. The area where the images of the two objects and the cyclist b are located.

S103: Compress an area other than the target area in the first video image to obtain second compressed data.

After determining the target area in the first video image in S102, the first device compresses only the area in the first video image excluding the target area, and the obtained compressed data is recorded as the second compressed data. Wherein, the method for compressing the first video image is not limited, and it may be performed by compression coding.

S104: The first device sends the second compressed data to the second device.

Specifically, the first device sends the second compressed data obtained in S103 to the second device, and for the second device, receives the second compressed data sent by the first device.

Optionally, in order for the second device to determine that the second compressed data is obtained by the first device after compressing the area outside the target area, the first device may also send the first video image to the second device in S104 The mark information of the target area in the middle, so that the second device can determine the target area in the first video image according to the mark information of the target area.

S105: The second device decompresses the second compressed data to obtain a third video image.

Specifically, after receiving the second compressed data sent by the first device, the second device decompresses the second compressed data, and the obtained video image is recorded as the third video image. The third video image is an image corresponding to an area other than the target area sent by the first device.

S106: The second device obtains an image corresponding to the target area from the second database.

Specifically, in the setting method shown in FIG. 5, a second database is set in the second device, and the second database can be used to store images of objects that meet preset conditions, and the second database can store the same objects in the first database. Image. The image of the object stored in the second database may be preset, may be stored in advance, or may be sent by the first device to the second device in real time according to the first database and stored in the second database.

Optionally, the second device may obtain the image corresponding to the target area from the second database according to the marking information of the target area in the second compressed image.

S107: The second device determines the first video image according to the third video image determined in S105 and the image of the target area acquired in S106, and finally completes the process of sending the first video image from the first device to the second device.

Exemplarily, when the target area in the first video image is the image of the two objects of electric vehicle a and cyclist b in the figure shown in FIG. 6, in S105, the third video image obtained by decompression by the second device Figure 6 occupies the area that does not include the two objects of the electric vehicle a and the cyclist b. The image of the target area obtained in S106 is the image of the two objects of the electric vehicle a and the cyclist b, then in S107 In the third video image, the second device can add the images of the two objects of the electric vehicle a and the cyclist b to the corresponding positions in the third video image to obtain a complete first video image.

It should be noted that, in the embodiment shown in FIG. 4, when applied in a real-time transmission scenario, the first video image may be a real-time video image acquired by the first device. When applied in a non-real-time transmission scenario, the first video image can be understood as any one of the video files. S101-S103 in Figure 4 show the processing methods for any one of the video files, and In S104, the first device will process each frame of video image in the video file through S101-S103, and then send the second compressed data of all the video images to the second device. The second device uses S105-S107 to pair Each frame of video image in the video file is processed, and finally all the video images in the video file are obtained.

To sum up, in the video image processing method provided in this embodiment, before sending the first video image to the father’s device, the first device first recognizes the target area of the object in the first database included in the first video image, and compares the first video image. The area other than the target area in a video image is compressed to obtain the second compressed data and sent to the second device. For the image of the object that meets the preset conditions stored in the first database of the first device, the second device’s It has been stored in the second database, so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data and combine it with the image of the target area stored in the second compressed data to finally obtain The first image. Therefore, with the video image processing method provided by this embodiment, when the first device sends the first video image to the second device, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image, reducing This reduces the amount of compressed data; for the second device, there is no need to repeatedly decompress the target area, reducing the amount of decompressed data. Thereby, the data volume of the compressed packet transmitted between the first device and the second device is reduced, thereby improving the efficiency of video image processing.

Optionally, in the embodiment shown in FIGS. 4-6, the image of the object stored in the second database in the second device may be sent by the first device to the second device for storage. Specifically, FIG. 7 is A schematic flowchart of another embodiment of the video image processing method provided by the application. The method shown in FIG. 7 can be applied to the real-time transmission scene or the non-real-time transmission scene in the scene shown in FIG. The two devices execute it, and the method shown in FIG. 7 is executed before the method S101 shown in FIG. 4. Specifically, the video image processing method provided in this embodiment includes:

S201: The first device compresses the image of the object stored in the first database to obtain first compressed data.

Specifically, the first device may compress the entire first database after determining the image of the object that meets the preset condition and storing it in the first database, and the obtained compressed data is recorded as the first compressed data. For example, assuming that the images of the two objects of electric vehicle a and cyclist b as shown in FIG. 6 are stored in the first database, in S201, the first device compares the images in the first database

S202: The first device sends the first compressed data to the second device.

Specifically, the first device sends the first compressed data obtained in S201 to the second device, and for the second device, receives the first compressed data sent by the first device.

S203: The second device decompresses the first compressed data to obtain a corresponding image set and store it in the second database.

Specifically, after receiving the first compressed data sent by the first device, the second device decompresses the first compressed data, and the obtained image including multiple objects is recorded as an image set, and the image set is stored in the second In the database, it will be used when the embodiment shown in FIG. 4 is subsequently executed.

In summary, this embodiment is based on the fact that the first device can send the image of the object in the first database to the second device, so that the first device can only compress the image of the object included in the first database once to obtain the first The compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine the second database, so that subsequent video images processed by the first device only need to compress the area outside the target area, thereby reducing The size and number of times the video image is compressed by the first device is reduced, and the size and number of times the video image is decompressed by the first device is also reduced, and the processing efficiency of the video image is further improved.

In the following, in conjunction with specific embodiments, specific implementation manners of applying the above-mentioned video image processing method of the present application to real-time transmission scenarios and non-real-time transmission scenarios are respectively described. The following specific embodiments can be performed by the first device and the second device independently of the embodiments shown in Figs. 4-7; Executed on the basis of the executed embodiment.

1. Non-real-time transmission scenarios.

When the video image processing method provided by the embodiments of the present application is applied to the scene shown in FIG. 1 to transmit non-real-time video files, the first device 10 may send a complete video file as a whole to the second device 20 The first device 10 may first compress the video file, and after obtaining a compressed package with a smaller amount of data, send the compressed package with a smaller amount of data to the second device 20 to save communication resources. Among them, FIG. 8 is a schematic flow chart of an exemplary embodiment of the video image processing method provided by this application, and shows the processing flow when the entire video file is compressed.

In the embodiment shown in FIG. 8, the first device 10 as the execution subject first obtains the to-be-processed video file 101, where the to-be-processed video file 101 specifically includes N continuous video images, and N is greater than 1. According to the sequence of the N video images in the video file 101, mark the video images in the video file 101 as 1, 2...N. The label can also be called the frame number. In the video file The number of video images included can also be referred to as the number of frames. Optionally, the to-be-processed video file 101 may be specified by the user of the first device 10, or captured by the first device 10, or acquired by the first device 10 through the Internet. When the first device 10 acquires The video file 101 to be processed can then be compressed through the embodiment shown in FIG. 4, or, before the first device 10 determines that the video file 101 is to be sent to the second device 20, through the implementation shown in FIG. Examples are compressed.

After acquiring the video file 101, the first device 10 first uses the first machine learning model 102 to identify the objects included in all N video images in the video file 101, and determines that the N video images included in the N video images conform to the preset At least one object of the condition. Wherein, the object includes objects other than the background in the video image. Taking the video image as shown in FIG. 6 as an example, in this embodiment, the first machine learning model 102 set in the first device 10 is opposite to the left After the side video image is recognized, the recognition result on the right side of Fig. 6 can be obtained, and the object ae in the video image can be recognized. After the first machine learning model 102 recognizes all the N video images in the entire video file, it can identify the objects included in each of the N video images. Subsequently, the first device 10 can screen all objects in all N video images together according to the recognition result of the first machine learning 102 model, and store the object images that meet the preset conditions among the N video images in the first device 10 set in the first database 103. Optionally, there may be multiple video images in N video images that all include images of objects that meet preset conditions, and when multiple video images include the same object and have the same resolution, any one of the video images can be The image of the object in the image is stored in the first database. When multiple video images include the same object and have different resolutions, the higher the resolution, the higher the definition, so the highest resolution of the multiple video images can be The image of the object is stored in the first database.

Optionally, the first machine learning model 102 provided in this embodiment may be a convolutional neural network (convolutional neural networks, CNN) neural network model, for example: AlexNet, ResNet, or Inception v3 can be applied to images Object recognition model.

Optionally, the preset condition described in this embodiment may be that the number of times the object appears in the N video images of the entire video file is greater than the preset number of times. For example, the preset number of times may be 10 (N>10), then When the first machine learning model 102 recognizes that among the N video images of the video file, there are more than 10 video images including the target passerby c, that is, the passerby c appears more than 10 times in the N video images of the entire video file, Then the image of passer-by c can be stored in the first database 103. By the same principle, the first device 10 can store all the images of the objects that meet the preset conditions included in the N video images of the video file in the first database 103 according to the recognition result of the first machine learning model.

Optionally, in an implementation manner, the images of different objects stored in the first database 103 may be stored in the first database 103. The boundaries of these images are the boundaries of the objects, and there is no other information such as the background other than the objects. For example, for The object passer-by c in FIG. 5 is stored in the first database 103 as the image divided by the border of passer-by c in the video image on the left side of FIG. 5, and does not include other objects or backgrounds except passer-by c. Or, in another implementation manner, since the first device 10 compresses the entire video file, the first database 103 may only store the frame number of a certain video image and the target number of the object in the video file where the image of the object is located. The position information of the boundary pixels, the image of the object can be obtained by the frame number and the position of the boundary pixels later. For example, after identifying an object in a video file that meets preset conditions, it is determined that the image of the object is included in the upper left corner of the 10th frame of the video image in the video file, and the first database 103 may store, for example, "10, (a, b , C...)" format data, used to represent the 10th frame of the object image in the video file, and the pixel position where the boundary of the 10th frame of the video image is (a, b, c...).

Subsequently, the first device 10 stores all the objects that meet the preset conditions in the N video images of the video file in the first database 103, and then further processes the N video images of the video file in turn through the second machine learning model 105 , Where the video image being processed by the second machine learning model is recorded as the first video image, and the second machine learning model 105 can compare the image of the object already stored in the first database 103 with the first video image to determine At least one object included in the first video image being processed and the area where the image of each object is located, the area where the at least one object is located is recorded as the target area. Illustratively, assuming that in the example shown in FIG. 4, the first database 103 stores images of a total of 26 objects labeled AZ, then after the second machine learning model obtains the first video image in the video file, The first video image is compared with the image in the first database, and it is determined that the first video image currently processed includes the objects labeled A and B in the first database, then the first video image is labeled A The area where the object of B and B is located is marked as the A target area and the B target area.

Optionally, the second machine learning model 105 and the first machine learning model 102 described in the embodiments of the present application may be the same machine learning model, for example, may also be a CNN-type neural network model; or may also be different The difference between the second machine learning model 105 and the first machine learning model 102 is that the second machine learning model 105 has the image of the identified object in the first database as the prior information. Therefore, the second machine learning model 105 The recognition performed by the machine learning model 105 can be understood as the comparison of images. The recognition performed by the first machine learning model 102 is to extract objects from a new video image. Since the second machine learning model 105 requires less calculation, it can It is set to a more lightweight model, so that after two machine learning models are set in the first device 10 for identification and comparison, the overall processing efficiency can be improved.

It is understandable that, in the example shown in FIG. 4, the second machine learning model 105 will sequentially use each video image in the video file 101 as the above-mentioned first video image, and each of them will be compared with the objects in the first database 103. The comparison is performed to determine the target area in each video image. Then, the first device 10 compresses the area other than the target area in the N video images, and obtains the compressed data of the N video images in the video file 101, which is recorded This is the second compressed data 106. It is understandable that although the second compressed data 106 includes compressed data of N video images, for each video image that includes the target area in the N video images, the included target area is "cropped". Only the other parts except the target area are kept. As a result, the compressed video image in the second compressed data 106 is not complete, and each video image lacks a target area including the image of the object in the first database.

In particular, in this embodiment, because the generated second compressed data 106 lacks the target area in the partial video image, it is equivalent to "cropping" the image of the object in the first database included in the video image, and in order to identify The “cutting” specifically refers to which image of the object in the first database and the position of the object in the video image. When generating the second compressed data 106, the first device 10 will also determine at least one target area included in each image. The video image is marked, and these marks can be carried in the corresponding video image, so that when the video image is subsequently decompressed, information such as the location of the target area in the video image is determined.

Optionally, in this embodiment, the marked content at least includes at least one of the location information of the target area in the first video image, transformation information, and identification information of the object included in the target area in the first database. The transformation information is used to identify the difference between the image in the first database and the first video image of the object in the target area. For example, in the example shown in FIG. 4, the object stored in the first database 103 can be identified by the identification information letter AZ. Assuming that the object corresponding to the letter A is a pedestrian, the image of the pedestrian is stored in the first database. The resolution is 128*128. The first device 10 currently recognizes the target area in the upper left corner of the video image by the second machine learning model 105 including the object corresponding to the letter A stored in the first database 103, then the first device 10 is compressing the video image. When the identification information of the object included in the target area in the figure in the first database 103 is "A", the position information of the target area in the first video image includes the pixel position of the target area in the peripheral boundary of the video image. , If the resolution of the target area in the video image is 64*64, it is equivalent to reducing the image with the resolution of 128*128 stored in the first database 103, so the target area can be marked "reduced by one time" In addition, the transformation information may also include transformations such as rotation and stretching.

Correspondingly, since the first database 103 stores the images of the objects that meet the preset conditions among the N video images, that is to say, the "cropped" parts of the incomplete video images in the second compressed data 106 are all stored in The first database 103. Therefore, when the first device 10 compresses the video file, it can also compress the image of the object stored in the first database 103 to obtain the first compressed data 104.

In this embodiment, the step of obtaining the first compressed data 104 by the first device 10 can be performed at any time after the first database 103 is determined, and is independent of the step of obtaining the second compressed data 106 by the first device 10, and may not Limit the order.

Finally, in an implementation manner, the first compressed data 104 and the second compressed data 106 can be used as compressed files after the video file 101 is compressed. The first device 10 generates the first compressed data 104 and the second compressed data 106. , Send the first compressed data and the second compressed data to the second device 20; or, in another implementation manner, after the first device 10 obtains the first compressed data 104 and the second compressed data 106, the two The compressed data is combined into the final third compressed data 107, as a compressed file after the entire video file 101 is compressed. After the first device 10 generates the third compressed data 107, the third compressed data 107 can be sent to the second device 20. This embodiment does not limit the sending mode.

Optionally, the specific data compression processing methods of the first device 10 to obtain the first compressed data 104 and the second compressed data 105 in this embodiment can be the same or different, for example, both can use video compression such as H.264 or H.265. The protocol is compressed.

Subsequently, after the second device 20 receives the compressed data sent by the first device 10 in the manner shown in the embodiment shown in FIG. 8, it can perform decompression processing on the compressed data to obtain a complete video file. Specifically, FIG. 9 is an exemplary flow chart of an embodiment of a video image processing method provided by this application, and specifically shows a processing flow of decompressing a compressed package to obtain a video file.

In the embodiment shown in FIG. 9, the second device 20 as the execution subject first receives the third compressed data 107 sent by the first device 10, and obtains the first compressed data 104 and the second compressed data according to the third compressed data 107 106. Alternatively, the second device 20 may directly receive the first compressed data 104 and the second compressed data 106 sent by the first device 10.

Subsequently, the second device 20 can decompress the first compressed data 104 and the second compressed data 106 respectively, where the decompressed first compressed data 104 can obtain an object set including images of multiple objects, for example, as shown in FIG. 6 The image of the object labeled AZ is used as an image set, and the image of the object in the image set AZ can be stored in the second database 108 of the second device 20. That is, after the second device 20 decompresses the first compressed data, the first database in the first device 10 can be restored, and the images of the objects included in the first database can be stored in the second data 108.

The second device 20 decompresses the second compressed data 106 to obtain N video images. None of the N video images includes the target area of the object stored in the second database 108, and each image includes marking information for the target area. The marked information includes at least one of the position information of the target area, the rotation transformation information of the object, or the identification information of the object included in the target area in the first database.

Optionally, if the images of multiple objects in the first compressed data 104 are represented by the frame number and position information of a certain video image in the video file where the images are located, for example, the identified objects in the first database 103 Record through the 10th frame in the video file and the pixel position in the 10th frame of the video image. Then, after obtaining the first compressed data 104, the second device 20 obtains the image of the object from the pixel position of the tenth frame in the second compressed data 106.

Finally, the second device 20 combines N video images based on the tag information of the target area in each video image, determines the image of the object in the target area from the second database, and performs image splicing to restore the video image. Exemplarily, if in the first video image currently being processed by the second device 20, the mark of the target area includes: "A", boundary pixels, and "reduced by one time", then the object A can be obtained from the second database 108 After the corresponding image, the image is reduced by one time, and the image is placed at the position of the boundary pixel in the first video image to realize the splicing of the video image. Then, after the second device 20 completes the stitching of all N video images, a complete video file is finally obtained.

In summary, in the video image processing method provided in this embodiment, when the first device compresses the video file, it first stores the images of the objects that meet the preset conditions among all the video images of the video file into the database through the first machine learning model. Then, the second machine learning model is used to identify the target area in each video image that includes the object in the first database, and then the object in the first database and the area other than the target area in each video image are respectively compressed and encoded , And finally get the compressed file of the entire video file. In this process, since the first database is compressed and encoded uniformly, when the first video image in the video file is subsequently compressed and encoded, the first device does not need to compress the target area in the first video image. It is only necessary to compress and encode the area of the video image except the target area of the object existing in the database, and for the image of the object that meets the preset conditions stored in the first database of the first device, the second device’s The second database has also been stored, so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data and combine it with the image of the target area stored in the second compressed data, and finally Get the first image. Therefore, in this embodiment, when the first device transmits the video file to the second device in non-real-time, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image, based on the second database of the second device. The image of the target area is already stored in the target area. The first device only needs to compress the area outside the target area, which reduces the amount of compressed data. For the second device, there is no need to repeatedly decompress the target area, thereby reducing The data volume of the compressed packet transmitted between the first device and the second device improves the efficiency of video image processing. In particular, the first database in this embodiment stores only images of objects, and the provided machine learning model is also based on the object’s image itself to identify and compare objects, so there is no need to use the technology shown in Figure 3 The image is divided into image blocks of different sizes and a larger number, and these image blocks are processed one by one, thereby reducing the amount of calculation during video image processing, thereby improving the efficiency of the entire video file during compression and encoding. In addition, this application judges whether objects in all video files meet preset conditions based on the entire video file as a whole, prevents an object from being repeatedly identified and compared, and further improves the efficiency of video file compression and encoding.

2. Real-time transmission scenarios.

When the video image processing method provided by the embodiment of the present application is applied to the scene shown in FIG. 1 to transmit the first video image acquired in real time, the acquired first video image needs to be compressed and sent to the first video image as soon as possible. Two devices at 20 o'clock. Specifically, FIG. 10 is an exemplary flowchart of an embodiment of a video image processing method provided by this application, showing the processing of the first device 10 as shown in FIG. 1 when compressing a real-time first video image Process. Among them, since this embodiment is applied in a scene that needs to ensure the real-time nature of video files, such as monitoring video backhaul, the video file acquired by the first device 10 is currently generated in real time and needs to be sent to the second device 20 immediately. Therefore, after receiving a frame of video image, the first device 10 needs to compress and encode the video image in time and send it to the second device 20 in time, and continuously receive new video images and repeat this process.

In the embodiment shown in FIG. 10, the first device 10 as the execution subject first obtains the first video image 201 that needs to be transmitted to the second device 20 in real time, where the first video image 201 may be a continuous video file. One of the video images in the first video file 201 can be specified for transmission by the user, or taken by the first device 10, or acquired by the first device 10 through the Internet, and needs to be sent to the second device 20 in real time. Video.

After acquiring the video image 201, the first device 10 first compares the video image 201 with the image of the object already stored in the first database 204 through the second machine learning model 207, and determines that the video image 201 includes the first database 204 The area where at least one object is stored in is recorded as the target area. For example, in the example shown in FIG. 7, the second machine learning model 207 determines the target area including the object A in the video image 201 according to the image of the object A stored in the database 204. Subsequently, the first device 10 "cuts" the target area in the video image 201, compresses the video image excluding the target area in the video image 201 to obtain the second compressed data 208, and sends the second compressed data 208 To the second device on the receiving end. In addition, in order to facilitate the marking of the target area of the video image, the second compressed data 208 may also include marking information of the target area. For example, the marking information includes: position information, transformation information, or target area of the target area in the first video image. At least one item such as identification information of the objects included in the area in the database.

Since the second machine learning model 207 has compressed and encoded the area outside the target area in the first video image 201 and sent it to the second device at the receiving end, for the object A included in the target area in the first video image 201, it should be Prior to this, the first device 10 performs compression encoding from the first database 204 to generate the first compressed data 205, and then sends it to the second device, so that the second device can combine the first compressed data after receiving the second compressed data 208 The video image 201 is obtained by splicing.

In a specific implementation manner, images of multiple objects may be pre-stored in the first database 204, so that the first device 10 can perform comparisons through the database 204 after obtaining the first video image. In another specific In the implementation manner, when the first device 10 transmits a video file including N video images to the second device 20 in real time, the first M video images among the N video images transmitted by the first device 10 are recorded as the second video In the case of an image (M<N), the first device 10 does not directly recognize the second machine learning model 207, but first recognizes the object in the second video image that meets the preset conditions. The image of is stored in the first database 204, and subsequently, when the video images after the M of the N video images are transmitted, based on the image of the object already stored in the first database 204, the first device 10 passes through the second machine The learning model 207 is compared with the first database 204. For these M video images, this application does not limit the method used by the first device 10 to compress and encode. For example, the video image can be roughly divided into several regions, and then compressed with different parameters according to the characteristics of the several regions. For coding, for example, a larger residual value and fewer high-frequency components can be used for areas that include background, and a smaller residual value can be used for areas that may include objects, etc., and more high-frequency components can be taken. ; Or, you can also use the existing H.264, H.265, or H.266 video compression protocol for compression encoding.

Further, since the video image processed in this embodiment is real-time, the first device 10 cannot directly determine the objects that meet the preset conditions included in the video file where the video image is located when transmitting a single video image, where: The preset condition may include: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M<N. That is to say, the objects in the first database 204 can only be determined to be added to the first database 204 after the N video images before the first video image are combined, so that the first database 204 can be added to the first database 204 at every moment. The objects stored in may not cover all objects in the current video file that meet the preset conditions. For example, in the example shown in FIG. 10, the moment when the first device 10 processes the video image 201 is the first moment. The video image 201 includes object A and object B, but because only the image of object A is stored in the first database 204, the second machine learning model 207 can only identify the target area that includes object A in the video image, even if the video The object B in the image 201 includes the first video image 201 at the first moment. The N video images of the first video image 201 have already met the preset condition. +1 image to 1 image before the first video image, the object B in these N video images does not meet the preset conditions), the second machine learning model will not recognize the area where the object B is as the target area, After the first moment, the object B can be added to the first database 204 to reduce the size of the area that needs to be compressed in each subsequent frame of video image. Therefore, in the process of processing the video image, the first device 10 will also recognize the objects included in the processed video image 201 through the first machine learning model 202, and determine objects other than the background, as shown in FIG. 10 Object A and Object B shown in. Optionally, as shown in FIG. 10, the sequence of the two processing steps of the first device 10 processing the video image 201 through the second machine learning model 207 and processing the video image 201 through the first machine learning model 202 is not limited, or two Steps can also be executed at the same time. Subsequently, the management module 203 in the first device 10 manages the identified objects. The management at least includes: adding, deleting, and replacing the image of the object stored in the first database 204, which will be described below with examples.

1. Adding objects. According to the number of objects included in the previous N video images (including the current video image 201) of the currently processed first video image 201, the management module 203 classifies objects that appear more than M times (M<N) and are currently processed. Images of objects that are not stored in the database are added to the database 204. For example, for the first machine learning model 202 in FIG. 10 to identify the object A and the object B in the first video image 201, and determine that the object that is not stored in the first database 204 is the object B, then the management module 203 according to the previous video image 201 Object B is included in 5 of the 10 video images, that is, the object B has appeared 5 times in total, and the recognized image of the object B needs to be added to the first database 204.

Optionally, the management module 203 can cache all the images of the object B in the previous first video image. When it is determined subsequently that the image of the object B is to be added to the database 204, the image of the object B with the highest resolution can be added from the cache. The image is stored in the first database 204 to improve the clarity of the image of the object B processed in the subsequent compression process.

Further, when the management module 203 adds the new object B to the first database 204, the first device 10 can immediately compress and encode the image of the newly added object B, and the obtained compressed data is recorded as the third compressed data, and The third compressed data is sent to the second device, so that after the second device decompresses the third compressed data, the image of the object B is stored in the second database of the second device. Alternatively, when the management module 203 adds the new object B to the first database 204, the first device 10 may also compress and encode the entire first database 204 after the image of the newly added object B, and the obtained compressed data is recorded as the first database 204. Fourth, compress the data, and send the fourth compressed data to the second device, so that the second device updates the second database of the second device after adding the compressed fourth compressed data.

Therefore, even in the example shown in FIG. 10, at this time, the second machine learning model 207 of the first device 10 cannot recognize the target area of the object B in the first video image 201 currently processed, but the first device 10 When performing processing such as recognition on subsequent video images, the second machine learning model 207 can use more objects in the first database 204 including object B for recognition. For example, FIG. 11 is an exemplary flow chart of an embodiment of a video image processing method provided by this application, and shows a processing flow of the first device 10 on the video image 301 following the first video image 201 as shown in FIG. 10 , Where the time when the first device 10 processes the first video image 201 is the first time, and the time when the first device 10 sends the third compressed data or the fourth compressed data to the second device 20 is the second time , Then at the first time and at the third time after the second time, when the first device 10 receives the video image 301, since the image of object B has been stored in the database 204, and the image of object B has passed the third compression The data or the fourth compressed data is sent to the second device 20. At this time, the first device 10 can use the second machine learning model 207 to determine the video image based on the images of the object A and the object B stored in the first database 204 301 includes the target areas of object A and object B. Subsequently, after the first device 10 "cuts" the target area in the video image 301, it compresses and encodes the video images in the video image 301 except the target area to obtain the second compressed data 208, and the second compressed data 208 Sent to the second device on the receiving end.

2. Deletion of the object. According to the number of X video images (including the current first video image 201) before the first video image 201 currently being processed, the management module 203 classifies the objects that appear less than Y times (Y<X) and are in the first video image 201. The image of the object stored in a database 204 is deleted from the first database 204. For example, for the first machine learning model 202 in FIG. 7 to recognize the object A and the object B in the first video image 201, it is assumed that the object A only appears in the 10 video images before the first video image 201 that is currently processed. Times, less than Y=2 times. At this time, the management module 203 can delete the image of the object A stored in the first database 204, so that the second machine learning model 207 can reduce the ratio in the database when processing subsequent video images. The number of images of the object to further improve efficiency.

3. Replacement of objects. The management module 203 specifically compares the image of the object identified in the first video image 201 currently being processed with the image of the same object stored in the first database 204, if the image resolution of the object A in the first video image 201 is 128*128 is greater than the resolution 64*64 of the image of object A in the first database 204, then the image of object A in the first video image 201 is stored in the first database 204, and the original first database 204 is stored The image of object A is deleted.

Subsequently, after the second device 20 receives the compressed data sent by the first device 10 in the manner shown in the embodiment shown in FIG. 10, the compressed data is decompressed to obtain the first video image. Specifically, FIG. 12 is an exemplary flow chart of an embodiment of a video image processing method provided by this application, in which it shows a processing flow in which the second device 20 decompresses the compressed data to obtain the first video image.

In the embodiment shown in FIG. 12, the second device 20 as the execution subject receives the first compressed data 205 and the second compressed data 208 sent by the first device 10. These two compressed data may be different from the second device 20. When the time is received, the second device 20 first receives the first compressed data 205, and then receives the second compressed data 208.

Wherein, after receiving the first compressed data 205, the second device 20 can decompress the first compressed data 205 to obtain images of multiple objects, for example, the objects labeled A, B... shown in FIG. 9 The image of is used as an image collection, and the image collection can be stored in the database 210 of the second device 20.

After the second device 20 receives the second compressed data 208, it can decompress the second compressed data 208 to obtain the third video image 211, the third video image 211 does not include the target of the object that meets the preset conditions in the second database 210 Area, and when the second compressed data 208 is received, the marking information of the target area included in the first video image sent by the first device 10 may also be received, and the marked information includes: position information of the target area in the first video image , Transform at least one item of information or identification information of the object included in the target area in the first database.

Finally, the second device 20 determines the image of the object in the target area from the second database 210 according to the mark information of the target area in the first video image and performs image splicing to restore the video image 201. Exemplarily, if in the first video image currently being processed by the second device 20, the mark of the target area includes: "A", boundary pixels, and "reduced by one time", then the image corresponding to object A can be obtained from the database After that, the image is reduced by one time, and the image is placed in the position of the boundary pixel in the decompressed third video image to realize the splicing of the current video image 201, and finally the first video image 201 is obtained.

In summary, the video image processing method provided in this embodiment is applied to the first device to recognize the objects stored in the first database through the second machine learning model when the first device compresses the first video image acquired in real time. Target area, and then compress and send the areas other than the target area in the video image. In this process, the object in the first video image can also be identified through the first machine learning model, and operations such as adding, deleting, and modifying the image of the object stored in the first database can be performed. In the process of compressing and encoding the first video image by the first device, since the image of the object in the first database is uniformly compressed, there is no need to compress the target area in the first video image, only the video The image includes the area outside the target area of the object stored in the database, and the area outside the target area of the object stored in the database can be compressed, and for the image of the object that meets the preset conditions stored in the first database of the first device, the second database of the second device is also stored in the second database. It has been stored so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data and combine the image of the target area stored in the second compressed data to finally obtain the first image. Therefore, in this embodiment, when the first device transmits the first video image to the second device in real time, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image. The image of the target area has been stored in the database. The first device only needs to compress the area outside the target area, which reduces the amount of compressed data. For the second device, there is no need to repeatedly decompress the target area, thereby reducing The data volume of the compressed packet transmitted between the first device and the second device is improved, and the efficiency of video image processing is improved. In particular, the first database in this embodiment stores only the image of the object, and the provided machine learning model is also based on the object’s image itself to identify and compare the object, so there is no need to use the image shown in Figure 3. In the technology, the image is divided into image blocks of different sizes and a large number, and multiple image blocks are processed one by one, thereby reducing the amount of calculation during video image processing, thereby improving the efficiency of the entire video file during compression and encoding . Moreover, this application can continue to update the objects stored in the database in real time while encoding and recognizing the continuous video images, so that the images of the objects stored in the database are up to date, ensuring that the database can be compared in subsequent comparisons. Use, further improve the efficiency of the video file when compressing and encoding.

In the foregoing embodiment, the parameter determination method provided by the embodiment of the present application is introduced, and in order to implement the functions in the parameter determination method provided by the foregoing embodiment of the present application, the network device and terminal device as the execution subject may include a hardware structure And/or software modules, in the form of a hardware structure, a software module, or a hardware structure plus a software module to realize the above-mentioned functions. Whether a certain function among the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.

FIG. 13 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device shown in FIG. 13 can be used as the first device 10 in the scene shown in FIG. The function performed by the first device, specifically, the device includes: an acquisition module 1301, a first determination module 1302, a compression module 1303, and a sending module 1304. Wherein, the acquiring module 1301 is used to acquire the first video image; the first determining module 1302 is used to determine the target area in the first video image; the target area includes the objects that meet the preset conditions stored in the first database of the first device Image; the compression module 1303 is used to compress the area other than the target area in the first video image to obtain the second compressed data; the sending module 1304 is used to send the second compressed data to the second device; the second device of the second device Images of objects that meet preset conditions have been stored in the database.

Optionally, the compression module 1303 is further configured to compress the image of the object stored in the first database to obtain first compressed data; the sending module 1304 is also configured to send the first compressed data to the second device; The data is used by the second device to determine the second database.

Optionally, the sending module 1304 is specifically configured to send the second compressed data and the marking information of the target area to the second device; where the marking information includes: position information of the target area in the first video image, and objects included in the target area At least one of the identification information or transformation information of the image in the first database; the transformation information is used to indicate the difference between the image in the first database and the first video image of the object in the target area.

Optionally, the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M<N .

FIG. 14 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device shown in FIG. 14 further includes a second determination module 1305 and a storage management module 1306 on the basis of FIG. 13. The device shown in FIG. 14 can be used to execute the video image processing method shown in FIG. 10, for example, the second determining module 1305 is used to identify target objects in the first video image that meet the preset conditions; the storage management module 1306 It is used to add the image corresponding to the new target object in the target object into the first database. The new target object is an object that is not stored in the first database, and the first database is stored in the storage module.

Optionally, the compression module 1303 is further configured to compress the image corresponding to the new target object to obtain third compressed data; the sending module 1304 is also configured to send the third compressed data to the second device.

Optionally, after the storage management module 1306 adds the image corresponding to the new target object in the target object to the first database, the compression module 1303 is further configured to compress the image of the object stored in the first database to obtain the fourth compression Data; the sending module 1304 is also used to send the fourth compressed data to the second device.

Optionally, the storage management module 1306 is further configured to delete the image of the object that does not meet the preset condition stored in the first database.

Optionally, the storage management module 1306 is also configured to: when the sharpness of the first object in the target area in the first video image is better than the sharpness of the first object stored in the first database, use The image of the first object in the first video image replaces the image of the first object stored in the first database.

Optionally, the first video image is a video image with a frame number greater than a preset frame number in the video file being compressed and transmitted by the first device in real time; the acquiring module 1301 is also used to acquire the second video image in the video to be processed, The frame number of the second video image in the to-be-processed video is less than the preset frame number; the second determining module 1305 is also used to identify objects in the second video image that meet the preset conditions; the storage management module 1306 is also used to Second, the objects in the video image that meet the preset conditions are stored in the first database.

Optionally, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.

FIG. 15 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device shown in FIG. 15 further includes a third determining module 1307 and a storage management module 1306 on the basis of that shown in FIG. 13. The device shown in FIG. 15 can be used to execute the video image processing method shown in FIG. 8. Illustratively, the third determining module 1307 is used to identify objects that meet preset conditions in all video images in a video file; storage management module 1306 is used to store the image of the object that meets the preset conditions in the first database.

Optionally, the image of the object stored in the first database includes: the boundary pixel position of the object and the frame number of the video image including the object in the video file.

FIG. 16 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device shown in FIG. 16 can be used as the second device 20 in the scene shown in FIG. The function performed by the second device, specifically, the device includes: a receiving module 1601, a decompression module 1602, an acquiring module 1603, and a determining module 1604. Exemplarily, the receiving module 1601 is used to receive the second compressed data sent by the first device; where the second compressed data is obtained by compressing the area in the first video image except the target area; the decompression module 1602 uses To decompress the second compressed data to obtain a third video image, the third video image includes an image corresponding to an area other than the target area in the first video image; the acquisition module 1603 is used to obtain a second video image from the second device The image corresponding to the target area is acquired in the database; the determining module 1604 is configured to determine the first video image according to the third video image and the image corresponding to the target area.

FIG. 17 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device shown in FIG. 17 further includes a storage management module 1605 on the basis of that shown in FIG. 16. Then, in this embodiment, the receiving module 1601 is also used to receive the first compressed data sent by the first device; the decompression module 1602 is also used to decompress the first compressed data to obtain the object corresponding to the preset conditions The image set includes the image corresponding to the target area; the storage management module 1605 is used to store the image set in the second database.

Optionally, the receiving module 1601 is further configured to receive the marking information of the target area sent by the first device; wherein the marking information includes: the position information of the target area in the first video image, and the object included in the target area is in the first device. At least one of the identification information or transformation information in the first database; the transformation information is used to indicate the difference between the image of the object in the target area in the first database and the first video image.

Optionally, the determining module 1604 is specifically configured to stitch the image corresponding to the target area and the third video image to obtain the first video image according to the marking information of the target area.

Optionally, the receiving module 1601 is further configured to receive the third compressed data sent by the first device; the decompression module 1602 is also configured to decompress the third compressed data to obtain the image of the new target object; the storage management module 1605 is also used to add the image of the new target object to the second database.

Optionally, the receiving module 1601 is further configured to receive the fourth compressed data sent by the first device; the decompression module 1602 is also configured to decompress the fourth compressed data to obtain the updated object corresponding to the preset conditions The storage management module 1605 is also used to update the second database based on the updated image set corresponding to the object that meets the preset conditions.

Optionally, the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M<N Or, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.

For the method executed by each module in each embodiment of the video image processing device provided in this application, please refer to the description in the video image processing method described in this application, and the implementation methods and principles are the same and will not be repeated.

It should be noted that it should be understood that the division of the various modules of the above device is only a division of logical functions, and may be fully or partially integrated into a physical entity during actual implementation, or may be physically separated. And these modules can all be implemented in the form of software called by processing elements; they can also be implemented in the form of hardware; some modules can be implemented in the form of calling software by processing elements, and some of the modules can be implemented in the form of hardware. For example, the determination module may be a separately established processing element, or it may be integrated into a certain chip of the above-mentioned device for implementation. In addition, it may also be stored in the memory of the above-mentioned device in the form of program code, which is determined by a certain processing element of the above-mentioned device. Call and execute the functions of the above-identified module. The implementation of other modules is similar. In addition, all or part of these modules can be integrated together or implemented independently. The processing element described here may be an integrated circuit with signal processing capability. In the implementation process, each step of the above method or each of the above modules can be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as one or more application specific integrated circuits (ASIC), or one or more microprocessors (digital signal processor, DSP), or, one or more field programmable gate arrays (FPGA), etc. For another example, when one of the above modules is implemented in the form of processing element scheduling program code, the processing element may be a general-purpose processor, such as a central processing unit (CPU) or other processors that can call program codes. For another example, these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC).

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server, or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The computer-readable storage medium may be a magnetic medium, (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, (digital video disc, DVD)), or a semiconductor medium (for example, a solid state disk (solid state disk, SSD)) )Wait.

FIG. 18 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device can be used as the first device or the second device described in any of the foregoing embodiments of this application, and execute the video executed by the corresponding device. Image processing method. As shown in FIG. 18, the communication device 1100 may include: a processor 111 (such as a CPU) and a transmission interface. The transmission interface may be a transceiver 113; wherein the transceiver 113 is coupled to the processor 111, and the processor 111 controls the transceiver. The transceiver 113's sending and receiving actions. Optionally, the communication device 1100 further includes a memory 112, and software instructions can be stored in the memory 112, and the processor 111 is configured to read the software instructions stored in the memory 112, so as to complete various processing functions and implement the present invention. The method steps executed by the first device or the second device in the application embodiment.

Optionally, the video image processing device involved in the embodiment of the present application may further include: a power supply 114, a system bus 115, and a communication interface 116. The transceiver 113 may be integrated in the transceiver of the video image processing device, or may be an independent transceiver antenna on the communication device. The system bus 115 is used to implement communication connections between components. The aforementioned communication interface 116 is used to implement connection and communication between the communication device and other peripherals.

In the embodiment of the present application, the above-mentioned processor 111 is configured to couple with the memory 112 to read and execute instructions in the memory 112 to implement the method steps executed by the first device or the second device in the above method embodiment. The transceiver 113 is coupled with the processor 111, and the processor 111 controls the transceiver 113 to send and receive messages. The implementation principles and technical effects are similar, and will not be repeated here.

The system bus mentioned in FIG. 18 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The system bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus. The communication interface is used to realize the communication between the database access device and other devices (such as client, read-write library and read-only library). The memory may be a non-volatile memory, such as a hard disk drive (HDD) or SSD, etc., or may be a volatile memory (volatile memory), such as a random-access memory (RAM). The memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited to this. The memory in the embodiments of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.

The processor mentioned in Figure 18 can be a general-purpose processor, including a CPU, a network processor (NP), etc.; it can also be a DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device , Discrete hardware components.

Optionally, an embodiment of the present application further provides a computer-readable storage medium, which stores instructions in the storage medium, and when the instructions are executed by a computer or a processor, the computer or the processor implements the foregoing implementations of the present application In the example, a video image processing method executed by the first device or the second device.

Optionally, an embodiment of the present application further provides a chip for executing instructions, where the chip is used to execute the video image processing method executed by the first device or the second device in any one of the foregoing embodiments of the present application.

The embodiments of the present application also provide a computer program product. The computer program product contains instructions. When the instructions run on a computer or a processor, the computer or the processor realizes one of the first devices in the foregoing embodiments of the present application. Or a video image processing method executed by the second device.

It can be understood that the various numerical numbers involved in the embodiments of the present application are only for easy distinction for description, and are not used to limit the scope of the embodiments of the present application.

It can be understood that, in the embodiments of the present invention, the size of the sequence numbers of the foregoing processes does not mean the order of execution. The execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention. Scope.

Claims

A video image processing method, applied to a first device, characterized in that it includes;

Obtain the first video image;

Determine a target area in the first video image; the target area includes an image of an object that meets a preset condition stored in the first database of the first device;

Compressing an area other than the target area in the first video image to obtain second compressed data;

The second compressed data is sent to the second device; the image of the object that meets the preset condition has been stored in the second database of the second device.
The method according to claim 1, wherein before the acquiring the first video image, the method further comprises:

Compressing the image of the object stored in the first database to obtain first compressed data;

The first compressed data is sent to the second device; the first compressed data is used by the second device to determine the second database.
The method according to claim 1 or 2, wherein the sending the second compressed data to a second device comprises:

Send the second compressed data and the marking information of the target area to the second device; wherein the marking information includes: the position information of the target area in the first video image, and the target area includes At least one item of identification information or transformation information of the image of the object in the first database; the transformation information is used to indicate the image of the object in the target area in the first database and the first video image difference between.
The method according to any one of claims 1-3, characterized in that,

The preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, and N>1, M ＜N.
The method according to claim 4, wherein after the obtaining the first video image, the method further comprises:

Identifying a target object in the first video image that meets the preset condition;

An image corresponding to a new target object in the target object is added to the first database, where the new target object is an object that is not stored in the first database.
The method according to claim 5, wherein after adding the image corresponding to the new target object in the target object into the first database, the method further comprises:

Compressing the image corresponding to the new target object to obtain third compressed data;

Sending the third compressed data to the second device.
The method according to claim 5, wherein after adding the image corresponding to the new target object in the target object into the first database, the method further comprises:

Compressing the image of the object stored in the first database to obtain fourth compressed data;

Sending the fourth compressed data to the second device.
The method according to claim 7, wherein after the identifying the target object in the first video image that meets the preset condition, the method further comprises:

Deleting the image of the object that does not meet the preset condition stored in the first database.
The method according to claim 7, characterized in that, after the obtaining the first video image, the method further comprises:

When the sharpness of the first object in the target area in the first video image is better than the sharpness of the image of the first object stored in the first database, use the first The image of the object in the first video image replaces the image of the first object stored in the first database.
The method according to any one of claims 4-9, wherein:

The first video image is a video image with a frame number greater than a preset frame number in a video file that is being compressed and transmitted in real time by the first device;

Before acquiring the first video image, the method further includes:

Acquiring a second video image in the video file, where the frame number of the second video image in the video file is less than the preset frame number;

Identify the objects in the second video image that meet the preset conditions and store them in the first database.
The method according to any one of claims 1-3, characterized in that,

The preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to a preset number.
The method according to claim 11, wherein before said acquiring the first video image, the method further comprises:

Identifying objects that meet preset conditions among all video images in the video file;

The image of the object that meets the preset condition is stored in the first database.
The method according to claim 11 or 12, wherein:

The image of the object stored in the first database includes: the boundary pixel position of the object and the frame number of the video image including the object in the video file.
A video image processing method applied to a second device, characterized in that it comprises:

Receiving second compressed data sent by the first device; wherein the second compressed data is obtained by compressing an area other than the target area in the first video image;

Decompress the second compressed data to obtain a third video image, where the third video image includes an image corresponding to an area other than the target area in the first video image;

Acquiring an image corresponding to the target area from a second database of the second device;

The first video image is determined according to the third video image and the image corresponding to the target area.
The method according to claim 14, further comprising:

Receiving the first compressed data sent by the first device;

Decompress the first compressed data to obtain an image set corresponding to an object that meets a preset condition, and store it in the second database; the image set includes images corresponding to the target area.
The method according to claim 14 or 15, further comprising:

Receiving the marking information of the target area sent by the first device; wherein the marking information includes: the position information of the target area in the first video image, and the object included in the target area is in the first video image; At least one of identification information or transformation information in the first database of the device; the transformation information is used to indicate the difference between the image of the object in the target area in the first database and the first video image .
The method according to claim 16, wherein the determining the first video image according to the third video image and the image corresponding to the target area comprises:

According to the marking information of the target area, the image corresponding to the target area and the third video image are spliced together to obtain the first video image.
The method according to any one of claims 14-17, wherein after the determining the first video image, the method further comprises:

Receiving the third compressed data sent by the first device;

Decompress the third compressed data to obtain an image of a new target object, and store it in the second database.
The method according to any one of claims 14-17, wherein after the determining the first video image, the method further comprises:

Receiving the fourth compressed data sent by the first device;

Decompress the fourth compressed data to obtain an updated image set corresponding to the object that meets the preset condition;

The second database is updated based on the updated image set corresponding to the object that meets the preset condition.
The method of claim 15, wherein:

The preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, and N>1, M ＜N;

Alternatively, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to a preset number.
A video image processing device, characterized in that it comprises:

An acquisition module for acquiring the first video image;

A first determining module, configured to determine a target area in the first video image; the target area includes an image of an object that meets a preset condition stored in a first database of the first device;

A compression module, configured to compress an area other than the target area in the first video image to obtain second compressed data;

The sending module is configured to send the second compressed data to the second device; the second database of the second device has stored the image of the object that meets the preset condition.
The device according to claim 21, wherein:

The compression module is further configured to compress the image of the object stored in the first database to obtain first compressed data;

The sending module is further configured to send the first compressed data to the second device; the first compressed data is used by the second device to determine the second database.
The device according to claim 21 or 22, wherein:

The sending module is specifically configured to send the second compressed data and the marking information of the target area to the second device; wherein the marking information includes: location information of the target area in the first video image , At least one of the identification information or transformation information of the image of the object included in the target area in the first database; the transformation information is used to indicate that the object in the target area is in the first database The difference between the image and the first video image.
The device according to any one of claims 21-23, wherein:

The preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, and N>1, M ＜N.
The device according to claim 24, further comprising:

A second determining module, configured to identify a target object in the first video image that meets the preset condition;

A storage management module, configured to add an image corresponding to a new target object in the target object to the first database, where the new target object is an object that is not stored in the first database, and the first database stores In the storage module.
The device of claim 25, wherein:

The compression module is further configured to compress the image corresponding to the new target object to obtain third compressed data;

The sending module is further configured to send the third compressed data to the second device.
The device according to claim 25, wherein after the storage management module adds an image corresponding to a new target object in the target object into the first database,

The compression module is further configured to compress the image of the object stored in the first database to obtain fourth compressed data;

The sending module is further configured to send the fourth compressed data to the second device.
The device of claim 27, wherein:

The storage management module is further configured to delete the image of the object that does not meet the preset condition and is stored in the first database.
The device of claim 27, wherein:

The storage management module is further configured to: when the image definition of the first object in the target area in the first video image is better than the image of the first object stored in the first database The image of the first object stored in the first database is replaced with the image of the first object in the first video image.
The device according to any one of claims 24-29, wherein:

The first video image is a video image with a frame number greater than a preset frame number in a video file that is being compressed and transmitted in real time by the first device;

The acquiring module is further configured to acquire a second video image in the video file, and the frame number of the second video image in the video file is less than the preset frame number;

The second determining module is further configured to identify an object in the second video image that meets the preset condition;

The storage management module is further configured to store the objects in the second video image that meet the preset conditions in the first database.
The device according to any one of claims 21-23, wherein:

The preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to a preset number.
The device according to claim 31, further comprising:

The third determining module is used to identify objects that meet preset conditions among all the video images in the video file;

The storage management module is configured to store the image of the object that meets the preset condition in the first database.
The device according to claim 31 or 32, wherein:

The image of the object stored in the first database includes: the boundary pixel position of the object and the frame number of the video image including the object in the video file.
A video image processing device, characterized in that it comprises:

A receiving module, configured to receive second compressed data sent by the first device; wherein the second compressed data is obtained by compressing an area other than the target area in the first video image;

A decompression module for decompressing the second compressed data to obtain a third video image, the third video image including an image corresponding to an area other than the target area in the first video image ；

An acquiring module, configured to acquire an image corresponding to the target area from the second database of the second device;

The determining module is configured to determine the first video image according to the third video image and the image corresponding to the target area.
The device according to claim 34, further comprising: a storage management module;

The receiving module is further configured to receive the first compressed data sent by the first device;

The decompression module is further configured to decompress the first compressed data to obtain an image set corresponding to an object that meets a preset condition, and the image set includes an image corresponding to the target area;

The storage management module is used for storing the image collection in the second database.
The device according to claim 34 or 35, wherein:

The receiving module is further configured to receive the marking information of the target area sent by the first device; wherein the marking information includes: position information of the target area in the first video image, and At least one of the identification information or transformation information of the included object in the first database of the first device; the transformation information is used to indicate that the image of the object in the target area in the first database and the The difference between the first video image.
The device of claim 36, wherein:

The determining module is specifically configured to stitch the image corresponding to the target area and the third video image to obtain the first video image according to the marking information of the target area.
The device according to any one of claims 34-37, wherein:

The receiving module is further configured to receive third compressed data sent by the first device;

The decompression module is further configured to decompress the third compressed data to obtain an image of a new target object;

The storage management module is further configured to add the image of the new target object to the second database.
The device according to any one of claims 34-37, wherein:

The receiving module is further configured to receive fourth compressed data sent by the first device;

The decompression module is further configured to decompress the fourth compressed data to obtain an updated image set corresponding to an object that meets a preset condition;

The storage management module is further configured to update the second database based on the updated image set corresponding to the object that meets the preset condition.
The device of claim 35, wherein:

The preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, and N>1, M ＜N;

Alternatively, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to a preset number.
A video image processing device, characterized by comprising: a processor and a transmission interface;

The device communicates with other devices through the transmission interface;

The processor is configured to read software instructions stored in the memory to implement the method according to any one of claims 1-13 or 14-20.
A computer-readable storage medium, characterized in that instructions are stored in the computer-readable storage medium, and when the instructions are executed by a computer or a processor, the computer or the processor can realize Any one of the methods described in any one of 14-20.
A computer program product, characterized in that, the computer program product contains instructions, and when the instructions run on a computer or a processor, the computer or the processor realizes any one of claims 1-13. Item or the method of any one of 14-20.