CN115699725A

CN115699725A - Video image processing method and device

Info

Publication number: CN115699725A
Application number: CN202080101403.9A
Authority: CN
Inventors: 吴更石; 郭栋; 张开明
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2023-02-03
Also published as: WO2021237464A1

Abstract

The application provides a video image processing method and device, wherein after a target area of an object in a first database included in a first video image is identified by a first device, an area except the target area in the first video image is compressed to obtain second compressed data, and the second compressed data is sent to a second device, so that after the second compressed data is received by the second device, a third video image obtained by decompression according to the second compressed data can be combined with an image of the target area stored in a second database of the second device, and the first image is finally obtained. Therefore, the first device does not need to repeatedly compress the target area of the object which frequently appears in the video image, and the second device does not need to repeatedly decompress the target area, so that the data volume of the compression packet transmitted between the first device and the second device is reduced, and the efficiency of video image processing is improved.

Description

Video image processing method and device

Technical Field

The present application relates to data processing technologies, and in particular, to a method and an apparatus for processing a video image.

Background

Video compression is a technology for re-compressing video files, and can compress larger video files into smaller compressed files for transmission or storage without affecting video content, and is commonly used in application scenes such as network video playing and monitoring video transmission, which need to transmit or store video files.

In the prior art, video compression protocols such as h.264 (also called Advanced Video Coding (AVC)), h.265 (also called High Efficiency Video Coding (HEVC)) and h.266 (also called next generation video coding (VVC)) can be used for compressing video files, in which all video images in a video file are divided into different image packets, for example, one image packet for each 64 consecutive frames of images. When each frame of image in each image packet is compressed, each frame of image is divided into image blocks with different sizes, if the similarity between a certain image block in the current image and the image blocks in other compressed images is higher, the contents in the two image blocks in the two frames of images can be considered to be the same, the compressed image blocks can be used for representing the image blocks in the current image, and when the current image is compressed, only the area except the certain image block in the current image needs to be compressed, so that the calculation amount when the video file is compressed is reduced, and the compression efficiency is improved.

By adopting the prior art, when a video file is compressed, each frame of video image in the video file needs to be blocked to obtain a plurality of image blocks, different image blocks are compared to obtain similar image blocks, and more dense image blocks need to be arranged for identifying and comparing areas with dense object distribution and more boundaries in the video image, so that the number of image blocks processed when each frame of video image is compressed is large, the efficiency of compressing the video file is reduced, and finally the efficiency of processing the video image is low.

Disclosure of Invention

The application provides a video image processing method and device, which are applied to compressing video images to solve the technical problem that the efficiency of video image processing is low due to low compression efficiency when the video images are compressed in the prior art.

In a first aspect of the present application, a video image processing method is provided, where an execution subject is a first device that compresses a video image, where after a target area of an object in a first database included in the first video image is identified, the first device compresses an area other than the target area in the first video image to obtain second compressed data, and sends the second compressed data to a second device.

In the process, the first device does not need to compress the target area, but only needs to compress an area except for the target area of the object existing in the database in the video image, and for the image of the object which is stored in the first database of the first device and meets the preset condition, the second database of the second device is also stored, so that after the second device receives the second compressed data, the third video image obtained by decompressing according to the second compressed data is combined with the image of the target area stored in the second compressed data, and the first image is finally obtained. Therefore, the embodiment enables the first device not to repeatedly compress the target area of the object which frequently appears in the video image, and based on the image of the target area already stored in the second database of the second device, the first device only needs to compress the area except the target area, so that repeated compression operation on the target area which repeatedly appears is not needed, the data volume of the final compression packet is reduced, and the efficiency of the first device in processing the video image is improved.

In an embodiment of the first aspect of the present application, based on the video image method provided by the first aspect, the image of the object meeting the preset condition stored in the second database of the second device may be sent by the first device. Specifically, for the first device, first compressed data obtained by compressing an image of an object in the first database may be separately sent to the second device, where the first compressed data includes a compression result of the target area, so that after the second device receives the first compressed data, an image set obtained by decompressing the first compressed data is stored in the second database.

Optionally, the first apparatus may send the first compressed data and the second compressed data to the second apparatus at the same time, and then the second apparatus may decompress the first compressed data first, so as to determine the second database, and then decompress the second compressed data; or, optionally, the first device sends the first compressed data to the second device before the method of the first aspect of the present application, and after the second device decompresses the first compressed data and determines the second database, executes the method of the first aspect of the present application to send the second compressed data to the second device. In an alternative scheme, after the first device sends the first compressed data to the second device, the first device may compress an area of the first video image except the target area to obtain second compressed data and send the second compressed data to the second device, and it is not necessary to wait until the second device decompresses the first compressed data to determine the second database, that is, the process of obtaining the second compressed data by the first device and the process of obtaining the second database by the second device decompressing the first compressed data may be parallel. The embodiment of the present application does not limit the sequence of the two processes. Therefore, in this embodiment, after the first device can only compress the image of the object included in the first database once, the obtained first compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine the second database, and the subsequent first device only needs to compress the region outside the target region when processing the video image, thereby reducing the image size and the number of times that the first device compresses the video image, and further improving the processing efficiency of the video image.

In the first embodiment of the first aspect of the present application, since the first device compresses only a region other than the target region in the first video image, the second device decompresses the third video image according to the second compressed data to obtain the first video image in combination with the image of the target region stored in the second compressed data, and in order to enable the second device to determine the position relationship between the decompressed third video image and the target region in the first video image more quickly and accurately, the first device serving as the compression end may synchronously send, to the first device, the tag information of the target region in the first video image when sending the second compressed data, where the tag information includes at least one of position information of the target region in the first video image and identification information acquisition transformation information of an image of an object included in the target region in the first database. Therefore, according to the video image processing method provided by this embodiment, when determining the target area, the first device may also determine the tag information of the target area, and subsequently send the second compressed data and the tag information of the target area to the second device at the same time, so that the second device can more quickly and accurately determine the target area in the first video image, and further can more quickly determine the first video image after receiving the second compressed data, thereby further improving the processing efficiency of the video image.

In an embodiment of the first aspect of the present application, when the method is applied in a scene of real-time video image transmission, the preset condition includes: in N video images before the first video image, the number of the video images including the object is larger than or equal to M, wherein M and N are positive integers, N is larger than 1, and M is smaller than N. Specifically, this embodiment may apply the scene that the first device acquires the first video image in real time, that is, compresses the first video image and transmits the compressed first video image to the second device, when the first device compresses the portion of the first video image except the target region through the foregoing embodiment, the first database is based on N video images from the N +1 th video image to the 1 st video image before the first video image, and the preset condition that the object satisfies is that the number of times of occurrence of the object in the N video images or the number of video images including the object is greater than or equal to M. Therefore, in this embodiment, the first database based on which the first device determines the target area is also obtained according to N images before the first video image, so that the real-time property of the first database is satisfied, and the method can be applied to a scene in which the first device compresses the first video image obtained in real time, and is used for improving the processing efficiency of the first device on the real-time video image.

In an embodiment of the first aspect of the present application, after the first device obtains the target area from the first video image, compresses the target area to obtain the second compressed data, and sends the second compressed data to the second device, the first device may further update the first database based on the newly obtained first video image. After the first video image is added, the preset condition is judged according to the fact that the first video image and the previous N-1 video images are obtained from N video images. Then, when the first device identifies a new target object that satisfies the preset condition, that is, the number of times that an object appears in the N video images or the number of video images including the object is greater than or equal to M, from N video images before the first video image, the first device adds an image of the new target object to the first database to update the first database, thereby satisfying the real-time performance of the target area determined by the first device when processing the video images.

In the first embodiment of the first aspect of the present application, when the first device adds the image of the new target object to the first database, the first device may send the image of the new target object to the second device, so that the second device side updates the second database stored therein, and the consistency between the first database and the second database is maintained.

In one implementation, the first device may compress an image of a new target object, and send third compressed data obtained by the compression to the second device; or, in another implementation manner, the first device may compress the whole first database after the new target object is added, and send the obtained fourth compressed data to the second device.

For the second device, when the third compressed data or the fourth compressed data is received, the second database may be updated, so that a new target object is stored in the updated second database. After that, for the video image processed by the first device, the area including the new target object in the video image can be regarded as the target area without compression, and after receiving the compressed data not including the target area and decompressing, the second device obtains the image of the new target object from the second database, and finally obtains the video image.

In an embodiment of the first aspect of the present application, in addition to adding an image of an object to the first database, the first device may delete the image stored in the first database after the object stored in the first database does not satisfy a preset condition, so as to save a storage space of the first database and improve utilization efficiency of the storage space of the first device.

In an embodiment of the first aspect of the present application, the first apparatus may further replace the image of the object stored in the first database when the object sharpness of the target area in the first video image is high. When the first device is used for the target area in the first video image, if the definition of the first object in the target area is determined to be better than that of the first object stored in the database, the first device replaces the image of the first object stored in the first database with the image of the first object in the first video image. Similarly, updates made to the first database may also be compressed by the first device and sent to the second device, causing the second device to update the second database. Therefore, when the subsequent second device restores the first video image through the object in the second database, better definition can be obtained, and the problem that the target area in the first video image is not clear due to the fact that the image definition of the object stored in the second database is not as good as the definition of the object in the actual first video image is solved.

In an embodiment of the first aspect of the present application, when the method is applied in a scene of real-time video image transmission, or when no image is stored in the first database of the first apparatus, for example, for a video file transmitted in real time, when the first apparatus acquires the first video image, the target region cannot be determined according to the first database, and therefore, in order to ensure completeness of the first apparatus in processing the video image, when the first apparatus acquires a video image with a frame number smaller than a preset frame number in the video file, the video image may be integrally encoded, and an object meeting a preset condition is determined according to the portion of the video image smaller than the preset frame number and stored in the first database. Subsequently, after the first device establishes the first database with the video image with the frame number less than the preset frame number, the first device may execute the video image processing method in the foregoing embodiment of the present application after receiving the first video image with the frame number greater than the preset frame number.

In an embodiment of the first aspect of the present application, when the method is applied to a scene of non-real-time video image transmission, since the first device can completely acquire the entire video image, the preset condition may be that the object is in the entire video file, and the number of video images including the object is greater than or equal to a preset number. Therefore, in this embodiment, the first database based on which the first device determines the target region is also obtained according to all images in the video file, so that the completeness of the first database is satisfied, and it is ensured that all objects added to the first database satisfy the preset condition, and the method can be applied to a non-real-time video file and a scene in which the first video image is compressed by the first device, so as to improve the processing efficiency of the first device on the real-time video image.

In an embodiment of the first aspect of the present application, the first database in the scene of the non-real-time video image transmission may also be obtained by the first device according to the video image. Before transmitting the video file, the first device may first identify images of objects meeting preset conditions in all video images of the video file and store the images in the first database, and then process each video image in the video images as the first video image.

In an embodiment of the first aspect of the present application, in addition to directly storing, in the second apparatus, an image of an object that meets the preset condition, since the first apparatus does not recognize the object as the target area before determining that the object meets the preset condition, the second compressed data sent by the first apparatus to the second apparatus may include the image of the object. Therefore, in order to save the amount of data transmitted between the first device and the second device, the first device may replace the transmitted object only by at least one of the boundary pixel position or the frame number of the target area, so that the second device can acquire the image of the object from the boundary pixel position of the corresponding frame number by itself after receiving the information. Therefore, the video image processing method provided in this embodiment can only send the boundary pixel position and the frame number of the image of the object when the first device sends the image of the object in the first database to the second device, so that the second device can obtain the target area from the received video image, thereby reducing the data amount actually sent when the first device sends the first video image to the second device, enabling the first device to compress the video image faster and the second device to decompress the video image faster, and further improving the efficiency of processing the video image.

A second aspect of the present application provides a video image processing method, where an execution subject is a second device that receives a compressed video file, where the second device decompresses received second compressed data to obtain a first video image that does not include a target area, and determines an image of an object in the target area by combining a second database, and the first video image can be obtained after the two are subjected to stitching processing. Therefore, for the situation that different video images all include the target area, the second device only needs to decompress the first compressed data once to obtain the image of the object in the target area, and does not need to decompress the target area in other video images to directly exhale from the second database.

In an embodiment of the second aspect of the present application, the image of the object meeting the preset condition stored in the second database of the second device may be transmitted by the first device. Specifically, for the second device, after receiving the first compressed data sent by the first device, the first compressed data is decompressed to obtain an image set, and the image set is stored in the second database. Therefore, in this embodiment, after the first device can only compress the image of the object included in the first database once, the obtained first compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine the second database, so that the subsequent second device only needs to decompress the second compressed data when processing the video image, and does not need to repeatedly compress the object in the target area, thereby reducing the size and the number of times of decompressing the second device, and further improving the processing efficiency of the video image.

In an embodiment of the second aspect of the present application, in order to enable the second device to more quickly and accurately determine the position relationship between the decompressed third video image and the target region in the first video image, the first device serving as the compression end may send, to the first device, the tag information of the target region in the first video image synchronously when sending the second compressed data, where the tag information includes at least one of position information of the target region in the first video image and identification information of an image of an object included in the target region in the first database to obtain transformation information. Then, for the second device, after receiving the first compressed data and the mark information of the target area, the target area in the first video image can be determined more quickly and accurately, and then the first video image can be determined more quickly after receiving the second compressed data, thereby further improving the processing efficiency of the video image.

In an embodiment of the second aspect, in addition to directly storing the image of the object meeting the preset condition in the second device, the first device does not recognize the object as the target area before determining that the object meets the preset condition, so that the second compressed data sent by the first device to the second device includes the image of the object. Therefore, in order to save the amount of data transmitted between the first device and the second device, the first device may replace the transmitted object only by at least one of the boundary pixel position or the frame number of the target area, so that the second device can acquire the image of the object from the boundary pixel position of the corresponding frame number by itself after receiving the information. Therefore, the video image processing method provided in this embodiment can only send the boundary pixel position and the frame number of the image of the object when the first device sends the image of the object in the first database to the second device, so that the second device can obtain the target area from the received video image, thereby reducing the data amount actually sent when the first device sends the first video image to the second device, enabling the first device to compress the video image more quickly and the second device to decompress the video image more quickly, and further improving the efficiency of processing the video image.

In an embodiment of the second aspect of the present application, when the method is applied to a scene in which a real-time video image is transmitted, after the first device updates the first database based on the first video image that is obtained newly, the first device may compress the image of the newly added target object to obtain third compressed data or compress the entire first database into which the image of the newly added target object is added to obtain fourth compressed data, and send the third compressed data to the second device. At this time, for the second device, the second database stored therein may be updated based on the third compressed data or the fourth compressed data, and the consistency between the first database and the second database may be maintained. The area including the new target object in the video image can be used as the target area without compression, and the second device receives the compressed data not including the target area, decompresses the compressed data, and acquires the image of the new target object from the second database to finally obtain the video image.

In an embodiment of the second aspect of the present application, when the provided method is applied in a scene of real-time video image transmission, the preset condition includes: in N video images before the first video image, the number of the video images including the object is more than or equal to M, wherein M and N are positive integers, N is more than 1, and M is less than N; when the provided method is applied to a scene of non-real-time video image transmission, the preset condition may be that the object is in the whole video file, and the number of video images including the object is greater than or equal to the preset number.

A third aspect of the present application provides a video image processing apparatus, which is applicable as a first apparatus, for performing the video image processing method according to any one of the first aspects of the present application, the apparatus comprising: the device comprises an acquisition module, a first determination module, a compression module and a sending module;

the acquisition module is used for acquiring a first video image; the first determining module is used for determining a target area in the first video image; the target area comprises an image of an object which is stored in a first database of the first device and meets a preset condition; the compression module is used for compressing the area except the target area in the first video image to obtain second compressed data; the sending module is used for sending the second compressed data to the second device; the second database of the second device stores images of objects meeting preset conditions.

In an embodiment of the third aspect of the present application, the compression module is further configured to compress an image of an object stored in the first database to obtain first compressed data; the sending module is further used for sending the first compressed data to the second device; the first compressed data is used by the second device to determine a second database.

In an embodiment of the third aspect of the present application, the sending module is specifically configured to send the second compressed data and the mark information of the target area to the second apparatus; wherein the marking information includes: at least one of position information of a target area in the first video image, identification information or transformation information of an image of an object included in the target area in the first database; the transformation information is used to represent the difference between the image of the object in the target region in the first database and the first video image.

In an embodiment of the third aspect of the present application, the preset condition includes: in N video images before the first video image, the number of the video images including the object is more than or equal to M, wherein M and N are positive integers, N is more than 1, and M is less than N.

In an embodiment of the third aspect of the present application, the apparatus further includes: a second determination module and a storage management module;

the second determining module is used for identifying a target object which meets a preset condition in the first video image; the storage management module is used for adding an image corresponding to a new target object in the target objects into a first database, wherein the new target object is an object which is not stored in the first database, and the first database is stored in the storage module.

In an embodiment of the third aspect of the present application, the compression module is further configured to compress an image corresponding to the new target object to obtain third compressed data; the sending module is further configured to send the third compressed data to the second device.

In an embodiment of the third aspect of the present application, after the storage management module adds an image corresponding to a new target object in the target objects to the first database, the compression module is further configured to compress the image of the object stored in the first database to obtain fourth compressed data; the sending module is further configured to send the fourth compressed data to the second device.

In an embodiment of the third aspect of the present application, the storage management module is further configured to delete an image of an object that does not meet a preset condition and is stored in the first database.

In an embodiment of the third aspect of the present application, the storage management module is further configured to replace the image of the first object stored in the first database with the image of the first object in the first video image when the definition of the image of the first object in the target area in the first video image is better than the definition of the image of the first object stored in the first database.

In an embodiment of the third aspect of the present application, the first video image is a video image with a frame number greater than a preset frame number in a video file being compressed and transmitted by the first device in real time; the acquisition module is also used for acquiring a second video image in the video to be processed, wherein the frame number of the second video image in the video to be processed is smaller than the preset frame number; the second determining module is further used for identifying an object which meets a preset condition in the second video image; the storage management module is further used for storing the objects meeting the preset conditions in the second video image into the first database.

In an embodiment of the third aspect of the present application, the preset conditions include: in the video file where the first video image is located, the number of the video images including the object is greater than or equal to a preset number.

In an embodiment of the third aspect of the present application, the apparatus further includes: a third determination module; the third determining module is used for identifying objects which accord with preset conditions in all video images in the video file; and the storage management module is used for storing the image of the object meeting the preset condition into the first database.

In an embodiment of the third aspect of the present application, the first database stores images of objects including: the border pixel locations of the object and the frame number of the video image in the video file that includes the object.

A fourth aspect of the present application provides a video image processing apparatus, as a second apparatus, for executing the video image processing method according to any one of the second aspects of the present application, the apparatus comprising: the device comprises a receiving module, a decompressing module, an acquiring module and a determining module; the receiving module is used for receiving second compressed data sent by the first device; the second compressed data is obtained by compressing the area except the target area in the first video image; the decompression module is used for decompressing the second compressed data to obtain a third video image, and the third video image comprises images corresponding to areas except the target area in the first video image; the acquisition module is used for acquiring an image corresponding to the target area from a second database of the second device; and the determining module is used for determining the first video image according to the third video image and the image corresponding to the target area.

In an embodiment of the fourth aspect of the present application, the apparatus further includes: a storage management module; the receiving module is further used for receiving first compressed data sent by the first device; the decompression module is further used for decompressing the first compressed data to obtain an image set corresponding to the object meeting the preset condition, wherein the image set comprises an image corresponding to the target area; the storage management module is used for storing the image set into a second database.

In an embodiment of the fourth aspect of the present application, the receiving module is further configured to receive label information of the target area, where the label information is sent by the first apparatus; wherein the marking information includes: location information of a target region in the first video image, at least one of identification information or transformation information of an object included in the target region in a first database of the first device; the transformation information is used to represent the difference between the image of the object in the target region in the first database and the first video image.

In an embodiment of the fourth aspect of the present application, the determining module is specifically configured to splice an image corresponding to the target area and the third video image according to the mark information of the target area, so as to obtain the first video image.

In an embodiment of the fourth aspect of the present application, the receiving module is further configured to receive third compressed data sent by the first apparatus; the decompression module is further used for decompressing the third compressed data to obtain a new image of the target object; the storage management module is further configured to add the image of the new target object to the second database.

In an embodiment of the fourth aspect of the present application, the receiving module is further configured to receive fourth compressed data sent by the first apparatus; the decompression module is further configured to decompress the fourth compressed data to obtain an updated image set corresponding to the object meeting the preset condition; the storage management module is further configured to update the second database based on the updated image set corresponding to the object meeting the preset condition.

In an embodiment of the fourth aspect of the present application, the preset condition includes: in N video images before the first video image, the number of the video images including the object is more than or equal to M, wherein M and N are positive integers, N is more than 1, and M is less than N; alternatively, the preset conditions include: in the video file where the first video image is located, the number of the video images including the object is greater than or equal to a preset number.

A fifth aspect of the present application provides a video image processing apparatus comprising: a processor and a transmission interface; the device communicates with other devices through the transmission interface; the processor is configured to read software instructions stored in the memory to implement the method according to any one of the first aspect of the present application.

A sixth aspect of the present application provides a video image processing apparatus comprising: a processor and a transmission interface; the device communicates with other devices through the transmission interface; the processor is configured to read software instructions stored in the memory to implement the method according to any one of the second aspects of the present application.

A seventh aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when executed by a computer or processor, cause the computer or processor to carry out a method according to any one of the first aspects of the present application.

An eighth aspect of the present application provides a computer-readable storage medium having stored therein instructions which, when executed by a computer or processor, cause the computer or processor to carry out a method according to any one of the second aspects of the present application.

A ninth aspect of the present application provides a computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to carry out the method according to any one of the first aspects of the present application.

A tenth aspect of the present application provides a computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to carry out the method according to any one of the second aspects of the present application.

Drawings

FIG. 1 is a schematic diagram of an application scenario of the present application;

FIG. 2 is a schematic diagram of a video compression technique;

FIG. 3 is a schematic diagram of a video image divided into image blocks;

fig. 4 is a schematic flowchart of an embodiment of a video image processing method provided in the present application;

FIG. 5 is a schematic diagram of a database setup method provided in the present application;

FIG. 6 is a schematic diagram of an object in a video image according to the present application;

fig. 7 is a schematic flowchart of another embodiment of a video image processing method provided in the present application;

FIG. 8 is a schematic flowchart of an exemplary embodiment of a video image processing method provided in the present application;

FIG. 9 is a schematic flowchart of an exemplary embodiment of a video image processing method provided in the present application;

FIG. 10 is a schematic flowchart of an exemplary embodiment of a video image processing method provided in the present application;

fig. 11 is a schematic flowchart of an exemplary embodiment of a video image processing method provided in the present application;

FIG. 12 is a schematic flowchart of an exemplary embodiment of a video image processing method provided in the present application;

fig. 13 is a schematic structural diagram of an embodiment of a video image processing apparatus provided in the present application;

FIG. 14 is a schematic structural diagram of an embodiment of a video image processing apparatus according to the present application;

FIG. 15 is a schematic structural diagram of an embodiment of a video image processing apparatus provided in the present application;

FIG. 16 is a schematic structural diagram of an embodiment of a video image processing apparatus provided in the present application;

fig. 17 is a schematic structural diagram of an embodiment of a video image processing apparatus provided in the present application;

fig. 18 is a schematic structural diagram of an embodiment of a video image processing apparatus according to the present application.

Detailed Description

Before describing the embodiments of the present application, the following description will be made with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an application scenario of the present application, where the present application is applied to a scenario of transmission of a video file between different devices, where, for example, the device shown in fig. 1 may be a device with a video file processing function, such as a mobile phone, a tablet computer, a laptop computer, a desktop computer, or a server, and various embodiments of the present application may be executed by the device shown in fig. 1, or executed by a processor (e.g., a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU) of the device shown in fig. 1, and the device shown in fig. 1 in this embodiment is used as an exemplary description, as shown in fig. 1, a first device 10 has a communication connection relationship with a second device 20, through which the first device 10 may transmit the video file 30 to the second device 20, and the transmitted video file 30 may be divided into a real-time video file and a non-real-time video file according to a time requirement, for example, when the transmission applied to the non-real-time video file is a movie file, the first device may transmit the video file 30 to the second device 20 before the movie data packet, and obtain a compression data packet 40 before the second device 20; the second device 20 can obtain the file of the whole movie after decompressing the data packet 40 received at time T2. It can be understood that the time interval between time T1 and time T2 is large because the first device needs to compress all the video images in the video file, and when applied to a real-time transmission scene, the video file can be a monitoring picture, the method includes that a video image and other files with timeliness are displayed on a television, a first device cannot acquire each frame of video image in a complete video file and needs to transmit the currently and latest acquired video image, the first device 10 can compress the video image at the time of T1 to obtain a data packet 40 and send the data packet 40 to a second device 20, the second device 20 receives and decompresses the data packet 40 at the time of T2, and the first device only needs to compress one frame of video image, so that the time interval between the time of T1 and the time of T2 is small, and then the process is continuously repeated between the first device 10 and the second device 20, so that the first device 10 can send the video image in the currently and latest acquired video files such as a monitoring image and a television image to the second device 20 in real time.

Because the video file is composed of continuous video images, the number of the video images included in the video file is increased, and each video image has higher resolution, the data volume of the whole video file is greatly improved. Therefore, when video files are transmitted between devices as shown in fig. 1 using limited communication resources, the video files can be compressed at a higher quality, and a larger video file can be compressed into a smaller compressed file for transmission and storage. In the process of compressing the video file, the data size of the transmitted video file is reduced, and the opposite end is ensured to be capable of completely recovering the video file according to the compressed file.

In order to implement compression of a video file, video compression protocols such as h.264, h.265, and h.266 proposed in some technologies may implement compression processing on the video file by re-encoding the video file through a compression encoding manner, for example, fig. 2 is a schematic diagram of a video compression technology, where an execution subject may be the first device 10 shown in fig. 1, and in the process of compressing the video file, consecutive video images included in the video file are divided into different image packets, for example, in fig. 2, each 64 video images in the video file are taken as an image packet, so as to obtain image packets 1 to 64, image packets 65 to 128, image packets 129 to 256 \8230, 8230. Then, for each image packet, part of the frames are selected as key frames, for example, for the image packets 1 to 64, the 1 st frame and the 64 th frame may be used as key frames to perform overall compression coding, and for each of the video images in the 2 nd frame to the 63 rd frame, the video images may be used as non-key frames, and before the compression coding of the video images, the video images are divided into different image blocks according to the distribution of objects in the video images, the density of boundaries, and the like, and compared with the image blocks in the key frames. For example, in fig. 2, the first device 10 may first perform compression coding on a key frame in a video file, and when a non-key frame is processed subsequently, if an image block in the currently compressed non-key frame includes an object a, and the similarity between the image block in the currently compressed non-key frame and an image block in the already compression-coded key frame that includes the object a is higher, it may be considered that the two image blocks in the two frames of video images are similar. Therefore, when the current non-key frame is compressed, the image blocks in the compressed and encoded key frame can be used for representing similar image blocks in the current non-key frame, and only the areas except the image blocks which can be represented by the image blocks in the key frame in the current compressed non-key frame need to be compressed and encoded, and similarly, the video images in the image packets 65-128 can be compared with the image blocks comprising the object B, and the video images in the image packets 129-256 can be compared with the image blocks comprising the object C, so that the calculation amount during the compression processing of the video files is reduced, and the compression efficiency is improved.

More specifically, when the non-key frames in the video file are compressed and encoded by a video compression protocol, the video image needs to be divided into different image blocks, fig. 3 is a schematic diagram of dividing the image blocks of the video image, for example, the number of objects distributed at four corners, especially the upper left corner, of the image block on the left side in fig. 3 is large, so that the boundary information of the objects to be processed is large, the number of the image blocks divided at the four corners, where the number of the objects is large, and the number of the image blocks divided in the region, where the number of the intermediate objects is small, in the image block divided at the left side in the image block divided at the right side, so that when the non-key frame in fig. 3 is subsequently compared with the key frame which is compressed and encoded, the image block where the number of the objects is large can be divided into smaller image blocks to be compared with the key frame, and because the boundary information included in the smaller image block is more accurate, the image block corresponding to the position can be more accurately compared with the key frame, so that when the image block in the key frame is compared with the image block in the key frame, a higher accuracy can be achieved.

However, in the process of dividing the video image into different image blocks, for areas with more densely distributed objects and more boundaries in the video image, more densely distributed image blocks need to be set for identification and comparison, so that when each frame of image in the video file is compressed, a greater number of image blocks need to be processed, thereby reducing the efficiency of each frame of video image in compression, and further reducing the efficiency of compressing the video file. Meanwhile, in the technology, the video file is divided into different image packets, and only the image blocks of the video images in the image packets are compared, so that the overall identification and comparison are lacked, if an object exists in each frame of video image in the whole video file, the object is repeatedly identified and compared in each image packet, and the efficiency of the video file in compression is also reduced.

Therefore, the application provides a video image processing method and device, which are used in the process of compressing a video file, and are used for independently extracting, comparing and compressing objects meeting certain preset conditions included in video images in the video file, so that when the video file is compressed, only the area except the part of the objects needs to be processed when each frame of video image is compressed after the part of the objects is compressed once, thereby improving the efficiency of each frame of video image in compression and further improving the efficiency of compressing the video file.

The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 4 is a flowchart illustrating a video image processing method according to an embodiment of the present disclosure, where the method shown in fig. 4 can be applied to a real-time transmission scene or a non-real-time transmission scene in the scene shown in fig. 1, and is executed by a first device and a second device. Specifically, the video image processing method provided by this embodiment includes:

s101: the first device acquires a first video image.

First, as a first device that transmits a video file, a first video image to be transmitted is acquired in S101. The first video image may be a frame of video image in a non-real-time video file transmitted by the first device to the second device, or the first video image may also be a real-time video image transmitted by the first device to the second device.

S102: the first device determines a target area in the first video image; the target area comprises an image of an object which is stored in a first database of the first device and meets a preset condition;

subsequently, the first device identifies a target area in the first video image, wherein the target area is an image including objects stored in a first database, and the objects stored in the first database all satisfy a preset condition. For example, fig. 5 is a schematic diagram of a database setting manner provided by the present application, where a first database is provided in a first device, and a second database is provided in a second device. A first database in the first device may be used to store images of objects that meet preset conditions.

Optionally, when the method is applied to a non-real-time transmission scene, the preset condition may be that the number of video images including an object in all video images of a video file in which the first video image is located is greater than or equal to a preset number; when the method is applied to a real-time transmission scene, the preset condition may be that the number of video images including the object in N video images before the first video image is greater than or equal to M, M and N are positive integers, N is greater than 1, and M is less than N.

Fig. 6 is a schematic diagram of an object in a video image provided by the present application, where, taking the first video image shown on the left side of fig. 6 as an example, at least an electric vehicle a, a cyclist b, a passerby c, a vehicle d, and a passerby e are included in the first video image, besides a background, these objects may be objects that may move in the video image, and other parts than the above objects belong to static objects, which may be considered as the background of the video image. Assuming that the first database of the first device stores images of two objects, i.e., the electric vehicle a and the rider b, the first device may determine that the target area in the first video image is an area where the images of the two objects, i.e., the electric vehicle a and the rider b, are located in the map in this step.

S103: and compressing the area except the target area in the first video image to obtain second compressed data.

After determining the target area in the first video image in S102, the first device compresses only the area of the first video image except the target area, and the obtained compressed data is recorded as second compressed data. The method of compressing the first video image is not limited, and may be performed by compression encoding.

S104: the first device transmits the second compressed data to the second device.

Specifically, the first device transmits the second compressed data obtained in S103 to the second device, and for the second device, receives the second compressed data transmitted by the first device.

Optionally, in order to enable the second apparatus to determine that the second compressed data is obtained by compressing an area outside the target area by the first apparatus, the first apparatus may further send, in S104, label information of the target area in the first video image to the second apparatus, so that the second apparatus may determine the target area in the first video image according to the label information of the target area.

S105: and the second device decompresses the second compressed data to obtain a third video image.

Specifically, after receiving the second compressed data sent by the first device, the second device decompresses the second compressed data, and the obtained video image is recorded as a third video image. The third video image is an image corresponding to an area other than the target area transmitted by the first device.

S106: and the second device acquires the image corresponding to the target area from the second database.

Specifically, as shown in fig. 5, a second database is provided in the second device, the second database can be used to store images of objects satisfying the preset condition, and the second database can store images of the same objects in the first database. The images of the objects stored in the second database may be preset, may be stored in advance, or may be transmitted to the second apparatus in real time by the first apparatus according to the first database and stored in the second database.

Optionally, the second device may obtain an image corresponding to the target area from the second database according to the mark information of the target area in the second compressed image.

S107: and the second device determines the first video image according to the third video image determined in the step S105 and the image of the target area acquired in the step S106, and finally completes the process of sending the first video image to the second device by the first device.

Illustratively, when the target region in the first video image is an image of two objects, i.e., an electric vehicle a and a rider b, in the diagram shown in fig. 6, and the third video image decompressed by the second device in S105 is an image of the area occupied by fig. 6 and excluding the two objects, i.e., the electric vehicle a and the rider b, and the image of the target region acquired in S106 is an image of the two objects, i.e., the electric vehicle a and the rider b, in S107, the second device may add the images of the two objects, i.e., the electric vehicle a and the rider b, to corresponding positions in the third video image to obtain a complete first video image.

It should be noted that, in the embodiment shown in fig. 4, when the method is applied in a real-time transmission scene, the first video image may be a real-time video image acquired by the first device. When the method is applied to a non-real-time transmission scene, the first video image may be understood as any one of video files, S101 to S103 in fig. 4 show a processing manner for any one of the video images in the video file, and in S104, the first device may process each frame of video image in the video file through S101 to S103, and then send the second compressed data of all the video images to the second device together, and the second device processes each frame of video image in the video file through S105 to S107, so as to finally obtain all the video images in the video file.

In summary, in the video image processing method provided in this embodiment, before the first device sends the first video image to the father device, the first device first identifies a target region of an object in a first database included in the first video image, compresses a region except the target region in the first video image to obtain second compressed data, and sends the second compressed data to the second device, and for an image of the object that is stored in the first database of the first device and meets a preset condition, the second database of the second device has been stored, so that after the second device receives the second compressed data, the second device may decompress a third video image obtained according to the second compressed data, and combine an image of the target region stored in the second compressed data, to finally obtain the first image. Therefore, with the video image processing method provided by this embodiment, when the first device sends the first video image to the second device, it is not necessary for the first device to repeatedly compress the target area of the object that often appears in the video image, which reduces the amount of compressed data; the second device does not need to decompress the target area repeatedly, and the data size of decompression is reduced. Thereby reducing the amount of data of the compressed packets transmitted between the first device and the second device, and thus improving the efficiency of video image processing.

Alternatively, in the embodiments shown in fig. 4 to 6, the image of the object stored in the second database in the second device may be sent by the first device to the second device for storage, specifically, fig. 7 is a flowchart of another embodiment of the video image processing method provided by the present application, the method shown in fig. 7 may be applied to a real-time transmission scene or a non-real-time transmission scene in the scene shown in fig. 1, and executed by the first device and the second device, and the method shown in fig. 7 is executed before the method S101 shown in fig. 4. Specifically, the video image processing method provided by this embodiment includes:

s201: the first device compresses an image of an object stored in a first database to obtain first compressed data.

Specifically, the first device may compress the entire first database after determining an image of an object that satisfies a preset condition and storing the image in the first database, and record the obtained compressed data as the first compressed data. For example, assuming that images of two objects, an electric vehicle a and a rider b, as shown in fig. 6, are stored in the first database, the first device compares the images in the first database in S201

S202: the first device transmits the first compressed data to the second device.

Specifically, the first device sends the first compressed data obtained in S201 to the second device, and for the second device, receives the first compressed data sent by the first device.

S203: the second device decompresses the first compressed data to obtain a corresponding image set and stores the image set in the second database.

Specifically, after receiving the first compressed data sent by the first device, the second device decompresses the first compressed data, records the obtained image including the plurality of objects as an image set, and stores the image set in the second database for use in the subsequent execution of the embodiment shown in fig. 4.

In summary, in this embodiment, based on that the first device can send the image of the object in the first database to the second device, the first device can only compress the image of the object included in the first database once, and then send the obtained first compressed data to the second device, so that the second device decompresses the first compressed data to determine the second database, and thus the subsequent first device only needs to compress the area outside the target area when processing the video image, thereby reducing the size and the number of times that the first device compresses the video image, reducing the size and the number of times that the first device decompresses the video image, and further improving the processing efficiency of the video image.

The following describes specific implementation manners of the video image processing method applied to a real-time transmission scene and a non-real-time transmission scene, respectively, with reference to specific embodiments. The following specific embodiments may be performed by the first device and the second device independently of the previously described embodiments as shown in fig. 4-7; or may be performed on the basis of the embodiments described above as performed by the first and second devices shown in fig. 4-7.

1. Non-real-time transmission of scenes.

When the video image processing method provided by the embodiment of the application is applied to transmitting a non-real-time video file in a scene shown in fig. 1, the first device 10 may send a whole section of a complete video file to the second device 20, and the first device 10 may first perform compression processing on the video file, obtain a compression packet with a small data size, and send the compression packet with the small data size to the second device 20, so as to save communication resources. Fig. 8 is an exemplary flowchart of an embodiment of a video image processing method provided in the present application, and illustrates a processing flow when compressing a whole video file.

In the embodiment shown in fig. 8, the first device 10 as the executing subject first acquires a to-be-processed video file 101, where the to-be-processed video file 101 specifically includes N consecutive video images, N is greater than 1, and sequentially marks the video images in the video file 101 as 1, 2 \ 8230, N \ 8230, which may also be referred to as a frame number, and the number of the video images included in the video file may also be referred to as a frame number according to the sequence of the N video images in the video file 101. Alternatively, the video file 101 to be processed may be designated by the user of the first device 10, or captured by the first device 10, or acquired by the first device 10 via the internet, and may be compressed by the embodiment shown in fig. 4 after the first device 10 acquires the video file 101 to be processed, or may be compressed by the embodiment shown in fig. 4 before the first device 10 determines that the video file 101 is to be transmitted to the second device 20.

After acquiring the video file 101, the first device 10 first identifies objects included in all N video images in the video file 101 through the first machine learning model 102, and determines at least one object included in the N video images that meets a preset condition. In this embodiment, after the first machine learning model 102 disposed in the first device 10 identifies the left video image, the identification result on the right side of fig. 6 can be obtained, and the objects a-e in the video image are identified. When the first machine learning model 102 identifies all the N video images in the entire video file, the object included in each of the N video images can be identified. Subsequently, the first device 10 may filter all objects in all N video images together according to the recognition result of the first machine learning 102 model, and store the object images satisfying the preset condition in the N video images into the first database 103 disposed in the first device 10. Optionally, a plurality of video images in the N video images may include images of objects satisfying the preset condition, and when the plurality of video images include the same object and have the same resolution, the image of the object in any one of the video images may be stored in the first database, and when the plurality of video images include the same object and have different resolutions, the higher the resolution is, the higher the definition is, the image of the object with the highest resolution in the plurality of video images may be stored in the first database.

Optionally, the first machine learning model 102 provided in this embodiment may be a neural network model of a Convolutional Neural Network (CNN), for example: alexNet, resNet, inclusion v3, or the like can be applied to models for object recognition in images.

Optionally, the preset condition in this embodiment may be that the number of times that the object appears in the N video images of the entire video file is greater than a preset number of times, for example, the preset number of times may be 10 (N > 10), and when the first machine learning model 102 identifies that the N video images of the video file include the object passerby c greater than 10, that is, the number of times that the passerby c appears in the N video images of the entire video file is greater than 10, the image of the passerby c may be stored in the first database 103. In the same manner, the first device 10 may store, in the first database 103, all images of objects satisfying the preset condition included in the N video images of the video file according to the recognition result of the first machine learning model.

Optionally, in an implementation, the images stored in the first database 103 may be images of different objects, the boundaries of the images are the boundaries of the objects, and the images have no other information such as a background except the objects, for example, for the object passerby c in fig. 5, the image divided by the boundary of the passerby c in the left video image in fig. 5 is stored in the first database 103, and no other object or background except the passerby c is included. Or, in another implementation, since the first device 10 compresses the whole video file, the first database 103 may store only the frame number of a certain video image in the video file where the image of the object is located and the boundary pixel position information of the object, and may subsequently acquire the image of the object through the frame number and the boundary pixel position. For example, after an object of which the video file satisfies a preset condition is identified, and an image including the object at the upper left corner of the 10 th frame of video image in the video file is determined, data in the form of, for example, "10, (a, b, c \8230;" may be stored in the first database 103, for indicating the pixel position of the image of the object at the 10 th frame of the video file and at the boundary of (a, b, c \8230;) in the 10 th frame of video image in the video file.

Subsequently, after storing all the objects satisfying the preset condition in the N video images of the video file into the first database 103, the first device 10 further processes the N video images of the video file in sequence through the second machine learning model 105, wherein the video image being processed by the second machine learning model is denoted as the first video image, the second machine learning model 105 can compare the image of the object already stored in the first database 103 with the first video image, determine at least one object included in the first video image being processed and a region where the image of each object is located, and denote the region where the at least one object is located as the target region. Exemplarily, assuming that in the example shown in fig. 4, images of 26 objects marked as a-Z are stored in the first database 103, after the second machine learning model acquires the first video image in the video file, the first video image is compared with the images in the first database, and it is determined that the currently processed first video image includes the objects marked as a and B in the first database, then the areas where the objects marked as a and B in the first video image are located are marked as an a target area and a B target area.

Optionally, the second machine learning model 105 and the first machine learning model 102 described in the embodiments of the present application may be the same machine learning model, for example, a CNN-type neural network model; or it may be a different machine learning model, where the second machine learning model 105 is different from the first machine learning model 102 in that there is an image of an object recognized in the first database in the second machine learning model 105 as prior information, and therefore, the recognition performed by the second machine learning model 105 may be understood as comparison of images, and the recognition performed by the first machine learning model 102 is to extract an object from a new video image, and since the amount of calculation of the second machine learning model 105 is less, a lighter model may be set, so that after two machine learning models are set in the first device 10 for recognition and comparison respectively, the overall processing efficiency can be improved.

It is understood that, in the example shown in fig. 4, the second machine learning model 105 compares each video image in the video file 101 as the first video image with the object in the first database 103 in sequence to determine the target region in each video image, and then the first device 10 compresses the regions of the N video images except the target region, so as to obtain the compressed data of the N video images in the video file 101, which is denoted as the second compressed data 106. It is to be understood that, although the compressed data of the N video images are included in the second compressed data 106, for each of the N video images including the target area, the included target area is "cropped" and only the other parts except the target area are reserved. So that the video images compressed in the second compressed data 106 are not complete and each video image lacks a target area of the image including the object in the first database.

In particular, in the present embodiment, since the part of the video image in the generated second compressed data 106 lacks the target area, which is equivalent to the image of the object in the first database included in the video image being "cropped", and in order to identify which image of the object in the first database and the position of the object in the video image are "cropped", when the second compressed data 106 is generated, the first apparatus 10 further marks the video image of each included at least one target area, and these marks may be carried in the corresponding video image, so that when the video image is subsequently decompressed, the information such as the position of the target area in the video image is determined.

Optionally, in this embodiment, the content of the mark at least includes: at least one of position information of the target region in the first video image, transformation information, and identification information of an object included in the target region in the first database. The transformation information is used to identify a difference between an image of an object in the target region in the first database and the first video image. For example, in the example shown in fig. 4, the objects stored in the first database 103 may be identified by identifying information letters a-Z, and assuming that the object corresponding to letter a is a pedestrian, the first database stores an image of the pedestrian at a resolution of 128 × 128. When the target region at the top left corner in the video image currently identified by the second machine learning model 105 includes an object corresponding to the letter a stored in the first database 103, when the first device 10 compresses the video image, the identification information of the object included in the target region in the drawing in the first database 103 is "a", the position information of the target region in the first video image includes the pixel position of the target region at the peripheral boundary in the video image, and meanwhile, if the resolution of the target region in the video image is 64 × 64, it is equivalent to performing reduction processing on the image stored in the first database 103 and having the resolution of 128 × 128, so that the target region can be marked with conversion information of "reducing by one time", and the conversion information may further include conversions such as rotation and stretching.

Accordingly, since the first database 103 stores the images of the objects satisfying the preset condition among the N video images, that is, the "cropped" portions of the incomplete video image in the second compressed data 106 are stored in the first database 103. Therefore, when compressing the video file, the first device 10 may also compress the image of the object stored in the first database 103 to obtain the first compressed data 104.

In this embodiment, the step of obtaining the first compressed data 104 by the first device 10 may be performed at any time after determining the first database 103, and is independent from the step of obtaining the second compressed data 106 by the first device 10, and the order of the steps may not be limited.

Finally, in one implementation, the first compressed data 104 and the second compressed data 106 may be compressed files after the video file 101 is compressed, and after the first device 10 generates the first compressed data 104 and the second compressed data 106, the first compressed data and the second compressed data are sent to the second device 20; or, in another implementation manner, after the first device 10 obtains the first compressed data 104 and the second compressed data 106, the two compressed data may be combined into final third compressed data 107, which is a compressed file after the entire video file 101 is compressed, and after the first device 10 generates the third compressed data 107, the third compressed data 107 may be sent to the second device 20, which is not limited in this embodiment.

Optionally, in this embodiment, specific data compression processing manners of obtaining the first compressed data 104 and obtaining the second compressed data 105 by the first device 10 may be the same or different, and for example, both may be compressed by using a video compression protocol such as h.264 or h.265.

Subsequently, when the second device 20 receives the compressed data sent by the first device 10 in the manner shown in the embodiment of fig. 8, the compressed data may be decompressed to obtain a complete video file. Specifically, fig. 9 is an exemplary flowchart of an embodiment of the video image processing method provided in the present application, and specifically shows a processing flow of decompressing a compressed packet to obtain a video file.

In the embodiment shown in fig. 9, the second device 20 as the execution subject first receives the third compressed data 107 transmitted by the first device 10, and obtains the first compressed data 104 and the second compressed data 106 from the third compressed data 107. Alternatively, the second device 20 may directly receive the first compressed data 104 and the second compressed data 106 transmitted by the first device 10.

Subsequently, the second apparatus 20 may decompress the first compressed data 104 and the second compressed data 106, respectively, wherein decompressing the first compressed data 104 may result in an object set comprising images of a plurality of objects, for example, images of objects labeled a-Z shown in fig. 6 as the image set, and may store the images of the objects a-Z in the image set in the second database 108 of the second apparatus 20. That is, after the second device 20 decompresses the first compressed data, the first database in the first device 10 may be restored, and images of objects included in the first database may be stored in the second data 108.

The second device 20 decompresses the second compressed data 106 to obtain N video images, none of the N video images includes the target region of the object stored in the second database 108, and each image includes the label information of the target region, the label information including: at least one of position information of the target area, rotational transformation information of the object, or identification information of the object included in the target area in the first database.

Alternatively, if the images of the objects in the first compressed data 104 are represented by the frame number and the position information of a certain video image in the video file where the image is located, for example, the identified object may be recorded in the first database 103 by the 10 th frame in the video file and the pixel position in the 10 th frame of the video image. The second device 20 acquires an image of the object from the pixel position of the 10 th frame in the second compressed data 106 after obtaining the first compressed data 104.

Finally, the second device 20 performs image stitching on the N video images after determining the images of the objects in the target area from the second database according to the mark information of the target area in each video image, so as to restore the video images. Illustratively, if the first video image currently being processed by the second apparatus 20, the marking of the target area includes: "a", the boundary pixels and "reduce by one time", the image corresponding to the object a may be obtained from the second database 108, and then the image is reduced by one time, and the image is placed at the position where the boundary pixels in the first video image are located, so as to implement the stitching of the video images. Then the second device 20 completes the splicing of all N video images to finally obtain a complete video file.

In summary, in the video image processing method provided in this embodiment, when compressing a video file, the first device stores images of objects satisfying a preset condition in all video images of the video file into the database through the first machine learning model, identifies a target area including an object in the first database in each video image through the second machine learning model, and then respectively performs compression coding on the object in the first database and an area except the target area in each video image, thereby finally obtaining a compressed file of the entire video file. In the process, as the first database is subjected to unified compression coding, when the first video image in the video file is subjected to compression coding subsequently, the first device does not need to compress the target area in the first video image, but only needs to compress and code an area except the target area of the object existing in the database in the video image, and for the image of the object which is stored in the first database of the first device and meets the preset condition, the second database of the second device is also stored, so that after the second device receives the second compressed data, the second device can decompress a third video image obtained according to the second compressed data, and combine with the image of the target area stored in the second compressed data, and finally obtain the first image. Therefore, according to the embodiment, when the first device transmits the video file to the second device in a non-real-time manner, the first device does not need to repeatedly compress the target area of the object which often appears in the video image, and based on the image of the target area which is already stored in the second database of the second device, the first device only needs to compress the area except the target area, so that the compressed data amount is reduced, and the target area does not need to be repeatedly decompressed for the second device, so that the data amount of the compressed packet transmitted between the first device and the second device is reduced, and the efficiency of video image processing is improved. In particular, in the embodiment, the first database stores the image including only the object, and the provided machine learning model also identifies and compares the object based on the image itself of the object, so that it is not necessary to use the technique shown in fig. 3 to divide the image into image blocks with different sizes and a large number, and process the image blocks one by one, thereby reducing the amount of computation during video image processing, and further improving the efficiency of the whole video file during compression coding. And the method and the device for compressing and coding the video file judge whether the objects in all the video files meet the preset conditions or not based on the whole video file, prevent one object from being repeatedly identified and compared and further improve the efficiency of the video file in compression and coding.

2. And transmitting the scene in real time.

When the method for processing a video image provided in the embodiment of the present application is applied to the scene shown in fig. 1, and is used to transmit the first video image acquired in real time, it is necessary to compress the acquired first video image as soon as possible and send the compressed first video image to the second device 20. Specifically, fig. 10 is an exemplary flowchart of an embodiment of the video image processing method provided in the present application, which illustrates a processing flow of the first apparatus 10 shown in fig. 1 when compressing the real-time first video image. In the present embodiment, when the present embodiment is applied to a scene that needs to ensure real-time property of a video file, for example, monitoring a video backhaul, a video file acquired by the first apparatus 10 is currently generated in real time and needs to be immediately sent to the second apparatus 20, so that after receiving a frame of video image, the first apparatus 10 needs to perform compression coding processing on the video image in time and send the video image to the second apparatus 20 in time, and continuously receive a new video image and repeat the process.

In the embodiment shown in fig. 10, the first device 10 as the executing subject first acquires the first video image 201 that needs to be transmitted to the second device 20 in real time, where the first video image 201 may be one of a continuous video file, the first video file 201 may be designated by the user for transmission, may be captured by the first device 10, or may be acquired by the first device 10 through the internet and needs to be transmitted to the second device 20 in real time.

After acquiring the video image 201, the first device 10 first compares the video image 201 with the images of the objects already stored in the first database 204 through the second machine learning model 207, and determines a region in which at least one object stored in the first database 204 is located in the video image 201, which is denoted as a target region. For example, in the example shown in fig. 7, the second machine learning model 207 determines a target region including the object a in the video image 201 from the image of the object a stored in the database 204. Subsequently, the first device 10 "crops" the target area in the video image 201, compresses the video image except for the target area in the video image 201 to obtain the second compressed data 208, and sends the second compressed data 208 to the second device on the receiving end. In addition, in order to facilitate marking of the target area of the video image, the second compressed data 208 may further include marking information of the target area, for example, the marking information includes: at least one of position information of the target area in the first video image, transformation information, or identification information of an object included in the target area in a database.

Since the second machine learning model 207 has compressed and encoded the region outside the target region in the first video image 201 and transmitted to the second device at the receiving end, for the object a included in the target region in the first video image 201, the first device 10 performs compression and encoding from the first database 204 to generate the first compressed data 205, and transmits the first compressed data 205 to the second device, so that the second device can obtain the video image 201 by combining the first compressed data after receiving the second compressed data 208.

In a specific implementation, the first database 204 may store a plurality of images of the object in advance, so that the first device 10 may compare the images with the database 204 after acquiring the first video image, in another specific implementation, when the first device 10 transmits a video file including N video images to the second device 20 in real time, when the first device 10 transmits the first M video images of the N video images, which are marked as the second video image (M < N), the first device 10 does not directly recognize the first M video images through the second machine learning model 207, but recognizes the first machine learning model first, and stores the image of the object in the second video image, which meets the preset condition, in the first database 204, and then, when transmitting the video images after M video images of the N video images, the first device 10 compares the second machine learning model 207 with the first database 204. For the M video images, the method used by the first apparatus 10 in the present application is not limited, for example, after the video image is roughly divided into several regions, compression encoding of different parameters may be performed according to the characteristics of the several regions, for example, a region including a background may use a larger residual value and take fewer high-frequency components, a region possibly including an object may use a smaller residual value and take more high-frequency components, and the like; alternatively, the compression encoding may be performed by using an existing video compression protocol such as h.264, h.265, or h.266.

Further, since the processed video image is real-time in this embodiment, the first device 10 cannot directly determine the object that meets the preset condition and is included in the video file where the video image is located when transmitting a single video image, where the preset condition may include: in N video images before the first video image, the number of the video images including the object is more than or equal to M, wherein M and N are positive integers, N is more than 1, and M is less than N. That is, the object in the first database 204 can be determined to be an object that meets the preset condition to be added to the first database 204 after being combined with N video images before the first video image, so that the object stored in the first database 204 may not cover all the objects that meet the preset condition in the current video file at each time, for example, in the example shown in fig. 10, it is assumed that the time when the first apparatus 10 processes the video image 201 is the first time, the video image 201 includes the object a and the object B, but since only the image of the object a is stored in the first database 204, the second machine learning model 207 can only recognize the target region that includes the object a in the video image, even though the N video images in which the object B in the video image 201 includes the first video image 201 at the first time meet the preset condition, since the image of the object B is not stored in the first database 204 (since the N +1 image before the first video image is 1 image before the first video image, the N video image meets the preset condition, the size of the object B in the first database 204 may not be reduced by the size of the second machine learning model, and the object B may not be reduced into the target region 204 at each time. Therefore, during the process of processing the video image, the first device 10 further identifies the objects included in the processed video image 201 through the first machine learning model 202, and determines the objects other than the background, such as the object a and the object B shown in fig. 10. Alternatively, as in fig. 10, the order of the two processing steps of the first apparatus 10 processing the video image 201 by the second machine learning model 207 and processing the video image 201 by the first machine learning model 202 is not limited, or the two steps may also be executed simultaneously. Subsequently, the identified object is managed by the management module 203 in the first device 10. The management of the proceeding at least includes: the addition, deletion, and replacement of the image of the object stored in the first database 204 are separately described below with reference to examples.

1. And adding the object. The management module 203 adds, to the database 204, an image of an object that appears more than M times (M < N) and is not stored in the current database, specifically according to the number of objects included in N previous video images (including the current video image 201) of the currently processed first video image 201. For example, for the first machine learning model 202 in fig. 10 to identify the object a and the object B in the first video image 201, and determine that the object not stored in the first database 204 is the object B, the management module 203 needs to add the image of the identified object B to the first database 204 according to that the video image 201 includes the object B in 5 previous 10 video images, that is, the object B has been accumulated for 5 times.

Optionally, the management module 203 may cache all previous images including the object B in the first video image, and when it is determined that the image of the object B is to be added to the database 204 subsequently, may store the image of the object B with the highest resolution in the first database 204 from the cache, so as to improve the definition of the image of the object B processed in the subsequent compression process.

Further, after the management module 203 adds a new object B into the first database 204, the first device 10 may immediately perform compression encoding on the image of the newly added object B, record the obtained compressed data as third compressed data, and send the third compressed data to the second device, so that the second device decompresses the third compressed data and stores the image of the object B into the second database of the second device. Or, after the management module 203 adds the new object B to the first database 204, the first device 10 may further perform compression encoding on the whole first database 204 after the image of the new object B is added, and record the obtained compressed data as fourth compressed data, and send the fourth compressed data to the second device, so that the second device updates the second database of the second device after compressing the fourth compressed data.

Therefore, even in the example shown in fig. 10, at this time, the second machine learning model 207 of the first device 10 cannot recognize the target region of the object B in the currently processed first video image 201, but when the first device 10 performs a process such as recognition on a subsequent video image, the second machine learning model 207 can recognize more objects including the object B in the first database 204. For example, fig. 11 is an exemplary flowchart of an embodiment of a video image processing method provided by the present application, and illustrates a process performed by the first apparatus 10 on a video image 301 after the first video image 201 as shown in fig. 10, where a time when the first apparatus 10 processes the first video image 201 is a first time, and a time when the first apparatus 10 transmits third compressed data or fourth compressed data to the second apparatus 20 is a second time, and then at the third time after the first time and the second time, after the first apparatus 10 receives the video image 301, since an image of an object B is already stored in the database 204 and the image of the object B is already transmitted to the second apparatus 20 through the third compressed data or the fourth compressed data, the first apparatus 10 can determine, through the second machine learning model 207, a target area including the object a and the object B in the video image 301 according to the images of the object a and the object B stored in the first database 204. Subsequently, the first device 10 "crops" the target area in the video image 301, performs compression coding on the video image except for the target area in the video image 301 to obtain the second compressed data 208, and sends the second compressed data 208 to the second device on the receiving end.

2. Deletion of the object. The management module 203 specifically deletes the object whose number of occurrences is less than Y (Y < X) times and the image of the object stored in the first database 204 from the first database 204 according to the number of X video images (including the current first video image 201) before the currently processed first video image 201. For example, for the first machine learning model 202 in fig. 7 to identify the object a and the object B in the first video image 201, assuming that the object a appears only 1 time and less than Y =2 times in 10 video images before the currently processed first video image 201, at this time, the management module 203 may delete the image of the object a stored in the first database 204, so that the second machine learning model 207 may reduce the number of images of the object compared in the database when processing the subsequent video images, thereby further improving the efficiency.

3. Replacement of the object. The management module 203 specifically compares the image of the identified object in the currently processed first video image 201 with the image of the same object stored in the first database 204, and stores the image of the object a in the first video image 201 into the first database 204 and deletes the image of the object a stored in the original first database 204 if the resolution of the image of the object a in the first video image 201 is 128 × 128 greater than the resolution of the image of the object a in the first database 204 of 64 × 64.

Subsequently, when the second device 20 receives the compressed data sent by the first device 10 in the manner shown in the embodiment shown in fig. 10, the compressed data is decompressed to obtain a first video image. Specifically, fig. 12 is an exemplary flowchart of an embodiment of the video image processing method provided in the present application, where a processing flow of the second apparatus 20 in decompressing the compressed data to obtain the first video image is shown.

In the embodiment shown in fig. 12, the second device 20 as the execution subject receives the first compressed data 205 and the second compressed data 208 sent by the first device 10, where the two compressed data may be received by the second device 20 at different times, and the second device 20 receives the first compressed data 205 first and then receives the second compressed data 208.

After receiving the first compressed data 205, the second device 20 may decompress the first compressed data 205 to obtain images of a plurality of objects, for example, images of objects denoted by reference numerals a, B \8230 \8230shownin fig. 9, as an image set, and may store the image set in the database 210 of the second device 20.

When the second device 20 receives the second compressed data 208, it may decompress the second compressed data 208 to obtain a third video image 211, where the third video image 211 does not include a target region of an object in the second database 210, and may also receive mark information of the target region included in the first video image sent by the first device 10 when the second compressed data 208 is received, where the mark information includes: at least one of position information, transformation information, or identification information of an object included in the target region in the first database in the first video image.

Finally, the second device 20 determines the image of the object in the target area from the second database 210 according to the mark information of the target area in the first video image, and then performs image stitching to restore the video image 201. Illustratively, if the current second apparatus 20 is processing the first video image, the marking of the target area includes: "a", boundary pixels and "reduce by one time", the image corresponding to the object a may be obtained from the database, and then the image is reduced by one time, and the image is placed at the position of the boundary pixel in the decompressed third video image, so as to implement the stitching of the current video image 201, and finally obtain the first video image 201.

In summary, the video image processing method provided in this embodiment is applied to a first device that, when compressing a first video image acquired in real time, identifies a target area in the video image, which includes an object stored in a first database, through a second machine learning model, and then compresses and transmits areas other than the target area in the video image. In the process, the object in the first video image can be identified through the first machine learning model, and the image of the object stored in the first database can be added, deleted and modified. In the process of compressing and encoding the first video image, the first device compresses the image of the object in the first database uniformly without compressing the target area in the first video image, only the area of the video image except the target area of the object stored in the database needs to be compressed, and for the image of the object which is stored in the first database of the first device and meets the preset condition, the second database of the second device is also stored, so that the second device can decompress the third video image obtained according to the second compressed data after receiving the second compressed data, and finally obtain the first image by combining the image of the target area stored in the second compressed data. Therefore, according to the embodiment, when the first device transmits the first video image to the second device in real time, the first device does not need to repeatedly compress the target area of the object frequently appearing in the video image, and based on the image of the target area already stored in the second database of the second device, the first device only needs to compress the area except the target area, so that the compressed data volume is reduced, and the target area does not need to be repeatedly decompressed for the second device, so that the data volume of the compressed packet transmitted between the first device and the second device is reduced, and the efficiency of video image processing is improved. In particular, in the embodiment, the images only including the object are stored in the first database, and the provided machine learning model also identifies and compares the object based on the image itself of the object, so that it is not necessary to divide the image into image blocks with different sizes and a large number and process the image blocks one by one as shown in fig. 3, thereby reducing the amount of calculation in video image processing and further improving the efficiency of the whole video file in compression coding. Moreover, when the continuous video images are coded and identified, the method and the device can continuously update the objects stored in the database in real time, so that the images of the objects stored in the database are up-to-date, the database can be ensured to be used in subsequent comparison, and the efficiency of the video file in compression coding is further improved.

In the foregoing embodiments, the parameter determination method provided in the embodiments of the present application is described, and in order to implement each function in the parameter determination method provided in the embodiments of the present application, the network device and the terminal device serving as execution subjects may include a hardware structure and/or a software module, and implement each function in the form of a hardware structure, a software module, or a hardware structure and a software module. Whether any of the above functions is implemented as a hardware structure, a software module, or a combination of a hardware structure and a software module depends upon the particular application and design constraints imposed on the technical solution.

Fig. 13 is a schematic structural diagram of an embodiment of a video image processing apparatus provided in the present application, where the apparatus shown in fig. 13 can be used as the first apparatus 10 in the scene shown in fig. 1, and perform the functions performed by the first apparatus in the embodiment shown in fig. 4, and specifically, the apparatus includes: an obtaining module 1301, a first determining module 1302, a compressing module 1303 and a sending module 1304. The obtaining module 1301 is configured to obtain a first video image; the first determining module 1302 is configured to determine a target area in a first video image; the target area comprises an image of an object which is stored in a first database of the first device and meets a preset condition; the compression module 1303 is configured to compress an area in the first video image except the target area to obtain second compressed data; a sending module 1304 is configured to send the second compressed data to the second apparatus; the second database of the second device stores images of objects meeting preset conditions.

Optionally, the compression module 1303 is further configured to compress an image of an object stored in the first database to obtain first compressed data; the sending module 1304 is further configured to send the first compressed data to the second apparatus; the first compressed data is used by the second device to determine a second database.

Optionally, the sending module 1304 is specifically configured to send the second compressed data and the label information of the target area to the second apparatus; wherein the marking information includes: at least one of position information of a target area in the first video image, identification information or transformation information of an image of an object included in the target area in the first database; the transformation information is used to represent the difference between the image of the object in the target region in the first database and the first video image.

Optionally, the preset conditions include: in N video images before the first video image, the number of the video images including the object is larger than or equal to M, wherein M and N are positive integers, N is larger than 1, and M is smaller than N.

Fig. 14 is a schematic structural diagram of an embodiment of a video image processing apparatus provided in the present application, and the apparatus shown in fig. 14 further includes, on the basis of fig. 13: a second determination module 1305, and a storage management module 1306. The apparatus shown in fig. 14 may be configured to execute the video image processing method shown in fig. 10, for example, the second determining module 1305 is configured to identify a target object in the first video image, which meets a preset condition; the storage management module 1306 is configured to add an image corresponding to a new target object in the target objects to the first database, where the new target object is an object that is not stored in the first database, and the first database is stored in the storage module.

Optionally, the compression module 1303 is further configured to compress an image corresponding to the new target object to obtain third compressed data; the sending module 1304 is further configured to send the third compressed data to the second device.

Optionally, after the storage management module 1306 adds the image corresponding to the new target object in the target object into the first database, the compression module 1303 is further configured to compress the image of the object stored in the first database to obtain fourth compressed data; the sending module 1304 is further configured to send the fourth compressed data to the second device.

Optionally, the storage management module 1306 is further configured to delete the image of the object that does not meet the preset condition and is stored in the first database.

Optionally, the storage management module 1306 is further configured to replace the image of the first object stored in the first database with the image of the first object in the first video image when the definition of the image of the first object in the target area in the first video image is better than the definition of the image of the first object stored in the first database.

Optionally, the first video image is a video image with a frame number greater than a preset frame number in a video file being compressed and transmitted by the first device in real time; the obtaining module 1301 is further configured to obtain a second video image in the video to be processed, where a frame number of the second video image in the video to be processed is smaller than a preset frame number; the second determining module 1305 is further configured to identify an object in the second video image, where the object meets a preset condition; the storage management module 1306 is further configured to store an object in the second video image, where the object meets a preset condition, in the first database.

Optionally, the preset conditions include: in the video file where the first video image is located, the number of the video images including the object is greater than or equal to a preset number.

Fig. 15 is a schematic structural diagram of an embodiment of a video image processing apparatus provided in the present application, where the apparatus shown in fig. 15 further includes, on the basis of fig. 13: a third determination module 1307 and a storage management module 1306. The apparatus shown in fig. 15 can be used to execute the video image processing method shown in fig. 8, and exemplarily, the third determining module 1307 is used to identify objects meeting preset conditions in all video images in a video file; the storage management module 1306 is configured to store the image of the object meeting the preset condition in the first database.

Optionally, the first database stores images of objects comprising: the border pixel locations of the object and the frame number of the video image in the video file that includes the object.

Fig. 16 is a schematic structural diagram of an embodiment of a video image processing apparatus provided in the present application, and the apparatus shown in fig. 16 may be used as the second apparatus 20 in the scene shown in fig. 1, and perform the functions performed by the second apparatus in the embodiment shown in fig. 4, specifically, the apparatus includes: a receiving module 1601, a decompressing module 1602, an obtaining module 1603 and a determining module 1604. Illustratively, the receiving module 1601 is configured to receive second compressed data sent by the first device; the second compressed data is obtained by compressing the area except the target area in the first video image; the decompression module 1602 is configured to decompress the second compressed data to obtain a third video image, where the third video image includes an image corresponding to an area other than the target area in the first video image; the obtaining module 1603 is configured to obtain an image corresponding to the target area from a second database of the second device; the determining module 1604 is configured to determine the first video image according to the third video image and the image corresponding to the target area.

Fig. 17 is a schematic structural diagram of an embodiment of a video image processing apparatus provided in the present application, where the apparatus shown in fig. 17 further includes, on the basis of fig. 16: a storage management module 1605. In this embodiment, the receiving module 1601 is further configured to receive first compressed data sent by the first device; the decompression module 1602 is further configured to decompress the first compressed data to obtain an image set corresponding to an object meeting a preset condition, where the image set includes an image corresponding to a target region; the storage management module 1605 is configured to store the image set in the second database.

Optionally, the receiving module 1601 is further configured to receive tag information of the target area sent by the first device; wherein the marking information includes: location information of a target area in the first video image, at least one of identification information or transformation information of an object included in the target area in a first database of the first device; the transformation information is used to represent the difference between the image of the object in the target region in the first database and the first video image.

Optionally, the determining module 1604 is specifically configured to splice the image corresponding to the target area and the third video image according to the mark information of the target area, so as to obtain the first video image.

Optionally, the receiving module 1601 is further configured to receive third compressed data sent by the first device; the decompression module 1602 is further configured to decompress the third compressed data to obtain a new image of the target object; the storage management module 1605 is also configured to add the image of the new target object to the second database.

Optionally, the receiving module 1601 is further configured to receive fourth compressed data sent by the first device; the decompression module 1602 is further configured to decompress the fourth compressed data to obtain an updated image set corresponding to the object meeting the preset condition; the storage management module 1605 is further configured to update the second database based on the updated image set corresponding to the object meeting the preset condition.

Optionally, the preset conditions include: in N video images before the first video image, the number of the video images including the object is more than or equal to M, wherein M and N are positive integers, N is more than 1, and M is less than N; alternatively, the preset conditions include: in the video file where the first video image is located, the number of the video images including the object is greater than or equal to a preset number.

The method executed by each module in each embodiment of the video image processing apparatus provided by the present application may refer to the description in the video image processing method described in the present application, and the implementation manner and principle thereof are the same and are not repeated.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can all be implemented in the form of software invoked by a processing element; or can be implemented in the form of hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the determining module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the determining module is called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when some of the above modules are implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The computer readable storage medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

Fig. 18 is a schematic structural diagram of an embodiment of a video image processing apparatus provided in the present application, which can be used as the first apparatus or the second apparatus described in any one of the foregoing embodiments of the present application, and execute a video image processing method executed by the corresponding apparatus. As shown in fig. 18, the communication apparatus 1100 may include: a processor 111 (e.g., a CPU) and a transmission interface, which may be a transceiver 113; the transceiver 113 is coupled to the processor 111, and the processor 111 controls the transceiver 113 to transmit and receive. Optionally, the communication device 1100 further includes a memory 112, the memory 112 may store software instructions, and the processor 111 is configured to read the software instructions stored in the memory 112, so as to perform various processing functions and implement the method steps performed by the first device or the second device in the embodiments of the present application.

Optionally, the video image processing apparatus according to the embodiment of the present application may further include: a power supply 114, a system bus 115, and a communication interface 116. The transceiver 113 may be integrated in a transceiver of the video image processing apparatus or may be a separate transceiving antenna on the communication apparatus. The system bus 115 is used to implement communication connections between the elements. The communication interface 116 is used for implementing connection communication between the communication device and other peripheral devices.

In this embodiment, the processor 111 is configured to be coupled to the memory 112, and read and execute the instructions in the memory 112 to implement the method steps performed by the first apparatus or the second apparatus in the above method embodiments. The transceiver 113 is coupled to the processor 111, and the processor 111 controls the transceiver 113 to perform message transceiving, which is similar in implementation principle and technical effect and will not be described herein again.

The system bus mentioned in fig. 18 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but does not indicate only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The memory may be a nonvolatile memory such as a Hard Disk Drive (HDD) or SSD, and may also be a volatile memory (RAM), such as a random-access memory (RAM). The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

The processor referred to in this fig. 18 may be a general purpose processor including a CPU, a Network Processor (NP), etc.; but also DSPs, ASICs, FPGAs or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

Optionally, an embodiment of the present application further provides a computer-readable storage medium, where the storage medium stores instructions that, when executed by a computer or a processor, cause the computer or the processor to implement the video image processing method executed by the first apparatus or the second apparatus in the foregoing embodiments of the present application.

Optionally, an embodiment of the present application further provides a chip for executing the instruction, where the chip is configured to execute the video image processing method executed by any one of the first device and the second device in the foregoing embodiments of the present application.

The present application further provides a computer program product, which includes instructions that, when executed on a computer or a processor, cause the computer or the processor to implement the video image processing method executed by one of the first apparatus or the second apparatus in the foregoing embodiments of the present application.

It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for convenience of description and distinction and are not intended to limit the scope of the embodiments of the present application.

It should be understood that, in the embodiment of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

A video image processing method is applied to a first device and is characterized by comprising the following steps of;

acquiring a first video image;

determining a target region in the first video image; the target area comprises an image of an object which is stored in a first database of the first device and meets a preset condition;

compressing regions except the target region in the first video image to obtain second compressed data;

transmitting the second compressed data to a second device; the second database of the second device stores the image of the object meeting the preset condition.
The method of claim 1, wherein prior to acquiring the first video image, further comprising:

compressing the image of the object stored in the first database to obtain first compressed data;

transmitting the first compressed data to the second device; the first compressed data is used by the second device to determine the second database.
The method of claim 1 or 2, wherein the sending the second compressed data to a second device comprises:

transmitting the second compressed data and the label information of the target area to the second device; wherein the marking information includes: location information of a target area in the first video image, at least one of identification information or transformation information of an image of an object included in the target area in the first database; the transformation information is used to represent a difference between an image of an object in the target region in a first database and the first video image.
The method according to any one of claims 1 to 3,

the preset conditions include: in N video images before the first video image, the number of the video images including the object is greater than or equal to M, wherein M and N are positive integers, N is greater than 1, and M is less than N.
The method of claim 4, wherein after the obtaining the first video image, further comprising:

identifying a target object which meets the preset condition in the first video image;

and adding an image corresponding to a new target object in the target objects into the first database, wherein the new target object is an object which is not stored in the first database.
The method of claim 5, wherein after adding the image corresponding to the new one of the target objects to the first database, further comprising:

compressing the image corresponding to the new target object to obtain third compressed data;

and sending the third compressed data to the second device.
The method of claim 5, wherein after adding the image corresponding to the new one of the target objects to the first database, further comprising:

compressing the image of the object stored in the first database to obtain fourth compressed data;

and sending the fourth compressed data to the second device.
The method according to claim 7, wherein after identifying the target object in the first video image that meets the preset condition, the method further comprises:

and deleting the images of the objects which do not accord with the preset conditions and are stored in the first database.
The method of claim 7, wherein after the obtaining the first video image, further comprising:

replacing the image of the first object stored in the first database with the image of the first object in the first video image when the sharpness of the image of the first object in the target area in the first video image is better than the sharpness of the image of the first object stored in the first database.
The method according to any one of claims 4 to 9,

the first video image is a video image with a frame number larger than a preset frame number in a video file which is compressed and transmitted by the first device in real time;

before the acquiring the first video image, the method further comprises:

acquiring a second video image in the video file, wherein the frame number of the second video image in the video file is smaller than the preset frame number;

and identifying the object which meets the preset condition in the second video image, and storing the object in the first database.
The method according to any one of claims 1 to 3,

the preset conditions include: and the number of the video images comprising the object in the video file in which the first video image is positioned is more than or equal to a preset number.
The method of claim 11, wherein prior to acquiring the first video image, further comprising:

identifying objects which accord with preset conditions in all video images in the video file;

and storing the image of the object meeting the preset condition into the first database.
The method according to claim 11 or 12,

the first database stores images of objects including: a boundary pixel location of the object and a frame number of a video image in a video file that includes the object.
A video image processing method applied to a second device is characterized by comprising the following steps:

receiving second compressed data sent by the first device; the second compressed data is obtained by compressing the area except the target area in the first video image;

decompressing the second compressed data to obtain a third video image, wherein the third video image comprises an image corresponding to an area except the target area in the first video image;

acquiring an image corresponding to the target area from a second database of the second device;

and determining the first video image according to the third video image and the image corresponding to the target area.
The method of claim 14, further comprising:

receiving first compressed data sent by the first device;

decompressing the first compressed data to obtain an image set corresponding to an object meeting a preset condition, and storing the image set in the second database; the image set comprises images corresponding to the target area.
The method of claim 14 or 15, further comprising:

receiving mark information of the target area sent by the first device; wherein the marking information includes: location information of a target region in the first video image, at least one of identification information or transformation information of an object included in the target region in a first database of the first device; the transformation information is used to represent a difference between an image of an object in the target region in a first database and the first video image.
The method according to claim 16, wherein determining the first video image according to the third video image and the image corresponding to the target area comprises:

and splicing the image corresponding to the target area and the third video image according to the marking information of the target area to obtain the first video image.
The method of any of claims 14-17, wherein after determining the first video image, further comprising:

receiving third compressed data sent by the first device;

and decompressing the third compressed data to obtain a new image of the target object, and storing the new image into the second database.
The method according to any of claims 14-17, wherein after determining the first video image, further comprising:

receiving fourth compressed data sent by the first device;

decompressing the fourth compressed data to obtain an image set corresponding to the updated object meeting the preset condition;

and updating the second database based on the image set corresponding to the updated object meeting the preset condition.
The method of claim 15,

the preset conditions include: in N video images before the first video image, the number of the video images including the object is more than or equal to M, wherein M and N are positive integers, N is more than 1, and M is less than N;

or, the preset conditions include: and in the video file where the first video image is located, the number of the video images including the object is greater than or equal to a preset number.
A video image processing apparatus characterized by comprising:

the acquisition module is used for acquiring a first video image;

a first determination module for determining a target region in the first video image; the target area comprises an image of an object which is stored in a first database of the first device and meets a preset condition;

the compression module is used for compressing the area except the target area in the first video image to obtain second compressed data;

a sending module, configured to send the second compressed data to a second apparatus; the second database of the second device stores the image of the object meeting the preset condition.
The apparatus of claim 21,

the compression module is further used for compressing the image of the object stored in the first database to obtain first compressed data;

the sending module is further configured to send the first compressed data to the second apparatus; the first compressed data is used by the second device to determine the second database.
The apparatus of claim 21 or 22,

the sending module is specifically configured to send the second compressed data and the tag information of the target area to the second device; wherein the marking information includes: location information of a target area in the first video image, at least one of identification information or transformation information of an image of an object included in the target area in the first database; the transformation information is used to represent a difference between an image of an object in the target region in a first database and the first video image.
The apparatus of any one of claims 21-23,

the preset conditions include: and in N video images before the first video image, the number of the video images including the object is more than or equal to M, wherein M and N are positive integers, N is more than 1, and M is less than N.
The apparatus of claim 24, further comprising:

the second determining module is used for identifying a target object which meets the preset condition in the first video image;

and the storage management module is used for adding an image corresponding to a new target object in the target objects into the first database, wherein the new target object is an object which is not stored in the first database, and the first database is stored in the storage module.
The apparatus of claim 25,

the compression module is further used for compressing the image corresponding to the new target object to obtain third compressed data;

the sending module is further configured to send the third compressed data to the second device.
The apparatus of claim 25, wherein after the storage management module adds the image corresponding to the new one of the target objects to the first database,

the compression module is further configured to compress the image of the object stored in the first database to obtain fourth compressed data;

the sending module is further configured to send the fourth compressed data to the second device.
The apparatus of claim 27,

the storage management module is further configured to delete the image of the object that does not meet the preset condition and is stored in the first database.
The apparatus of claim 27,

the storage management module is further configured to replace the image of the first object stored in the first database with the image of the first object in the first video image when the sharpness of the image of the first object in the target area in the first video image is better than the sharpness of the image of the first object stored in the first database.
The apparatus of any one of claims 24-29,

the first video image is a video image with a frame number larger than a preset frame number in a video file which is compressed and transmitted by the first device in real time;

the acquisition module is further configured to acquire a second video image in the video file, where a frame number of the second video image in the video file is smaller than the preset frame number;

the second determining module is further configured to identify an object in the second video image that meets the preset condition;

the storage management module is further configured to store the object in the second video image, which meets the preset condition, in the first database.
The apparatus of any one of claims 21-23,

the preset conditions include: and in the video file where the first video image is located, the number of the video images including the object is greater than or equal to a preset number.
The apparatus of claim 31, further comprising:

the third determining module is used for identifying objects which accord with preset conditions in all video images in the video file;

and the storage management module is used for storing the image of the object meeting the preset condition into the first database.
The apparatus of claim 31 or 32,

the first database stores images of objects including: a boundary pixel location of the object and a frame number of a video image in a video file that includes the object.
A video image processing apparatus characterized by comprising:

the receiving module is used for receiving second compressed data sent by the first device; the second compressed data is obtained by compressing the area except the target area in the first video image;

a decompression module, configured to decompress the second compressed data to obtain a third video image, where the third video image includes an image corresponding to an area other than the target area in the first video image;

the acquisition module is used for acquiring an image corresponding to the target area from a second database of a second device;

and the determining module is used for determining the first video image according to the third video image and the image corresponding to the target area.
The apparatus of claim 34, further comprising: a storage management module;

the receiving module is further configured to receive first compressed data sent by the first device;

the decompression module is further configured to decompress the first compressed data to obtain an image set corresponding to an object meeting a preset condition, where the image set includes an image corresponding to the target region;

the storage management module is used for storing the image set into the second database.
The apparatus of claim 34 or 35,

the receiving module is further configured to receive the mark information of the target area sent by the first device; wherein the marking information includes: location information of a target region in the first video image, at least one of identification information or transformation information of an object included in the target region in a first database of the first device; the transformation information is used to represent a difference between an image of an object in the target region in a first database and the first video image.
The apparatus of claim 36,

the determining module is specifically configured to splice the image corresponding to the target area and the third video image according to the mark information of the target area, so as to obtain the first video image.
The apparatus of any one of claims 34-37,

the receiving module is further configured to receive third compressed data sent by the first apparatus;

the decompression module is further configured to decompress the third compressed data to obtain a new image of the target object;

the storage management module is further configured to add the image of the new target object to the second database.
The apparatus of any one of claims 34-37,

the receiving module is further configured to receive fourth compressed data sent by the first device;

the decompression module is further configured to decompress the fourth compressed data to obtain an updated image set corresponding to the object meeting the preset condition;

the storage management module is further configured to update the second database based on the updated image set corresponding to the object meeting the preset condition.
The apparatus of claim 35,

the preset conditions include: in N video images before the first video image, the number of the video images including the object is more than or equal to M, wherein M and N are positive integers, N is more than 1, and M is less than N;

or, the preset conditions include: and in the video file where the first video image is located, the number of the video images including the object is greater than or equal to a preset number.
A video image processing apparatus characterized by comprising: a processor and a transmission interface;

the device communicates with other devices through the transmission interface;

the processor is configured to read software instructions stored in the memory to implement the method of any one of claims 1-13 or 14-20.
A computer-readable storage medium having stored therein instructions which, when executed by a computer or processor, cause the computer or processor to carry out the method of any one of claims 1 to 13 or 14 to 20.
A computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to carry out the method according to any one of claims 1 to 13 or 14 to 20.