WO2021237464A1 - Video image processing method and device - Google Patents

Video image processing method and device Download PDF

Info

Publication number
WO2021237464A1
WO2021237464A1 PCT/CN2020/092377 CN2020092377W WO2021237464A1 WO 2021237464 A1 WO2021237464 A1 WO 2021237464A1 CN 2020092377 W CN2020092377 W CN 2020092377W WO 2021237464 A1 WO2021237464 A1 WO 2021237464A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
video image
video
database
compressed data
Prior art date
Application number
PCT/CN2020/092377
Other languages
French (fr)
Chinese (zh)
Inventor
吴更石
郭栋
张开明
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/092377 priority Critical patent/WO2021237464A1/en
Priority to CN202080101403.9A priority patent/CN115699725A/en
Publication of WO2021237464A1 publication Critical patent/WO2021237464A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • This application relates to data processing technology, and in particular to a video image processing method and device.
  • Video compression is a technology that recompresses video files. It can compress larger video files into smaller compressed files for transmission or storage without affecting the video content. It is common in network video playback and surveillance video. Transmission and other application scenarios that need to transmit or store video files.
  • H.264 also known as advanced video codec (AVC)
  • H.265 also known as high efficiency video coding (HEVC)
  • H.266 Video compression protocols, also known as the next-generation video coding standard (VVC)
  • VVC next-generation video coding standard
  • all video images in a video file are divided into different image packages. For example, each consecutive 64 frames of images is an image packet. Then when compressing each frame of image in each image package, each frame of image is divided into image blocks of different sizes. If a certain image block in the current image is similar to the image blocks in other compressed images When it is higher, it can be considered that the contents of the two image blocks in the two frames of images are the same.
  • the compressed image block can be used to represent the image block in the current image.
  • you only need to The area of the image except for a certain image block in the current image can be compressed, thereby reducing the amount of calculation when compressing the video file and improving the compression efficiency.
  • each frame of video image in the video file needs to be divided into blocks to obtain multiple image blocks, and different image blocks are compared to obtain similar image blocks, while for objects in the video image Areas with denser distribution and more boundaries need to set denser image blocks for identification and comparison, so that when each frame of video image is compressed, the number of image blocks processed is larger, which reduces the compression of video files.
  • the time efficiency ultimately leads to a lower efficiency of video image processing.
  • the present application provides a video image processing method and device, which are applied to compress video images, so as to solve the technical problem of low compression efficiency when compressing video images in the prior art, resulting in low efficiency of video image processing.
  • the first aspect of the present application provides a video image processing method
  • the execution subject is a first device that compresses video images, wherein the first device recognizes the target area of the object in the first database included in the first video image, and then The area other than the target area in a video image is compressed to obtain second compressed data and sent to the second device.
  • the first device does not need to compress the target area, but only needs to compress the area of the video image except the target area of the object existing in the database, and for the first device stored in the first device
  • the image of the object in the database that meets the preset conditions is also stored in the second database of the second device, so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data , Combined with the image of the target area stored in the second compressed data to finally obtain the first image. Therefore, in this embodiment, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image.
  • the first device Based on the image of the target area already stored in the second database of the second device, the first device only needs to The area outside the target area can be compressed, so that there is no need to repeatedly perform the compression operation on the recurring target area, thereby reducing the data volume of the final compressed packet and improving the efficiency of the first device for processing the video image.
  • the image of the object that meets the preset condition stored in the second database of the second device may be sent by the first device.
  • the first compressed data obtained by compressing the image of the object in the first database may be separately sent to the second device.
  • the first compressed data includes the compression result of the target area, so that the second device receives After the first compressed data is reached, the image set obtained by decompressing the first compressed data is stored in the second database.
  • the first device may simultaneously send the first compressed data and the second compressed data to the second device, and the second device may decompress the first compressed data first, so that after determining the second database, decompress the first compressed data.
  • the first device sends the first compressed data to the second device, and the second device decompresses the first compressed data and determines the second
  • the method described in the first aspect of the present application is executed to send the second compressed data to the second device.
  • the first device after the first device sends the first compressed data to the second device, it can compress the area in the first video image except the target area to obtain the second compressed data and send it to the second device.
  • Sending the second compressed data there is no need to wait until the second device decompresses the first compressed data to determine the second database, that is, the process of obtaining the second compressed data by the first device and the second device decompressing the first compressed data to obtain the second database
  • the process can be parallel.
  • the embodiment of the present application does not limit the sequence of the two processes.
  • the obtained first compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine
  • the second database enables the subsequent video images processed by the first device to only need to compress the area outside the target area, thereby reducing the image size and number of times the first device compresses the video image, thereby improving the video Image processing efficiency.
  • the first device since the first device only compresses the area outside the target area in the first video image, the third video image obtained by decompressing the second device according to the second compressed data is combined with The image of the target area stored in the second compressed data obtains the first video image, and in order to allow the second device to more quickly and accurately determine the positional relationship between the decompressed third video image and the target area in the first video image,
  • the first device as the compression terminal can synchronously send the tag information of the target area in the first video image to the first device, where the tag information includes the location information of the target area in the first video image.
  • the identification information of the image of the object included in the target area in the first database acquires at least one item of the transformation information. Therefore, in the video image processing method provided by this embodiment, the first device can also determine the marking information of the target area when determining the target area, and subsequently send the second compressed data and the marking information of the target area to the second at the same time. The device enables the second device to more quickly and accurately determine the target area in the first video image, and then can determine the first video image faster after receiving the second compressed data, further improving the processing efficiency of the video image .
  • the preset condition when the provided method is applied in a scene of real-time video image transmission, includes: among the N video images before the first video image, the video image of the object is included The number of is greater than or equal to M, where M and N are both positive integers, N>1, M ⁇ N.
  • this embodiment can be applied to the scene that the first device acquires the first video image in real time, that is, compressed and sent to the second device, then the first device removes the target area from the first video image through the foregoing embodiment.
  • the first database based on the first video image is obtained from the N+1 video image to the first video image, and the object meets the preset
  • the condition is that the number of occurrences of the object in the N video images, or the number of video images including the object, is greater than or equal to M. Therefore, the first database on which the first device determines the target area in this embodiment is also obtained based on the N images before the first video image, which satisfies the real-time nature of the first database and can be applied to the first device to In a scene where the first video image acquired in real time is compressed, it is used to improve the processing efficiency of the real-time video image by the first device.
  • the first device may also be based on the latest Update the first database with the acquired first video image. After the first video image is added, the judgment of the preset condition is based on the first video image and the previous N-1 video images, a total of N video images are obtained.
  • the image of the new target object is added to the first database to update the first database, thereby satisfying the real-time nature of the target area determined by the first device when processing the video image.
  • the first device when the first device adds the image of the new target object to the first database, it can send the image of the newly added target object to the second device, so that the second device can The side updates its stored second database to keep the consistency between the first database and the second database.
  • the first device may compress the image of the new target object, and then send the third compressed data obtained to the second device; or, in another implementation manner, the first device may add After the first database after the new target object is compressed as a whole, the obtained fourth compressed data is sent to the second device.
  • the second database may be updated, so that the updated second database stores the new target object.
  • the area that includes the new target object in the video image can be used as the target area without compression, and the second device receives the compressed data that does not include the target area. After decompression, combined with obtaining the image of the new target object from the second database, the video image is finally obtained.
  • the first device in addition to facing the image of the newly added object in the first database, may also delete the object stored in the first database after the object stored in the first database does not meet a preset condition. Image to save the storage space of the first database and improve the utilization efficiency of the storage space of the first device.
  • the first device may also replace the image of the object stored in the first database when the definition of the object in the target area in the first video image is relatively high.
  • the first device detects the target area in the first video image, if it is determined that the definition of the first object included in the target area is better than the definition of the first object stored in the database, it uses the The image of the first object replaces the image of the first object stored in the first database.
  • updates made to the first database can also be compressed by the first device and sent to the second device, so that the second device updates the second database.
  • the subsequent second device restores the first video image through the object in the second database, it can obtain better definition, and prevent the image definition of the object stored in the second database from being inferior to the actual object in the first video image.
  • the sharpness of the image, causing the problem that the target area in the first video image is not clear.
  • the first device when the provided method is applied in a scene of real-time video image transmission, or when no image is stored in the first database of the first device, for example, for a real-time transmitted video file, When the first device acquires the first video image, it cannot determine the target area based on the first database. Therefore, in order to ensure the completeness of the first device in processing the video image, when the first device acquires the video file with a frame number smaller than the preset When a video image with a frame number is set, the video image can be encoded as a whole, and an object that meets the preset condition is determined based on the part of the video image that is less than the preset frame number, and stored in the first database. Subsequently, after the first device establishes the first database with video images smaller than the preset frame number, after receiving the first video image larger than the preset frame number, the first device may execute the video image in the foregoing embodiment of the present application Approach.
  • the preset condition may be that the object is in the entire In the video file, the number of video images including the object is greater than or equal to the preset number. Therefore, the first database on which the first device determines the target area in this embodiment is also obtained based on all the images in the video file, thereby satisfying the completeness of the first database and ensuring that all objects added to the first database meet the expected requirements. If conditions are set, it can be applied to a scene where the first device compresses a non-real-time video file and the first video image is used to improve the processing efficiency of the first device on the real-time video image.
  • the first database in the scene of non-real-time video image transmission may also be obtained by the first device according to the video image.
  • the first device may first identify the image of the object that meets the preset conditions among all the video images of the video file and store it in the first database before transmitting the video file, and then treat each video image in the video image as the above-mentioned second image. A video image is processed.
  • the first device in addition to directly storing the image of the object that meets the preset condition in the second device, the first device does not recognize the object as the object before determining that the object meets the preset condition.
  • the target area will cause the second compressed data sent by the first device to the second device to include the image of the object. Therefore, in order to save the amount of data transmitted between the first device and the second device, the first device can replace the sent object only by at least one of the boundary pixel position or the frame number of the target area, so that the second device is After receiving this information, the image of the object can be obtained from the boundary pixel position of the corresponding frame number by itself.
  • the video image processing method provided by this embodiment can send only the boundary pixel position and frame number of the object image when the first device sends the image of the object in the first database to the second device, so that the second device can Obtain the target area from the received video image, thereby reducing the amount of data actually sent when the first device sends the first video image to the second device, making the first device compress faster and the second device faster Decompression further improves the processing efficiency of video images.
  • a second aspect of the present application provides a video image processing method.
  • the execution subject is a second device that receives compressed video files, wherein the second device decompresses the received second compressed data to obtain a first video image processing method that does not include the target area.
  • a video image is combined with the second database to determine the image of the object in the target area. After the two are stitched together, the first video image can be obtained. Therefore, for the case where the target area is included in different video images, the second device only needs to decompress the first compressed data once to obtain the image of the object in the target area, but there is no need to decompress the target area in other video images. Exhale directly from the second database, since the target area included in the video image is missing during decompression, the calculation amount of the device during decompression is reduced, and the efficiency of video image processing by the second device can also be improved.
  • the image of the object meeting the preset condition stored in the second database of the second device may be sent by the first device.
  • the image set obtained by decompressing the first compressed data is stored in the second database. Therefore, in this embodiment, after the first device can compress the image of the object included in the first database only once, the obtained first compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine
  • the second database allows the subsequent second device to only need to decompress the second compressed data when processing the video image, instead of repeatedly compressing objects in the target area, thereby reducing the number of second devices. The file size and times of decompression, thereby improving the processing efficiency of the video image.
  • the first device as the compression end may When sending the second compressed data, synchronously send to the first device the marking information of the target area in the first video image, where the marking information includes the position information of the target area in the first video image and the objects included in the target area
  • the identification information of the image in the first database acquires at least one item of the transformation information. Then for the second device, after receiving the first compressed data and the mark information of the target area, it can more quickly and accurately determine the target area in the first video image, and then it can be faster after receiving the second compressed data.
  • the first video image is determined accurately, which further improves the processing efficiency of the video image.
  • the first device in addition to directly storing the image of an object that meets the preset conditions in the second device, since the first device does not recognize the object as a target before determining that the object meets the preset conditions The area will cause the second compressed data sent by the first device to the second device to include the image of the object. Therefore, in order to save the amount of data transmitted between the first device and the second device, the first device can replace the sent object only by at least one of the boundary pixel position or the frame number of the target area, so that the second device is After receiving this information, the image of the object can be obtained from the boundary pixel position of the corresponding frame number by itself.
  • the video image processing method provided by this embodiment can send only the boundary pixel position and frame number of the object image when the first device sends the image of the object in the first database to the second device, so that the second device can Obtain the target area from the received video image, thereby reducing the amount of data actually sent when the first device sends the first video image to the second device, making the first device faster to compress and the second device to decompress faster. Compression further improves the processing efficiency of video images.
  • the first device when the provided method is applied in a scene of real-time video image transmission, after the first device updates the first database based on the newly acquired first video image, the first device When the image of the new target object is added to the first database, the newly added image of the target object can be compressed to obtain the third compressed data, or the first database after the image of the newly added target object is compressed as a whole
  • the obtained fourth compressed data is sent to the second device.
  • the stored second database can be updated according to the third compressed data or the fourth compressed data to maintain the consistency between the first database and the second database.
  • the area of the video image that includes the new target object can be used as the target area without compression.
  • the second device receives the compressed data that does not include the target area, it decompresses it, and then obtains the new target from the second database.
  • the image of the object and finally a video image.
  • the preset condition when the provided method is applied to a scene of real-time video image transmission, includes: the N video images before the first video image include the object The number of video images is greater than or equal to M, where M and N are both positive integers, N>1, M ⁇ N; when the provided method is applied to a scene of non-real-time video image transmission, the preset conditions can be The object is in the entire video file, and the number of video images including the object is greater than or equal to the preset number.
  • the third aspect of the present application provides a video image processing device, which can be used as a first device for executing the video image processing method according to any one of the first aspects of the present application, and the device includes: an acquisition module, a first determination module, Compression module and sending module;
  • the obtaining module is used to obtain the first video image; the first determining module is used to determine the target area in the first video image; the target area includes the image of the object that meets the preset condition stored in the first database of the first device;
  • the compression module is used to compress the area except the target area in the first video image to obtain the second compressed data;
  • the sending module is used to send the second compressed data to the second device; the second database of the second device has been stored An image of an object that meets the preset conditions.
  • the compression module is further configured to compress the image of the object stored in the first database to obtain the first compressed data; the sending module is further configured to send the first compressed data to the second device; The first compressed data is used by the second device to determine the second database.
  • the sending module is specifically configured to send the second compressed data and the marking information of the target area to the second device; wherein the marking information includes: position information of the target area in the first video image, At least one of the identification information or transformation information of the image of the object included in the target area in the first database; the transformation information is used to indicate the difference between the image of the object in the target area and the first video image in the first database. the difference.
  • the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M ⁇ N.
  • the device further includes: a second determining module and a storage management module;
  • the second determining module is used to identify the target object in the first video image that meets the preset conditions; the storage management module is used to add the image corresponding to the new target object in the target object into the first database, and the new target object is not in the first database.
  • the first database is stored in the storage module.
  • the compression module is further configured to compress the image corresponding to the new target object to obtain third compressed data; the sending module is also configured to send the third compressed data to the second device.
  • the compression module is also used to compress the image of the object stored in the first database To obtain the fourth compressed data; the sending module is also used to send the fourth compressed data to the second device.
  • the storage management module is further configured to delete the image of the object that does not meet the preset condition stored in the first database.
  • the storage management module is further configured to: when the sharpness of the first object in the target area in the first video image is better than that of the first object stored in the first database For the definition of the image, the image of the first object stored in the first database is replaced with the image of the first object in the first video image.
  • the first video image is a video image with a frame number greater than a preset frame number in a video file that is being compressed and transmitted by the first device in real time;
  • the second video image, the frame number of the second video image in the video to be processed is less than the preset frame number;
  • the second determining module is also used to identify objects in the second video image that meet the preset conditions;
  • the storage management module also uses Therefore, the objects in the second video image that meet the preset conditions are stored in the first database.
  • the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.
  • the device further includes: a third determining module; wherein, the third determining module is configured to identify objects that meet preset conditions in all video images in the video file; and the storage management module is configured to The image of the object that meets the preset condition is stored in the first database.
  • the image of the object stored in the first database includes: the boundary pixel position of the object and the frame number of the video image including the object in the video file.
  • the fourth aspect of the present application provides a video image processing device, which can be used as a second device to execute any video image processing method as in the second aspect of the present application.
  • the device includes: a receiving module, a decompression module, an acquisition module, and Determining module; wherein the receiving module is used to receive the second compressed data sent by the first device; wherein the second compressed data is obtained by compressing the area other than the target area in the first video image; the decompressing module, Used to decompress the second compressed data to obtain a third video image.
  • the third video image includes an image corresponding to an area other than the target area in the first video image; the acquisition module is used to obtain a third video image from the second device. Second, the image corresponding to the target area is acquired from the database; the determining module is used to determine the first video image according to the third video image and the image corresponding to the target area.
  • the device further includes: a storage management module; wherein, the receiving module is further configured to receive the first compressed data sent by the first device; the decompression module is further configured to: Decompression is performed to obtain an image set corresponding to the object that meets the preset conditions, the image set includes the image corresponding to the target area; the storage management module is used to store the image set in the second database.
  • the receiving module is further configured to receive the marking information of the target area sent by the first device; wherein the marking information includes: position information of the target area in the first video image, and the target area includes At least one of the identification information or transformation information of the object in the first database of the first device; the transformation information is used to indicate the difference between the image of the object in the target area in the first database and the first video image.
  • the determining module is specifically configured to stitch the image corresponding to the target area and the third video image to obtain the first video image according to the marking information of the target area.
  • the receiving module is further configured to receive the third compressed data sent by the first device; the decompression module is also configured to decompress the third compressed data to obtain the new target object Image;
  • the storage management module is also used to add the image of the new target object to the second database.
  • the receiving module is further configured to receive the fourth compressed data sent by the first device; the decompression module is also configured to decompress the fourth compressed data to obtain the updated conforming data Set the image set corresponding to the conditional object; the storage management module is also used to update the second database based on the updated image set corresponding to the object that meets the preset condition.
  • the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M ⁇ N; or, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.
  • a fifth aspect of the present application provides a video image processing device, including: a processor and a transmission interface; the device communicates with other devices through the transmission interface; the processor is configured to read software stored in a memory Instructions to implement the method described in any one of the first aspect of the present application.
  • a sixth aspect of the present application provides a video image processing device, including: a processor and a transmission interface; the device communicates with other devices through the transmission interface; the processor is configured to read software stored in a memory Instructions to implement the method described in any one of the second aspect of the present application.
  • a seventh aspect of the present application provides a computer-readable storage medium that stores instructions in the computer-readable storage medium.
  • the instructions are executed by a computer or a processor, the computer or the processor realizes The method of any one of the aspects.
  • the eighth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium.
  • the instructions are executed by a computer or a processor, the computer or the processor can realize The method of any one of the two aspects.
  • the ninth aspect of the present application provides a computer program product, characterized in that the computer program product contains instructions, when the instructions run on a computer or a processor, the computer or the processor is Apply for the method described in any one of the first aspect.
  • the tenth aspect of the present application provides a computer program product, characterized in that the computer program product contains instructions, when the instructions run on a computer or a processor, the computer or the processor is Apply for the method described in any one of the second aspect.
  • Figure 1 is a schematic diagram of the application scenario of this application.
  • Figure 2 is a schematic diagram of a video compression technology
  • Figure 3 is a schematic diagram of a video image divided into image blocks
  • FIG. 4 is a schematic flowchart of an embodiment of a video image processing method provided by this application.
  • FIG. 5 is a schematic diagram of the database setting method provided by this application.
  • FIG. 6 is a schematic diagram of an object in a video image provided by this application.
  • FIG. 7 is a schematic flowchart of another embodiment of a video image processing method provided by this application.
  • FIG. 8 is an exemplary flowchart of an embodiment of a video image processing method provided by this application.
  • FIG. 9 is an exemplary flowchart of an embodiment of a video image processing method provided by this application.
  • FIG. 10 is an exemplary flowchart of an embodiment of a video image processing method provided by this application.
  • FIG. 11 is an exemplary flowchart of an embodiment of a video image processing method provided by this application.
  • FIG. 12 is an exemplary flowchart of an embodiment of a video image processing method provided by this application.
  • FIG. 13 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • FIG. 14 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • FIG. 15 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • FIG. 16 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • FIG. 17 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • FIG. 18 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • Figure 1 is a schematic diagram of the application scenario of this application. This application is applied to the scenario of video file transmission between different devices.
  • the device described in Figure 1 can be a mobile phone, a tablet computer, a notebook computer, a desktop computer, or a server.
  • the embodiments of the present application can be executed by the device as described in Figure 1, or by the processor of the device as described in Figure 1 (for example: central processing unit, CPU ) Or a graphics processing unit (GPU), etc.
  • the execution of the device in Fig. 1 is taken as an example. As shown in Fig.
  • the first device 10 and the second device 20 are With a communication connection relationship, the first device 10 can send the video file 30 to the second device 20 through the communication connection relationship, and the transmitted video file 30 can be divided into real-time video files and non-real-time video files according to the requirement of timeliness. Real-time video file.
  • the video file when applied to a non-real-time transmission scene, can be a file of a movie, and the first device 10 can send the file at time T1 before sending the file of this movie to the second device 20
  • the data packet 40 is obtained, and the compressed data packet 40 is sent to the second device 20; after the second device 20 receives the data packet 40 at time T2 and decompresses it, the entire movie can be obtained File
  • the video file can be a monitoring screen
  • time-sensitive files such as TV screens, the first device cannot obtain each frame of the video image in the complete video file, but needs to transmit the latest currently obtained video image.
  • the first device 10 can transmit the video image at time T1.
  • the data packet 40 is obtained by compression and sent to the second device 20.
  • the second device 20 receives and decompresses the data packet 40 at time T2. Since the first device only needs to compress one frame of video image, between time T1 and time T2 The time interval is small, and then the first device 10 and the second device 20 continue to repeat this process, so that the first device 10 can send the latest video images in video files such as monitoring screens, TV screens, etc., to the first device in real time. ⁇ 20 ⁇ Two devices 20.
  • the video file is composed of continuous video images, the number of video images included in the video file is increasing, and each video image itself has a higher resolution, which greatly improves the overall data of the video file quantity. Therefore, when video files are transmitted between the devices as shown in FIG. 1 using limited communication resources, the video files can be compressed with higher quality, and larger video files can be compressed into smaller compressed files for transmission and storage. In the process of compressing video files, it is not only necessary to reduce the amount of data of the transmitted video files, but also to ensure that the opposite end can completely restore the video files based on the compressed files.
  • FIG. 2 is a schematic diagram of a video compression technology, where the execution subject may be the first device 10 as shown in FIG. Different image packages.
  • the execution subject may be the first device 10 as shown in FIG. Different image packages.
  • each 64 video images in the video file are taken as an image package to obtain image packages 1-64, image packages 65-128, image packages 129-256...
  • image packages 65-128, image packages 129-256...
  • select some of the frames as key frames For example, for the image package 1-64, you can use the first and 64th frames as key frames to perform overall compression encoding.
  • each video image in the 63rd frame can be used as a non-key frame.
  • the video image is divided into different image blocks according to the distribution of objects in the video image and the degree of boundary density.
  • the image blocks in the key frame are compared.
  • the first device 10 will first compress and encode the key frames in the video file.
  • an image block in the currently compressed non-key frames includes object A, and
  • the similarity of the image blocks including the object A in the compressed and encoded key frame is relatively high, it can be considered that the two image blocks in the two frames of video images are similar.
  • the image blocks in the key frames that have been compressed and encoded can be used to represent the similar image blocks in the current non-key frames, and only the current compressed non-key frames can pass the key
  • the area outside the image block represented by the frame image block can be compressed and encoded.
  • the image block including the object B and the video image in the image pack 129-256 can be compared. Compare the image block including the object C, thereby reducing the amount of calculation when compressing the video file and improving the compression efficiency.
  • Figure 3 is a schematic diagram of a video image divided into image blocks, as shown in Figure 3.
  • the image on the left in the center has a large number of objects distributed at the four corners, especially the upper left corner, so the boundary information of the objects to be processed is more.
  • the four corners with more objects have a larger number of image blocks, and the area with fewer objects in the middle has a smaller number of image blocks, so that the subsequent comparison between the non-key frames in Figure 3 and the key frames that have been compressed and encoded , It is possible to divide the more distributed objects in the non-key frame into smaller image blocks and compare them with the key frames. Since the smaller the image block, the more accurate the boundary information included, and the more accurate the corresponding position of the image block The image is compared with the key frame, so that a higher accuracy can be achieved when comparing with the image block in the key frame.
  • this application provides a video image processing method and device, which are used in the video file compression process to separately extract, compare, and compress objects included in the video image in the video file that meet certain preset conditions, so that the When a video file is compressed, after compressing only part of the object once, when each frame of the video image is compressed, only the area except this part of the object needs to be processed, thereby improving the compression of each frame of video image. Efficiency, thereby improving the efficiency of compressing video files.
  • FIG. 4 is a schematic flowchart of an embodiment of a video image processing method provided by this application.
  • the method shown in FIG. 4 can be applied to the real-time transmission scene or the non-real-time transmission scene in the scene shown in FIG.
  • the device and the second device execute.
  • the video image processing method provided in this embodiment includes:
  • the first device acquires a first video image.
  • a first video image to be sent is acquired in S101.
  • the first video image may be a frame of video image in a non-real-time video file sent by the first device to the second device, or the first video image may also be a real-time video image sent by the first device to the second device.
  • Video image may be a frame of video image in a non-real-time video file sent by the first device to the second device, or the first video image may also be a real-time video image sent by the first device to the second device.
  • the first device determines a target area in the first video image; the target area includes an image of an object that meets a preset condition stored in a first database of the first device;
  • FIG. 5 is a schematic diagram of a database setting method provided by this application, in which a first database is provided in the first device, and a second database is provided in the second device.
  • the first database in the first device may be used to store images of objects meeting preset conditions.
  • the preset condition when applied in a non-real-time transmission scenario, may be that among all the video images of the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number;
  • the preset condition may be that among the N video images before the first video image, the number of video images including the object is greater than or equal to M, M and N are both positive integers, and N>1, M ⁇ N.
  • Fig. 6 is a schematic diagram of an object in a video image provided by this application.
  • the first video image includes at least electric vehicles a, Cyclist b, passer-by c, vehicle d, and passer-by e
  • these objects can be objects that may move in the video image, except for the above objects, other parts are static objects, which can be regarded as the background of the video image.
  • the first database of the first device stores images of two objects, electric vehicle a and cyclist b
  • the first device can determine in this step that the target area in the first video image is electric vehicle a in the figure. The area where the images of the two objects and the cyclist b are located.
  • S103 Compress an area other than the target area in the first video image to obtain second compressed data.
  • the first device After determining the target area in the first video image in S102, the first device compresses only the area in the first video image excluding the target area, and the obtained compressed data is recorded as the second compressed data.
  • the method for compressing the first video image is not limited, and it may be performed by compression coding.
  • S104 The first device sends the second compressed data to the second device.
  • the first device sends the second compressed data obtained in S103 to the second device, and for the second device, receives the second compressed data sent by the first device.
  • the first device may also send the first video image to the second device in S104 The mark information of the target area in the middle, so that the second device can determine the target area in the first video image according to the mark information of the target area.
  • S105 The second device decompresses the second compressed data to obtain a third video image.
  • the second device decompresses the second compressed data, and the obtained video image is recorded as the third video image.
  • the third video image is an image corresponding to an area other than the target area sent by the first device.
  • the second device obtains an image corresponding to the target area from the second database.
  • a second database is set in the second device, and the second database can be used to store images of objects that meet preset conditions, and the second database can store the same objects in the first database.
  • Image The image of the object stored in the second database may be preset, may be stored in advance, or may be sent by the first device to the second device in real time according to the first database and stored in the second database.
  • the second device may obtain the image corresponding to the target area from the second database according to the marking information of the target area in the second compressed image.
  • the second device determines the first video image according to the third video image determined in S105 and the image of the target area acquired in S106, and finally completes the process of sending the first video image from the first device to the second device.
  • the target area in the first video image is the image of the two objects of electric vehicle a and cyclist b in the figure shown in FIG. 6, in S105, the third video image obtained by decompression by the second device Figure 6 occupies the area that does not include the two objects of the electric vehicle a and the cyclist b.
  • the image of the target area obtained in S106 is the image of the two objects of the electric vehicle a and the cyclist b, then in S107
  • the second device can add the images of the two objects of the electric vehicle a and the cyclist b to the corresponding positions in the third video image to obtain a complete first video image.
  • the first video image when applied in a real-time transmission scenario, may be a real-time video image acquired by the first device.
  • the first video image can be understood as any one of the video files.
  • S101-S103 in Figure 4 show the processing methods for any one of the video files, and
  • the first device will process each frame of video image in the video file through S101-S103, and then send the second compressed data of all the video images to the second device.
  • the second device uses S105-S107 to pair Each frame of video image in the video file is processed, and finally all the video images in the video file are obtained.
  • the first device before sending the first video image to the father’s device, the first device first recognizes the target area of the object in the first database included in the first video image, and compares the first video image. The area other than the target area in a video image is compressed to obtain the second compressed data and sent to the second device. For the image of the object that meets the preset conditions stored in the first database of the first device, the second device’s It has been stored in the second database, so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data and combine it with the image of the target area stored in the second compressed data to finally obtain The first image.
  • the first device when the first device sends the first video image to the second device, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image, reducing This reduces the amount of compressed data; for the second device, there is no need to repeatedly decompress the target area, reducing the amount of decompressed data. Thereby, the data volume of the compressed packet transmitted between the first device and the second device is reduced, thereby improving the efficiency of video image processing.
  • FIG. 7 is A schematic flowchart of another embodiment of the video image processing method provided by the application.
  • the method shown in FIG. 7 can be applied to the real-time transmission scene or the non-real-time transmission scene in the scene shown in FIG.
  • the two devices execute it, and the method shown in FIG. 7 is executed before the method S101 shown in FIG. 4.
  • the video image processing method provided in this embodiment includes:
  • S201 The first device compresses the image of the object stored in the first database to obtain first compressed data.
  • the first device may compress the entire first database after determining the image of the object that meets the preset condition and storing it in the first database, and the obtained compressed data is recorded as the first compressed data. For example, assuming that the images of the two objects of electric vehicle a and cyclist b as shown in FIG. 6 are stored in the first database, in S201, the first device compares the images in the first database
  • S202 The first device sends the first compressed data to the second device.
  • the first device sends the first compressed data obtained in S201 to the second device, and for the second device, receives the first compressed data sent by the first device.
  • S203 The second device decompresses the first compressed data to obtain a corresponding image set and store it in the second database.
  • the second device decompresses the first compressed data, and the obtained image including multiple objects is recorded as an image set, and the image set is stored in the second In the database, it will be used when the embodiment shown in FIG. 4 is subsequently executed.
  • this embodiment is based on the fact that the first device can send the image of the object in the first database to the second device, so that the first device can only compress the image of the object included in the first database once to obtain the first
  • the compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine the second database, so that subsequent video images processed by the first device only need to compress the area outside the target area, thereby reducing
  • the size and number of times the video image is compressed by the first device is reduced, and the size and number of times the video image is decompressed by the first device is also reduced, and the processing efficiency of the video image is further improved.
  • the first device 10 may send a complete video file as a whole to the second device 20
  • the first device 10 may first compress the video file, and after obtaining a compressed package with a smaller amount of data, send the compressed package with a smaller amount of data to the second device 20 to save communication resources.
  • FIG. 8 is a schematic flow chart of an exemplary embodiment of the video image processing method provided by this application, and shows the processing flow when the entire video file is compressed.
  • the first device 10 as the execution subject first obtains the to-be-processed video file 101, where the to-be-processed video file 101 specifically includes N continuous video images, and N is greater than 1.
  • the sequence of the N video images in the video file 101 mark the video images in the video file 101 as 1, 2...N.
  • the label can also be called the frame number.
  • the to-be-processed video file 101 may be specified by the user of the first device 10, or captured by the first device 10, or acquired by the first device 10 through the Internet.
  • the video file 101 to be processed can then be compressed through the embodiment shown in FIG. 4, or, before the first device 10 determines that the video file 101 is to be sent to the second device 20, through the implementation shown in FIG. Examples are compressed.
  • the first device 10 After acquiring the video file 101, the first device 10 first uses the first machine learning model 102 to identify the objects included in all N video images in the video file 101, and determines that the N video images included in the N video images conform to the preset At least one object of the condition. Wherein, the object includes objects other than the background in the video image.
  • the first machine learning model 102 set in the first device 10 is opposite to the left After the side video image is recognized, the recognition result on the right side of Fig. 6 can be obtained, and the object ae in the video image can be recognized. After the first machine learning model 102 recognizes all the N video images in the entire video file, it can identify the objects included in each of the N video images.
  • the first device 10 can screen all objects in all N video images together according to the recognition result of the first machine learning 102 model, and store the object images that meet the preset conditions among the N video images in the first device 10 set in the first database 103.
  • the higher the resolution, the higher the definition, so the highest resolution of the multiple video images can be The image of the object is stored in the first database.
  • the first machine learning model 102 provided in this embodiment may be a convolutional neural network (convolutional neural networks, CNN) neural network model, for example: AlexNet, ResNet, or Inception v3 can be applied to images Object recognition model.
  • CNN convolutional neural network
  • the preset condition described in this embodiment may be that the number of times the object appears in the N video images of the entire video file is greater than the preset number of times.
  • the preset number of times may be 10 (N>10), then
  • the first machine learning model 102 recognizes that among the N video images of the video file, there are more than 10 video images including the target passerby c, that is, the passerby c appears more than 10 times in the N video images of the entire video file, Then the image of passer-by c can be stored in the first database 103.
  • the first device 10 can store all the images of the objects that meet the preset conditions included in the N video images of the video file in the first database 103 according to the recognition result of the first machine learning model.
  • the images of different objects stored in the first database 103 may be stored in the first database 103.
  • the boundaries of these images are the boundaries of the objects, and there is no other information such as the background other than the objects.
  • the object passer-by c in FIG. 5 is stored in the first database 103 as the image divided by the border of passer-by c in the video image on the left side of FIG. 5, and does not include other objects or backgrounds except passer-by c.
  • the first database 103 since the first device 10 compresses the entire video file, the first database 103 may only store the frame number of a certain video image and the target number of the object in the video file where the image of the object is located.
  • the position information of the boundary pixels, the image of the object can be obtained by the frame number and the position of the boundary pixels later. For example, after identifying an object in a video file that meets preset conditions, it is determined that the image of the object is included in the upper left corner of the 10th frame of the video image in the video file, and the first database 103 may store, for example, "10, (a, b , C...)" format data, used to represent the 10th frame of the object image in the video file, and the pixel position where the boundary of the 10th frame of the video image is (a, b, c).
  • the first device 10 stores all the objects that meet the preset conditions in the N video images of the video file in the first database 103, and then further processes the N video images of the video file in turn through the second machine learning model 105 , Where the video image being processed by the second machine learning model is recorded as the first video image, and the second machine learning model 105 can compare the image of the object already stored in the first database 103 with the first video image to determine At least one object included in the first video image being processed and the area where the image of each object is located, the area where the at least one object is located is recorded as the target area.
  • the target area is recorded as the target area.
  • the first database 103 stores images of a total of 26 objects labeled AZ, then after the second machine learning model obtains the first video image in the video file, The first video image is compared with the image in the first database, and it is determined that the first video image currently processed includes the objects labeled A and B in the first database, then the first video image is labeled A The area where the object of B and B is located is marked as the A target area and the B target area.
  • the second machine learning model 105 and the first machine learning model 102 described in the embodiments of the present application may be the same machine learning model, for example, may also be a CNN-type neural network model; or may also be different
  • the difference between the second machine learning model 105 and the first machine learning model 102 is that the second machine learning model 105 has the image of the identified object in the first database as the prior information. Therefore, the second machine learning model 105
  • the recognition performed by the machine learning model 105 can be understood as the comparison of images.
  • the recognition performed by the first machine learning model 102 is to extract objects from a new video image. Since the second machine learning model 105 requires less calculation, it can It is set to a more lightweight model, so that after two machine learning models are set in the first device 10 for identification and comparison, the overall processing efficiency can be improved.
  • the second machine learning model 105 will sequentially use each video image in the video file 101 as the above-mentioned first video image, and each of them will be compared with the objects in the first database 103. The comparison is performed to determine the target area in each video image. Then, the first device 10 compresses the area other than the target area in the N video images, and obtains the compressed data of the N video images in the video file 101, which is recorded This is the second compressed data 106. It is understandable that although the second compressed data 106 includes compressed data of N video images, for each video image that includes the target area in the N video images, the included target area is "cropped". Only the other parts except the target area are kept. As a result, the compressed video image in the second compressed data 106 is not complete, and each video image lacks a target area including the image of the object in the first database.
  • the generated second compressed data 106 lacks the target area in the partial video image, it is equivalent to "cropping" the image of the object in the first database included in the video image, and in order to identify The “cutting” specifically refers to which image of the object in the first database and the position of the object in the video image.
  • the first device 10 will also determine at least one target area included in each image. The video image is marked, and these marks can be carried in the corresponding video image, so that when the video image is subsequently decompressed, information such as the location of the target area in the video image is determined.
  • the marked content at least includes at least one of the location information of the target area in the first video image, transformation information, and identification information of the object included in the target area in the first database.
  • the transformation information is used to identify the difference between the image in the first database and the first video image of the object in the target area.
  • the object stored in the first database 103 can be identified by the identification information letter AZ. Assuming that the object corresponding to the letter A is a pedestrian, the image of the pedestrian is stored in the first database. The resolution is 128*128.
  • the first device 10 currently recognizes the target area in the upper left corner of the video image by the second machine learning model 105 including the object corresponding to the letter A stored in the first database 103, then the first device 10 is compressing the video image.
  • the identification information of the object included in the target area in the figure in the first database 103 is "A”
  • the position information of the target area in the first video image includes the pixel position of the target area in the peripheral boundary of the video image.
  • the resolution of the target area in the video image is 64*64, it is equivalent to reducing the image with the resolution of 128*128 stored in the first database 103, so the target area can be marked "reduced by one time”
  • the transformation information may also include transformations such as rotation and stretching.
  • the first database 103 stores the images of the objects that meet the preset conditions among the N video images, that is to say, the "cropped" parts of the incomplete video images in the second compressed data 106 are all stored in The first database 103. Therefore, when the first device 10 compresses the video file, it can also compress the image of the object stored in the first database 103 to obtain the first compressed data 104.
  • the step of obtaining the first compressed data 104 by the first device 10 can be performed at any time after the first database 103 is determined, and is independent of the step of obtaining the second compressed data 106 by the first device 10, and may not Limit the order.
  • the first compressed data 104 and the second compressed data 106 can be used as compressed files after the video file 101 is compressed.
  • the first device 10 generates the first compressed data 104 and the second compressed data 106.
  • the third compressed data 107 can be sent to the second device 20. This embodiment does not limit the sending mode.
  • the specific data compression processing methods of the first device 10 to obtain the first compressed data 104 and the second compressed data 105 in this embodiment can be the same or different, for example, both can use video compression such as H.264 or H.265.
  • the protocol is compressed.
  • FIG. 9 is an exemplary flow chart of an embodiment of a video image processing method provided by this application, and specifically shows a processing flow of decompressing a compressed package to obtain a video file.
  • the second device 20 as the execution subject first receives the third compressed data 107 sent by the first device 10, and obtains the first compressed data 104 and the second compressed data according to the third compressed data 107 106.
  • the second device 20 may directly receive the first compressed data 104 and the second compressed data 106 sent by the first device 10.
  • the second device 20 can decompress the first compressed data 104 and the second compressed data 106 respectively, where the decompressed first compressed data 104 can obtain an object set including images of multiple objects, for example, as shown in FIG. 6
  • the image of the object labeled AZ is used as an image set, and the image of the object in the image set AZ can be stored in the second database 108 of the second device 20. That is, after the second device 20 decompresses the first compressed data, the first database in the first device 10 can be restored, and the images of the objects included in the first database can be stored in the second data 108.
  • the second device 20 decompresses the second compressed data 106 to obtain N video images. None of the N video images includes the target area of the object stored in the second database 108, and each image includes marking information for the target area.
  • the marked information includes at least one of the position information of the target area, the rotation transformation information of the object, or the identification information of the object included in the target area in the first database.
  • the images of multiple objects in the first compressed data 104 are represented by the frame number and position information of a certain video image in the video file where the images are located, for example, the identified objects in the first database 103 Record through the 10th frame in the video file and the pixel position in the 10th frame of the video image. Then, after obtaining the first compressed data 104, the second device 20 obtains the image of the object from the pixel position of the tenth frame in the second compressed data 106.
  • the second device 20 combines N video images based on the tag information of the target area in each video image, determines the image of the object in the target area from the second database, and performs image splicing to restore the video image.
  • the mark of the target area includes: "A", boundary pixels, and "reduced by one time”
  • the object A can be obtained from the second database 108
  • the image is reduced by one time, and the image is placed at the position of the boundary pixel in the first video image to realize the splicing of the video image.
  • the second device 20 completes the stitching of all N video images, a complete video file is finally obtained.
  • the first device when the first device compresses the video file, it first stores the images of the objects that meet the preset conditions among all the video images of the video file into the database through the first machine learning model. Then, the second machine learning model is used to identify the target area in each video image that includes the object in the first database, and then the object in the first database and the area other than the target area in each video image are respectively compressed and encoded , And finally get the compressed file of the entire video file.
  • the first database since the first database is compressed and encoded uniformly, when the first video image in the video file is subsequently compressed and encoded, the first device does not need to compress the target area in the first video image.
  • the second device It is only necessary to compress and encode the area of the video image except the target area of the object existing in the database, and for the image of the object that meets the preset conditions stored in the first database of the first device, the second device’s
  • the second database has also been stored, so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data and combine it with the image of the target area stored in the second compressed data, and finally Get the first image. Therefore, in this embodiment, when the first device transmits the video file to the second device in non-real-time, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image, based on the second database of the second device. The image of the target area is already stored in the target area.
  • the first device only needs to compress the area outside the target area, which reduces the amount of compressed data.
  • the second device there is no need to repeatedly decompress the target area, thereby reducing
  • the data volume of the compressed packet transmitted between the first device and the second device improves the efficiency of video image processing.
  • the first database in this embodiment stores only images of objects, and the provided machine learning model is also based on the object’s image itself to identify and compare objects, so there is no need to use the technology shown in Figure 3
  • the image is divided into image blocks of different sizes and a larger number, and these image blocks are processed one by one, thereby reducing the amount of calculation during video image processing, thereby improving the efficiency of the entire video file during compression and encoding.
  • this application judges whether objects in all video files meet preset conditions based on the entire video file as a whole, prevents an object from being repeatedly identified and compared, and further improves the efficiency of video file compression and encoding.
  • FIG. 10 is an exemplary flowchart of an embodiment of a video image processing method provided by this application, showing the processing of the first device 10 as shown in FIG. 1 when compressing a real-time first video image Process.
  • this embodiment since this embodiment is applied in a scene that needs to ensure the real-time nature of video files, such as monitoring video backhaul, the video file acquired by the first device 10 is currently generated in real time and needs to be sent to the second device 20 immediately. Therefore, after receiving a frame of video image, the first device 10 needs to compress and encode the video image in time and send it to the second device 20 in time, and continuously receive new video images and repeat this process.
  • the first device 10 as the execution subject first obtains the first video image 201 that needs to be transmitted to the second device 20 in real time, where the first video image 201 may be a continuous video file.
  • One of the video images in the first video file 201 can be specified for transmission by the user, or taken by the first device 10, or acquired by the first device 10 through the Internet, and needs to be sent to the second device 20 in real time. Video.
  • the first device 10 After acquiring the video image 201, the first device 10 first compares the video image 201 with the image of the object already stored in the first database 204 through the second machine learning model 207, and determines that the video image 201 includes the first database 204 The area where at least one object is stored in is recorded as the target area. For example, in the example shown in FIG. 7, the second machine learning model 207 determines the target area including the object A in the video image 201 according to the image of the object A stored in the database 204. Subsequently, the first device 10 "cuts" the target area in the video image 201, compresses the video image excluding the target area in the video image 201 to obtain the second compressed data 208, and sends the second compressed data 208 To the second device on the receiving end.
  • the second compressed data 208 may also include marking information of the target area.
  • the marking information includes: position information, transformation information, or target area of the target area in the first video image. At least one item such as identification information of the objects included in the area in the database.
  • the second machine learning model 207 Since the second machine learning model 207 has compressed and encoded the area outside the target area in the first video image 201 and sent it to the second device at the receiving end, for the object A included in the target area in the first video image 201, it should be Prior to this, the first device 10 performs compression encoding from the first database 204 to generate the first compressed data 205, and then sends it to the second device, so that the second device can combine the first compressed data after receiving the second compressed data 208
  • the video image 201 is obtained by splicing.
  • images of multiple objects may be pre-stored in the first database 204, so that the first device 10 can perform comparisons through the database 204 after obtaining the first video image.
  • the first device 10 transmits a video file including N video images to the second device 20 in real time
  • the first M video images among the N video images transmitted by the first device 10 are recorded as the second video
  • the first device 10 does not directly recognize the second machine learning model 207, but first recognizes the object in the second video image that meets the preset conditions.
  • the image of is stored in the first database 204, and subsequently, when the video images after the M of the N video images are transmitted, based on the image of the object already stored in the first database 204, the first device 10 passes through the second machine
  • the learning model 207 is compared with the first database 204.
  • this application does not limit the method used by the first device 10 to compress and encode.
  • the video image can be roughly divided into several regions, and then compressed with different parameters according to the characteristics of the several regions.
  • a larger residual value and fewer high-frequency components can be used for areas that include background, and a smaller residual value can be used for areas that may include objects, etc., and more high-frequency components can be taken. ; Or, you can also use the existing H.264, H.265, or H.266 video compression protocol for compression encoding.
  • the first device 10 cannot directly determine the objects that meet the preset conditions included in the video file where the video image is located when transmitting a single video image, where:
  • the preset condition may include: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M ⁇ N. That is to say, the objects in the first database 204 can only be determined to be added to the first database 204 after the N video images before the first video image are combined, so that the first database 204 can be added to the first database 204 at every moment.
  • the objects stored in may not cover all objects in the current video file that meet the preset conditions.
  • the moment when the first device 10 processes the video image 201 is the first moment.
  • the video image 201 includes object A and object B, but because only the image of object A is stored in the first database 204, the second machine learning model 207 can only identify the target area that includes object A in the video image, even if the video The object B in the image 201 includes the first video image 201 at the first moment.
  • the N video images of the first video image 201 have already met the preset condition.
  • the second machine learning model will not recognize the area where the object B is as the target area, After the first moment, the object B can be added to the first database 204 to reduce the size of the area that needs to be compressed in each subsequent frame of video image. Therefore, in the process of processing the video image, the first device 10 will also recognize the objects included in the processed video image 201 through the first machine learning model 202, and determine objects other than the background, as shown in FIG. 10 Object A and Object B shown in. Optionally, as shown in FIG.
  • the sequence of the two processing steps of the first device 10 processing the video image 201 through the second machine learning model 207 and processing the video image 201 through the first machine learning model 202 is not limited, or two Steps can also be executed at the same time.
  • the management module 203 in the first device 10 manages the identified objects.
  • the management at least includes: adding, deleting, and replacing the image of the object stored in the first database 204, which will be described below with examples.
  • the management module 203 classifies objects that appear more than M times (M ⁇ N) and are currently processed. Images of objects that are not stored in the database are added to the database 204. For example, for the first machine learning model 202 in FIG. 10 to identify the object A and the object B in the first video image 201, and determine that the object that is not stored in the first database 204 is the object B, then the management module 203 according to the previous video image 201 Object B is included in 5 of the 10 video images, that is, the object B has appeared 5 times in total, and the recognized image of the object B needs to be added to the first database 204.
  • the management module 203 can cache all the images of the object B in the previous first video image.
  • the image of the object B with the highest resolution can be added from the cache.
  • the image is stored in the first database 204 to improve the clarity of the image of the object B processed in the subsequent compression process.
  • the first device 10 can immediately compress and encode the image of the newly added object B, and the obtained compressed data is recorded as the third compressed data, and The third compressed data is sent to the second device, so that after the second device decompresses the third compressed data, the image of the object B is stored in the second database of the second device.
  • the management module 203 adds the new object B to the first database 204 the first device 10 may also compress and encode the entire first database 204 after the image of the newly added object B, and the obtained compressed data is recorded as the first database 204. Fourth, compress the data, and send the fourth compressed data to the second device, so that the second device updates the second database of the second device after adding the compressed fourth compressed data.
  • FIG. 11 is an exemplary flow chart of an embodiment of a video image processing method provided by this application, and shows a processing flow of the first device 10 on the video image 301 following the first video image 201 as shown in FIG.
  • the first device 10 can use the second machine learning model 207 to determine the video image based on the images of the object A and the object B stored in the first database 204 301 includes the target areas of object A and object B.
  • the first device 10 After the first device 10 "cuts" the target area in the video image 301, it compresses and encodes the video images in the video image 301 except the target area to obtain the second compressed data 208, and the second compressed data 208 Sent to the second device on the receiving end.
  • the management module 203 classifies the objects that appear less than Y times (Y ⁇ X) and are in the first video image 201.
  • the image of the object stored in a database 204 is deleted from the first database 204.
  • the management module 203 can delete the image of the object A stored in the first database 204, so that the second machine learning model 207 can reduce the ratio in the database when processing subsequent video images. The number of images of the object to further improve efficiency.
  • the management module 203 specifically compares the image of the object identified in the first video image 201 currently being processed with the image of the same object stored in the first database 204, if the image resolution of the object A in the first video image 201 is 128*128 is greater than the resolution 64*64 of the image of object A in the first database 204, then the image of object A in the first video image 201 is stored in the first database 204, and the original first database 204 is stored The image of object A is deleted.
  • FIG. 12 is an exemplary flow chart of an embodiment of a video image processing method provided by this application, in which it shows a processing flow in which the second device 20 decompresses the compressed data to obtain the first video image.
  • the second device 20 as the execution subject receives the first compressed data 205 and the second compressed data 208 sent by the first device 10. These two compressed data may be different from the second device 20.
  • the second device 20 first receives the first compressed data 205, and then receives the second compressed data 208.
  • the second device 20 can decompress the first compressed data 205 to obtain images of multiple objects, for example, the objects labeled A, B... shown in FIG. 9
  • the image of is used as an image collection, and the image collection can be stored in the database 210 of the second device 20.
  • the second device 20 After the second device 20 receives the second compressed data 208, it can decompress the second compressed data 208 to obtain the third video image 211, the third video image 211 does not include the target of the object that meets the preset conditions in the second database 210 Area, and when the second compressed data 208 is received, the marking information of the target area included in the first video image sent by the first device 10 may also be received, and the marked information includes: position information of the target area in the first video image , Transform at least one item of information or identification information of the object included in the target area in the first database.
  • the second device 20 determines the image of the object in the target area from the second database 210 according to the mark information of the target area in the first video image and performs image splicing to restore the video image 201.
  • the mark of the target area includes: "A”, boundary pixels, and "reduced by one time”
  • the image corresponding to object A can be obtained from the database
  • the image is reduced by one time, and the image is placed in the position of the boundary pixel in the decompressed third video image to realize the splicing of the current video image 201, and finally the first video image 201 is obtained.
  • the video image processing method provided in this embodiment is applied to the first device to recognize the objects stored in the first database through the second machine learning model when the first device compresses the first video image acquired in real time. Target area, and then compress and send the areas other than the target area in the video image.
  • the object in the first video image can also be identified through the first machine learning model, and operations such as adding, deleting, and modifying the image of the object stored in the first database can be performed.
  • the second database of the second device is also stored in the second database. It has been stored so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data and combine the image of the target area stored in the second compressed data to finally obtain the first image.
  • the first device when the first device transmits the first video image to the second device in real time, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image.
  • the image of the target area has been stored in the database.
  • the first device only needs to compress the area outside the target area, which reduces the amount of compressed data.
  • the second device there is no need to repeatedly decompress the target area, thereby reducing
  • the data volume of the compressed packet transmitted between the first device and the second device is improved, and the efficiency of video image processing is improved.
  • the first database in this embodiment stores only the image of the object, and the provided machine learning model is also based on the object’s image itself to identify and compare the object, so there is no need to use the image shown in Figure 3.
  • the image is divided into image blocks of different sizes and a large number, and multiple image blocks are processed one by one, thereby reducing the amount of calculation during video image processing, thereby improving the efficiency of the entire video file during compression and encoding .
  • this application can continue to update the objects stored in the database in real time while encoding and recognizing the continuous video images, so that the images of the objects stored in the database are up to date, ensuring that the database can be compared in subsequent comparisons. Use, further improve the efficiency of the video file when compressing and encoding.
  • the network device and terminal device as the execution subject may include a hardware structure And/or software modules, in the form of a hardware structure, a software module, or a hardware structure plus a software module to realize the above-mentioned functions. Whether a certain function among the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
  • FIG. 13 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • the device shown in FIG. 13 can be used as the first device 10 in the scene shown in FIG.
  • the function performed by the first device specifically, the device includes: an acquisition module 1301, a first determination module 1302, a compression module 1303, and a sending module 1304.
  • the acquiring module 1301 is used to acquire the first video image; the first determining module 1302 is used to determine the target area in the first video image; the target area includes the objects that meet the preset conditions stored in the first database of the first device Image; the compression module 1303 is used to compress the area other than the target area in the first video image to obtain the second compressed data; the sending module 1304 is used to send the second compressed data to the second device; the second device of the second device Images of objects that meet preset conditions have been stored in the database.
  • the compression module 1303 is further configured to compress the image of the object stored in the first database to obtain first compressed data; the sending module 1304 is also configured to send the first compressed data to the second device; The data is used by the second device to determine the second database.
  • the sending module 1304 is specifically configured to send the second compressed data and the marking information of the target area to the second device; where the marking information includes: position information of the target area in the first video image, and objects included in the target area At least one of the identification information or transformation information of the image in the first database; the transformation information is used to indicate the difference between the image in the first database and the first video image of the object in the target area.
  • the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M ⁇ N .
  • FIG. 14 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • the device shown in FIG. 14 further includes a second determination module 1305 and a storage management module 1306 on the basis of FIG. 13.
  • the device shown in FIG. 14 can be used to execute the video image processing method shown in FIG. 10, for example, the second determining module 1305 is used to identify target objects in the first video image that meet the preset conditions; the storage management module 1306 It is used to add the image corresponding to the new target object in the target object into the first database.
  • the new target object is an object that is not stored in the first database, and the first database is stored in the storage module.
  • the compression module 1303 is further configured to compress the image corresponding to the new target object to obtain third compressed data; the sending module 1304 is also configured to send the third compressed data to the second device.
  • the compression module 1303 is further configured to compress the image of the object stored in the first database to obtain the fourth compression Data; the sending module 1304 is also used to send the fourth compressed data to the second device.
  • the storage management module 1306 is further configured to delete the image of the object that does not meet the preset condition stored in the first database.
  • the storage management module 1306 is also configured to: when the sharpness of the first object in the target area in the first video image is better than the sharpness of the first object stored in the first database, use The image of the first object in the first video image replaces the image of the first object stored in the first database.
  • the first video image is a video image with a frame number greater than a preset frame number in the video file being compressed and transmitted by the first device in real time; the acquiring module 1301 is also used to acquire the second video image in the video to be processed, The frame number of the second video image in the to-be-processed video is less than the preset frame number; the second determining module 1305 is also used to identify objects in the second video image that meet the preset conditions; the storage management module 1306 is also used to Second, the objects in the video image that meet the preset conditions are stored in the first database.
  • the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.
  • FIG. 15 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • the device shown in FIG. 15 further includes a third determining module 1307 and a storage management module 1306 on the basis of that shown in FIG. 13.
  • the device shown in FIG. 15 can be used to execute the video image processing method shown in FIG. 8.
  • the third determining module 1307 is used to identify objects that meet preset conditions in all video images in a video file; storage management module 1306 is used to store the image of the object that meets the preset conditions in the first database.
  • the image of the object stored in the first database includes: the boundary pixel position of the object and the frame number of the video image including the object in the video file.
  • FIG. 16 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • the device shown in FIG. 16 can be used as the second device 20 in the scene shown in FIG.
  • the function performed by the second device specifically, the device includes: a receiving module 1601, a decompression module 1602, an acquiring module 1603, and a determining module 1604.
  • the receiving module 1601 is used to receive the second compressed data sent by the first device; where the second compressed data is obtained by compressing the area in the first video image except the target area; the decompression module 1602 uses To decompress the second compressed data to obtain a third video image, the third video image includes an image corresponding to an area other than the target area in the first video image; the acquisition module 1603 is used to obtain a second video image from the second device The image corresponding to the target area is acquired in the database; the determining module 1604 is configured to determine the first video image according to the third video image and the image corresponding to the target area.
  • FIG. 17 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • the device shown in FIG. 17 further includes a storage management module 1605 on the basis of that shown in FIG. 16.
  • the receiving module 1601 is also used to receive the first compressed data sent by the first device;
  • the decompression module 1602 is also used to decompress the first compressed data to obtain the object corresponding to the preset conditions
  • the image set includes the image corresponding to the target area;
  • the storage management module 1605 is used to store the image set in the second database.
  • the receiving module 1601 is further configured to receive the marking information of the target area sent by the first device; wherein the marking information includes: the position information of the target area in the first video image, and the object included in the target area is in the first device. At least one of the identification information or transformation information in the first database; the transformation information is used to indicate the difference between the image of the object in the target area in the first database and the first video image.
  • the determining module 1604 is specifically configured to stitch the image corresponding to the target area and the third video image to obtain the first video image according to the marking information of the target area.
  • the receiving module 1601 is further configured to receive the third compressed data sent by the first device; the decompression module 1602 is also configured to decompress the third compressed data to obtain the image of the new target object; the storage management module 1605 is also used to add the image of the new target object to the second database.
  • the receiving module 1601 is further configured to receive the fourth compressed data sent by the first device; the decompression module 1602 is also configured to decompress the fourth compressed data to obtain the updated object corresponding to the preset conditions
  • the storage management module 1605 is also used to update the second database based on the updated image set corresponding to the object that meets the preset conditions.
  • the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M ⁇ N
  • the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.
  • the division of the various modules of the above device is only a division of logical functions, and may be fully or partially integrated into a physical entity during actual implementation, or may be physically separated.
  • these modules can all be implemented in the form of software called by processing elements; they can also be implemented in the form of hardware; some modules can be implemented in the form of calling software by processing elements, and some of the modules can be implemented in the form of hardware.
  • the determination module may be a separately established processing element, or it may be integrated into a certain chip of the above-mentioned device for implementation.
  • it may also be stored in the memory of the above-mentioned device in the form of program code, which is determined by a certain processing element of the above-mentioned device.
  • each step of the above method or each of the above modules can be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.
  • the above modules may be one or more integrated circuits configured to implement the above methods, such as one or more application specific integrated circuits (ASIC), or one or more microprocessors (digital signal processor, DSP), or, one or more field programmable gate arrays (FPGA), etc.
  • ASIC application specific integrated circuit
  • DSP digital signal processor
  • FPGA field programmable gate arrays
  • the processing element may be a general-purpose processor, such as a central processing unit (CPU) or other processors that can call program codes.
  • CPU central processing unit
  • these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC).
  • SOC system-on-a-chip
  • the computer may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the computer-readable storage medium may be a magnetic medium, (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, (digital video disc, DVD)), or a semiconductor medium (for example, a solid state disk (solid state disk, SSD)) )Wait.
  • FIG. 18 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
  • the device can be used as the first device or the second device described in any of the foregoing embodiments of this application, and execute the video executed by the corresponding device.
  • the communication device 1100 may include: a processor 111 (such as a CPU) and a transmission interface.
  • the transmission interface may be a transceiver 113; wherein the transceiver 113 is coupled to the processor 111, and the processor 111 controls the transceiver.
  • the transceiver 113 's sending and receiving actions.
  • the communication device 1100 further includes a memory 112, and software instructions can be stored in the memory 112, and the processor 111 is configured to read the software instructions stored in the memory 112, so as to complete various processing functions and implement the present invention.
  • the video image processing device involved in the embodiment of the present application may further include: a power supply 114, a system bus 115, and a communication interface 116.
  • the transceiver 113 may be integrated in the transceiver of the video image processing device, or may be an independent transceiver antenna on the communication device.
  • the system bus 115 is used to implement communication connections between components.
  • the aforementioned communication interface 116 is used to implement connection and communication between the communication device and other peripherals.
  • the above-mentioned processor 111 is configured to couple with the memory 112 to read and execute instructions in the memory 112 to implement the method steps executed by the first device or the second device in the above method embodiment.
  • the transceiver 113 is coupled with the processor 111, and the processor 111 controls the transceiver 113 to send and receive messages.
  • the implementation principles and technical effects are similar, and will not be repeated here.
  • the system bus mentioned in FIG. 18 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the system bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used to realize the communication between the database access device and other devices (such as client, read-write library and read-only library).
  • the memory may be a non-volatile memory, such as a hard disk drive (HDD) or SSD, etc., or may be a volatile memory (volatile memory), such as a random-access memory (RAM).
  • the memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited to this.
  • the memory in the embodiments of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
  • the processor mentioned in Figure 18 can be a general-purpose processor, including a CPU, a network processor (NP), etc.; it can also be a DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device , Discrete hardware components.
  • an embodiment of the present application further provides a computer-readable storage medium, which stores instructions in the storage medium, and when the instructions are executed by a computer or a processor, the computer or the processor implements the foregoing implementations of the present application
  • a video image processing method executed by the first device or the second device.
  • an embodiment of the present application further provides a chip for executing instructions, where the chip is used to execute the video image processing method executed by the first device or the second device in any one of the foregoing embodiments of the present application.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product contains instructions.
  • the instructions run on a computer or a processor, the computer or the processor realizes one of the first devices in the foregoing embodiments of the present application. Or a video image processing method executed by the second device.
  • the size of the sequence numbers of the foregoing processes does not mean the order of execution.
  • the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application.
  • the implementation process constitutes any limitation.

Abstract

The present application provides a video image processing method and device. Upon recognizing a target area of an object in a first database comprised in a first video image, a first device performs compression on an area in the first video image, other than the target area, to obtain second compression data and sends same to a second device, such that upon receiving the second compression data, the second device can finally obtain a first image according to a combination of a third video image obtained by performing decompression according to the second compression data and an image of the target area stored in a second database of the second device. Therefore, according to the present application, the first device does not need to repeatedly perform compression on a target area of an object frequently appearing in a video image and the second device does not need to repeatedly perform decompression on the target area, thus reducing a data volume of a compressed packet transmitted between the first device and the second device and improving video image processing efficiency.

Description

视频图像处理方法及装置Video image processing method and device 技术领域Technical field
本申请涉及数据处理技术,尤其涉及一种视频图像处理方法及装置。This application relates to data processing technology, and in particular to a video image processing method and device.
背景技术Background technique
视频压缩是一种将视频文件进行重新压缩的技术,能够在不影响视频内容的情况下,将较大的视频文件压缩成较小的压缩文件进行传输或存储,常见于网络视频播放、监控视频传输等需要传输或存储视频文件的应用场景中。Video compression is a technology that recompresses video files. It can compress larger video files into smaller compressed files for transmission or storage without affecting the video content. It is common in network video playback and surveillance video. Transmission and other application scenarios that need to transmit or store video files.
现有技术中,H.264(也被称作高级视频编码(advanced video codec,AVC))、H.265(也被称作高效视频编码(high efficiency video coding,HEVC))以及H.266(也被称作下一代视频编码标准(versatile video coding,VVC))等视频压缩协议都可用于对视频文件进行压缩处理,在这些协议中,将视频文件中的所有视频图像分成不同的图像包,例如每连续的64帧图像为一个图像包。随后在对每个图像包中的每一帧图像进行压缩时,将每帧图像划分为大小不同的图像块,若当前图像中的某图像块与其他已经压缩后的图像中的图像块相似度较高时,可以认为两帧图像中的这两个图像块内的内容相同,可以使用已经压缩后的图像块表示当前图像中的该图像块,在对当前图像进行压缩时,只需要将当前图像中除了当前图像中某图像块之外的区域进行压缩即可,从而减少对视频文件进行压缩处理时的计算量、提高压缩效率。In the prior art, H.264 (also known as advanced video codec (AVC)), H.265 (also known as high efficiency video coding (HEVC)), and H.266 ( Video compression protocols, also known as the next-generation video coding standard (VVC), can be used to compress video files. In these protocols, all video images in a video file are divided into different image packages. For example, each consecutive 64 frames of images is an image packet. Then when compressing each frame of image in each image package, each frame of image is divided into image blocks of different sizes. If a certain image block in the current image is similar to the image blocks in other compressed images When it is higher, it can be considered that the contents of the two image blocks in the two frames of images are the same. The compressed image block can be used to represent the image block in the current image. When compressing the current image, you only need to The area of the image except for a certain image block in the current image can be compressed, thereby reducing the amount of calculation when compressing the video file and improving the compression efficiency.
采用现有技术,在压缩视频文件时,需要将视频文件中的每一帧视频图像进行分块得到多个图像块,并将不同图像块进行比较以得到相似图像块,而对于视频图像中对象分布较为密集、边界较多的区域就需要设置更密集的图像块进行识别与比对,使得在对每一帧视频图像进行压缩时所处理的图像块数量较多,降低了对视频文件进行压缩时的效率,最终导致了对视频图像处理的效率较低。With the prior art, when compressing a video file, each frame of video image in the video file needs to be divided into blocks to obtain multiple image blocks, and different image blocks are compared to obtain similar image blocks, while for objects in the video image Areas with denser distribution and more boundaries need to set denser image blocks for identification and comparison, so that when each frame of video image is compressed, the number of image blocks processed is larger, which reduces the compression of video files. The time efficiency ultimately leads to a lower efficiency of video image processing.
发明内容Summary of the invention
本申请提供一种视频图像处理方法及装置,应用于对视频图像进行压缩,以解决现有技术中对视频图像进行压缩时压缩效率较低,导致视频图像处理的效率较低的技术问题。The present application provides a video image processing method and device, which are applied to compress video images, so as to solve the technical problem of low compression efficiency when compressing video images in the prior art, resulting in low efficiency of video image processing.
本申请第一方面提供一种视频图像处理方法,执行主体为压缩视频图像的第一装置,其中,第一装置识别出第一视频图像中包括的第一数据库中对象的目标区域后,对第一视频图像中除目标区域之外的区域进行压缩得到第二压缩数据并发送给第二装置。The first aspect of the present application provides a video image processing method, the execution subject is a first device that compresses video images, wherein the first device recognizes the target area of the object in the first database included in the first video image, and then The area other than the target area in a video image is compressed to obtain second compressed data and sent to the second device.
在这个过程中,第一装置不需要对目标区域进行压缩,只需要对视频图像中除数据库中存在的对象的目标区域之外的区域进行压缩即可,而对于存储在第一装置的第一数据库中符合预设条件的对象的图像,第二装置的第二数据库中也已经存储,使得第二装置在接收到第二压缩数据之后,可以根据第二压缩数据解压缩得到的第三视频图像,结合第二压缩数据中存储的目标区域的图像,最终得到第一图像。因此,本实施例能够使第一装置不 需要对视频图像中经常出现的对象的目标区域反复进行压缩,基于第二装置的第二数据库中已经存储了目标区域的图像,第一装置只需要对目标区域之外的区域进行压缩即可,从而不用对重复出现的目标区域反复进行压缩操作,进而减少了最终压缩包的数据量,提高了第一装置对视频图像处理的效率。In this process, the first device does not need to compress the target area, but only needs to compress the area of the video image except the target area of the object existing in the database, and for the first device stored in the first device The image of the object in the database that meets the preset conditions is also stored in the second database of the second device, so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data , Combined with the image of the target area stored in the second compressed data to finally obtain the first image. Therefore, in this embodiment, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image. Based on the image of the target area already stored in the second database of the second device, the first device only needs to The area outside the target area can be compressed, so that there is no need to repeatedly perform the compression operation on the recurring target area, thereby reducing the data volume of the final compressed packet and improving the efficiency of the first device for processing the video image.
在本申请第一方面一实施例中,基于上述第一方面提供的视频图像方法,第二装置的第二数据库中存储的符合预设条件的对象的图像,可以是由第一装置发送的。具体地,对于第一装置,可以将第一数据库中对象的图像进行压缩后得到的第一压缩数据单独发送给第二装置,该第一压缩数据包括目标区域的压缩结果,使得第二装置接收到第一压缩数据后,对第一压缩数据进行解压缩后得到的图像集存入第二数据库中。In an embodiment of the first aspect of the present application, based on the video image method provided in the first aspect, the image of the object that meets the preset condition stored in the second database of the second device may be sent by the first device. Specifically, for the first device, the first compressed data obtained by compressing the image of the object in the first database may be separately sent to the second device. The first compressed data includes the compression result of the target area, so that the second device receives After the first compressed data is reached, the image set obtained by decompressing the first compressed data is stored in the second database.
其中,可选地,第一装置可以同时向第二装置发送第一压缩数据和第二压缩数据,则第二装置可以先解压缩第一压缩数据,从而确定第二数据库之后,再解压缩第二压缩数据;或者,可选地,第一装置在本申请第一方面所述的方法之前,先向第二装置发送第一压缩数据,在第二装置解压缩第一压缩数据并确定第二数据库之后,再执行本申请第一方面所述的方法向第二装置发送第二压缩数据。在一种可选的方案中,第一装置向第二装置发送第一压缩数据之后,就可以对第一视频图像中除目标区域之外的区域进行压缩得到第二压缩数据并向第二装置发送第二压缩数据,无需等到第二装置解压缩第一压缩数据确定第二数据库之后,也即,第一装置得到第二压缩数据的过程与第二装置解压缩第一压缩数据得到第二数据库的过程可以是并行的。本申请实施例并不限制这两个过程的先后顺序。因此,本实施例可以在第一装置可以只对第一数据库中包括的对象的图像进行一次压缩后,得到的第一压缩数据发送给第二装置,使得第二装置解压缩第一压缩数据确定第二数据库,使得后续第一装置在处理的视频图像时只需要对目标区域之外的区域进行压缩即可,从而减少了第一装置对视频图像进行压缩的图像大小和次数,进而提高了视频图像的处理效率。Wherein, optionally, the first device may simultaneously send the first compressed data and the second compressed data to the second device, and the second device may decompress the first compressed data first, so that after determining the second database, decompress the first compressed data. 2. Compressed data; or, optionally, before the method described in the first aspect of this application, the first device sends the first compressed data to the second device, and the second device decompresses the first compressed data and determines the second After the database, the method described in the first aspect of the present application is executed to send the second compressed data to the second device. In an optional solution, after the first device sends the first compressed data to the second device, it can compress the area in the first video image except the target area to obtain the second compressed data and send it to the second device. Sending the second compressed data, there is no need to wait until the second device decompresses the first compressed data to determine the second database, that is, the process of obtaining the second compressed data by the first device and the second device decompressing the first compressed data to obtain the second database The process can be parallel. The embodiment of the present application does not limit the sequence of the two processes. Therefore, in this embodiment, after the first device can compress the image of the object included in the first database only once, the obtained first compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine The second database enables the subsequent video images processed by the first device to only need to compress the area outside the target area, thereby reducing the image size and number of times the first device compresses the video image, thereby improving the video Image processing efficiency.
在本申请第一方面一实施例中,由于第一装置仅对第一视频图像中目标区域之外的区域进行压缩,由第二装置根据第二压缩数据解压缩得到的第三视频图像,结合第二压缩数据中存储的目标区域的图像得到第一视频图像,而为了让第二装置更加快捷、准确地确定第一视频图像中,所解压得到的第三视频图像和目标区域的位置关系,作为压缩端的第一装置可以在发送第二压缩数据时,同步向第一装置发送目标区域在第一视频图像中的标记信息,其中,所述标记信息包括第一视频图像中目标区域的位置信息、目标区域中包括的对象的图像在第一数据库中的标识信息获取变换信息中的至少一项。因此,本实施例提供的视频图像处理方法,第一装置在确定目标区域时,同样可以确定出目标区域的标记信息,并在后续将第二压缩数据和目标区域的标记信息同时发送给第二装置,使得第二装置能够更加迅速、准确地确定出第一视频图像中的目标区域,进而能够在接收到第二压缩数据之后更快地确定第一视频图像,进一步提高了视频图像的处理效率。In an embodiment of the first aspect of the present application, since the first device only compresses the area outside the target area in the first video image, the third video image obtained by decompressing the second device according to the second compressed data is combined with The image of the target area stored in the second compressed data obtains the first video image, and in order to allow the second device to more quickly and accurately determine the positional relationship between the decompressed third video image and the target area in the first video image, When sending the second compressed data, the first device as the compression terminal can synchronously send the tag information of the target area in the first video image to the first device, where the tag information includes the location information of the target area in the first video image. , The identification information of the image of the object included in the target area in the first database acquires at least one item of the transformation information. Therefore, in the video image processing method provided by this embodiment, the first device can also determine the marking information of the target area when determining the target area, and subsequently send the second compressed data and the marking information of the target area to the second at the same time. The device enables the second device to more quickly and accurately determine the target area in the first video image, and then can determine the first video image faster after receiving the second compressed data, further improving the processing efficiency of the video image .
在本申请第一方面一实施例中,当所提供的方法应用在实时视频图像传输的场景中时,所述预设条件包括:第一视频图像之前的N张视频图像中,包括对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N。具体地,本实施例可应用与第一装置实时获取了第一视频图像后,即压缩处理后发送至第二装置的场景,则第一装置通过前述实施例对第一视频图像中除目标区域之外的部分进行压缩时,所基于的第一数据库是第一视频图像之前,第N+1张视频图像到第1张视频图像这N张视频图像所得到的,而对象所 满足的预设条件为在这N张视频图像中对象出现的次数或者说包括对象的视频图像的数量大于或等于M。因此,本实施例中第一装置确定目标区域时所基于的第一数据库也是根据第一视频图像之前的N张图像得到的,从而满足了第一数据库的实时性,能够应用在第一装置对实时获取的第一视频图像进行压缩的场景中,用于提高第一装置对实时视频图像的处理效率。In an embodiment of the first aspect of the present application, when the provided method is applied in a scene of real-time video image transmission, the preset condition includes: among the N video images before the first video image, the video image of the object is included The number of is greater than or equal to M, where M and N are both positive integers, N>1, M<N. Specifically, this embodiment can be applied to the scene that the first device acquires the first video image in real time, that is, compressed and sent to the second device, then the first device removes the target area from the first video image through the foregoing embodiment. When the other part is compressed, the first database based on the first video image is obtained from the N+1 video image to the first video image, and the object meets the preset The condition is that the number of occurrences of the object in the N video images, or the number of video images including the object, is greater than or equal to M. Therefore, the first database on which the first device determines the target area in this embodiment is also obtained based on the N images before the first video image, which satisfies the real-time nature of the first database and can be applied to the first device to In a scene where the first video image acquired in real time is compressed, it is used to improve the processing efficiency of the real-time video image by the first device.
在本申请第一方面一实施例中,第一装置在经过前述实施例对第一视频图像进行获取目标区域、压缩得到第二压缩数据并发送给第二装置之后,第一装置还可以基于最新获取的第一视频图像,更新第一数据库。而在加入第一视频图像后,判断预设条件时依据的是第一视频图像以及之前的N-1张视频图像共N张视频图像所得到的。则当第一装置从第一视频图像以前的N张视频图像中识别出满足预设条件的,也就是在这N张视频图像中对象出现的次数或者说包括对象的视频图像的数量大于或等于M的新的目标对象后,将新的目标对象的图像加入第一数据库中,实现第一数据库的更新,从而满足了第一装置在对视频图像处理时,确定的目标区域的实时性。In an embodiment of the first aspect of the present application, after the first device obtains the target area of the first video image, compresses the second compressed data and sends it to the second device through the foregoing embodiment, the first device may also be based on the latest Update the first database with the acquired first video image. After the first video image is added, the judgment of the preset condition is based on the first video image and the previous N-1 video images, a total of N video images are obtained. Then, when the first device recognizes from the N video images before the first video image that meet the preset condition, that is, the number of occurrences of the object in these N video images or the number of video images including the object is greater than or equal to After M's new target object, the image of the new target object is added to the first database to update the first database, thereby satisfying the real-time nature of the target area determined by the first device when processing the video image.
在本申请第一方面一实施例中,第一装置在将新的目标对象的图像加入到第一数据库中,就可以将新加入的目标对象的图像发送给第二装置,让第二装置一侧对其存储的第二数据库进行更新,保持第一数据库和第二数据库之间的一致。In an embodiment of the first aspect of the present application, when the first device adds the image of the new target object to the first database, it can send the image of the newly added target object to the second device, so that the second device can The side updates its stored second database to keep the consistency between the first database and the second database.
在一种实现方式中,第一装置可以将新的目标对象的图像进行压缩后,得到的第三压缩数据发送给第二装置;或者,在另一种实现方式中,第一装置可以将加入新的目标对象后的第一数据库整体压缩后,得到的第四压缩数据发送给第二装置。In one implementation manner, the first device may compress the image of the new target object, and then send the third compressed data obtained to the second device; or, in another implementation manner, the first device may add After the first database after the new target object is compressed as a whole, the obtained fourth compressed data is sent to the second device.
对于第二装置,在接收到第三压缩数据或者第四压缩数据时,可以对第二数据库进行更新,使得更新后的第二数据库中存储新的目标对象。在此之后,对于第一装置所处理的视频图像,就可以将视频图像中包括新的目标对象的区域作为目标区域而不进行压缩,第二装置在接收到不包括目标区域的压缩数据,进行解压缩后,结合从第二数据库中获取新的目标对象的图像,最终得到视频图像。For the second device, upon receiving the third compressed data or the fourth compressed data, the second database may be updated, so that the updated second database stores the new target object. After that, for the video image processed by the first device, the area that includes the new target object in the video image can be used as the target area without compression, and the second device receives the compressed data that does not include the target area. After decompression, combined with obtaining the image of the new target object from the second database, the video image is finally obtained.
在本申请第一方面一实施例中,第一装置除了对向第一数据库中新增对象的图像,还可以在第一数据库中存储的对象不满足预设条件之后,删除第一数据库中存储的图像,以节省第一数据库的存储空间,提高第一装置的存储空间的利用效率。In an embodiment of the first aspect of the present application, in addition to facing the image of the newly added object in the first database, the first device may also delete the object stored in the first database after the object stored in the first database does not meet a preset condition. Image to save the storage space of the first database and improve the utilization efficiency of the storage space of the first device.
在本申请第一方面一实施例中,第一装置还可以在第一视频图像中目标区域的对象清晰度较高时,对第一数据库中存储的对象的图像进行替换。其中,第一装置在对第一视频图像中的目标区域时,若确定目标区域中包括的第一对象的清晰度优于数据库中存储的第一对象的清晰度,则使用第一视频图像中第一对象的图像替换第一数据库中存储的第一对象的图像。同样地,对于第一数据库所做的更新也可以由第一装置进行压缩,并发送给第二装置,使得第二装置更新第二数据库。因此,后续第二装置在通过第二数据库中的对象还原第一视频图像时,能够获得更好的清晰度,防止因第二数据库中存储的对象的图像清晰度不如实际第一视频图像中对象的清晰度,而造成第一视频图像中目标区域不清楚的问题。In an embodiment of the first aspect of the present application, the first device may also replace the image of the object stored in the first database when the definition of the object in the target area in the first video image is relatively high. Wherein, when the first device detects the target area in the first video image, if it is determined that the definition of the first object included in the target area is better than the definition of the first object stored in the database, it uses the The image of the first object replaces the image of the first object stored in the first database. Similarly, updates made to the first database can also be compressed by the first device and sent to the second device, so that the second device updates the second database. Therefore, when the subsequent second device restores the first video image through the object in the second database, it can obtain better definition, and prevent the image definition of the object stored in the second database from being inferior to the actual object in the first video image. The sharpness of the image, causing the problem that the target area in the first video image is not clear.
在本申请第一方面一实施例中,当所提供的方法应用在实时视频图像传输的场景中,或者当第一装置的第一数据库中未存储图像时,例如,对于一个实时传输的视频文件,第一装置获取第一张视频图像时,并不能根据第一数据库确定目标区域,因此,为了保证第 一装置在处理视频图像时的完备性,当第一装置获取了视频文件中帧号小于预设帧号的视频图像时,可以对视频图像进行整体编码,并根据小于预设帧号的这部分视频图像确定符合预设条件的对象,并存入第一数据库。随后,当第一装置通过小于预设帧号的视频图像建立第一数据库后,第一装置在接收到大于预设帧号的第一视频图像后,可以执行本申请前述实施例中的视频图像处理方法。In an embodiment of the first aspect of the present application, when the provided method is applied in a scene of real-time video image transmission, or when no image is stored in the first database of the first device, for example, for a real-time transmitted video file, When the first device acquires the first video image, it cannot determine the target area based on the first database. Therefore, in order to ensure the completeness of the first device in processing the video image, when the first device acquires the video file with a frame number smaller than the preset When a video image with a frame number is set, the video image can be encoded as a whole, and an object that meets the preset condition is determined based on the part of the video image that is less than the preset frame number, and stored in the first database. Subsequently, after the first device establishes the first database with video images smaller than the preset frame number, after receiving the first video image larger than the preset frame number, the first device may execute the video image in the foregoing embodiment of the present application Approach.
在本申请第一方面一实施例中,当所提供的方法应用在非实时视频图像传输的场景中时,由于第一装置能够完整地获取整个视频图像,则所述预设条件可以是对象在整个视频文件中,包括对象的视频图像的数量大于或等于预设数量。因此,本实施例中第一装置确定目标区域时所基于的第一数据库也是根据视频文件中所有图像得到的,从而满足了第一数据库的完备,保证所加入第一数据库的对象都是满足预设条件的,能够应用在第一装置对非实时的视频文件中,第一视频图像进行压缩的场景中,用于提高第一装置对实时视频图像的处理效率。In an embodiment of the first aspect of the present application, when the provided method is applied in a scene of non-real-time video image transmission, since the first device can completely obtain the entire video image, the preset condition may be that the object is in the entire In the video file, the number of video images including the object is greater than or equal to the preset number. Therefore, the first database on which the first device determines the target area in this embodiment is also obtained based on all the images in the video file, thereby satisfying the completeness of the first database and ensuring that all objects added to the first database meet the expected requirements. If conditions are set, it can be applied to a scene where the first device compresses a non-real-time video file and the first video image is used to improve the processing efficiency of the first device on the real-time video image.
在本申请第一方面一实施例中,在非实时视频图像传输的场景中的第一数据库也可以是第一装置根据视频图像得到的。其中,第一装置可以在传输视频文件之前,首先识别视频文件的所有视频图像中符合预设条件的对象的图像并存入第一数据库,随后再对视频图像中的每张视频图像作为上述第一视频图像进行处理。In an embodiment of the first aspect of the present application, the first database in the scene of non-real-time video image transmission may also be obtained by the first device according to the video image. Wherein, the first device may first identify the image of the object that meets the preset conditions among all the video images of the video file and store it in the first database before transmitting the video file, and then treat each video image in the video image as the above-mentioned second image. A video image is processed.
在本申请第一方面一实施例中,第二装置中除了可以直接存储符合预设条件的对象的图像,而由于第一装置在确定对象符合预设条件之前,并不会将该对象识别为目标区域,会使得第一装置向第二装置发送的第二压缩数据中包括该对象的图像。因此,为了节省第一装置和第二装置之间传输的数据量,第一装置可以只通过目标区域的边界像素位置或帧号中的至少一项来代替所发送的对象,使得第二装置在接收到这些信息后,即可自行从对应的帧号的边界像素位置处获取对象的图像。因此,本实施例提供的视频图像处理方法,能够在第一装置向第二装置发送第一数据库中的对象的图像时,只发送对象的图像的边界像素位置以及帧号,让第二装置可以从已经接收到的视频图像中获取目标区域,从而减少了第一装置向第二装置发送第一视频图像时,实际发送的数据量,使得第一装置更快地压缩、第二装置更快地解压缩,进一步提高了对视频图像的处理效率。In an embodiment of the first aspect of the present application, in addition to directly storing the image of the object that meets the preset condition in the second device, the first device does not recognize the object as the object before determining that the object meets the preset condition. The target area will cause the second compressed data sent by the first device to the second device to include the image of the object. Therefore, in order to save the amount of data transmitted between the first device and the second device, the first device can replace the sent object only by at least one of the boundary pixel position or the frame number of the target area, so that the second device is After receiving this information, the image of the object can be obtained from the boundary pixel position of the corresponding frame number by itself. Therefore, the video image processing method provided by this embodiment can send only the boundary pixel position and frame number of the object image when the first device sends the image of the object in the first database to the second device, so that the second device can Obtain the target area from the received video image, thereby reducing the amount of data actually sent when the first device sends the first video image to the second device, making the first device compress faster and the second device faster Decompression further improves the processing efficiency of video images.
本申请第二方面提供一种视频图像处理方法,执行主体为接收压缩后的视频文件的第二装置,其中,第二装置将接收到的第二压缩数据解压缩后得到不包括目标区域的第一视频图像,并结合第二数据库确定目标区域的对象的图像,二者进行拼接处理后,即可得到第一视频图像。因此对于不同视频图像中都包括目标区域的情况,由于第二装置只需要对第一压缩数据进行一次解压即可得到目标区域中对象的图像,而在其他视频图像中都不需要解压缩目标区域直接从第二数据库中呼气,由于在解压时缺少了视频图像中包括的目标区域,从而减少了装置在解压缩时的计算量,也能够提高第二装置对视频图像处理的效率。A second aspect of the present application provides a video image processing method. The execution subject is a second device that receives compressed video files, wherein the second device decompresses the received second compressed data to obtain a first video image processing method that does not include the target area. A video image is combined with the second database to determine the image of the object in the target area. After the two are stitched together, the first video image can be obtained. Therefore, for the case where the target area is included in different video images, the second device only needs to decompress the first compressed data once to obtain the image of the object in the target area, but there is no need to decompress the target area in other video images. Exhale directly from the second database, since the target area included in the video image is missing during decompression, the calculation amount of the device during decompression is reduced, and the efficiency of video image processing by the second device can also be improved.
在本申请第二方面一实施例中,第二装置的第二数据库中存储的符合预设条件的对象的图像,可以是由第一装置发送的。具体地,对于第二装置,在接收到第一装置发送的第一压缩数据后,对第一压缩数据进行解压缩后得到的图像集存入第二数据库中。因此,本实施例可以在第一装置可以只对第一数据库中包括的对象的图像进行一次压缩后,得到的第一压缩数据发送给第二装置,使得第二装置解压缩第一压缩数据确定第二数据库,使得后续第二装置在处理的视频图像时只需要对第二压缩数据进行解压缩即可,而不再需要对 目标区域的对象进行重复的压缩即可,从而减少了第二装置进行解压缩的文件大小和次数,进而提高了视频图像的处理效率。In an embodiment of the second aspect of the present application, the image of the object meeting the preset condition stored in the second database of the second device may be sent by the first device. Specifically, for the second device, after receiving the first compressed data sent by the first device, the image set obtained by decompressing the first compressed data is stored in the second database. Therefore, in this embodiment, after the first device can compress the image of the object included in the first database only once, the obtained first compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine The second database allows the subsequent second device to only need to decompress the second compressed data when processing the video image, instead of repeatedly compressing objects in the target area, thereby reducing the number of second devices. The file size and times of decompression, thereby improving the processing efficiency of the video image.
在本申请第二方面一实施例中,为了让第二装置更加快捷、准确地确定第一视频图像中,所解压得到的第三视频图像和目标区域的位置关系,作为压缩端的第一装置可以在发送第二压缩数据时,同步向第一装置发送目标区域在第一视频图像中的标记信息,其中,所述标记信息包括第一视频图像中目标区域的位置信息、目标区域中包括的对象的图像在第一数据库中的标识信息获取变换信息中的至少一项。则对于第二装置,在接收到第压缩数据和目标区域的标记信息后,能够更加迅速、准确地确定出第一视频图像中的目标区域,进而能够在接收到第二压缩数据并之后更快地确定第一视频图像,进一步提高了视频图像的处理效率。In an embodiment of the second aspect of the present application, in order to allow the second device to more quickly and accurately determine the positional relationship between the decompressed third video image and the target area in the first video image, the first device as the compression end may When sending the second compressed data, synchronously send to the first device the marking information of the target area in the first video image, where the marking information includes the position information of the target area in the first video image and the objects included in the target area The identification information of the image in the first database acquires at least one item of the transformation information. Then for the second device, after receiving the first compressed data and the mark information of the target area, it can more quickly and accurately determine the target area in the first video image, and then it can be faster after receiving the second compressed data. The first video image is determined accurately, which further improves the processing efficiency of the video image.
在申请第二方面一实施例中,第二装置中除了可以直接存储符合预设条件的对象的图像,而由于第一装置在确定对象符合预设条件之前,并不会将该对象识别为目标区域,会使得第一装置向第二装置发送的第二压缩数据中包括该对象的图像。因此,为了节省第一装置和第二装置之间传输的数据量,第一装置可以只通过目标区域的边界像素位置或帧号中的至少一项来代替所发送的对象,使得第二装置在接收到这些信息后,即可自行从对应的帧号的边界像素位置处获取对象的图像。因此,本实施例提供的视频图像处理方法,能够在第一装置向第二装置发送第一数据库中的对象的图像时,只发送对象的图像的边界像素位置以及帧号,让第二装置可以从已经接收到的视频图像中获取目标区域,从而减少了第一装置向第二装置发送第一视频图像时,实际发送的数据量,使得第一装置更快递压缩、第二装置更快地解压缩,进一步提高了对视频图像的处理效率。In an embodiment of the second aspect of the application, in addition to directly storing the image of an object that meets the preset conditions in the second device, since the first device does not recognize the object as a target before determining that the object meets the preset conditions The area will cause the second compressed data sent by the first device to the second device to include the image of the object. Therefore, in order to save the amount of data transmitted between the first device and the second device, the first device can replace the sent object only by at least one of the boundary pixel position or the frame number of the target area, so that the second device is After receiving this information, the image of the object can be obtained from the boundary pixel position of the corresponding frame number by itself. Therefore, the video image processing method provided by this embodiment can send only the boundary pixel position and frame number of the object image when the first device sends the image of the object in the first database to the second device, so that the second device can Obtain the target area from the received video image, thereby reducing the amount of data actually sent when the first device sends the first video image to the second device, making the first device faster to compress and the second device to decompress faster. Compression further improves the processing efficiency of video images.
在本申请第二方面一实施例中,当所提供的方法应用在实时视频图像传输的场景中时,第一装置在基于最新获取的第一视频图像,更新第一数据库后,第一装置在将新的目标对象的图像加入到第一数据库中,就可以将新加入的目标对象的图像进行压缩后得到的第三压缩数据或者,将新加入目标对象的图像后的第一数据库整体进行压缩后得到的第四压缩数据,发送给第二装置。此时,对于第二装置,可以根据第三压缩数据或者第四压缩数据对其存储的第二数据库进行更新,保持第一数据库和第二数据库之间的一致。就可以将视频图像中包括新的目标对象的区域作为目标区域而不进行压缩,第二装置在接收到不包括目标区域的压缩数据,进行解压缩后,结合从第二数据库中获取新的目标对象的图像,最终得到视频图像。In an embodiment of the second aspect of the present application, when the provided method is applied in a scene of real-time video image transmission, after the first device updates the first database based on the newly acquired first video image, the first device When the image of the new target object is added to the first database, the newly added image of the target object can be compressed to obtain the third compressed data, or the first database after the image of the newly added target object is compressed as a whole The obtained fourth compressed data is sent to the second device. At this time, for the second device, the stored second database can be updated according to the third compressed data or the fourth compressed data to maintain the consistency between the first database and the second database. The area of the video image that includes the new target object can be used as the target area without compression. After the second device receives the compressed data that does not include the target area, it decompresses it, and then obtains the new target from the second database. The image of the object, and finally a video image.
在本申请第二方面一实施例中,当应用在当所提供的方法应用在实时视频图像传输的场景中时,所述预设条件包括:第一视频图像之前的N张视频图像中,包括对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N;当所提供的方法应用在非实时视频图像传输的场景中时,所述预设条件可以是对象在整个视频文件中,包括对象的视频图像的数量大于或等于预设数量。In an embodiment of the second aspect of the present application, when the provided method is applied to a scene of real-time video image transmission, the preset condition includes: the N video images before the first video image include the object The number of video images is greater than or equal to M, where M and N are both positive integers, N>1, M<N; when the provided method is applied to a scene of non-real-time video image transmission, the preset conditions can be The object is in the entire video file, and the number of video images including the object is greater than or equal to the preset number.
本申请第三方面提供一种视频图像处理装置,可作为第一装置,用于执行如本申请第一方面任一所述的视频图像处理方法,该装置包括:获取模块、第一确定模块、压缩模块和发送模块;The third aspect of the present application provides a video image processing device, which can be used as a first device for executing the video image processing method according to any one of the first aspects of the present application, and the device includes: an acquisition module, a first determination module, Compression module and sending module;
其中,获取模块用于获取第一视频图像;第一确定模块用于确定第一视频图像中的目标区域;目标区域包括第一装置的第一数据库中存储的符合预设条件的对象的图像;压缩 模块用于对第一视频图像中除目标区域之外的区域进行压缩,得到第二压缩数据;发送模块,于向第二装置发送第二压缩数据;第二装置的第二数据库中已存储符合预设条件的对象的图像。Wherein, the obtaining module is used to obtain the first video image; the first determining module is used to determine the target area in the first video image; the target area includes the image of the object that meets the preset condition stored in the first database of the first device; The compression module is used to compress the area except the target area in the first video image to obtain the second compressed data; the sending module is used to send the second compressed data to the second device; the second database of the second device has been stored An image of an object that meets the preset conditions.
在本申请第三方面一实施例中,压缩模块还用于将第一数据库中存储的对象的图像进行压缩,得到第一压缩数据;发送模块还用于向第二装置发送第一压缩数据;第一压缩数据用于第二装置确定第二数据库。In an embodiment of the third aspect of the present application, the compression module is further configured to compress the image of the object stored in the first database to obtain the first compressed data; the sending module is further configured to send the first compressed data to the second device; The first compressed data is used by the second device to determine the second database.
在本申请第三方面一实施例中,发送模块具体用于,向第二装置发送第二压缩数据和目标区域的标记信息;其中,标记信息包括:第一视频图像中目标区域的位置信息,目标区域中包括的对象的图像在第一数据库中的标识信息或者变换信息中的至少一项;变换信息用于表示目标区域中的对象在第一数据库中的图像与第一视频图像之间的区别。In an embodiment of the third aspect of the present application, the sending module is specifically configured to send the second compressed data and the marking information of the target area to the second device; wherein the marking information includes: position information of the target area in the first video image, At least one of the identification information or transformation information of the image of the object included in the target area in the first database; the transformation information is used to indicate the difference between the image of the object in the target area and the first video image in the first database. the difference.
在本申请第三方面一实施例中,预设条件包括:第一视频图像之前的N张视频图像中,包括对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N。In an embodiment of the third aspect of the present application, the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M<N.
在本申请第三方面一实施例中,装置还包括:第二确定模块和存储管理模块;In an embodiment of the third aspect of the present application, the device further includes: a second determining module and a storage management module;
其中,第二确定模块用于识别第一视频图像中符合预设条件的目标对象;存储管理模块用于将目标对象中的新目标对象对应的图像加入第一数据库,新目标对象为未在第一数据库中存储的对象,第一数据库存储在存储模块中。Wherein, the second determining module is used to identify the target object in the first video image that meets the preset conditions; the storage management module is used to add the image corresponding to the new target object in the target object into the first database, and the new target object is not in the first database. For objects stored in a database, the first database is stored in the storage module.
在本申请第三方面一实施例中,压缩模块还用于,对新目标对象对应的图像进行压缩,得到第三压缩数据;发送模块还用于,将第三压缩数据发送给第二设备。In an embodiment of the third aspect of the present application, the compression module is further configured to compress the image corresponding to the new target object to obtain third compressed data; the sending module is also configured to send the third compressed data to the second device.
在本申请第三方面一实施例中,在存储管理模块将目标对象中的新目标对象对应的图像加入第一数据库之后,压缩模块还用于,对第一数据库中存储的对象的图像进行压缩,得到第四压缩数据;发送模块还用于,将第四压缩数据发送给第二设备。In an embodiment of the third aspect of the present application, after the storage management module adds the image corresponding to the new target object in the target object to the first database, the compression module is also used to compress the image of the object stored in the first database To obtain the fourth compressed data; the sending module is also used to send the fourth compressed data to the second device.
在本申请第三方面一实施例中,存储管理模块还用于,将第一数据库中存储的不符合预设条件的对象的图像删除。In an embodiment of the third aspect of the present application, the storage management module is further configured to delete the image of the object that does not meet the preset condition stored in the first database.
在本申请第三方面一实施例中,存储管理模块还用于,当目标区域中的第一对象在第一视频图像中的图像的清晰度,优于第一数据库中存储的第一对象的图像的清晰度,用第一对象在第一视频图像中的图像替换第一数据库中存储的第一对象的图像。In an embodiment of the third aspect of the present application, the storage management module is further configured to: when the sharpness of the first object in the target area in the first video image is better than that of the first object stored in the first database For the definition of the image, the image of the first object stored in the first database is replaced with the image of the first object in the first video image.
在本申请第三方面一实施例中,第一视频图像是第一装置正在实时压缩并传输的视频文件中帧号大于预设帧号的视频图像;获取模块还用于,获取待处理视频中的第二视频图像,第二视频图像在待处理视频中的帧号小于预设帧号;第二确定模块还用于,识别第二视频图像中符合预设条件的对象;存储管理模块还用于,将第二视频图像中符合预设条件的对象存入第一数据库。In an embodiment of the third aspect of the present application, the first video image is a video image with a frame number greater than a preset frame number in a video file that is being compressed and transmitted by the first device in real time; The second video image, the frame number of the second video image in the video to be processed is less than the preset frame number; the second determining module is also used to identify objects in the second video image that meet the preset conditions; the storage management module also uses Therefore, the objects in the second video image that meet the preset conditions are stored in the first database.
在本申请第三方面一实施例中,预设条件包括:第一视频图像所在的视频文件中,包括对象的视频图像的数量大于或等于预设数量。In an embodiment of the third aspect of the present application, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.
在本申请第三方面一实施例中,装置还包括:第三确定模块;其中,第三确定模块,用于识别视频文件中所有视频图像中符合预设条件的对象;存储管理模块,用于将符合预设条件的对象的图像存入第一数据库。In an embodiment of the third aspect of the present application, the device further includes: a third determining module; wherein, the third determining module is configured to identify objects that meet preset conditions in all video images in the video file; and the storage management module is configured to The image of the object that meets the preset condition is stored in the first database.
在本申请第三方面一实施例中,第一数据库存储的对象的图像包括:对象的边界像素位置和包括对象的视频图像在视频文件中的帧号。In an embodiment of the third aspect of the present application, the image of the object stored in the first database includes: the boundary pixel position of the object and the frame number of the video image including the object in the video file.
本申请第四方面提供一种视频图像处理装置,可作为第二装置,用于执行如本申 请第二方面任一的视频图像处理方法,该装置包括:接收模块、解压缩模块、获取模块和确定模块;其中,接收模块,用于接收第一装置发送的第二压缩数据;其中,第二压缩数据是对第一视频图像中除目标区域之外的区域进行压缩得到的;解压缩模块,用于对第二压缩数据进行解压缩,得到第三视频图像,第三视频图像包括第一视频图像中的除目标区域之外的区域对应的图像;获取模块,用于从第二装置的第二数据库中获取目标区域对应的图像;确定模块,用于根据第三视频图像和目标区域对应的图像,确定第一视频图像。The fourth aspect of the present application provides a video image processing device, which can be used as a second device to execute any video image processing method as in the second aspect of the present application. The device includes: a receiving module, a decompression module, an acquisition module, and Determining module; wherein the receiving module is used to receive the second compressed data sent by the first device; wherein the second compressed data is obtained by compressing the area other than the target area in the first video image; the decompressing module, Used to decompress the second compressed data to obtain a third video image. The third video image includes an image corresponding to an area other than the target area in the first video image; the acquisition module is used to obtain a third video image from the second device. Second, the image corresponding to the target area is acquired from the database; the determining module is used to determine the first video image according to the third video image and the image corresponding to the target area.
在本申请第四方面一实施例中,装置还包括:存储管理模块;其中,接收模块还用于,接收第一装置发送的第一压缩数据;解压缩模块还用于,对第一压缩数据进行解压缩,得到符合预设条件的对象对应的图像集,图像集包括目标区域对应的图像;存储管理模块用于,将图像集存入第二数据库。In an embodiment of the fourth aspect of the present application, the device further includes: a storage management module; wherein, the receiving module is further configured to receive the first compressed data sent by the first device; the decompression module is further configured to: Decompression is performed to obtain an image set corresponding to the object that meets the preset conditions, the image set includes the image corresponding to the target area; the storage management module is used to store the image set in the second database.
在本申请第四方面一实施例中,接收模块还用于,接收第一装置发送的目标区域的标记信息;其中,标记信息包括:第一视频图像中目标区域的位置信息,目标区域中包括的对象在第一装置的第一数据库中的标识信息或变换信息中的至少一项;变换信息用于表示目标区域中的对象在第一数据库中的图像与第一视频图像之间的区别。In an embodiment of the fourth aspect of the present application, the receiving module is further configured to receive the marking information of the target area sent by the first device; wherein the marking information includes: position information of the target area in the first video image, and the target area includes At least one of the identification information or transformation information of the object in the first database of the first device; the transformation information is used to indicate the difference between the image of the object in the target area in the first database and the first video image.
在本申请第四方面一实施例中,确定模块,具体用于根据目标区域的标记信息,将目标区域对应的图像、和第三视频图像进行拼接,得到第一视频图像。In an embodiment of the fourth aspect of the present application, the determining module is specifically configured to stitch the image corresponding to the target area and the third video image to obtain the first video image according to the marking information of the target area.
在本申请第四方面一实施例中,接收模块还用于,接收第一装置发送的第三压缩数据;解压缩模块还用于,对第三压缩数据进行解压缩,得到新的目标对象的图像;存储管理模块还用于,将新的目标对象的图像加入第二数据库。In an embodiment of the fourth aspect of the present application, the receiving module is further configured to receive the third compressed data sent by the first device; the decompression module is also configured to decompress the third compressed data to obtain the new target object Image; The storage management module is also used to add the image of the new target object to the second database.
在本申请第四方面一实施例中,接收模块还用于,接收第一装置发送的第四压缩数据;解压缩模块还用于,对第四压缩数据进行解压缩,得到更新后的符合预设条件的对象对应的图像集;存储管理模块还用于,基于更新后的符合预设条件的对象对应的图像集更新第二数据库。In an embodiment of the fourth aspect of the present application, the receiving module is further configured to receive the fourth compressed data sent by the first device; the decompression module is also configured to decompress the fourth compressed data to obtain the updated conforming data Set the image set corresponding to the conditional object; the storage management module is also used to update the second database based on the updated image set corresponding to the object that meets the preset condition.
在本申请第四方面一实施例中,预设条件包括:第一视频图像之前的N张视频图像中,包括对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N;或者,预设条件包括:第一视频图像所在的视频文件中,包括对象的视频图像的数量大于或等于预设数量。In an embodiment of the fourth aspect of the present application, the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M<N; or, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.
本申请第五方面提供一种视频图像处理装置,包括:处理器和传输接口;所述装置通过所述传输接口与其他装置进行通信;所述处理器被配置为读取存储在存储器中的软件指令,以实现如本申请第一方面任一项所述的方法。A fifth aspect of the present application provides a video image processing device, including: a processor and a transmission interface; the device communicates with other devices through the transmission interface; the processor is configured to read software stored in a memory Instructions to implement the method described in any one of the first aspect of the present application.
本申请第六方面提供一种视频图像处理装置,包括:处理器和传输接口;所述装置通过所述传输接口与其他装置进行通信;所述处理器被配置为读取存储在存储器中的软件指令,以实现如本申请第二方面任一项所述的方法。A sixth aspect of the present application provides a video image processing device, including: a processor and a transmission interface; the device communicates with other devices through the transmission interface; the processor is configured to read software stored in a memory Instructions to implement the method described in any one of the second aspect of the present application.
本申请第七方面提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令被计算机或处理器运行时,使得所述计算机或处理器实现如本申请第一方面任一项所述的方法。A seventh aspect of the present application provides a computer-readable storage medium that stores instructions in the computer-readable storage medium. When the instructions are executed by a computer or a processor, the computer or the processor realizes The method of any one of the aspects.
本申请第八方面提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令被计算机或处理器运行时,使得所述计算机或处理器实现如本申请第二方 面任一项所述的方法。The eighth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium. When the instructions are executed by a computer or a processor, the computer or the processor can realize The method of any one of the two aspects.
本申请第九方面提供一种计算机程序产品,其特征在于,所述计算机程序产品中包含指令,当所述指令在计算机或处理器上运行时,使得所述计算机或所述处理器实现如本申请第一方面任一项所述的方法。The ninth aspect of the present application provides a computer program product, characterized in that the computer program product contains instructions, when the instructions run on a computer or a processor, the computer or the processor is Apply for the method described in any one of the first aspect.
本申请第十方面提供一种计算机程序产品,其特征在于,所述计算机程序产品中包含指令,当所述指令在计算机或处理器上运行时,使得所述计算机或所述处理器实现如本申请第二方面任一项所述的方法。The tenth aspect of the present application provides a computer program product, characterized in that the computer program product contains instructions, when the instructions run on a computer or a processor, the computer or the processor is Apply for the method described in any one of the second aspect.
附图说明Description of the drawings
图1为本申请应用场景的示意图;Figure 1 is a schematic diagram of the application scenario of this application;
图2为一种视频压缩技术的示意图;Figure 2 is a schematic diagram of a video compression technology;
图3为一种视频图像划分图像块的示意图;Figure 3 is a schematic diagram of a video image divided into image blocks;
图4为本申请提供的视频图像处理方法一实施例的流程示意图;4 is a schematic flowchart of an embodiment of a video image processing method provided by this application;
图5为本申请提供的数据库设置方式示意图;Figure 5 is a schematic diagram of the database setting method provided by this application;
图6为本申请提供的一种视频图像中对象的示意图;FIG. 6 is a schematic diagram of an object in a video image provided by this application;
图7为本申请提供的视频图像处理方法另一实施例的流程示意图;FIG. 7 is a schematic flowchart of another embodiment of a video image processing method provided by this application;
图8为本申请提供的视频图像处理方法实施例的一示例性的流程示意图;FIG. 8 is an exemplary flowchart of an embodiment of a video image processing method provided by this application;
图9为本申请提供的视频图像处理方法实施例的一示例性的流程示意图;FIG. 9 is an exemplary flowchart of an embodiment of a video image processing method provided by this application;
图10为本申请提供的视频图像处理方法实施例的一示例性的流程示意图;FIG. 10 is an exemplary flowchart of an embodiment of a video image processing method provided by this application;
图11为本申请提供的视频图像处理方法实施例的一示例性的流程示意图;FIG. 11 is an exemplary flowchart of an embodiment of a video image processing method provided by this application;
图12为本申请提供的视频图像处理方法实施例的一示例性的流程示意图;FIG. 12 is an exemplary flowchart of an embodiment of a video image processing method provided by this application;
图13为本申请提供的视频图像处理装置一实施例的结构示意图;FIG. 13 is a schematic structural diagram of an embodiment of a video image processing device provided by this application;
图14为本申请提供的视频图像处理装置一实施例的结构示意图;FIG. 14 is a schematic structural diagram of an embodiment of a video image processing device provided by this application;
图15为本申请提供的视频图像处理装置一实施例的结构示意图;15 is a schematic structural diagram of an embodiment of a video image processing device provided by this application;
图16为本申请提供的视频图像处理装置一实施例的结构示意图;FIG. 16 is a schematic structural diagram of an embodiment of a video image processing device provided by this application;
图17为本申请提供的视频图像处理装置一实施例的结构示意图;FIG. 17 is a schematic structural diagram of an embodiment of a video image processing device provided by this application;
图18为本申请提供的视频图像处理装置一实施例的结构示意图。FIG. 18 is a schematic structural diagram of an embodiment of a video image processing device provided by this application.
具体实施方式Detailed ways
下面在介绍本申请实施例前,先结合附图,对本申请所应用的场景以及存在的问题进行说明。Before introducing the embodiments of the present application, the following describes the application scenarios and existing problems of the present application with reference to the accompanying drawings.
图1为本申请应用场景的示意图,本申请应用于不同的装置之间视频文件的传输的场景,其中,如图1中所述的装置可以是手机、平板电脑、笔记本电脑、台式电脑或者服务器等具有视频文件处理功能的装置,本申请各实施例可以由如图1中所述的装置执行,或者由如图1中所述装置的处理器(例如:中央处理器(central processing unit,CPU)或者图形处理器(graphics processing unit,GPU)等执行,本申请实施例中以图1中的装置执行作为示例性的说明。如图1所示,第一装置10与第二装置20之间具有通信连接关系,第一装置10可以通过所述通信连接关系将视频文件30发送给第二装置20,所传输的视频文件30又可根据是否有时效性的要求被划分为实时视频文件和非实时视频文件。例如,当应用于 非实时的传输场景,视频文件可以是一部电影的文件,第一装置10在将这部电影的文件发送给第二装置20之前,可以在T1时刻将文件整体进行压缩等处理后得到数据包40,并将压缩后的数据包40发送给第二装置20;第二装置20在T2时刻接收到的数据包40并进行解压之后,可以得到整部电影的文件,可以理解的是,由于第一装置需要对视频文件中所有视频图像进行压缩,因此T1-T2时刻之间的时间间隔较大;当应用于实时的传输场景,视频文件可以是监控画面、电视画面等具有时效性的文件,第一装置无法获取完整的视频文件中每一帧视频图像后,而是需要对当前最新获取的视频图像进行传输,第一装置10可以在T1时刻对视频图像进行压缩得到数据包40并发送至第二装置20,第二装置20在T2时刻接收并解压数据包40,由于第一装置只需要对一帧视频图像进行压缩,因此T1时刻和T2时刻之间的时间间隔较小,随后第一装置10和第二装置20之间不断重复这个过程,使得第一装置10可以将当前最新得到的监控画面、电视画面等视频文件中的视频图像实时发送给第二装置20。Figure 1 is a schematic diagram of the application scenario of this application. This application is applied to the scenario of video file transmission between different devices. The device described in Figure 1 can be a mobile phone, a tablet computer, a notebook computer, a desktop computer, or a server. For devices with video file processing functions, the embodiments of the present application can be executed by the device as described in Figure 1, or by the processor of the device as described in Figure 1 (for example: central processing unit, CPU ) Or a graphics processing unit (GPU), etc. In the embodiment of the present application, the execution of the device in Fig. 1 is taken as an example. As shown in Fig. 1, the first device 10 and the second device 20 are With a communication connection relationship, the first device 10 can send the video file 30 to the second device 20 through the communication connection relationship, and the transmitted video file 30 can be divided into real-time video files and non-real-time video files according to the requirement of timeliness. Real-time video file. For example, when applied to a non-real-time transmission scene, the video file can be a file of a movie, and the first device 10 can send the file at time T1 before sending the file of this movie to the second device 20 After the overall compression and other processing, the data packet 40 is obtained, and the compressed data packet 40 is sent to the second device 20; after the second device 20 receives the data packet 40 at time T2 and decompresses it, the entire movie can be obtained File, it is understandable that because the first device needs to compress all video images in the video file, the time interval between T1-T2 is relatively large; when applied to real-time transmission scenarios, the video file can be a monitoring screen, For time-sensitive files such as TV screens, the first device cannot obtain each frame of the video image in the complete video file, but needs to transmit the latest currently obtained video image. The first device 10 can transmit the video image at time T1. The data packet 40 is obtained by compression and sent to the second device 20. The second device 20 receives and decompresses the data packet 40 at time T2. Since the first device only needs to compress one frame of video image, between time T1 and time T2 The time interval is small, and then the first device 10 and the second device 20 continue to repeat this process, so that the first device 10 can send the latest video images in video files such as monitoring screens, TV screens, etc., to the first device in real time.二装置20。 Two devices 20.
由于视频文件由连续的视频图像组成,而视频文件中包括的视频图像的数量越来越多,并且每张视频图像本身也具有了较高的分辨率,都极大地提高了视频文件整体的数据量。因此,如图1所示的装置之间利用有限的通信资源传输视频文件时,可以对视频文件进行较高质量的压缩,将较大的视频文件压缩成较小的压缩文件进行传输与存储。而在压缩视频文件的过程中,不仅要减少所传输的视频文件的数据量,还要保证对端能够根据压缩后的文件完整地对视频文件进行恢复。Since the video file is composed of continuous video images, the number of video images included in the video file is increasing, and each video image itself has a higher resolution, which greatly improves the overall data of the video file quantity. Therefore, when video files are transmitted between the devices as shown in FIG. 1 using limited communication resources, the video files can be compressed with higher quality, and larger video files can be compressed into smaller compressed files for transmission and storage. In the process of compressing video files, it is not only necessary to reduce the amount of data of the transmitted video files, but also to ensure that the opposite end can completely restore the video files based on the compressed files.
为了实现对视频文件的压缩,一些技术中提出的H.264、H.265以及H.266等视频压缩协议都可以通过压缩编码的方式,通过将视频文件进行重新编码,实现对视频文件进行压缩处理,例如,图2为一种视频压缩技术的示意图,其中,执行主体可以是如图1所示的第一装置10,在压缩视频文件的过程中,将视频文件中包括的连续视频图像划分为不同的图像包,例如在图2中,将视频文件中每64张视频图像作为一个图像包,得到图像包1-64、图像包65-128、图像包129-256……。随后,对于每一个图像包,将其中部分帧选做关键帧,例如,对于图像包1-64中,可以将第1帧和第64帧作为关键帧,进行整体的压缩编码,对于第2帧-第63帧中的每一帧视频图像,可以作为非关键帧,在这些视频图像进行压缩编码之前,根据视频图像中对象的分布以及边界密集程度等将视频图像划分为不同的图像块,与关键帧中的图像块进行对比。例如,在图2中,第一装置10会先将视频文件中的关键帧进行压缩编码,在后续处理非关键帧时,若当前压缩的非关键帧中的某图像块中包括对象A,与已经压缩编码后的关键帧中包括对象A的图像块相似度较高时,可以认为两帧视频图像中的这两个图像块相似。因此在对当前非关键帧进行压缩时,可以使用已经压缩编码后的关键帧中的图像块表示当前非关键帧中相似的图像块,并只需要对当前压缩的非关键帧中除了可以通过关键帧图像块表示的图像块之外的区域进行压缩编码即可,同样地,对于图像包65-128中的视频图像可以对比出包括对象B的图像块、图像包129-256中的视频图像可以对比出包括对象C的图像块,从而减少对视频文件进行压缩处理时的计算量、提高压缩效率。In order to realize the compression of video files, the video compression protocols such as H.264, H.265 and H.266 proposed in some technologies can all be compressed and encoded, and the video files can be compressed by re-encoding the video files. Processing, for example, FIG. 2 is a schematic diagram of a video compression technology, where the execution subject may be the first device 10 as shown in FIG. Different image packages. For example, in Figure 2, each 64 video images in the video file are taken as an image package to obtain image packages 1-64, image packages 65-128, image packages 129-256... Then, for each image package, select some of the frames as key frames. For example, for the image package 1-64, you can use the first and 64th frames as key frames to perform overall compression encoding. For the second frame -Each video image in the 63rd frame can be used as a non-key frame. Before these video images are compressed and encoded, the video image is divided into different image blocks according to the distribution of objects in the video image and the degree of boundary density. The image blocks in the key frame are compared. For example, in Figure 2, the first device 10 will first compress and encode the key frames in the video file. During the subsequent processing of the non-key frames, if an image block in the currently compressed non-key frames includes object A, and When the similarity of the image blocks including the object A in the compressed and encoded key frame is relatively high, it can be considered that the two image blocks in the two frames of video images are similar. Therefore, when compressing the current non-key frames, the image blocks in the key frames that have been compressed and encoded can be used to represent the similar image blocks in the current non-key frames, and only the current compressed non-key frames can pass the key The area outside the image block represented by the frame image block can be compressed and encoded. Similarly, for the video image in the image pack 65-128, the image block including the object B and the video image in the image pack 129-256 can be compared. Compare the image block including the object C, thereby reducing the amount of calculation when compressing the video file and improving the compression efficiency.
更为具体地,通过视频压缩协议对上述对视频文件中非关键帧进行压缩编码时,需要将视频图像划分为不同的图像块,图3为一种视频图像划分图像块的示意图,如图3中左侧的图像在四个角处尤其是左上角分布的对象数量较多,因此需要处理的对象的边界信息较 多,在右侧示出的对左侧图像划分的图像块中,将分布对象较多的四个角处划分的图像块数量较多,中间对象较少的区域划分的图像块数量较少,使得后续将图3中的非关键帧与已经压缩编码的关键帧进行对比时,能够将非关键帧中对象分布较多处划分为更小的图像块与关键帧进行比对,由于图像块越小包括的边界信息就越精准,能够更加精准地将图像块所在位置对应的图像与关键帧进行比较,从而在与关键帧中的图像块进行比较时可以达到更高的准确率。More specifically, when compressing and encoding the aforementioned non-key frames in a video file through a video compression protocol, the video image needs to be divided into different image blocks. Figure 3 is a schematic diagram of a video image divided into image blocks, as shown in Figure 3. The image on the left in the center has a large number of objects distributed at the four corners, especially the upper left corner, so the boundary information of the objects to be processed is more. The four corners with more objects have a larger number of image blocks, and the area with fewer objects in the middle has a smaller number of image blocks, so that the subsequent comparison between the non-key frames in Figure 3 and the key frames that have been compressed and encoded , It is possible to divide the more distributed objects in the non-key frame into smaller image blocks and compare them with the key frames. Since the smaller the image block, the more accurate the boundary information included, and the more accurate the corresponding position of the image block The image is compared with the key frame, so that a higher accuracy can be achieved when comparing with the image block in the key frame.
然而,在上述对视频图像划分为不同图像块的过程中,对于视频图像中对象分布较为密集、边界较多的区域就需要设置更密集的图像块进行识别与比对,使得在对视频文件中的每一帧图像进行压缩时,都需要处理较多数量的图像块,从而降低了每一帧视频图像在压缩时的效率,进而降低了对视频文件进行压缩时的效率。同时,上述技术中由于将视频文件划分为不同的图像包,而只针对图像包内的视频图像进行图像块的比对,缺乏整体的识别与比对,若一个对象在整个视频文件中的每一帧视频图像都存在,还是会在每个图像包中对该对象进行反复识别与比对,同样降低了视频文件在进行压缩时的效率。However, in the above process of dividing the video image into different image blocks, it is necessary to set up denser image blocks for identification and comparison of areas with denser object distribution and more boundaries in the video image, so that in the video file When each frame of image is compressed, a larger number of image blocks need to be processed, which reduces the efficiency of each frame of video image during compression, thereby reducing the efficiency of compressing video files. At the same time, in the above technology, because the video file is divided into different image packages, the image block comparison is only performed on the video images in the image package, and there is a lack of overall recognition and comparison. If an object is in each of the entire video files If a frame of video image exists, the object will still be repeatedly identified and compared in each image package, which also reduces the efficiency of video file compression.
因此,本申请提供一种视频图像处理方法及装置,用于视频文件压缩过程,针对视频文件中的视频图像所包括的满足一定预设条件的对象单独进行提取、比对以及压缩,使得在对视频文件进行压缩时,可以只对部分对象进行一次压缩之后,在每一帧视频图像进行压缩时只需要对除这部分对象之外的区域进行处理,从而提高每一帧视频图像在压缩时的效率,进而提高对视频文件进行压缩的效率。Therefore, this application provides a video image processing method and device, which are used in the video file compression process to separately extract, compare, and compress objects included in the video image in the video file that meet certain preset conditions, so that the When a video file is compressed, after compressing only part of the object once, when each frame of the video image is compressed, only the area except this part of the object needs to be processed, thereby improving the compression of each frame of video image. Efficiency, thereby improving the efficiency of compressing video files.
下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solution of the present application will be described in detail below with specific embodiments. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
图4为本申请提供的视频图像处理方法一实施例的流程示意图,如图4所示的方法可应用于如图1所示的场景中的实时传输场景或者非实时传输场景中,由第一装置和第二装置执行。具体地,本实施例提供的视频图像处理方法包括:FIG. 4 is a schematic flowchart of an embodiment of a video image processing method provided by this application. The method shown in FIG. 4 can be applied to the real-time transmission scene or the non-real-time transmission scene in the scene shown in FIG. The device and the second device execute. Specifically, the video image processing method provided in this embodiment includes:
S101:第一装置获取第一视频图像。S101: The first device acquires a first video image.
首先,作为发送视频文件的第一装置,在S101中获取将要发送的第一视频图像。所述第一视频图像可以是第一装置向第二装置发送的非实时的视频文件中的一帧视频图像,或者,所述第一视频图像还可以是第一装置向第二装置发送的实时视频图像。First, as the first device for sending a video file, a first video image to be sent is acquired in S101. The first video image may be a frame of video image in a non-real-time video file sent by the first device to the second device, or the first video image may also be a real-time video image sent by the first device to the second device. Video image.
S102:第一装置确定第一视频图像中的目标区域;所述目标区域中包括第一装置的第一数据库中存储的符合预设条件的对象的图像;S102: The first device determines a target area in the first video image; the target area includes an image of an object that meets a preset condition stored in a first database of the first device;
随后,第一装置对第一视频图像中的目标区域进行识别,其中,所述目标区域是包括第一数据库中存储的对象的图像,第一数据库中存储的对象都满足预设条件。例如,图5为本申请提供的数据库设置方式示意图,其中,第一装置中设置有第一数据库,第二装置设置有第二数据库。第一装置中的第一数据库可用于存储满足预设条件的对象的图像。Subsequently, the first device recognizes a target area in the first video image, where the target area is an image including objects stored in the first database, and the objects stored in the first database all satisfy a preset condition. For example, FIG. 5 is a schematic diagram of a database setting method provided by this application, in which a first database is provided in the first device, and a second database is provided in the second device. The first database in the first device may be used to store images of objects meeting preset conditions.
可选地,当应用在非实时传输场景中,所述预设条件可以是第一视频图像所在的视频文件的所有视频图像中,包括对象的视频图像的数量大于或等于预设数量;当应用在实时传输场景中,所述预设条件可以是第一视频图像之前的N张视频图像中,包括对象的视频图像的数量大于或等于M,M和N均为正整数,N>1,M<N。Optionally, when applied in a non-real-time transmission scenario, the preset condition may be that among all the video images of the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number; In a real-time transmission scenario, the preset condition may be that among the N video images before the first video image, the number of video images including the object is greater than or equal to M, M and N are both positive integers, and N>1, M <N.
图6为本申请提供的一种视频图像中对象的示意图,其中,以图6左侧示出的第一视频图像作为示例,在第一视频图像中除背景之外,至少包括电动车a、骑车人b、路人c、车辆 d和路人e,这些对象可以是视频图像中可能移动的物体,除了上述物体的其他部分属于静态的物体,可以认为是视频图像的背景。假设第一装置的第一数据库中存储有电动车a和骑车人b这两个对象的图像,则第一装置在本步骤中可以确定第一视频图像中的目标区域为图中电动车a和骑车人b这两个对象的图像所在的区域。Fig. 6 is a schematic diagram of an object in a video image provided by this application. Taking the first video image shown on the left side of Fig. 6 as an example, the first video image includes at least electric vehicles a, Cyclist b, passer-by c, vehicle d, and passer-by e, these objects can be objects that may move in the video image, except for the above objects, other parts are static objects, which can be regarded as the background of the video image. Assuming that the first database of the first device stores images of two objects, electric vehicle a and cyclist b, the first device can determine in this step that the target area in the first video image is electric vehicle a in the figure. The area where the images of the two objects and the cyclist b are located.
S103:对第一视频图像中除目标区域之外的区域进行压缩,得到第二压缩数据。S103: Compress an area other than the target area in the first video image to obtain second compressed data.
第一装置在S102确定出第一视频图像中的目标区域后,仅对第一视频图像中除目标区域之外的区域进行压缩,得到的压缩数据记为第二压缩数据。其中,对第一视频图像进行压缩的方式不作限定,可以通过压缩编码的方式进行。After determining the target area in the first video image in S102, the first device compresses only the area in the first video image excluding the target area, and the obtained compressed data is recorded as the second compressed data. Wherein, the method for compressing the first video image is not limited, and it may be performed by compression coding.
S104:第一装置向第二装置发送第二压缩数据。S104: The first device sends the second compressed data to the second device.
具体地,第一装置将S103中得到的第二压缩数据发送给第二装置,对于第二装置则接收第一装置发送的第二压缩数据。Specifically, the first device sends the second compressed data obtained in S103 to the second device, and for the second device, receives the second compressed data sent by the first device.
可选地,为了让第二装置能够确定第二压缩数据是第一装置对目标区域之外的区域进行了压缩后得到的,第一装置还可以在S104中向第二装置发送第一视频图像中目标区域的标记信息,使得第二装置可以根据目标区域的标记信息确定第一视频图像中的目标区域。Optionally, in order for the second device to determine that the second compressed data is obtained by the first device after compressing the area outside the target area, the first device may also send the first video image to the second device in S104 The mark information of the target area in the middle, so that the second device can determine the target area in the first video image according to the mark information of the target area.
S105:第二装置对第二压缩数据进行解压缩,得到第三视频图像。S105: The second device decompresses the second compressed data to obtain a third video image.
具体地,第二装置在接收到第一装置发送的第二压缩数据之后,对第二压缩数据进行解压缩,得到的视频图像记为第三视频图像。第三视频图像是第一装置所发送的除了目标区域之外的区域对应的图像。Specifically, after receiving the second compressed data sent by the first device, the second device decompresses the second compressed data, and the obtained video image is recorded as the third video image. The third video image is an image corresponding to an area other than the target area sent by the first device.
S106:第二装置从第二数据库中获取目标区域对应的图像。S106: The second device obtains an image corresponding to the target area from the second database.
具体地,如图5所示的设置方式中,第二装置中设置第二数据库,第二数据库中可用于存储满足预设条件的对象的图像,第二数据库可以存储第一数据库中相同的对象的图像。第二数据库中存储的对象的图像可以是预置的,可以是提前存储的,也可以是第一装置根据第一数据库实时发送给第二装置并存储在第二数据库中的。Specifically, in the setting method shown in FIG. 5, a second database is set in the second device, and the second database can be used to store images of objects that meet preset conditions, and the second database can store the same objects in the first database. Image. The image of the object stored in the second database may be preset, may be stored in advance, or may be sent by the first device to the second device in real time according to the first database and stored in the second database.
可选地,第二装置可以根据第二压缩图像中目标区域的标记信息,从第二数据库中获取目标区域对应的图像。Optionally, the second device may obtain the image corresponding to the target area from the second database according to the marking information of the target area in the second compressed image.
S107:第二装置根据S105确定的第三视频图像和S106中获取的目标区域的图像,确定出第一视频图像,最终完成第一装置向第二装置发送第一视频图像的过程。S107: The second device determines the first video image according to the third video image determined in S105 and the image of the target area acquired in S106, and finally completes the process of sending the first video image from the first device to the second device.
示例性地,当第一视频图像中的目标区域为如图6所示的图中电动车a和骑车人b这两个对象的图像,在S105中第二装置解压得到的第三视频图像是图6占用不包括电动车a和骑车人b这两个对象的区域、在S106中所获取的目标区域的图像是电动车a和骑车人b这两个对象的图像,则在S107中,第二装置可以将电动车a和骑车人b这两个对象的图像加入到第三视频图像中对应的位置后,得到完整的第一视频图像。Exemplarily, when the target area in the first video image is the image of the two objects of electric vehicle a and cyclist b in the figure shown in FIG. 6, in S105, the third video image obtained by decompression by the second device Figure 6 occupies the area that does not include the two objects of the electric vehicle a and the cyclist b. The image of the target area obtained in S106 is the image of the two objects of the electric vehicle a and the cyclist b, then in S107 In the third video image, the second device can add the images of the two objects of the electric vehicle a and the cyclist b to the corresponding positions in the third video image to obtain a complete first video image.
需要说明的是,如图4所示的实施例中,当应用在实时传输场景中,所述第一视频图像可以是第一装置所获取的实时视频图像。当应用在非实时传输场景中,所述第一视频图像可以理解为视频文件中的任一张,图4中S101-S103示出了对视频文件中的任一张视频图像的处理方式,而在S104中,第一装置会将视频文件中每一帧视频图像经过S101-S103的处理后,将所有视频图像的第二压缩数据共同发送给第二装置,由第二装置通过S105-S107对视频文件中每一帧视频图像进行处理,最终得到视频文件中的所有视频图像。It should be noted that, in the embodiment shown in FIG. 4, when applied in a real-time transmission scenario, the first video image may be a real-time video image acquired by the first device. When applied in a non-real-time transmission scenario, the first video image can be understood as any one of the video files. S101-S103 in Figure 4 show the processing methods for any one of the video files, and In S104, the first device will process each frame of video image in the video file through S101-S103, and then send the second compressed data of all the video images to the second device. The second device uses S105-S107 to pair Each frame of video image in the video file is processed, and finally all the video images in the video file are obtained.
综上,本实施例提供的视频图像处理方法中,第一装置在向爹人装置发送第一视频图 像之前,首先识别第一视频图像中包括的第一数据库中对象的目标区域,并对第一视频图像中除目标区域之外的区域进行压缩得到第二压缩数据并发送给第二装置,而对于存储在第一装置的第一数据库中符合预设条件的对象的图像,第二装置的第二数据库中已经存储,使得第二装置在接收到第二压缩数据之后,可以根据第二压缩数据解压缩得到的第三视频图像,结合第二压缩数据中存储的目标区域的图像,最终得到第一图像。因此,通过本实施例提供的视频图像处理方法,在第一装置向第二装置发送第一视频图像时,对于第一装置不需要对视频图像中经常出现的对象的目标区域反复进行压缩,减少了压缩的数据量;对于第二装置不需要对目标区域反复进行解压缩,减少了解压缩的数据量。从而减少了第一装置和第二装置之间传输的压缩包的数据量,因此提高了视频图像处理的效率。To sum up, in the video image processing method provided in this embodiment, before sending the first video image to the father’s device, the first device first recognizes the target area of the object in the first database included in the first video image, and compares the first video image. The area other than the target area in a video image is compressed to obtain the second compressed data and sent to the second device. For the image of the object that meets the preset conditions stored in the first database of the first device, the second device’s It has been stored in the second database, so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data and combine it with the image of the target area stored in the second compressed data to finally obtain The first image. Therefore, with the video image processing method provided by this embodiment, when the first device sends the first video image to the second device, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image, reducing This reduces the amount of compressed data; for the second device, there is no need to repeatedly decompress the target area, reducing the amount of decompressed data. Thereby, the data volume of the compressed packet transmitted between the first device and the second device is reduced, thereby improving the efficiency of video image processing.
可选地,在如图4-6所示实施例中,第二装置中的第二数据库存储的对象的图像可以是第一装置发送给第二装置进行存储的,具体地,图7为本申请提供的视频图像处理方法另一实施例的流程示意图,如图7所示的方法可应用于如图1所示的场景中的实时传输场景或者非实时传输场景中,由第一装置和第二装置执行,并且,如图7所示的方法在如图4所示的方法S101之前执行。具体地,本实施例提供的视频图像处理方法包括:Optionally, in the embodiment shown in FIGS. 4-6, the image of the object stored in the second database in the second device may be sent by the first device to the second device for storage. Specifically, FIG. 7 is A schematic flowchart of another embodiment of the video image processing method provided by the application. The method shown in FIG. 7 can be applied to the real-time transmission scene or the non-real-time transmission scene in the scene shown in FIG. The two devices execute it, and the method shown in FIG. 7 is executed before the method S101 shown in FIG. 4. Specifically, the video image processing method provided in this embodiment includes:
S201:第一装置对第一数据库中存储的对象的图像进行压缩,得到第一压缩数据。S201: The first device compresses the image of the object stored in the first database to obtain first compressed data.
具体地,第一装置可以在确定满足预设条件的对象的图像,并存入第一数据库后,对第一数据库整体进行压缩,得到的压缩数据记为第一压缩数据。例如,假设第一数据库中存储有如图6所示的电动车a和骑车人b这两个对象的图像,则在S201中,第一装置对第一数据库中Specifically, the first device may compress the entire first database after determining the image of the object that meets the preset condition and storing it in the first database, and the obtained compressed data is recorded as the first compressed data. For example, assuming that the images of the two objects of electric vehicle a and cyclist b as shown in FIG. 6 are stored in the first database, in S201, the first device compares the images in the first database
S202:第一装置将第一压缩数据发送给第二装置。S202: The first device sends the first compressed data to the second device.
具体地,第一装置将S201中得到的第一压缩数据发送给第二装置,对于第二装置则接收第一装置发送的第一压缩数据。Specifically, the first device sends the first compressed data obtained in S201 to the second device, and for the second device, receives the first compressed data sent by the first device.
S203:第二装置对第一压缩数据进行解压缩,得到对应的图像集存入第二数据库中。S203: The second device decompresses the first compressed data to obtain a corresponding image set and store it in the second database.
具体地,第二装置在接收到第一装置发送的第一压缩数据之后,对第一压缩数据进行解压缩,得到的包括多个对象的图像记为图像集,并将图像集存入第二数据库中,以备后续执行如图4所示实施例时使用。Specifically, after receiving the first compressed data sent by the first device, the second device decompresses the first compressed data, and the obtained image including multiple objects is recorded as an image set, and the image set is stored in the second In the database, it will be used when the embodiment shown in FIG. 4 is subsequently executed.
综上,本实施例基于第一装置可以向第二装置发送第一数据库中的对象的图像,使得第一装置可以只对第一数据库中包括的对象的图像进行一次压缩后,得到的第一压缩数据发送给第二装置,使得第二装置解压缩第一压缩数据确定第二数据库,使得后续第一装置在处理的视频图像时只需要对目标区域之外的区域进行压缩即可,从而减少了第一装置对视频图像进行压缩的图像大小和次数,也减少了第一装置对视频图像进行解压缩的大小和次数,进一步提高了视频图像的处理效率。In summary, this embodiment is based on the fact that the first device can send the image of the object in the first database to the second device, so that the first device can only compress the image of the object included in the first database once to obtain the first The compressed data is sent to the second device, so that the second device decompresses the first compressed data to determine the second database, so that subsequent video images processed by the first device only need to compress the area outside the target area, thereby reducing The size and number of times the video image is compressed by the first device is reduced, and the size and number of times the video image is decompressed by the first device is also reduced, and the processing efficiency of the video image is further improved.
下面结合具体的实施例,分别对本申请上述视频图像处理方法应用在实时传输场景和非实时传输场景中的具体实现方式进行说明。下面的具体实施例可以独立于前述如图4-7所示的实施例,由第一装置和第二装置执行;或者,可以在前述如图4-7所示的第一装置和第二装置所执行的实施例的基础上执行。In the following, in conjunction with specific embodiments, specific implementation manners of applying the above-mentioned video image processing method of the present application to real-time transmission scenarios and non-real-time transmission scenarios are respectively described. The following specific embodiments can be performed by the first device and the second device independently of the embodiments shown in Figs. 4-7; Executed on the basis of the executed embodiment.
一、非实时传输场景。1. Non-real-time transmission scenarios.
当本申请实施例提供的视频图像处理方法应用于如图1所示的场景中对非实时性的视频文件进行传输时,第一装置10可以将一段完整的视频文件整体发送至第二装置20,第一 装置10首先可以对视频文件进行压缩处理,得到数据量较小的压缩包后,向第二装置20发送数据量较小的压缩包,以节省通信资源。其中,图8为本申请提供的视频图像处理方法实施例的一示例性的流程示意图,示出了对视频文件整体进行压缩时的处理流程。When the video image processing method provided by the embodiments of the present application is applied to the scene shown in FIG. 1 to transmit non-real-time video files, the first device 10 may send a complete video file as a whole to the second device 20 The first device 10 may first compress the video file, and after obtaining a compressed package with a smaller amount of data, send the compressed package with a smaller amount of data to the second device 20 to save communication resources. Among them, FIG. 8 is a schematic flow chart of an exemplary embodiment of the video image processing method provided by this application, and shows the processing flow when the entire video file is compressed.
在如图8所示的实施例中,作为执行主体的第一装置10首先获取待处理的视频文件101,其中,所述待处理的视频文件101中具体包括N张连续的视频图像,N大于1,并按照这N张视频图像在视频文件101中的先后顺序,将视频文件101中的视频图像依次标记为1、2……N,所述标号又可被称为帧号,视频文件中包括的视频图像的数量又可被称为帧数。可选地,所述待处理的视频文件101可以是第一装置10的用户指定的、或者是第一装置10所拍摄的、或者是第一装置10通过互联网获取的,当第一装置10获取待处理的视频文件101之后即可通过如图4所示的实施例进行压缩,或者,在第一装置10确定将要把视频文件101发送至第二装置20之前,通过如图4所示的实施例进行压缩。In the embodiment shown in FIG. 8, the first device 10 as the execution subject first obtains the to-be-processed video file 101, where the to-be-processed video file 101 specifically includes N continuous video images, and N is greater than 1. According to the sequence of the N video images in the video file 101, mark the video images in the video file 101 as 1, 2...N. The label can also be called the frame number. In the video file The number of video images included can also be referred to as the number of frames. Optionally, the to-be-processed video file 101 may be specified by the user of the first device 10, or captured by the first device 10, or acquired by the first device 10 through the Internet. When the first device 10 acquires The video file 101 to be processed can then be compressed through the embodiment shown in FIG. 4, or, before the first device 10 determines that the video file 101 is to be sent to the second device 20, through the implementation shown in FIG. Examples are compressed.
当获取视频文件101之后,第一装置10首先通过第一机器学习模型102对视频文件101中所有N张视频图像内所包括的对象进行识别,并确定在N张视频图像中包括的符合预设条件的至少一个对象。其中,所述对象包括视频图像中除背景之外的物体,同样以如图6所示的视频图像作为示例,则本实施例中,第一装置10中设置的第一机器学习模型102对左侧视频图像进行识别后,可以得到图6右侧的识别结果,识别出视频图像中的对象a-e。而当第一机器学习模型102对整个视频文件中所有的N张视频图像进行识别后,可以识别出N张视频图像中每一张视频图像中包括的对象。随后,第一装置10可以根据第一机器学习102模型的识别结果,对所有N张视频图像中的所有对象共同进行筛选,将N张视频图像中满足预设条件的对象图像存入第一装置10中设置的第一数据库103中。可选地,N张视频图像中可以有多张视频图像中都包括了满足预设条件的对象的图像,而当多张视频图像中包括同一对象且分辨率相同时,可以将任一张视频图像中的对象的图像存入第一数据库中,当多张视频图像中都包括同一对象且分辨率不同时,由于分辨率越高清晰度越高,因此可以将多张视频图像中分辨率最高的对象的图像存入第一数据库中。After acquiring the video file 101, the first device 10 first uses the first machine learning model 102 to identify the objects included in all N video images in the video file 101, and determines that the N video images included in the N video images conform to the preset At least one object of the condition. Wherein, the object includes objects other than the background in the video image. Taking the video image as shown in FIG. 6 as an example, in this embodiment, the first machine learning model 102 set in the first device 10 is opposite to the left After the side video image is recognized, the recognition result on the right side of Fig. 6 can be obtained, and the object ae in the video image can be recognized. After the first machine learning model 102 recognizes all the N video images in the entire video file, it can identify the objects included in each of the N video images. Subsequently, the first device 10 can screen all objects in all N video images together according to the recognition result of the first machine learning 102 model, and store the object images that meet the preset conditions among the N video images in the first device 10 set in the first database 103. Optionally, there may be multiple video images in N video images that all include images of objects that meet preset conditions, and when multiple video images include the same object and have the same resolution, any one of the video images can be The image of the object in the image is stored in the first database. When multiple video images include the same object and have different resolutions, the higher the resolution, the higher the definition, so the highest resolution of the multiple video images can be The image of the object is stored in the first database.
可选地,本实施例中所提供的第一机器学习模型102可以是卷积神经网络(convolutional neural networks,CNN)类的神经网络模型,例如:AlexNet、ResNet或者Inception v3等能够应用于图像中对象识别的模型。Optionally, the first machine learning model 102 provided in this embodiment may be a convolutional neural network (convolutional neural networks, CNN) neural network model, for example: AlexNet, ResNet, or Inception v3 can be applied to images Object recognition model.
可选地,本实施例中所述的预设条件可以是对象在整个视频文件的N张视频图像中出现的次数大于预设次数,例如,预设次数可以取10(N>10),则当第一机器学习模型102识别出视频文件的N张视频图像中,包括对象路人c的视频图像大于10张,也即路人c在整个视频文件的N张视频图像中出现的次数大于10次,则可以将路人c的图像存入第一数据库103中。同样的原理,第一装置10可以根据第一机器学习模型的识别结果,将视频文件的N张视频图像中所有包括的满足预设条件的对象的图像存入第一数据库103中。Optionally, the preset condition described in this embodiment may be that the number of times the object appears in the N video images of the entire video file is greater than the preset number of times. For example, the preset number of times may be 10 (N>10), then When the first machine learning model 102 recognizes that among the N video images of the video file, there are more than 10 video images including the target passerby c, that is, the passerby c appears more than 10 times in the N video images of the entire video file, Then the image of passer-by c can be stored in the first database 103. By the same principle, the first device 10 can store all the images of the objects that meet the preset conditions included in the N video images of the video file in the first database 103 according to the recognition result of the first machine learning model.
可选地,在一种实现方式中,第一数据库103中所存储的可以是不同对象的图像,这些图像的边界就是对象的边界,图像中除了对象之外没有背景等其他信息,例如,对于图5中的对象路人c,在第一数据库103中存储的就是图5左侧视频图像中以路人c边界所划分的图像,不包括除路人c外的其他对象或背景。或者,在另一种实现方式中,由于第一装置10是视频文件的整体进行压缩,第一数据库103可以仅存储对象的图像所在的视频文件中某一视频图像的帧号和对象的目标的边界像素位置信息,后续可以通过帧号和边界像素位置 获取对象的图像。例如,在识别出视频文件满足预设条件的对象后,确定视频文件中第10帧视频图像的左上角处包括该对象的图像,则第一数据库103中可以存储例如“10,(a,b,c……)”形式的数据,用于表示对象的图像在视频文件中第10帧、以及该第10帧视频图像中的边界为(a,b,c……)的像素位置。Optionally, in an implementation manner, the images of different objects stored in the first database 103 may be stored in the first database 103. The boundaries of these images are the boundaries of the objects, and there is no other information such as the background other than the objects. For example, for The object passer-by c in FIG. 5 is stored in the first database 103 as the image divided by the border of passer-by c in the video image on the left side of FIG. 5, and does not include other objects or backgrounds except passer-by c. Or, in another implementation manner, since the first device 10 compresses the entire video file, the first database 103 may only store the frame number of a certain video image and the target number of the object in the video file where the image of the object is located. The position information of the boundary pixels, the image of the object can be obtained by the frame number and the position of the boundary pixels later. For example, after identifying an object in a video file that meets preset conditions, it is determined that the image of the object is included in the upper left corner of the 10th frame of the video image in the video file, and the first database 103 may store, for example, "10, (a, b , C...)" format data, used to represent the 10th frame of the object image in the video file, and the pixel position where the boundary of the 10th frame of the video image is (a, b, c...).
随后,第一装置10在将视频文件的N张视频图像中所有满足预设条件的对象存入第一数据库103之后,进一步通过第二机器学习模型105依次对视频文件的N张视频图像进行处理,其中,将第二机器学习模型正在处理的视频图像记为第一视频图像,第二机器学习模型105可以将第一数据库103中已经存储的对象的图像与第一视频图像进行比较,确定出正处理的第一视频图像中包括的至少一个对象、以及每个对象的图像所在的区域,将至少一个对象所在的区域记为目标区域。示例性地,假设在图4所示的示例中,第一数据库103中存储有标号为A-Z的共26个对象的图像,则第二机器学习模型获取到视频文件中的第一视频图像后,将第一视频图像与第一数据库中的图像进行比对,确定出当前所处理的第一视频图像中包括第一数据库里标号为A和B的对象,则将第一视频图像中标号为A和B的对象所在的区域记为A目标区域和B目标区域。Subsequently, the first device 10 stores all the objects that meet the preset conditions in the N video images of the video file in the first database 103, and then further processes the N video images of the video file in turn through the second machine learning model 105 , Where the video image being processed by the second machine learning model is recorded as the first video image, and the second machine learning model 105 can compare the image of the object already stored in the first database 103 with the first video image to determine At least one object included in the first video image being processed and the area where the image of each object is located, the area where the at least one object is located is recorded as the target area. Illustratively, assuming that in the example shown in FIG. 4, the first database 103 stores images of a total of 26 objects labeled AZ, then after the second machine learning model obtains the first video image in the video file, The first video image is compared with the image in the first database, and it is determined that the first video image currently processed includes the objects labeled A and B in the first database, then the first video image is labeled A The area where the object of B and B is located is marked as the A target area and the B target area.
可选地,本申请各实施例中所述的第二机器学习模型105与第一机器学习模型102可以是相同的机器学习模型,例如也可以是CNN类的神经网络模型;或者还可以是不同的机器学习模型,其中第二机器学习模型105与第一机器学习模型102的不同在于,第二机器学习模型105中有第一数据库中已识别的对象的图像作为先验信息,因此,第二机器学习模型105进行的识别可以理解为图像的比对、第一机器学习模型102进行的识别是从一个新的视频图像中提取对象,由于第二机器学习模型105的计算量更少,因此可以设置为更加轻量级的模型,使得第一装置10中设置两个机器学习模型分别进行识别和比对后,能够提高整体的处理效率。Optionally, the second machine learning model 105 and the first machine learning model 102 described in the embodiments of the present application may be the same machine learning model, for example, may also be a CNN-type neural network model; or may also be different The difference between the second machine learning model 105 and the first machine learning model 102 is that the second machine learning model 105 has the image of the identified object in the first database as the prior information. Therefore, the second machine learning model 105 The recognition performed by the machine learning model 105 can be understood as the comparison of images. The recognition performed by the first machine learning model 102 is to extract objects from a new video image. Since the second machine learning model 105 requires less calculation, it can It is set to a more lightweight model, so that after two machine learning models are set in the first device 10 for identification and comparison, the overall processing efficiency can be improved.
可以理解的是,在图4所示的示例中,第二机器学习模型105会依次将视频文件101中的每张视频图像作为上述第一视频图像,分别都与第一数据库103中的对象进行比对,确定每张视频图像中的目标区域,随后,第一装置10对N张视频图像中除目标区域之外的区域进行压缩,得到的视频文件101中N张视频图像的压缩数据,记为第二压缩数据106。可以理解的是,第二压缩数据106中虽然包括了N张视频图像的压缩数据,但是对于N张视频图像中包括目标区域的每一张视频图像,所包括的目标区域都被“裁剪”而只保留了除目标区域之外其他部分。使得第二压缩数据106中被压缩的视频图像并不是完整的,每张视频图像中都缺少包括第一数据库中对象的图像的目标区域。It is understandable that, in the example shown in FIG. 4, the second machine learning model 105 will sequentially use each video image in the video file 101 as the above-mentioned first video image, and each of them will be compared with the objects in the first database 103. The comparison is performed to determine the target area in each video image. Then, the first device 10 compresses the area other than the target area in the N video images, and obtains the compressed data of the N video images in the video file 101, which is recorded This is the second compressed data 106. It is understandable that although the second compressed data 106 includes compressed data of N video images, for each video image that includes the target area in the N video images, the included target area is "cropped". Only the other parts except the target area are kept. As a result, the compressed video image in the second compressed data 106 is not complete, and each video image lacks a target area including the image of the object in the first database.
特别地,本实施例中由于所生成的第二压缩数据106中的部分视频图像中缺少目标区域,相当于对视频图像包括的第一数据库中的对象的图像进行了“裁剪”,而为了标识所“裁剪”的具体是第一数据库中的哪个对象的图像以及对象在视频图像中的位置,在生成第二压缩数据106时,第一装置10还会对每张包括的至少一个目标区域的视频图像进行标记,这些标记可以携带在对应的视频图像中,使得后续在解压视频图像时,确定视频图像中目标区域的位置等信息。In particular, in this embodiment, because the generated second compressed data 106 lacks the target area in the partial video image, it is equivalent to "cropping" the image of the object in the first database included in the video image, and in order to identify The “cutting” specifically refers to which image of the object in the first database and the position of the object in the video image. When generating the second compressed data 106, the first device 10 will also determine at least one target area included in each image. The video image is marked, and these marks can be carried in the corresponding video image, so that when the video image is subsequently decompressed, information such as the location of the target area in the video image is determined.
可选地,本实施例中,标记的内容至少包括:第一视频图像中目标区域的位置信息、变换信息和目标区域包括的对象在第一数据库中的标识信息中的至少一项。所述变换信息用于标识目标区域中的对象在第一数据库中的图像与第一视频图像之间的区别。例如,在 如图4所示的示例中,可以通过标识信息字母A-Z对第一数据库103中存储的对象进行标识,假设字母A对应的对象的是一个行人,第一数据库中存储该行人的图像分辨率为128*128。第一装置10当前通过第二机器学习模型105识别的视频图像中的左上角的目标区域包括了第一数据库103中存储的字母A对应的对象,则第一装置10在对上述视频图像进行压缩时,图中的目标区域中包括的对象在第一数据库103中的标识信息为“A”,目标区域在第一视频图像的位置信息包括目标区域在视频图像中的一周边界的像素位置,同时,若视频图像中目标区域的分辨率为64*64,相当于对第一数据库103中所存储的分辨率为128*128的图像进行了缩小处理,因此可以对该目标区域标记“缩小一倍”的变换信息,此外,所述变换信息还可以包括旋转、拉伸等变换。Optionally, in this embodiment, the marked content at least includes at least one of the location information of the target area in the first video image, transformation information, and identification information of the object included in the target area in the first database. The transformation information is used to identify the difference between the image in the first database and the first video image of the object in the target area. For example, in the example shown in FIG. 4, the object stored in the first database 103 can be identified by the identification information letter AZ. Assuming that the object corresponding to the letter A is a pedestrian, the image of the pedestrian is stored in the first database. The resolution is 128*128. The first device 10 currently recognizes the target area in the upper left corner of the video image by the second machine learning model 105 including the object corresponding to the letter A stored in the first database 103, then the first device 10 is compressing the video image. When the identification information of the object included in the target area in the figure in the first database 103 is "A", the position information of the target area in the first video image includes the pixel position of the target area in the peripheral boundary of the video image. , If the resolution of the target area in the video image is 64*64, it is equivalent to reducing the image with the resolution of 128*128 stored in the first database 103, so the target area can be marked "reduced by one time" In addition, the transformation information may also include transformations such as rotation and stretching.
相应地,由于第一数据库103中存储了N张视频图像中满足预设条件的对象的图像,也就是说,第二压缩数据106中不完整的视频图像中被“裁剪”的部分都存储在第一数据库103中。因此,第一装置10在对视频文件进行压缩时,还可以将第一数据库103中所存储的对象的图像进行压缩,得到第一压缩数据104。Correspondingly, since the first database 103 stores the images of the objects that meet the preset conditions among the N video images, that is to say, the "cropped" parts of the incomplete video images in the second compressed data 106 are all stored in The first database 103. Therefore, when the first device 10 compresses the video file, it can also compress the image of the object stored in the first database 103 to obtain the first compressed data 104.
在本实施例中,第一装置10得到第一压缩数据104的步骤可以在确定第一数据库103之后的任一时间执行,与第一装置10得到第二压缩数据106的步骤相互独立、可以不限定先后顺序。In this embodiment, the step of obtaining the first compressed data 104 by the first device 10 can be performed at any time after the first database 103 is determined, and is independent of the step of obtaining the second compressed data 106 by the first device 10, and may not Limit the order.
最终,在一种实现方式中,第一压缩数据104和第二压缩数据106可以作为视频文件101完成压缩后的压缩文件,第一装置10在生成第一压缩数据104和第二压缩数据106后,向第二装置20发送第一压缩数据和第二压缩数据;或者,在另一种实现方式中,当第一装置10得到第一压缩数据104和第二压缩数据106之后,可以将两个压缩数据组合成最终的第三压缩数据107,作为整个视频文件101完成压缩后的压缩文件,第一装置10生成第三压缩数据107之后,即可向第二装置20发送第三压缩数据107,本实施例对发送方式不做限定。Finally, in an implementation manner, the first compressed data 104 and the second compressed data 106 can be used as compressed files after the video file 101 is compressed. The first device 10 generates the first compressed data 104 and the second compressed data 106. , Send the first compressed data and the second compressed data to the second device 20; or, in another implementation manner, after the first device 10 obtains the first compressed data 104 and the second compressed data 106, the two The compressed data is combined into the final third compressed data 107, as a compressed file after the entire video file 101 is compressed. After the first device 10 generates the third compressed data 107, the third compressed data 107 can be sent to the second device 20. This embodiment does not limit the sending mode.
可选地,本实施例中第一装置10得到第一压缩数据104和得到第二压缩数据105的具体数据压缩处理方式可以相同或不同,例如都可以采用H.264或者H.265等视频压缩协议进行压缩。Optionally, the specific data compression processing methods of the first device 10 to obtain the first compressed data 104 and the second compressed data 105 in this embodiment can be the same or different, for example, both can use video compression such as H.264 or H.265. The protocol is compressed.
随后,当第二装置20接收到第一装置10通过如图8所示实施例中的方式发送的压缩数据后,可以对压缩数据进行解压缩处理,得到完整的视频文件。具体地,图9为本申请提供的视频图像处理方法实施例的一示例性的流程示意图,具体的,示出了对压缩包进行解压得到视频文件的处理流程。Subsequently, after the second device 20 receives the compressed data sent by the first device 10 in the manner shown in the embodiment shown in FIG. 8, it can perform decompression processing on the compressed data to obtain a complete video file. Specifically, FIG. 9 is an exemplary flow chart of an embodiment of a video image processing method provided by this application, and specifically shows a processing flow of decompressing a compressed package to obtain a video file.
在图9所示的实施例中,作为执行主体的第二装置20首先接收第一装置10发送的第三压缩数据107,并根据第三压缩数据107得到第一压缩数据104和第二压缩数据106。或者,第二装置20可以直接接收第一装置10发送的第一压缩数据104和第二压缩数据106。In the embodiment shown in FIG. 9, the second device 20 as the execution subject first receives the third compressed data 107 sent by the first device 10, and obtains the first compressed data 104 and the second compressed data according to the third compressed data 107 106. Alternatively, the second device 20 may directly receive the first compressed data 104 and the second compressed data 106 sent by the first device 10.
随后,第二装置20可以分别对第一压缩数据104和第二压缩数据106进行解压缩,其中,解压第一压缩数据104可以得到包括多个对象的图像的对象集,例如图6中示出的标号为A-Z的对象的图像作为图像集,并可以将图像集中A-Z的对象的图像存入第二装置20的第二数据库108中。即,第二装置20解压缩第一压缩数据后,可以还原第一装置10中的第一数据库,将第一数据库中包括的对象的图像都存入第二数据108中。Subsequently, the second device 20 can decompress the first compressed data 104 and the second compressed data 106 respectively, where the decompressed first compressed data 104 can obtain an object set including images of multiple objects, for example, as shown in FIG. 6 The image of the object labeled AZ is used as an image set, and the image of the object in the image set AZ can be stored in the second database 108 of the second device 20. That is, after the second device 20 decompresses the first compressed data, the first database in the first device 10 can be restored, and the images of the objects included in the first database can be stored in the second data 108.
第二装置20解压第二压缩数据106可以得到N张视频图像,N张视频图像中都不包括第二数据库108中存储的对象的目标区域,并且每张图像都包括对目标区域的标记信息,所 标记信息包括:目标区域的位置信息、对象的旋转变换信息或目标区域包括的对象在第一数据库中的标识信息中的至少一项。The second device 20 decompresses the second compressed data 106 to obtain N video images. None of the N video images includes the target area of the object stored in the second database 108, and each image includes marking information for the target area. The marked information includes at least one of the position information of the target area, the rotation transformation information of the object, or the identification information of the object included in the target area in the first database.
可选地,若第一压缩数据104中的多个对象的图像是通过图像所在的视频文件中某一视频图像的帧号和位置信息表示,例如,第一数据库103中可以将识别出的对象通过视频文件中第10帧、以及该第10帧视频图像中的像素位置进行记录。则第二装置20在得到第一压缩数据104后,从第二压缩数据106中第10帧的像素位置处获取该对象的图像。Optionally, if the images of multiple objects in the first compressed data 104 are represented by the frame number and position information of a certain video image in the video file where the images are located, for example, the identified objects in the first database 103 Record through the 10th frame in the video file and the pixel position in the 10th frame of the video image. Then, after obtaining the first compressed data 104, the second device 20 obtains the image of the object from the pixel position of the tenth frame in the second compressed data 106.
最终,第二装置20将N张视频图像,根据每张视频图像中目标区域的标记信息,从第二数据库中确定目标区域中对象的图像后进行图像拼接,以还原视频图像。示例性地,若当前第二装置20正在处理的第一视频图像中,对目标区域的标记包括:“A”、边界像素和“缩小一倍”,则可以从第二数据库108中获取对象A对应的图像后,将图像缩小一倍,并将图像放置在第一视频图像中的边界像素所在的位置,实现视频图像的拼接。则当第二装置20完成所有N张视频图像的拼接之后,最终得到完整的视频文件。Finally, the second device 20 combines N video images based on the tag information of the target area in each video image, determines the image of the object in the target area from the second database, and performs image splicing to restore the video image. Exemplarily, if in the first video image currently being processed by the second device 20, the mark of the target area includes: "A", boundary pixels, and "reduced by one time", then the object A can be obtained from the second database 108 After the corresponding image, the image is reduced by one time, and the image is placed at the position of the boundary pixel in the first video image to realize the splicing of the video image. Then, after the second device 20 completes the stitching of all N video images, a complete video file is finally obtained.
综上,本实施例提供的视频图像处理方法,第一装置在对视频文件进行压缩时,先通过第一机器学习模型将视频文件的所有视频图像中满足预设条件的对象的图像存入数据库后,通过第二机器学习模型识别每一张视频图像中包括第一数据库中对象的目标区域,随后对第一数据库中的对象以及每张视频图像中除了目标区域之外的区域分别进行压缩编码,最终得到整个视频文件的压缩文件。在这个过程中,由于对第一数据库进行了统一的压缩编码,而在后续对视频文件中第一视频图像进行压缩编码时,第一装置不需要对第一视频图像中的目标区域进行压缩,只需要对视频图像中除数据库中存在的对象的目标区域之外的区域进行压缩编码即可,而对于存储在第一装置的第一数据库中符合预设条件的对象的图像,第二装置的第二数据库中也已经存储,使得第二装置在接收到第二压缩数据之后,可以根据第二压缩数据解压缩得到的第三视频图像,结合第二压缩数据中存储的目标区域的图像,最终得到第一图像。因此,本实施例能够使在第一装置向第二装置非实时传输视频文件时,第一装置不需要对视频图像中经常出现的对象的目标区域反复进行压缩,基于第二装置的第二数据库中已经存储了目标区域的图像,第一装置只需要对目标区域之外的区域进行压缩即可,减少了压缩的数据量,对于第二装置不需要对目标区域反复进行解压缩,从而减少了第一装置和第二装置之间传输的压缩包的数据量,提高了视频图像处理的效率。特别地,本实施例中第一数据库存储的只包括对象的图像,所提供的机器学习模型也是基于对象的图像本身对对象进行识别、比对,因此不需要使用如图3所示的技术中将图像划分为不同大小、数量较多的图像块,并对这些图像块一一进行处理,从而减少了视频图像处理时的计算量,进而提高了整个视频文件在压缩编码时的效率。并且,本申请基于整个视频文件整体,对所有视频文件中的对象判断是否满足预设条件,防止一个对象被反复识别与比对,进一步提高了视频文件在进行压缩编码时的效率。In summary, in the video image processing method provided in this embodiment, when the first device compresses the video file, it first stores the images of the objects that meet the preset conditions among all the video images of the video file into the database through the first machine learning model. Then, the second machine learning model is used to identify the target area in each video image that includes the object in the first database, and then the object in the first database and the area other than the target area in each video image are respectively compressed and encoded , And finally get the compressed file of the entire video file. In this process, since the first database is compressed and encoded uniformly, when the first video image in the video file is subsequently compressed and encoded, the first device does not need to compress the target area in the first video image. It is only necessary to compress and encode the area of the video image except the target area of the object existing in the database, and for the image of the object that meets the preset conditions stored in the first database of the first device, the second device’s The second database has also been stored, so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data and combine it with the image of the target area stored in the second compressed data, and finally Get the first image. Therefore, in this embodiment, when the first device transmits the video file to the second device in non-real-time, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image, based on the second database of the second device. The image of the target area is already stored in the target area. The first device only needs to compress the area outside the target area, which reduces the amount of compressed data. For the second device, there is no need to repeatedly decompress the target area, thereby reducing The data volume of the compressed packet transmitted between the first device and the second device improves the efficiency of video image processing. In particular, the first database in this embodiment stores only images of objects, and the provided machine learning model is also based on the object’s image itself to identify and compare objects, so there is no need to use the technology shown in Figure 3 The image is divided into image blocks of different sizes and a larger number, and these image blocks are processed one by one, thereby reducing the amount of calculation during video image processing, thereby improving the efficiency of the entire video file during compression and encoding. In addition, this application judges whether objects in all video files meet preset conditions based on the entire video file as a whole, prevents an object from being repeatedly identified and compared, and further improves the efficiency of video file compression and encoding.
二、实时传输场景。2. Real-time transmission scenarios.
当本申请实施例提供的视频图像的处理方法应用于如图1所示的场景中对实时获取的第一视频图像进行传输时,需要尽快将获取的第一视频图像进行压缩处理并发送至第二装置20时。具体地,图10为本申请提供的视频图像处理方法实施例的一示例性的流程示意图,示出了如图1所示的第一装置10在对实时的第一视频图像进行压缩时的处理流程。其中,由于本实施例应用在需要保证视频文件实时性的场景中,例如监控视频回传,则第一装置 10所获取的视频文件是当前实时产生、并需要立即发送给第二装置20的,因此需要第一装置10在接收到一帧视频图像之后,及时对该视频图像进行压缩编码处理并及时发送给第二装置20,并不断接收新的视频图像并重复这个过程。When the video image processing method provided by the embodiment of the present application is applied to the scene shown in FIG. 1 to transmit the first video image acquired in real time, the acquired first video image needs to be compressed and sent to the first video image as soon as possible. Two devices at 20 o'clock. Specifically, FIG. 10 is an exemplary flowchart of an embodiment of a video image processing method provided by this application, showing the processing of the first device 10 as shown in FIG. 1 when compressing a real-time first video image Process. Among them, since this embodiment is applied in a scene that needs to ensure the real-time nature of video files, such as monitoring video backhaul, the video file acquired by the first device 10 is currently generated in real time and needs to be sent to the second device 20 immediately. Therefore, after receiving a frame of video image, the first device 10 needs to compress and encode the video image in time and send it to the second device 20 in time, and continuously receive new video images and repeat this process.
在如图10所示的实施例中,作为执行主体的第一装置10首先获取需要实时向第二装置20传输的第一视频图像201,其中,第一视频图像201可以是一段连续的视频文件中的一张视频图像,第一视频文件201可以是用户指定传输的,也可以是第一装置10拍摄所拍摄的,或者是第一装置10通过互联网获取的,需要实时发送给第二装置20的视频。In the embodiment shown in FIG. 10, the first device 10 as the execution subject first obtains the first video image 201 that needs to be transmitted to the second device 20 in real time, where the first video image 201 may be a continuous video file. One of the video images in the first video file 201 can be specified for transmission by the user, or taken by the first device 10, or acquired by the first device 10 through the Internet, and needs to be sent to the second device 20 in real time. Video.
当获取视频图像201之后,第一装置10首先通过第二机器学习模型207,将视频图像201与第一数据库204中已经存储的对象的图像进行比较,确定出视频图像201中包括第一数据库204中存储的至少一个对象所在的区域,记为目标区域。例如,在如图7所示的示例中,第二机器学习模型207根据数据库204中存储的对象A的图像,确定出视频图像201中包括对象A的目标区域。随后,第一装置10将视频图像201中的目标区域“裁剪”后,将视频图像201中除目标区域之外的视频图像进行压缩,得到第二压缩数据208,并将第二压缩数据208发送给接收端的第二装置。此外,为了便于对视频图像的目标区域进行标记,第二压缩数据208中还可以包括目标区域的标记信息,例如,该标记信息包括:第一视频图像中目标区域的位置信息、变换信息或目标区域包括的对象在数据库中的标识信息等至少一项。After acquiring the video image 201, the first device 10 first compares the video image 201 with the image of the object already stored in the first database 204 through the second machine learning model 207, and determines that the video image 201 includes the first database 204 The area where at least one object is stored in is recorded as the target area. For example, in the example shown in FIG. 7, the second machine learning model 207 determines the target area including the object A in the video image 201 according to the image of the object A stored in the database 204. Subsequently, the first device 10 "cuts" the target area in the video image 201, compresses the video image excluding the target area in the video image 201 to obtain the second compressed data 208, and sends the second compressed data 208 To the second device on the receiving end. In addition, in order to facilitate the marking of the target area of the video image, the second compressed data 208 may also include marking information of the target area. For example, the marking information includes: position information, transformation information, or target area of the target area in the first video image. At least one item such as identification information of the objects included in the area in the database.
由于第二机器学习模型207已经将第一视频图像201中目标区域之外的区域进行压缩编码并发送给接收端的第二装置,对于第一视频图像201中目标区域所包括的对象A,应在此之前由第一装置10从第一数据库204中进行压缩编码生成第一压缩数据205后,发送给第二装置,使得第二装置接收到第二压缩数据208后,即可结合第一压缩数据拼接得到视频图像201。Since the second machine learning model 207 has compressed and encoded the area outside the target area in the first video image 201 and sent it to the second device at the receiving end, for the object A included in the target area in the first video image 201, it should be Prior to this, the first device 10 performs compression encoding from the first database 204 to generate the first compressed data 205, and then sends it to the second device, so that the second device can combine the first compressed data after receiving the second compressed data 208 The video image 201 is obtained by splicing.
在一种具体的实现方式中,第一数据库204中可以预先存储多个对象的图像,使得第一装置10在获取第一张视频图像之后就可以通过数据库204进行比对,在另一种具体的实现方式中,当第一装置10实时向第二装置20传输一段包括N张视频图像的视频文件时,在第一装置10传输N张视频图像中前M张视频图像,记为第二视频图像时(M<N),第一装置10并不直接通过第二机器学习模型207进行识别,而是先通过第一机器学习模型进行识别,并把第二视频图像中符合预设条件的对象的图像存入第一数据库204,随后,在传输N张视频图像中M张视频图像之后的视频图像时,基于第一数据库204中已经存储的对象的图像,第一装置10再通过第二机器学习模型207结合第一数据库204进行比对。而对于这M张视频图像,本申请对第一装置10进行压缩编码时使用的方式不做限定,例如可以将视频图像粗略分割为几个区域后,根据几个区域的特性进行不同参数的压缩编码,例如,对于包括背景的区域可以使用更大的残差值、取更少的高频分量,对于可能包括对象的区域可以使用更小的残差值等、取更多的高频分量等;或者,还可以采用现有的H.264、H.265或者H.266等视频压缩协议进行压缩编码。In a specific implementation manner, images of multiple objects may be pre-stored in the first database 204, so that the first device 10 can perform comparisons through the database 204 after obtaining the first video image. In another specific In the implementation manner, when the first device 10 transmits a video file including N video images to the second device 20 in real time, the first M video images among the N video images transmitted by the first device 10 are recorded as the second video In the case of an image (M<N), the first device 10 does not directly recognize the second machine learning model 207, but first recognizes the object in the second video image that meets the preset conditions. The image of is stored in the first database 204, and subsequently, when the video images after the M of the N video images are transmitted, based on the image of the object already stored in the first database 204, the first device 10 passes through the second machine The learning model 207 is compared with the first database 204. For these M video images, this application does not limit the method used by the first device 10 to compress and encode. For example, the video image can be roughly divided into several regions, and then compressed with different parameters according to the characteristics of the several regions. For coding, for example, a larger residual value and fewer high-frequency components can be used for areas that include background, and a smaller residual value can be used for areas that may include objects, etc., and more high-frequency components can be taken. ; Or, you can also use the existing H.264, H.265, or H.266 video compression protocol for compression encoding.
进一步地,由于本实施例中所处理的视频图像是实时的,第一装置10在传输单张视频图像时并不能直接确定视频图像所在视频文件中所包括的满足预设条件的对象,其中,所述预设条件可以包括:第一视频图像之前的N张视频图像中,包括对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N。也就是说,第一数据库204中的对象是在结合第一视频图像之前N张视频图像后,才能够确定出满足预设条件的对象加入 第一数据库204中,使得每一时刻第一数据库204中所存储的对象可能并不能覆盖当前视频文件中所有满足预设条件的对象,例如,在图10所示的示例中,记当前第一装置10对视频图像201进行处理的时刻为第一时刻,视频图像201中包括对象A和对象B,但是由于第一数据库204中只存储了对象A的图像,使得第二机器学习模型207只能够识别出视频图像中包括对象A的目标区域,即使视频图像201中的对象B在第一时刻包括第一视频图像201的N张视频图像已经满足预设条件,由于第一数据库204中并没有存储对象B的图像(因为第一视频图像之前的第N+1张图像到第一视频图像之前的1张图像,这N张视频图像中对象B并不符合预设条件),第二机器学习模型也不会将对象B所在的区域识别为目标区域,而第一时刻之后可以将对象B可以加入第一数据库204中以减少后续每一帧视频图像的需要压缩的区域的大小。因此,在对视频图像处理的过程中,第一装置10还会通过第一机器学习模型202对所处理的视频图像201中包括的对象进行识别,确定出除背景之外的对象,如图10中所示的对象A和对象B。可选地,如图10中第一装置10通过第二机器学习模型207处理视频图像201、和通过第一机器学习模型202处理视频图像201的两个处理步骤的先后顺序不作限定,或者两个步骤还可以同时执行。随后,由第一装置10中的管理模块203对识别的对象进行管理。所述进行的管理至少包括:对第一数据库204中存储的对象的图像进行添加、删除和替换,下面结合示例分别进行说明。Further, since the video image processed in this embodiment is real-time, the first device 10 cannot directly determine the objects that meet the preset conditions included in the video file where the video image is located when transmitting a single video image, where: The preset condition may include: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M<N. That is to say, the objects in the first database 204 can only be determined to be added to the first database 204 after the N video images before the first video image are combined, so that the first database 204 can be added to the first database 204 at every moment. The objects stored in may not cover all objects in the current video file that meet the preset conditions. For example, in the example shown in FIG. 10, the moment when the first device 10 processes the video image 201 is the first moment. The video image 201 includes object A and object B, but because only the image of object A is stored in the first database 204, the second machine learning model 207 can only identify the target area that includes object A in the video image, even if the video The object B in the image 201 includes the first video image 201 at the first moment. The N video images of the first video image 201 have already met the preset condition. +1 image to 1 image before the first video image, the object B in these N video images does not meet the preset conditions), the second machine learning model will not recognize the area where the object B is as the target area, After the first moment, the object B can be added to the first database 204 to reduce the size of the area that needs to be compressed in each subsequent frame of video image. Therefore, in the process of processing the video image, the first device 10 will also recognize the objects included in the processed video image 201 through the first machine learning model 202, and determine objects other than the background, as shown in FIG. 10 Object A and Object B shown in. Optionally, as shown in FIG. 10, the sequence of the two processing steps of the first device 10 processing the video image 201 through the second machine learning model 207 and processing the video image 201 through the first machine learning model 202 is not limited, or two Steps can also be executed at the same time. Subsequently, the management module 203 in the first device 10 manages the identified objects. The management at least includes: adding, deleting, and replacing the image of the object stored in the first database 204, which will be described below with examples.
1、对象的添加。管理模块203具体根据当前所处理的第一视频图像201以前的N张视频图像(包括当前的视频图像201)包括的对象的数量,将出现次数大于M次的对象(M<N)、且当前数据库中并没有存储的对象的图像加入到数据库204中。例如,对于图10中第一机器学习模型202识别第一视频图像201中的对象A和对象B,并确定第一数据库204中未存储的对象为对象B,则管理模块203根据视频图像201以前10张视频图像中有5张图像中都包括了对象B,也即对象B已经累计出现了5次,则需要将识别出的对象B的图像加入到第一数据库204中。1. Adding objects. According to the number of objects included in the previous N video images (including the current video image 201) of the currently processed first video image 201, the management module 203 classifies objects that appear more than M times (M<N) and are currently processed. Images of objects that are not stored in the database are added to the database 204. For example, for the first machine learning model 202 in FIG. 10 to identify the object A and the object B in the first video image 201, and determine that the object that is not stored in the first database 204 is the object B, then the management module 203 according to the previous video image 201 Object B is included in 5 of the 10 video images, that is, the object B has appeared 5 times in total, and the recognized image of the object B needs to be added to the first database 204.
可选地,管理模块203可以将以前的第一视频图像中包括对象B的图像都进行缓存,后续确定要将对象B的图像加入数据库204时,可以从缓存中将分辨率最高的对象B的图像存入第一数据库204,以提高后续的压缩过程中所处理的对象B的图像的清晰度。Optionally, the management module 203 can cache all the images of the object B in the previous first video image. When it is determined subsequently that the image of the object B is to be added to the database 204, the image of the object B with the highest resolution can be added from the cache. The image is stored in the first database 204 to improve the clarity of the image of the object B processed in the subsequent compression process.
进一步地,当管理模块203将新的对象B加入第一数据库204后,第一装置10可以立即将新加入的对象B的图像进行压缩编码,得到的压缩数据记为第三压缩数据,并将第三压缩数据发送给第二装置,使得第二装置解压缩第三压缩数据后,将对象B的图像存入第二装置的第二数据库中。或者,当管理模块203将新的对象B加入第一数据库204后,第一装置10还可以将新加入的对象B的图像后的第一数据库204整体进行压缩编码,得到的压缩数据记为第四压缩数据,并将第四压缩数据发送给第二装置,使得第二装置加压缩第四压缩数据后,更新第二装置的第二数据库。Further, when the management module 203 adds the new object B to the first database 204, the first device 10 can immediately compress and encode the image of the newly added object B, and the obtained compressed data is recorded as the third compressed data, and The third compressed data is sent to the second device, so that after the second device decompresses the third compressed data, the image of the object B is stored in the second database of the second device. Alternatively, when the management module 203 adds the new object B to the first database 204, the first device 10 may also compress and encode the entire first database 204 after the image of the newly added object B, and the obtained compressed data is recorded as the first database 204. Fourth, compress the data, and send the fourth compressed data to the second device, so that the second device updates the second database of the second device after adding the compressed fourth compressed data.
因此,即使在如图10所示的示例中,此时第一装置10的第二机器学习模型207无法对当前处理的第一视频图像201中的对象B的目标区域进行识别,但是第一装置10在对后续的视频图像进行识别等处理时,第二机器学习模型207就可以使用第一数据库204中包括对象B在内更多的对象进行识别。例如,图11为本申请提供的视频图像处理方法实施例的一示例性的流程示意图,示出了第一装置10对如图10所示的第一视频图像201以后的视频图像301进行处理流程,其中,记第一装置10对第一视频图像201进行处理的时刻为第一时刻, 记第一装置10将第三压缩数据或者第四压缩数据发送给第二装置20的时刻为第二时刻,则在第一时刻、第二时刻以后的第三时刻,当第一装置10接收到视频图像301以后,由于数据库204中已经存储了对象B的图像,并且对象B的图像已经通过第三压缩数据或者第四压缩数据发送给了第二装置20,此时第一装置10就可以通过第二机器学习模型207可以根据第一数据库204中存储的对象A和对象B的图像,确定出视频图像301中包括对象A和对象B的目标区域。随后,第一装置10将视频图像301中的目标区域“裁剪”后,将视频图像301中除目标区域之外的视频图像进行压缩编码,得到第二压缩数据208,并将第二压缩数据208发送给接收端的第二装置。Therefore, even in the example shown in FIG. 10, at this time, the second machine learning model 207 of the first device 10 cannot recognize the target area of the object B in the first video image 201 currently processed, but the first device 10 When performing processing such as recognition on subsequent video images, the second machine learning model 207 can use more objects in the first database 204 including object B for recognition. For example, FIG. 11 is an exemplary flow chart of an embodiment of a video image processing method provided by this application, and shows a processing flow of the first device 10 on the video image 301 following the first video image 201 as shown in FIG. 10 , Where the time when the first device 10 processes the first video image 201 is the first time, and the time when the first device 10 sends the third compressed data or the fourth compressed data to the second device 20 is the second time , Then at the first time and at the third time after the second time, when the first device 10 receives the video image 301, since the image of object B has been stored in the database 204, and the image of object B has passed the third compression The data or the fourth compressed data is sent to the second device 20. At this time, the first device 10 can use the second machine learning model 207 to determine the video image based on the images of the object A and the object B stored in the first database 204 301 includes the target areas of object A and object B. Subsequently, after the first device 10 "cuts" the target area in the video image 301, it compresses and encodes the video images in the video image 301 except the target area to obtain the second compressed data 208, and the second compressed data 208 Sent to the second device on the receiving end.
2、对象的删除。管理模块203具体根据当前所处理的第一视频图像201以前的X张视频图像(包括当前的第一视频图像201)的数量,将出现次数小于Y次的对象(Y<X)、且在第一数据库204中所存储的对象的图像,从第一数据库204中删除。例如,对于图7中第一机器学习模型202识别第一视频图像201中的对象A和对象B,假设对象A在当前处理的第一视频图像201以前的10张视频图像中只出现了这1次,小于Y=2次,此时,管理模块203可以将第一数据库204中存储的的对象A的图像删除,使得第二机器学习模型207在处理后续的视频图像时,可以减少数据库中比对的对象的图像的数量,进一步提高效率。2. Deletion of the object. According to the number of X video images (including the current first video image 201) before the first video image 201 currently being processed, the management module 203 classifies the objects that appear less than Y times (Y<X) and are in the first video image 201. The image of the object stored in a database 204 is deleted from the first database 204. For example, for the first machine learning model 202 in FIG. 7 to recognize the object A and the object B in the first video image 201, it is assumed that the object A only appears in the 10 video images before the first video image 201 that is currently processed. Times, less than Y=2 times. At this time, the management module 203 can delete the image of the object A stored in the first database 204, so that the second machine learning model 207 can reduce the ratio in the database when processing subsequent video images. The number of images of the object to further improve efficiency.
3、对象的替换。管理模块203具体将当前所处理的第一视频图像201中所识别的对象的图像与第一数据库204中存储的相同对象的图像进行比较,若第一视频图像201中对象A的图像分辨率为128*128大于第一数据库204中对象A的图像的分辨率64*64,则将第一视频图像201中对象A的图像存入第一数据库204中,并将原第一数据库204中存储的对象A的图像删除。3. Replacement of objects. The management module 203 specifically compares the image of the object identified in the first video image 201 currently being processed with the image of the same object stored in the first database 204, if the image resolution of the object A in the first video image 201 is 128*128 is greater than the resolution 64*64 of the image of object A in the first database 204, then the image of object A in the first video image 201 is stored in the first database 204, and the original first database 204 is stored The image of object A is deleted.
随后,当第二装置20接收到第一装置10通过图10所示的实施例中的方式发送的压缩数据后,对压缩数据进行解压缩处理,得到第一视频图像。具体地,图12为本申请提供的视频图像处理方法实施例的一示例性的流程示意图,其中,示出了第二装置20在对压缩数据进行解压得到第一视频图像的处理流程。Subsequently, after the second device 20 receives the compressed data sent by the first device 10 in the manner shown in the embodiment shown in FIG. 10, the compressed data is decompressed to obtain the first video image. Specifically, FIG. 12 is an exemplary flow chart of an embodiment of a video image processing method provided by this application, in which it shows a processing flow in which the second device 20 decompresses the compressed data to obtain the first video image.
在图12所示的实施例中,作为执行主体的第二装置20接收第一装置10发送的第一压缩数据205和第二压缩数据208,这两个压缩数据可以是第二装置20在不同时间接收到,第二装置20先接收到第一压缩数据205,再接收到第二压缩数据208。In the embodiment shown in FIG. 12, the second device 20 as the execution subject receives the first compressed data 205 and the second compressed data 208 sent by the first device 10. These two compressed data may be different from the second device 20. When the time is received, the second device 20 first receives the first compressed data 205, and then receives the second compressed data 208.
其中,第二装置20接收到第一压缩数据205后,即可对第一压缩数据205进行解压缩,得到多个对象的图像,例如图9中示出的标号为A、B……的对象的图像作为图像集,并可以将图像集存如在第二装置20的数据库210中。Wherein, after receiving the first compressed data 205, the second device 20 can decompress the first compressed data 205 to obtain images of multiple objects, for example, the objects labeled A, B... shown in FIG. 9 The image of is used as an image collection, and the image collection can be stored in the database 210 of the second device 20.
当第二装置20接收到第二压缩数据208之后,即可解压第二压缩数据208可以得到第三视频图像211,第三视频图像211不包括第二数据库210中符合预设条件的对象的目标区域,并且在接收到第二压缩数据208时还可以接收到第一装置10发送的第一视频图像中包括的目标区域的标记信息,所标记信息包括:第一视频图像中目标区域的位置信息、变换信息或目标区域中包括的对象在第一数据库中的标识信息的至少一项。After the second device 20 receives the second compressed data 208, it can decompress the second compressed data 208 to obtain the third video image 211, the third video image 211 does not include the target of the object that meets the preset conditions in the second database 210 Area, and when the second compressed data 208 is received, the marking information of the target area included in the first video image sent by the first device 10 may also be received, and the marked information includes: position information of the target area in the first video image , Transform at least one item of information or identification information of the object included in the target area in the first database.
最终,第二装置20根据第一视频图像中目标区域的标记信息,从第二数据库210中确定目标区域中对象的图像后进行图像拼接,以还原视频图像201。示例性地,若当前第二装置20正在处理的第一视频图像中,对目标区域的标记包括:“A”、边界像素和“缩小一倍”,则可以从数据库中获取对象A对应的图像后,将图像缩小一倍,并将图像放置在 所解压缩后的第三视频图像中该边界像素所在的位置,实现对当前视频图像201的拼接,最终得到第一视频图像201。Finally, the second device 20 determines the image of the object in the target area from the second database 210 according to the mark information of the target area in the first video image and performs image splicing to restore the video image 201. Exemplarily, if in the first video image currently being processed by the second device 20, the mark of the target area includes: "A", boundary pixels, and "reduced by one time", then the image corresponding to object A can be obtained from the database After that, the image is reduced by one time, and the image is placed in the position of the boundary pixel in the decompressed third video image to realize the splicing of the current video image 201, and finally the first video image 201 is obtained.
综上,本实施例提供的视频图像处理方法,应用于第一装置在对实时获取的第一视频图像进行压缩时,通过第二机器学习模型识别视频图像中包括第一数据库中存储的对象的目标区域,随后对视频图像中除了目标区域之外的区域分别进行压缩并进行发送。并在这个过程中,还可以通过第一机器学习模型识别第一视频图像中的对象,并对第一数据库中存储的对象的图像进行添加、删除和修改等操作。第一装置在对第一视频图像进行压缩编码的过程中,由于对第一数据库中对象的图像进行了统一的压缩,而不需要对第一视频图像中的目标区域进行压缩,只需要对视频图像中包括数据库中存储的对象的目标区域之外的区域进行压缩即可,而对于存储在第一装置的第一数据库中符合预设条件的对象的图像,第二装置的第二数据库中也已经存储,使得第二装置在接收到第二压缩数据之后,可以根据第二压缩数据解压缩得到的第三视频图像,结合第二压缩数据中存储的目标区域的图像,最终得到第一图像。因此,本实施例能够使在第一装置向第二装置实时传输第一视频图像时,第一装置不需要对视频图像中经常出现的对象的目标区域反复进行压缩,基于第二装置的第二数据库中已经存储了目标区域的图像,第一装置只需要对目标区域之外的区域进行压缩即可,减少了压缩的数据量,对于第二装置不需要对目标区域反复进行解压缩,从而减少了第一装置和第二装置之间传输的压缩包的数据量,提高了视频图像处理的效率。特别地,本实施例中第一数据库中存储的是只包括对象的图像,所提供的机器学习模型也是基于对象的图像本身对对象进行识别、比对,因此不需要使用如图3所示的技术中将图像划分为不同大小、数量较多的图像块,并对多个图像块一一进行处理,从而减少了视频图像处理时的计算量,进而提高了整个视频文件在压缩编码时的效率。并且,本申请在对连续的视频图像进行编码识别的同时,还可以继续对数据库中存储的对象进行实时更新,使得数据库中保存的对象的图像是最新的,确保数据库可以在后续比对中进行使用,进一步提高了视频文件在进行压缩编码时的效率。In summary, the video image processing method provided in this embodiment is applied to the first device to recognize the objects stored in the first database through the second machine learning model when the first device compresses the first video image acquired in real time. Target area, and then compress and send the areas other than the target area in the video image. In this process, the object in the first video image can also be identified through the first machine learning model, and operations such as adding, deleting, and modifying the image of the object stored in the first database can be performed. In the process of compressing and encoding the first video image by the first device, since the image of the object in the first database is uniformly compressed, there is no need to compress the target area in the first video image, only the video The image includes the area outside the target area of the object stored in the database, and the area outside the target area of the object stored in the database can be compressed, and for the image of the object that meets the preset conditions stored in the first database of the first device, the second database of the second device is also stored in the second database. It has been stored so that after receiving the second compressed data, the second device can decompress the third video image obtained according to the second compressed data and combine the image of the target area stored in the second compressed data to finally obtain the first image. Therefore, in this embodiment, when the first device transmits the first video image to the second device in real time, the first device does not need to repeatedly compress the target area of the object that frequently appears in the video image. The image of the target area has been stored in the database. The first device only needs to compress the area outside the target area, which reduces the amount of compressed data. For the second device, there is no need to repeatedly decompress the target area, thereby reducing The data volume of the compressed packet transmitted between the first device and the second device is improved, and the efficiency of video image processing is improved. In particular, the first database in this embodiment stores only the image of the object, and the provided machine learning model is also based on the object’s image itself to identify and compare the object, so there is no need to use the image shown in Figure 3. In the technology, the image is divided into image blocks of different sizes and a large number, and multiple image blocks are processed one by one, thereby reducing the amount of calculation during video image processing, thereby improving the efficiency of the entire video file during compression and encoding . Moreover, this application can continue to update the objects stored in the database in real time while encoding and recognizing the continuous video images, so that the images of the objects stored in the database are up to date, ensuring that the database can be compared in subsequent comparisons. Use, further improve the efficiency of the video file when compressing and encoding.
在前述实施例中,对本申请实施例提供的参数确定方法进行了介绍,而为了实现上述本申请实施例提供的参数确定方法中的各功能,作为执行主体的网络设备和终端设备可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。In the foregoing embodiment, the parameter determination method provided by the embodiment of the present application is introduced, and in order to implement the functions in the parameter determination method provided by the foregoing embodiment of the present application, the network device and terminal device as the execution subject may include a hardware structure And/or software modules, in the form of a hardware structure, a software module, or a hardware structure plus a software module to realize the above-mentioned functions. Whether a certain function among the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
图13为本申请提供的视频图像处理装置一实施例的结构示意图,如图13所示的装置可以作为如图1所示的场景中的第一装置10,执行如图4所示实施例中第一装置执行的功能,具体地,该装置包括:获取模块1301、第一确定模块1302,压缩模块1303和发送模块1304。其中,获取模块1301用于获取第一视频图像;第一确定模块1302用于确定第一视频图像中的目标区域;目标区域包括第一装置的第一数据库中存储的符合预设条件的对象的图像;压缩模块1303用于对第一视频图像中除目标区域之外的区域进行压缩,得到第二压缩数据;发送模块1304用于向第二装置发送第二压缩数据;第二装置的第二数据库中已存储符合预设条件的对象的图像。FIG. 13 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device shown in FIG. 13 can be used as the first device 10 in the scene shown in FIG. The function performed by the first device, specifically, the device includes: an acquisition module 1301, a first determination module 1302, a compression module 1303, and a sending module 1304. Wherein, the acquiring module 1301 is used to acquire the first video image; the first determining module 1302 is used to determine the target area in the first video image; the target area includes the objects that meet the preset conditions stored in the first database of the first device Image; the compression module 1303 is used to compress the area other than the target area in the first video image to obtain the second compressed data; the sending module 1304 is used to send the second compressed data to the second device; the second device of the second device Images of objects that meet preset conditions have been stored in the database.
可选地,压缩模块1303还用于,将第一数据库中存储的对象的图像进行压缩,得到第一压缩数据;发送模块1304还用于,向第二装置发送第一压缩数据;第一压缩数据用 于第二装置确定第二数据库。Optionally, the compression module 1303 is further configured to compress the image of the object stored in the first database to obtain first compressed data; the sending module 1304 is also configured to send the first compressed data to the second device; The data is used by the second device to determine the second database.
可选地,发送模块1304具体用于,向第二装置发送第二压缩数据和目标区域的标记信息;其中,标记信息包括:第一视频图像中目标区域的位置信息,目标区域中包括的对象的图像在第一数据库中的标识信息或者变换信息中的至少一项;变换信息用于表示目标区域中的对象在第一数据库中的图像与第一视频图像之间的区别。Optionally, the sending module 1304 is specifically configured to send the second compressed data and the marking information of the target area to the second device; where the marking information includes: position information of the target area in the first video image, and objects included in the target area At least one of the identification information or transformation information of the image in the first database; the transformation information is used to indicate the difference between the image in the first database and the first video image of the object in the target area.
可选地,预设条件包括:第一视频图像之前的N张视频图像中,包括对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N。Optionally, the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M<N .
图14为本申请提供的视频图像处理装置一实施例的结构示意图,如图14所示的装置在如图13所示的基础上,还包括:第二确定模块1305和存储管理模块1306。如图14所示的装置可用于执行如图10所示的视频图像处理方法,示例性地,第二确定模块1305用于识别第一视频图像中符合预设条件的目标对象;存储管理模块1306用于将目标对象中的新目标对象对应的图像加入第一数据库,新目标对象为未在第一数据库中存储的对象,第一数据库存储在存储模块中。FIG. 14 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device shown in FIG. 14 further includes a second determination module 1305 and a storage management module 1306 on the basis of FIG. 13. The device shown in FIG. 14 can be used to execute the video image processing method shown in FIG. 10, for example, the second determining module 1305 is used to identify target objects in the first video image that meet the preset conditions; the storage management module 1306 It is used to add the image corresponding to the new target object in the target object into the first database. The new target object is an object that is not stored in the first database, and the first database is stored in the storage module.
可选地,压缩模块1303还用于,对新目标对象对应的图像进行压缩,得到第三压缩数据;发送模块1304还用于,将第三压缩数据发送给第二设备。Optionally, the compression module 1303 is further configured to compress the image corresponding to the new target object to obtain third compressed data; the sending module 1304 is also configured to send the third compressed data to the second device.
可选地,在存储管理模块1306将目标对象中的新目标对象对应的图像加入第一数据库之后,压缩模块1303还用于,对第一数据库中存储的对象的图像进行压缩,得到第四压缩数据;发送模块1304还用于,将第四压缩数据发送给第二设备。Optionally, after the storage management module 1306 adds the image corresponding to the new target object in the target object to the first database, the compression module 1303 is further configured to compress the image of the object stored in the first database to obtain the fourth compression Data; the sending module 1304 is also used to send the fourth compressed data to the second device.
可选地,存储管理模块1306还用于,将第一数据库中存储的不符合预设条件的对象的图像删除。Optionally, the storage management module 1306 is further configured to delete the image of the object that does not meet the preset condition stored in the first database.
可选地,存储管理模块1306还用于,当目标区域中的第一对象在第一视频图像中的图像的清晰度,优于第一数据库中存储的第一对象的图像的清晰度,用第一对象在第一视频图像中的图像替换第一数据库中存储的第一对象的图像。Optionally, the storage management module 1306 is also configured to: when the sharpness of the first object in the target area in the first video image is better than the sharpness of the first object stored in the first database, use The image of the first object in the first video image replaces the image of the first object stored in the first database.
可选地,第一视频图像是第一装置正在实时压缩并传输的视频文件中帧号大于预设帧号的视频图像;获取模块1301还用于,获取待处理视频中的第二视频图像,第二视频图像在待处理视频中的帧号小于预设帧号;第二确定模块1305还用于,识别第二视频图像中符合预设条件的对象;存储管理模块1306还用于,将第二视频图像中符合预设条件的对象存入第一数据库。Optionally, the first video image is a video image with a frame number greater than a preset frame number in the video file being compressed and transmitted by the first device in real time; the acquiring module 1301 is also used to acquire the second video image in the video to be processed, The frame number of the second video image in the to-be-processed video is less than the preset frame number; the second determining module 1305 is also used to identify objects in the second video image that meet the preset conditions; the storage management module 1306 is also used to Second, the objects in the video image that meet the preset conditions are stored in the first database.
可选地,预设条件包括:第一视频图像所在的视频文件中,包括对象的视频图像的数量大于或等于预设数量。Optionally, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.
图15为本申请提供的视频图像处理装置一实施例的结构示意图,如图15所示的装置在如图13所示的基础上,还包括:第三确定模块1307和存储管理模块1306。如图15所示的装置可用于执行如图8所示的视频图像处理方法,示例性地,第三确定模块1307用于识别视频文件中所有视频图像中符合预设条件的对象;存储管理模块1306用于将符合预设条件的对象的图像存入第一数据库。FIG. 15 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device shown in FIG. 15 further includes a third determining module 1307 and a storage management module 1306 on the basis of that shown in FIG. 13. The device shown in FIG. 15 can be used to execute the video image processing method shown in FIG. 8. Illustratively, the third determining module 1307 is used to identify objects that meet preset conditions in all video images in a video file; storage management module 1306 is used to store the image of the object that meets the preset conditions in the first database.
可选地,第一数据库存储的对象的图像包括:对象的边界像素位置和包括对象的视频图像在视频文件中的帧号。Optionally, the image of the object stored in the first database includes: the boundary pixel position of the object and the frame number of the video image including the object in the video file.
图16为本申请提供的视频图像处理装置一实施例的结构示意图,如图16所示的装置可以作为如图1所示的场景中的第二装置20,执行如图4所示实施例中第二装置 执行的功能,具体地,该装置包括:接收模块1601,解压缩模块1602,获取模块1603和确定模块1604。示例性地,接收模块1601用于接收第一装置发送的第二压缩数据;其中,第二压缩数据是对第一视频图像中除目标区域之外的区域进行压缩得到的;解压缩模块1602用于对第二压缩数据进行解压缩,得到第三视频图像,第三视频图像包括第一视频图像中的除目标区域之外的区域对应的图像;获取模块1603用于从第二装置的第二数据库中获取目标区域对应的图像;确定模块1604用于根据第三视频图像和目标区域对应的图像,确定第一视频图像。FIG. 16 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device shown in FIG. 16 can be used as the second device 20 in the scene shown in FIG. The function performed by the second device, specifically, the device includes: a receiving module 1601, a decompression module 1602, an acquiring module 1603, and a determining module 1604. Exemplarily, the receiving module 1601 is used to receive the second compressed data sent by the first device; where the second compressed data is obtained by compressing the area in the first video image except the target area; the decompression module 1602 uses To decompress the second compressed data to obtain a third video image, the third video image includes an image corresponding to an area other than the target area in the first video image; the acquisition module 1603 is used to obtain a second video image from the second device The image corresponding to the target area is acquired in the database; the determining module 1604 is configured to determine the first video image according to the third video image and the image corresponding to the target area.
图17为本申请提供的视频图像处理装置一实施例的结构示意图,如图17所示的装置在如图16所示的基础上,还包括:存储管理模块1605。则在本实施例中,接收模块1601还用于,接收第一装置发送的第一压缩数据;解压缩模块1602还用于,对第一压缩数据进行解压缩,得到符合预设条件的对象对应的图像集,图像集包括目标区域对应的图像;存储管理模块1605用于,将图像集存入第二数据库。FIG. 17 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device shown in FIG. 17 further includes a storage management module 1605 on the basis of that shown in FIG. 16. Then, in this embodiment, the receiving module 1601 is also used to receive the first compressed data sent by the first device; the decompression module 1602 is also used to decompress the first compressed data to obtain the object corresponding to the preset conditions The image set includes the image corresponding to the target area; the storage management module 1605 is used to store the image set in the second database.
可选地,接收模块1601还用于,接收第一装置发送的目标区域的标记信息;其中,标记信息包括:第一视频图像中目标区域的位置信息,目标区域中包括的对象在第一装置的第一数据库中的标识信息或变换信息中的至少一项;变换信息用于表示目标区域中的对象在第一数据库中的图像与第一视频图像之间的区别。Optionally, the receiving module 1601 is further configured to receive the marking information of the target area sent by the first device; wherein the marking information includes: the position information of the target area in the first video image, and the object included in the target area is in the first device. At least one of the identification information or transformation information in the first database; the transformation information is used to indicate the difference between the image of the object in the target area in the first database and the first video image.
可选地,确定模块1604具体用于根据目标区域的标记信息,将目标区域对应的图像、和第三视频图像进行拼接,得到第一视频图像。Optionally, the determining module 1604 is specifically configured to stitch the image corresponding to the target area and the third video image to obtain the first video image according to the marking information of the target area.
可选地,接收模块1601还用于,接收第一装置发送的第三压缩数据;解压缩模块1602还用于,对第三压缩数据进行解压缩,得到新的目标对象的图像;存储管理模块1605还用于,将新的目标对象的图像加入第二数据库。Optionally, the receiving module 1601 is further configured to receive the third compressed data sent by the first device; the decompression module 1602 is also configured to decompress the third compressed data to obtain the image of the new target object; the storage management module 1605 is also used to add the image of the new target object to the second database.
可选地,接收模块1601还用于,接收第一装置发送的第四压缩数据;解压缩模块1602还用于,对第四压缩数据进行解压缩,得到更新后的符合预设条件的对象对应的图像集;存储管理模块1605还用于,基于更新后的符合预设条件的对象对应的图像集更新第二数据库。Optionally, the receiving module 1601 is further configured to receive the fourth compressed data sent by the first device; the decompression module 1602 is also configured to decompress the fourth compressed data to obtain the updated object corresponding to the preset conditions The storage management module 1605 is also used to update the second database based on the updated image set corresponding to the object that meets the preset conditions.
可选地,预设条件包括:第一视频图像之前的N张视频图像中,包括对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N;或者,预设条件包括:第一视频图像所在的视频文件中,包括对象的视频图像的数量大于或等于预设数量。Optionally, the preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, N>1, M<N Or, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to the preset number.
本申请提供的视频图像处理装置各实施例中各模块所执行的方法,可参照本申请所记载的视频图像处理方法中的描述,其实现方式与原理相同,不再赘述。For the method executed by each module in each embodiment of the video image processing device provided in this application, please refer to the description in the video image processing method described in this application, and the implementation methods and principles are the same and will not be repeated.
需要说明的是,应理解以上装置的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块通过处理元件调用软件的形式实现,部分模块通过硬件的形式实现。例如,确定模块可以为单独设立的处理元件,也可以集成在上述装置的某一个芯片中实现,此外,也可以以程序代码的形式存储于上述装置的存储器中,由上述装置的某一个处理元件调用并执行以上确定模块的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件可以是一种集成电路,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路 或者软件形式的指令完成。It should be noted that it should be understood that the division of the various modules of the above device is only a division of logical functions, and may be fully or partially integrated into a physical entity during actual implementation, or may be physically separated. And these modules can all be implemented in the form of software called by processing elements; they can also be implemented in the form of hardware; some modules can be implemented in the form of calling software by processing elements, and some of the modules can be implemented in the form of hardware. For example, the determination module may be a separately established processing element, or it may be integrated into a certain chip of the above-mentioned device for implementation. In addition, it may also be stored in the memory of the above-mentioned device in the form of program code, which is determined by a certain processing element of the above-mentioned device. Call and execute the functions of the above-identified module. The implementation of other modules is similar. In addition, all or part of these modules can be integrated together or implemented independently. The processing element described here may be an integrated circuit with signal processing capability. In the implementation process, each step of the above method or each of the above modules can be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(application specific integrated circuit,ASIC),或,一个或多个微处理器(digital signal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA)等。再如,当以上某个模块通过处理元件调度程序代码的形式实现时,该处理元件可以是通用处理器,例如中央处理器(central processing unit,CPU)或其它可以调用程序代码的处理器。再如,这些模块可以集成在一起,以片上系统(system-on-a-chip,SOC)的形式实现。For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as one or more application specific integrated circuits (ASIC), or one or more microprocessors (digital signal processor, DSP), or, one or more field programmable gate arrays (FPGA), etc. For another example, when one of the above modules is implemented in the form of processing element scheduling program code, the processing element may be a general-purpose processor, such as a central processing unit (CPU) or other processors that can call program codes. For another example, these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC).
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述计算机可读存储介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server, or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The computer-readable storage medium may be a magnetic medium, (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, (digital video disc, DVD)), or a semiconductor medium (for example, a solid state disk (solid state disk, SSD)) )Wait.
图18为本申请提供的视频图像处理装置一实施例的结构示意图,该装置可作为本申请前述实施例中任一所述的第一装置或者第二装置,并执行相应的装置所执行的视频图像处理方法。如图18所示,该通信装置1100可以包括:处理器111(例如CPU)和传输接口,所述传输接口可以是收发器113;其中,收发器113耦合至处理器111,处理器111控制收发器113的收发动作。可选的,所述通信装置1100还包括存储器112,存储器112中可以存储软件指令,处理器111被配置为读取存储在存储器112中的软件指令,以用于完成各种处理功能以及实现本申请实施例中第一装置或者第二装置执行的方法步骤。FIG. 18 is a schematic structural diagram of an embodiment of a video image processing device provided by this application. The device can be used as the first device or the second device described in any of the foregoing embodiments of this application, and execute the video executed by the corresponding device. Image processing method. As shown in FIG. 18, the communication device 1100 may include: a processor 111 (such as a CPU) and a transmission interface. The transmission interface may be a transceiver 113; wherein the transceiver 113 is coupled to the processor 111, and the processor 111 controls the transceiver. The transceiver 113's sending and receiving actions. Optionally, the communication device 1100 further includes a memory 112, and software instructions can be stored in the memory 112, and the processor 111 is configured to read the software instructions stored in the memory 112, so as to complete various processing functions and implement the present invention. The method steps executed by the first device or the second device in the application embodiment.
可选的,本申请实施例涉及的视频图像处理装置还可以包括:电源114、系统总线115以及通信接口116。收发器113可以集成在视频图像处理装置的收发信机中,也可以为通信装置上独立的收发天线。系统总线115用于实现元件之间的通信连接。上述通信接口116用于实现通信装置与其他外设之间进行连接通信。Optionally, the video image processing device involved in the embodiment of the present application may further include: a power supply 114, a system bus 115, and a communication interface 116. The transceiver 113 may be integrated in the transceiver of the video image processing device, or may be an independent transceiver antenna on the communication device. The system bus 115 is used to implement communication connections between components. The aforementioned communication interface 116 is used to implement connection and communication between the communication device and other peripherals.
在本申请实施例中,上述处理器111用于与存储器112耦合,读取并执行存储器112中的指令,以实现上述方法实施例中第一装置或者第二装置执行的方法步骤。收发器113与处理器111耦合,由处理器111控制收发器113进行消息收发,其实现原理和技术效果类似,在此不再赘述。In the embodiment of the present application, the above-mentioned processor 111 is configured to couple with the memory 112 to read and execute instructions in the memory 112 to implement the method steps executed by the first device or the second device in the above method embodiment. The transceiver 113 is coupled with the processor 111, and the processor 111 controls the transceiver 113 to send and receive messages. The implementation principles and technical effects are similar, and will not be repeated here.
该图18中提到的系统总线可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述系统总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗 线表示,但并不表示仅有一根总线或一种类型的总线。通信接口用于实现数据库访问装置与其他设备(例如客户端、读写库和只读库)之间的通信。存储器可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或SSD等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。The system bus mentioned in FIG. 18 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The system bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus. The communication interface is used to realize the communication between the database access device and other devices (such as client, read-write library and read-only library). The memory may be a non-volatile memory, such as a hard disk drive (HDD) or SSD, etc., or may be a volatile memory (volatile memory), such as a random-access memory (RAM). The memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited to this. The memory in the embodiments of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
该图18中提到的处理器可以是通用处理器,包括CPU、网络处理器(network processor,NP)等;还可以是DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The processor mentioned in Figure 18 can be a general-purpose processor, including a CPU, a network processor (NP), etc.; it can also be a DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device , Discrete hardware components.
可选的,本申请实施例还提供一种计算机可读存储介质,所述存储介质中存储有指令,当所述指令被计算机或处理器运行时,使得计算机或处理器实现本申请前述各实施例中一项第一装置或者第二装置执行的视频图像处理方法。Optionally, an embodiment of the present application further provides a computer-readable storage medium, which stores instructions in the storage medium, and when the instructions are executed by a computer or a processor, the computer or the processor implements the foregoing implementations of the present application In the example, a video image processing method executed by the first device or the second device.
可选的,本申请实施例还提供一种运行指令的芯片,所述芯片用于执行本申请前述各实施例中任一项第一装置或者第二装置执行的视频图像处理方法。Optionally, an embodiment of the present application further provides a chip for executing instructions, where the chip is used to execute the video image processing method executed by the first device or the second device in any one of the foregoing embodiments of the present application.
本申请实施例还提供一种计算机程序产品,所述计算机程序产品中包含指令,当指令在计算机或处理器上运行时,使得计算机或处理器实现本申请前述各实施例中一项第一装置或者第二装置执行的视频图像处理方法。The embodiments of the present application also provide a computer program product. The computer program product contains instructions. When the instructions run on a computer or a processor, the computer or the processor realizes one of the first devices in the foregoing embodiments of the present application. Or a video image processing method executed by the second device.
可以理解的是,在本申请实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请实施例的范围。It can be understood that the various numerical numbers involved in the embodiments of the present application are only for easy distinction for description, and are not used to limit the scope of the embodiments of the present application.
可以理解的是,在本发明的实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It can be understood that, in the embodiments of the present invention, the size of the sequence numbers of the foregoing processes does not mean the order of execution. The execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention. Scope.

Claims (43)

  1. 一种视频图像处理方法,应用于第一装置,其特征在于,包括;A video image processing method, applied to a first device, characterized in that it includes;
    获取第一视频图像;Obtain the first video image;
    确定所述第一视频图像中的目标区域;所述目标区域包括所述第一装置的第一数据库中存储的符合预设条件的对象的图像;Determine a target area in the first video image; the target area includes an image of an object that meets a preset condition stored in the first database of the first device;
    对所述第一视频图像中除所述目标区域之外的区域进行压缩,得到第二压缩数据;Compressing an area other than the target area in the first video image to obtain second compressed data;
    向第二装置发送所述第二压缩数据;所述第二装置的第二数据库中已存储所述符合预设条件的对象的图像。The second compressed data is sent to the second device; the image of the object that meets the preset condition has been stored in the second database of the second device.
  2. 根据权利要求1所述的方法,其特征在于,所述获取第一视频图像之前,还包括:The method according to claim 1, wherein before the acquiring the first video image, the method further comprises:
    将所述第一数据库中存储的对象的图像进行压缩,得到第一压缩数据;Compressing the image of the object stored in the first database to obtain first compressed data;
    向所述第二装置发送所述第一压缩数据;所述第一压缩数据用于所述第二装置确定所述第二数据库。The first compressed data is sent to the second device; the first compressed data is used by the second device to determine the second database.
  3. 根据权利要求1或2所述的方法,其特征在于,所述向第二装置发送所述第二压缩数据,包括:The method according to claim 1 or 2, wherein the sending the second compressed data to a second device comprises:
    向所述第二装置发送所述第二压缩数据和所述目标区域的标记信息;其中,所述标记信息包括:所述第一视频图像中目标区域的位置信息,所述目标区域中包括的对象的图像在所述第一数据库中的标识信息或者变换信息中的至少一项;所述变换信息用于表示所述目标区域中的对象在第一数据库中的图像与所述第一视频图像之间的区别。Send the second compressed data and the marking information of the target area to the second device; wherein the marking information includes: the position information of the target area in the first video image, and the target area includes At least one item of identification information or transformation information of the image of the object in the first database; the transformation information is used to indicate the image of the object in the target area in the first database and the first video image difference between.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,The method according to any one of claims 1-3, characterized in that,
    所述预设条件包括:所述第一视频图像之前的N张视频图像中,包括所述对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N。The preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, and N>1, M <N.
  5. 根据权利要求4所述的方法,其特征在于,所述获取第一视频图像之后,还包括:The method according to claim 4, wherein after the obtaining the first video image, the method further comprises:
    识别所述第一视频图像中符合所述预设条件的目标对象;Identifying a target object in the first video image that meets the preset condition;
    将所述目标对象中的新目标对象对应的图像加入所述第一数据库,所述新目标对象为未在所述第一数据库中存储的对象。An image corresponding to a new target object in the target object is added to the first database, where the new target object is an object that is not stored in the first database.
  6. 根据权利要求5所述的方法,其特征在于,所述将所述目标对象中的新目标对象对应的图像加入所述第一数据库之后,还包括:The method according to claim 5, wherein after adding the image corresponding to the new target object in the target object into the first database, the method further comprises:
    对所述新目标对象对应的图像进行压缩,得到第三压缩数据;Compressing the image corresponding to the new target object to obtain third compressed data;
    将所述第三压缩数据发送给所述第二装置。Sending the third compressed data to the second device.
  7. 根据权利要求5所述的方法,其特征在于,所述将所述目标对象中的新目标对象对应的图像加入所述第一数据库之后,还包括:The method according to claim 5, wherein after adding the image corresponding to the new target object in the target object into the first database, the method further comprises:
    对所述第一数据库中存储的对象的图像进行压缩,得到第四压缩数据;Compressing the image of the object stored in the first database to obtain fourth compressed data;
    将所述第四压缩数据发送给所述第二装置。Sending the fourth compressed data to the second device.
  8. 根据权利要求7所述的方法,其特征在于,所述识别所述第一视频图像中符合所述预设条件的目标对象之后,还包括:The method according to claim 7, wherein after the identifying the target object in the first video image that meets the preset condition, the method further comprises:
    将所述第一数据库中存储的不符合所述预设条件的对象的图像删除。Deleting the image of the object that does not meet the preset condition stored in the first database.
  9. 根据权利要求7所述的方法,其特征在于,所述获取第一视频图像之后,还包括:The method according to claim 7, characterized in that, after the obtaining the first video image, the method further comprises:
    当所述目标区域中的第一对象在所述第一视频图像中的图像的清晰度,优于所述第一数据库中存储的所述第一对象的图像的清晰度,用所述第一对象在所述第一视频图像中的 图像替换所述第一数据库中存储的所述第一对象的图像。When the sharpness of the first object in the target area in the first video image is better than the sharpness of the image of the first object stored in the first database, use the first The image of the object in the first video image replaces the image of the first object stored in the first database.
  10. 根据权利要求4-9任一项所述的方法,其特征在于,The method according to any one of claims 4-9, wherein:
    所述第一视频图像是所述第一装置正在实时压缩并传输的视频文件中帧号大于预设帧号的视频图像;The first video image is a video image with a frame number greater than a preset frame number in a video file that is being compressed and transmitted in real time by the first device;
    所述获取第一视频图像之前,还包括:Before acquiring the first video image, the method further includes:
    获取所述视频文件中的第二视频图像,所述第二视频图像在所述视频文件中的帧号小于所述预设帧号;Acquiring a second video image in the video file, where the frame number of the second video image in the video file is less than the preset frame number;
    识别所述第二视频图像中符合所述预设条件的对象,并存入所述第一数据库。Identify the objects in the second video image that meet the preset conditions and store them in the first database.
  11. 根据权利要求1-3任一项所述的方法,其特征在于,The method according to any one of claims 1-3, characterized in that,
    所述预设条件包括:所述第一视频图像所在的视频文件中,包括所述对象的视频图像的数量大于或等于预设数量。The preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to a preset number.
  12. 根据权利要求11所述的方法,其特征在于,所述获取第一视频图像之前,还包括:The method according to claim 11, wherein before said acquiring the first video image, the method further comprises:
    识别所述视频文件中所有视频图像中符合预设条件的对象;Identifying objects that meet preset conditions among all video images in the video file;
    将所述符合预设条件的对象的图像存入所述第一数据库。The image of the object that meets the preset condition is stored in the first database.
  13. 根据权利要求11或12所述的方法,其特征在于,The method according to claim 11 or 12, wherein:
    所述第一数据库存储的对象的图像包括:所述对象的边界像素位置和包括所述对象的视频图像在视频文件中的帧号。The image of the object stored in the first database includes: the boundary pixel position of the object and the frame number of the video image including the object in the video file.
  14. 一种视频图像处理方法,应用于第二装置,其特征在于,包括:A video image processing method applied to a second device, characterized in that it comprises:
    接收第一装置发送的第二压缩数据;其中,所述第二压缩数据是对第一视频图像中除目标区域之外的区域进行压缩得到的;Receiving second compressed data sent by the first device; wherein the second compressed data is obtained by compressing an area other than the target area in the first video image;
    对所述第二压缩数据进行解压缩,得到第三视频图像,所述第三视频图像包括所述第一视频图像中的除所述目标区域之外的区域对应的图像;Decompress the second compressed data to obtain a third video image, where the third video image includes an image corresponding to an area other than the target area in the first video image;
    从所述第二装置的第二数据库中获取所述目标区域对应的图像;Acquiring an image corresponding to the target area from a second database of the second device;
    根据所述第三视频图像和所述目标区域对应的图像,确定所述第一视频图像。The first video image is determined according to the third video image and the image corresponding to the target area.
  15. 根据权利要求14所述的方法,其特征在于,还包括:The method according to claim 14, further comprising:
    接收所述第一装置发送的第一压缩数据;Receiving the first compressed data sent by the first device;
    对所述第一压缩数据进行解压缩,得到符合预设条件的对象对应的图像集,并存入所述第二数据库;所述图像集包括所述目标区域对应的图像。Decompress the first compressed data to obtain an image set corresponding to an object that meets a preset condition, and store it in the second database; the image set includes images corresponding to the target area.
  16. 根据权利要求14或15所述的方法,其特征在于,还包括:The method according to claim 14 or 15, further comprising:
    接收所述第一装置发送的所述目标区域的标记信息;其中,所述标记信息包括:所述第一视频图像中目标区域的位置信息,所述目标区域中包括的对象在所述第一装置的第一数据库中的标识信息或变换信息中的至少一项;所述变换信息用于表示所述目标区域中的对象在第一数据库中的图像与所述第一视频图像之间的区别。Receiving the marking information of the target area sent by the first device; wherein the marking information includes: the position information of the target area in the first video image, and the object included in the target area is in the first video image; At least one of identification information or transformation information in the first database of the device; the transformation information is used to indicate the difference between the image of the object in the target area in the first database and the first video image .
  17. 根据权利要求16所述的方法,其特征在于,所述根据所述第三视频图像和所述目标区域对应的图像,确定所述第一视频图像,包括:The method according to claim 16, wherein the determining the first video image according to the third video image and the image corresponding to the target area comprises:
    根据所述目标区域的标记信息,将所述目标区域对应的图像、和所述第三视频图像进行拼接,得到所述第一视频图像。According to the marking information of the target area, the image corresponding to the target area and the third video image are spliced together to obtain the first video image.
  18. 根据权利要求14-17任一项所述的方法,其特征在于,所述确定所述第一视频图 像之后,还包括:The method according to any one of claims 14-17, wherein after the determining the first video image, the method further comprises:
    接收所述第一装置发送的第三压缩数据;Receiving the third compressed data sent by the first device;
    对所述第三压缩数据进行解压缩,得到新的目标对象的图像,并存入所述第二数据库。Decompress the third compressed data to obtain an image of a new target object, and store it in the second database.
  19. 根据权利要求14-17任一项所述的方法,其特征在于,所述确定所述第一视频图像之后,还包括:The method according to any one of claims 14-17, wherein after the determining the first video image, the method further comprises:
    接收所述第一装置发送的第四压缩数据;Receiving the fourth compressed data sent by the first device;
    对所述第四压缩数据进行解压缩,得到更新后的符合预设条件的对象对应的图像集;Decompress the fourth compressed data to obtain an updated image set corresponding to the object that meets the preset condition;
    基于所述更新后的符合预设条件的对象对应的图像集更新所述第二数据库。The second database is updated based on the updated image set corresponding to the object that meets the preset condition.
  20. 根据权利要求15所述的方法,其特征在于,The method of claim 15, wherein:
    所述预设条件包括:所述第一视频图像之前的N张视频图像中,包括所述对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N;The preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, and N>1, M <N;
    或者,所述预设条件包括:所述第一视频图像所在的视频文件中,包括所述对象的视频图像的数量大于或等于预设数量。Alternatively, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to a preset number.
  21. 一种视频图像处理装置,其特征在于,包括:A video image processing device, characterized in that it comprises:
    获取模块,用于获取第一视频图像;An acquisition module for acquiring the first video image;
    第一确定模块,用于确定所述第一视频图像中的目标区域;所述目标区域包括第一装置的第一数据库中存储的符合预设条件的对象的图像;A first determining module, configured to determine a target area in the first video image; the target area includes an image of an object that meets a preset condition stored in a first database of the first device;
    压缩模块,用于对所述第一视频图像中除所述目标区域之外的区域进行压缩,得到第二压缩数据;A compression module, configured to compress an area other than the target area in the first video image to obtain second compressed data;
    发送模块,用于向第二装置发送所述第二压缩数据;所述第二装置的第二数据库中已存储所述符合预设条件的对象的图像。The sending module is configured to send the second compressed data to the second device; the second database of the second device has stored the image of the object that meets the preset condition.
  22. 根据权利要求21所述的装置,其特征在于,The device according to claim 21, wherein:
    所述压缩模块还用于,将所述第一数据库中存储的对象的图像进行压缩,得到第一压缩数据;The compression module is further configured to compress the image of the object stored in the first database to obtain first compressed data;
    所述发送模块还用于,向所述第二装置发送所述第一压缩数据;所述第一压缩数据用于所述第二装置确定所述第二数据库。The sending module is further configured to send the first compressed data to the second device; the first compressed data is used by the second device to determine the second database.
  23. 根据权利要求21或22所述的装置,其特征在于,The device according to claim 21 or 22, wherein:
    所述发送模块具体用于,向所述第二装置发送所述第二压缩数据和所述目标区域的标记信息;其中,所述标记信息包括:所述第一视频图像中目标区域的位置信息,所述目标区域中包括的对象的图像在所述第一数据库中的标识信息或者变换信息中的至少一项;所述变换信息用于表示所述目标区域中的对象在第一数据库中的图像与所述第一视频图像之间的区别。The sending module is specifically configured to send the second compressed data and the marking information of the target area to the second device; wherein the marking information includes: location information of the target area in the first video image , At least one of the identification information or transformation information of the image of the object included in the target area in the first database; the transformation information is used to indicate that the object in the target area is in the first database The difference between the image and the first video image.
  24. 根据权利要求21-23任一项所述的装置,其特征在于,The device according to any one of claims 21-23, wherein:
    所述预设条件包括:所述第一视频图像之前的N张视频图像中,包括所述对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N。The preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, and N>1, M <N.
  25. 根据权利要求24所述的装置,其特征在于,还包括:The device according to claim 24, further comprising:
    第二确定模块,用于识别所述第一视频图像中符合所述预设条件的目标对象;A second determining module, configured to identify a target object in the first video image that meets the preset condition;
    存储管理模块,用于将所述目标对象中的新目标对象对应的图像加入所述第一数据库,所述新目标对象为未在所述第一数据库中存储的对象,所述第一数据库存储在存储模块中。A storage management module, configured to add an image corresponding to a new target object in the target object to the first database, where the new target object is an object that is not stored in the first database, and the first database stores In the storage module.
  26. 根据权利要求25所述的装置,其特征在于,The device of claim 25, wherein:
    所述压缩模块还用于,对所述新目标对象对应的图像进行压缩,得到第三压缩数据;The compression module is further configured to compress the image corresponding to the new target object to obtain third compressed data;
    所述发送模块还用于,将所述第三压缩数据发送给所述第二装置。The sending module is further configured to send the third compressed data to the second device.
  27. 根据权利要求25所述的装置,其特征在于,在所述存储管理模块将所述目标对象中的新目标对象对应的图像加入所述第一数据库之后,The device according to claim 25, wherein after the storage management module adds an image corresponding to a new target object in the target object into the first database,
    所述压缩模块还用于,对所述第一数据库中存储的对象的图像进行压缩,得到第四压缩数据;The compression module is further configured to compress the image of the object stored in the first database to obtain fourth compressed data;
    所述发送模块还用于,将所述第四压缩数据发送给所述第二装置。The sending module is further configured to send the fourth compressed data to the second device.
  28. 根据权利要求27所述的装置,其特征在于,The device of claim 27, wherein:
    所述存储管理模块还用于,将所述第一数据库中存储的不符合所述预设条件的对象的图像删除。The storage management module is further configured to delete the image of the object that does not meet the preset condition and is stored in the first database.
  29. 根据权利要求27所述的装置,其特征在于,The device of claim 27, wherein:
    所述存储管理模块还用于,当所述目标区域中的第一对象在所述第一视频图像中的图像的清晰度,优于所述第一数据库中存储的所述第一对象的图像的清晰度,用所述第一对象在所述第一视频图像中的图像替换所述第一数据库中存储的所述第一对象的图像。The storage management module is further configured to: when the image definition of the first object in the target area in the first video image is better than the image of the first object stored in the first database The image of the first object stored in the first database is replaced with the image of the first object in the first video image.
  30. 根据权利要求24-29任一项所述的装置,其特征在于,The device according to any one of claims 24-29, wherein:
    所述第一视频图像是所述第一装置正在实时压缩并传输的视频文件中帧号大于预设帧号的视频图像;The first video image is a video image with a frame number greater than a preset frame number in a video file that is being compressed and transmitted in real time by the first device;
    所述获取模块还用于,获取所述视频文件中的第二视频图像,所述第二视频图像在所述视频文件中的帧号小于所述预设帧号;The acquiring module is further configured to acquire a second video image in the video file, and the frame number of the second video image in the video file is less than the preset frame number;
    所述第二确定模块还用于,识别所述第二视频图像中符合所述预设条件的对象;The second determining module is further configured to identify an object in the second video image that meets the preset condition;
    所述存储管理模块还用于,将所述第二视频图像中符合所述预设条件的对象存入所述第一数据库。The storage management module is further configured to store the objects in the second video image that meet the preset conditions in the first database.
  31. 根据权利要求21-23任一项所述的装置,其特征在于,The device according to any one of claims 21-23, wherein:
    所述预设条件包括:所述第一视频图像所在的视频文件中,包括所述对象的视频图像的数量大于或等于预设数量。The preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to a preset number.
  32. 根据权利要求31所述的装置,其特征在于,还包括:The device according to claim 31, further comprising:
    第三确定模块,用于识别所述视频文件中所有视频图像中符合预设条件的对象;The third determining module is used to identify objects that meet preset conditions among all the video images in the video file;
    存储管理模块,用于将所述符合预设条件的对象的图像存入所述第一数据库。The storage management module is configured to store the image of the object that meets the preset condition in the first database.
  33. 根据权利要求31或32所述的装置,其特征在于,The device according to claim 31 or 32, wherein:
    所述第一数据库存储的对象的图像包括:所述对象的边界像素位置和包括所述对象的视频图像在视频文件中的帧号。The image of the object stored in the first database includes: the boundary pixel position of the object and the frame number of the video image including the object in the video file.
  34. 一种视频图像处理装置,其特征在于,包括:A video image processing device, characterized in that it comprises:
    接收模块,用于接收第一装置发送的第二压缩数据;其中,所述第二压缩数据是对第一视频图像中除目标区域之外的区域进行压缩得到的;A receiving module, configured to receive second compressed data sent by the first device; wherein the second compressed data is obtained by compressing an area other than the target area in the first video image;
    解压缩模块,用于对所述第二压缩数据进行解压缩,得到第三视频图像,所述第三视频图像包括所述第一视频图像中的除所述目标区域之外的区域对应的图像;A decompression module for decompressing the second compressed data to obtain a third video image, the third video image including an image corresponding to an area other than the target area in the first video image ;
    获取模块,用于从第二装置的第二数据库中获取所述目标区域对应的图像;An acquiring module, configured to acquire an image corresponding to the target area from the second database of the second device;
    确定模块,用于根据所述第三视频图像和所述目标区域对应的图像,确定所述第一视 频图像。The determining module is configured to determine the first video image according to the third video image and the image corresponding to the target area.
  35. 根据权利要求34所述的装置,其特征在于,还包括:存储管理模块;The device according to claim 34, further comprising: a storage management module;
    所述接收模块还用于,接收所述第一装置发送的第一压缩数据;The receiving module is further configured to receive the first compressed data sent by the first device;
    所述解压缩模块还用于,对所述第一压缩数据进行解压缩,得到符合预设条件的对象对应的图像集,所述图像集包括所述目标区域对应的图像;The decompression module is further configured to decompress the first compressed data to obtain an image set corresponding to an object that meets a preset condition, and the image set includes an image corresponding to the target area;
    所述存储管理模块用于,将所述图像集存入所述第二数据库。The storage management module is used for storing the image collection in the second database.
  36. 根据权利要求34或35所述的装置,其特征在于,The device according to claim 34 or 35, wherein:
    所述接收模块还用于,接收所述第一装置发送的所述目标区域的标记信息;其中,所述标记信息包括:所述第一视频图像中目标区域的位置信息,所述目标区域中包括的对象在所述第一装置的第一数据库中的标识信息或变换信息中的至少一项;所述变换信息用于表示所述目标区域中的对象在第一数据库中的图像与所述第一视频图像之间的区别。The receiving module is further configured to receive the marking information of the target area sent by the first device; wherein the marking information includes: position information of the target area in the first video image, and At least one of the identification information or transformation information of the included object in the first database of the first device; the transformation information is used to indicate that the image of the object in the target area in the first database and the The difference between the first video image.
  37. 根据权利要求36所述的装置,其特征在于,The device of claim 36, wherein:
    所述确定模块,具体用于根据所述目标区域的标记信息,将所述目标区域对应的图像、和所述第三视频图像进行拼接,得到所述第一视频图像。The determining module is specifically configured to stitch the image corresponding to the target area and the third video image to obtain the first video image according to the marking information of the target area.
  38. 根据权利要求34-37任一项所述的装置,其特征在于,The device according to any one of claims 34-37, wherein:
    所述接收模块还用于,接收所述第一装置发送的第三压缩数据;The receiving module is further configured to receive third compressed data sent by the first device;
    所述解压缩模块还用于,对所述第三压缩数据进行解压缩,得到新的目标对象的图像;The decompression module is further configured to decompress the third compressed data to obtain an image of a new target object;
    存储管理模块还用于,将所述新的目标对象的图像加入所述第二数据库。The storage management module is further configured to add the image of the new target object to the second database.
  39. 根据权利要求34-37任一项所述的装置,其特征在于,The device according to any one of claims 34-37, wherein:
    所述接收模块还用于,接收所述第一装置发送的第四压缩数据;The receiving module is further configured to receive fourth compressed data sent by the first device;
    所述解压缩模块还用于,对所述第四压缩数据进行解压缩,得到更新后的符合预设条件的对象对应的图像集;The decompression module is further configured to decompress the fourth compressed data to obtain an updated image set corresponding to an object that meets a preset condition;
    存储管理模块还用于,基于所述更新后的符合预设条件的对象对应的图像集更新所述第二数据库。The storage management module is further configured to update the second database based on the updated image set corresponding to the object that meets the preset condition.
  40. 根据权利要求35所述的装置,其特征在于,The device of claim 35, wherein:
    所述预设条件包括:所述第一视频图像之前的N张视频图像中,包括所述对象的视频图像的数量大于或等于M,其中,M和N均为正整数,N>1,M<N;The preset condition includes: among the N video images before the first video image, the number of video images including the object is greater than or equal to M, where M and N are both positive integers, and N>1, M <N;
    或者,所述预设条件包括:所述第一视频图像所在的视频文件中,包括所述对象的视频图像的数量大于或等于预设数量。Alternatively, the preset condition includes: in the video file where the first video image is located, the number of video images including the object is greater than or equal to a preset number.
  41. 一种视频图像处理装置,其特征在于,包括:处理器和传输接口;A video image processing device, characterized by comprising: a processor and a transmission interface;
    所述装置通过所述传输接口与其他装置进行通信;The device communicates with other devices through the transmission interface;
    所述处理器被配置为读取存储在存储器中的软件指令,以实现如权利要求1-13任一项或者14-20任一项所述的方法。The processor is configured to read software instructions stored in the memory to implement the method according to any one of claims 1-13 or 14-20.
  42. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令被计算机或处理器运行时,使得所述计算机或处理器实现如权利要求1-13任一项或者14-20任一项所述的方法。A computer-readable storage medium, characterized in that instructions are stored in the computer-readable storage medium, and when the instructions are executed by a computer or a processor, the computer or the processor can realize Any one of the methods described in any one of 14-20.
  43. 一种计算机程序产品,其特征在于,所述计算机程序产品中包含指令,当所述指令在计算机或处理器上运行时,使得所述计算机或所述处理器实现如权利要求1-13任一项或者14-20任一项所述的方法。A computer program product, characterized in that, the computer program product contains instructions, and when the instructions run on a computer or a processor, the computer or the processor realizes any one of claims 1-13. Item or the method of any one of 14-20.
PCT/CN2020/092377 2020-05-26 2020-05-26 Video image processing method and device WO2021237464A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/092377 WO2021237464A1 (en) 2020-05-26 2020-05-26 Video image processing method and device
CN202080101403.9A CN115699725A (en) 2020-05-26 2020-05-26 Video image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/092377 WO2021237464A1 (en) 2020-05-26 2020-05-26 Video image processing method and device

Publications (1)

Publication Number Publication Date
WO2021237464A1 true WO2021237464A1 (en) 2021-12-02

Family

ID=78745214

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092377 WO2021237464A1 (en) 2020-05-26 2020-05-26 Video image processing method and device

Country Status (2)

Country Link
CN (1) CN115699725A (en)
WO (1) WO2021237464A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070030896A1 (en) * 2001-12-20 2007-02-08 Dorin Comaniciu Real-time video object generation for smart cameras
CN103475882A (en) * 2013-09-13 2013-12-25 北京大学 Surveillance video encoding and recognizing method and surveillance video encoding and recognizing system
CN103581603A (en) * 2012-07-24 2014-02-12 联想(北京)有限公司 Multimedia data transmission method and electronic equipment
CN108282674A (en) * 2018-02-05 2018-07-13 天地融科技股份有限公司 A kind of video transmission method, terminal and system
CN109783680A (en) * 2019-01-16 2019-05-21 北京旷视科技有限公司 Image method for pushing, image acquiring method, device and image processing system
CN109831638A (en) * 2019-01-23 2019-05-31 广州视源电子科技股份有限公司 Video image transmission method, device, interactive intelligent tablet computer and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070030896A1 (en) * 2001-12-20 2007-02-08 Dorin Comaniciu Real-time video object generation for smart cameras
CN103581603A (en) * 2012-07-24 2014-02-12 联想(北京)有限公司 Multimedia data transmission method and electronic equipment
CN103475882A (en) * 2013-09-13 2013-12-25 北京大学 Surveillance video encoding and recognizing method and surveillance video encoding and recognizing system
CN108282674A (en) * 2018-02-05 2018-07-13 天地融科技股份有限公司 A kind of video transmission method, terminal and system
CN109783680A (en) * 2019-01-16 2019-05-21 北京旷视科技有限公司 Image method for pushing, image acquiring method, device and image processing system
CN109831638A (en) * 2019-01-23 2019-05-31 广州视源电子科技股份有限公司 Video image transmission method, device, interactive intelligent tablet computer and storage medium

Also Published As

Publication number Publication date
CN115699725A (en) 2023-02-03

Similar Documents

Publication Publication Date Title
US8019169B2 (en) Image coding apparatus, image decoding apparatus, image processing apparatus and methods thereof
WO2019001108A1 (en) Video processing method and apparatus
US9667982B2 (en) Techniques for transform based transcoding
CN103957341A (en) Image transmission method and related device
JP2022500890A (en) Video image component prediction methods, devices and computer storage media
CN111263097B (en) Media data transmission method and related equipment
WO2022121650A1 (en) Point cloud attribute predicting method, encoder, decoder, and storage medium
WO2021237464A1 (en) Video image processing method and device
US20150063435A1 (en) Techniques for reference based transcoding
US9812095B2 (en) Video processing method including managing a reference picture list and video system therefore
KR20170053714A (en) Systems and methods for subject-oriented compression
CN114746870A (en) High level syntax for priority signaling in neural network compression
CN115866245A (en) Video encoding method, video encoding device, computer equipment and storage medium
CN108805943B (en) Image transcoding method and device
US20130272583A1 (en) Methods and apparatuses for facilitating face image analysis
CN114727116A (en) Encoding method and device
US20150149578A1 (en) Storage device and method of distributed processing of multimedia data
WO2023240662A1 (en) Encoding method, decoding method, encoder, decoder, and storage medium
CN117218578A (en) Video similarity determination method, device, computer equipment and storage medium
WO2023206420A1 (en) Video encoding and decoding method and apparatus, device, system and storage medium
CN114025162B (en) Entropy decoding method, medium, program product, and electronic device
CN113905255B (en) Media data editing method, media data packaging method and related equipment
US20050147306A1 (en) Method of encoding and decoding digital images
WO2024078403A1 (en) Image processing method and apparatus, and device
US20230082456A1 (en) Point cloud attribute prediction method and apparatus, and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937889

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20937889

Country of ref document: EP

Kind code of ref document: A1