WO2019219065A1 - 视频分析的方法和装置 - Google Patents

视频分析的方法和装置 Download PDF

Info

Publication number
WO2019219065A1
WO2019219065A1 PCT/CN2019/087288 CN2019087288W WO2019219065A1 WO 2019219065 A1 WO2019219065 A1 WO 2019219065A1 CN 2019087288 W CN2019087288 W CN 2019087288W WO 2019219065 A1 WO2019219065 A1 WO 2019219065A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
resolution
image frame
original resolution
Prior art date
Application number
PCT/CN2019/087288
Other languages
English (en)
French (fr)
Inventor
冯仁光
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2019219065A1 publication Critical patent/WO2019219065A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of video surveillance technologies, and in particular, to a method and apparatus for video analysis.
  • the intelligent analysis method of the video may be: hard decoding the original video to obtain a YUV (a color image data encoding method) format frame of the original resolution, and then encoding the original resolution YUV format frame into a low resolution.
  • the video is then soft decoded by a CPU (Central Processing Unit) to obtain a low resolution YUV format frame, and then the video analysis can be performed based on the low resolution YUV format frame.
  • a CPU Central Processing Unit
  • Embodiments of the present application provide a method and apparatus for video analysis.
  • the technical solution is as follows:
  • a method of video analysis comprising:
  • Video analysis processing is performed based on the image frame of the target resolution.
  • performing video analysis processing on the image frame based on the target resolution including:
  • Video analysis processing is performed based on the image frame of the target resolution and the image frame of the original resolution.
  • performing video analysis processing on the image frame based on the target resolution and the image frame in the original resolution including:
  • a second region image that matches the target image is intercepted and displayed.
  • the determining, according to the target resolution and the original resolution, determining a second of the image frames of the original resolution corresponding to the first location information in the image frame of the target resolution Location information including:
  • a second region image that matches the target image is determined based on the relative position.
  • the intercepting and displaying the second area image that matches the target image in the image frame of the original resolution based on the second location information includes:
  • the second region image is intercepted and displayed.
  • the determining an information integrity score of the second area image includes:
  • an apparatus for video analysis comprising:
  • a first processing module configured to perform hardware decoding on the video data to obtain an image frame of an original resolution
  • a second processing module configured to perform hardware downsampling processing on the image frame of the original resolution to obtain an image frame of a preset target resolution, where the target resolution is smaller than the original resolution;
  • an analysis module configured to perform video analysis processing on the image frame based on the target resolution.
  • the analyzing module is configured to:
  • Video analysis processing is performed based on the image frame of the target resolution and the image frame of the original resolution.
  • the analyzing module is configured to:
  • a second region image that matches the target image is intercepted and displayed.
  • the analyzing module is configured to:
  • a second region image that matches the target image is determined based on the relative position.
  • the analyzing module is configured to:
  • the second region image is intercepted and displayed.
  • the analyzing module is configured to:
  • an electronic device comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a code set or a set of instructions, the at least one instruction, the at least one A program, the set of codes, or a set of instructions loaded by the processor and executed to implement the method of video analysis as described in the first aspect above.
  • a fourth aspect provides a computer readable storage medium, where the storage medium stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the at least one program, and the code
  • a set or set of instructions is loaded by the processor and executed to implement the method of video analysis as described in the first aspect above.
  • the video data to be analyzed is obtained; the video data is hardware decoded to obtain an image frame of a original resolution; and the image frame of the original resolution is subjected to hardware downsampling to obtain a preset target resolution image. a frame in which the target resolution is smaller than the original resolution; the image frame based on the target resolution is subjected to video analysis processing.
  • the processing steps can be simplified, thereby improving the processing efficiency of intelligent analysis of the video.
  • FIG. 1 is a flowchart of a method for video analysis provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a method for video analysis provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a method for video analysis provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an interface of a video analysis method according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an apparatus for video analysis according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the embodiment of the present application provides a method for video analysis, which may be implemented by an electronic device.
  • the electronic device can be a terminal or a server.
  • the terminal can include components such as a processor, a memory, and the like.
  • the processor can be a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit), and can be used for hardware decoding of video data and hardware downsampling of image frames of original resolution.
  • the image frame based on the target resolution is subjected to video analysis processing and the like.
  • the memory may be a RAM (Random Access Memory), a Flash (flash memory), etc., and may be used to store received data, data required for processing, data generated during processing, and the like, such as video data, Image frames of the original resolution, image frames of the target resolution, and the like.
  • the terminal may also include a screen, a transceiver, an image detecting component, an audio output component, an audio input component, and the like.
  • the screen can be used to display the captured area image and so on.
  • the transceiver can be used for data transmission with other devices, and can include an antenna, a matching circuit, a modem, and the like.
  • the image detecting unit may be a camera or the like.
  • the audio output unit can be a speaker, a headphone, or the like.
  • the audio input component can be a microphone or the like.
  • the server can include components such as a processor, a memory, a transceiver, and the like.
  • the processor can be a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit), and can be used for hardware decoding of video data and hardware downsampling of image frames of original resolution.
  • the image frame based on the target resolution is subjected to video analysis processing and the like.
  • the memory may be a RAM (Random Access Memory), a Flash (flash memory), etc., and may be used to store received data, data required for processing, data generated during processing, and the like, such as video data, Image frames of the original resolution, image frames of the target resolution, and the like.
  • the transceiver can be used for data transmission with a terminal or other server (such as a positioning server), for example, sending a second area image to the terminal, and the transceiver can include an antenna, a matching circuit, a modem, and the like.
  • a terminal or other server such as a positioning server
  • the transceiver can include an antenna, a matching circuit, a modem, and the like.
  • the processing flow of the method may include the following steps:
  • step 101 video data to be analyzed is acquired.
  • the video data can be acquired first.
  • the acquired video data may be a piece of video data in the surveillance video, for example, the user wants to find a certain shooting target by monitoring the video, and the like.
  • the acquired video data may also be a video clip in the movie video, for example, the user wants to perform special effects processing on a video clip in the movie, and the like.
  • the obtained video data may also be other acquisition methods, which is not limited in this application.
  • step 102 the video data is hardware decoded to obtain an image frame of the original resolution.
  • decoding is a process of restoring a digital code to a content it represents or converting an electrical pulse signal into information, data, etc., represented by a specific method.
  • Hardware decoding is a hardware, such as a GPU (Graphics Processing Unit, Graphics processor), a way of decoding a decoded video stream.
  • the video data is hardware-decoded by the GPU to obtain an image frame of the original resolution.
  • the method used for hardware decoding is an existing hardware decoding method, which is not described herein.
  • step 103 the image frame of the original resolution is subjected to hardware downsampling processing to obtain an image frame of a preset target resolution, wherein the target resolution is smaller than the original resolution.
  • downsampling is a processing method that reduces the number of sampling points.
  • the quality of the image after downsampling processing is reduced, and image processing of the downsampled processed image can reduce the amount of processing. For example, for an image, if the downsampling coefficient is k, that is, one dot is taken every k points of each row and column in the original image to form an image.
  • Hardware downsampling is a downsampling method implemented in hardware.
  • the image frame of the original resolution may be downsampled, in order to improve processing efficiency and reduce
  • the burden of the CPU can directly downsample the image frame of the original resolution on the hardware according to the preset downsampling rate, and obtain an image frame of a preset target resolution, and the target resolution must be smaller than the original resolution. In this way, it is not necessary to copy the original resolution image frame to the CPU, which can alleviate the problem of low efficiency caused by multiple copies of the data, and can reduce the processing load of the CPU.
  • the downsampling process may be a proportional downsampling process, that is, the obtained image frame of the target resolution is the same as the aspect ratio of the image frame of the original resolution; the downsampling process may also be unequal.
  • the downsampling process of the ratio that is, the obtained image frame of the target resolution is different from the aspect ratio of the image frame of the original resolution, which is not limited in this application.
  • step 104 video analysis processing is performed based on the image frame of the target resolution.
  • different video analysis processes may be performed based on the image frames of the target resolution and the image frames of the original resolution, respectively.
  • the algorithm module with higher speed and efficiency can perform algorithm operation based on the image frame of the target resolution; the algorithm module with higher image quality can perform algorithm operation based on the image frame of the original resolution, the flow of the solution Can be as shown in Figure 2.
  • the algorithm for recognizing and capturing the face is taken as an example, and the human face can be recognized according to the image frame of the target resolution and the image frame of the original resolution, and the recognized face image is intercepted and displayed, as shown in FIG. 3 .
  • the processing of the above steps may be: determining, based on the acquired target image, first position information of the first area image that matches the target image in the image frame of the target resolution; based on the target resolution and the original resolution Determining second position information in the image frame of the original resolution corresponding to the first position information in the image frame of the target resolution; based on the second position information, intercepting and displaying in the image frame of the original resolution A second area image that matches the target image.
  • the location information may be coordinate information of four vertex angles of the area image.
  • the image of the captured target can be input to the electronic device, and the target is The image is matched with the image frame of the target resolution, the area image of the target image is matched with the target image (ie, the first area image), and the position information of the first area image is determined (ie, the first position) information).
  • the second location information in the image frame of the original resolution corresponding to the first location information is obtained by scaling.
  • a scheme of equal-scale downsampling processing can be adopted. If the ratio is 1:1, that is, the size of the image frame of the original resolution is the same as the size of the image frame of the target resolution, the first position information and the first position information The two position information is the same. Therefore, the area image corresponding to the second position information (ie, the second area image) can be determined in the image frame of the original resolution, and the second area image is the area image matching the target image. The second area image is intercepted and displayed to the user. If it is a scheme of equal-scale downsampling processing, and the ratio is not 1:1, that is, the size of the image frame of the original resolution is different from the size of the image frame of the target resolution, the first position information may be calculated first.
  • the relative position of the first position information in the image frame of the target resolution may be determined first, and then, in the image frame of the original resolution, the second area image matching the target image is determined according to the determined relative position.
  • the size of the image frame of the target resolution is 180 ⁇ 240 pixels
  • the size of the image frame of the original resolution is 720 ⁇ 960 pixels
  • the obtained first position information is (30, 40), (120, 40, respectively).
  • the ratio information of the aspect ratio of the image frame of the first position information to the target resolution is (1/6, 1/6), (2). /3, 1/6), (1/6, 3/4), (2/3, 3/4)
  • the scale information is calculated with the size of the image frame of the original resolution to obtain the image of the second region
  • the second location information is (120, 160), (480, 160), (120, 720), (480, 720). Based on the obtained second location information, the second region image is determined, and then the second region image is intercepted and displayed to the user.
  • the above conversion process may also adopt a scheme of unequal proportional down-sampling processing, and the method for determining the second region image according to the position information is the same as the equal-scale downsampling processing method in which the ratio is not 1:1, that is, first A relative position of the first position information in the image frame of the target resolution is determined, and then, in the image frame of the original resolution, the second area image matching the target image is determined according to the determined relative position.
  • the size of the image frame of the target resolution is 180 ⁇ 240 pixels
  • the size of the image frame of the original resolution is 720 ⁇ 720 pixels
  • the first position information of the obtained first area image is (30, respectively).
  • the ratio information (ie, relative position information) of the aspect ratio of the image frame of the first position information to the target resolution can be obtained (1/6, 1/6), (2/3, 1/6), (1/6, 3/4), (2/3, 3/4), the scale information is calculated with the size of the original resolution image frame
  • second position information of the second area image which are (120, 120), (480, 120), (120, 540), (480, 540), respectively.
  • the second region image is determined, and then the second region image is intercepted and displayed to the user.
  • the above-exemplified manner of calculating the proportional information is only a feasible manner, and other calculation manners may be used, for example, first calculating the aspect ratio of the original resolution image frame and the target resolution.
  • the ratio of the aspect ratio of the image frame is calculated according to the ratio, and the position information of the image of the second region is calculated according to the ratio.
  • the matching method for matching the target image with the image frame of the target resolution may be: matching the target image with the preset first region image of the image frame of the target resolution, and calculating the target image. And a first matching degree of the preset first area image, and storing the first matching degree corresponding to the coordinates of the four top angles of the preset first area image, wherein the size of the first area image and the target image are preset The dimensions are exactly the same; then the abscissas of the four apex angles of the preset first area image are all increased by the same preset first increment to obtain a new preset first area image, and the new preset first area is The image is matched with the target image to obtain a second matching degree between the new preset first region image and the target image, and the second matching degree is compared with the stored first matching degree, and a relatively large matching degree between the two is obtained.
  • the coordinates of the four apex angles of the preset first area image corresponding to the matching degree are stored, and the smaller matching degree and the coordinates of the four apex angles of the preset first area image corresponding to the matching degree are deleted.
  • the abscissa of the four apex angles of the preset first area image is again increased by the same preset first increment to obtain a new preset first area image, and the new preset first area image is Matching with the target image to obtain a third matching degree between the new preset first area image and the target image, comparing the third matching degree with the stored matching degree, and the relatively large matching degree between the two and the matching
  • the coordinates of the four apex angles of the preset first area image are stored, and the smaller matching degree and the coordinates of the four apex angles of the preset first area image corresponding to the matching degree are deleted...
  • the abscissa of the two vertices of the preset first region image reaches the maximum value or the minimum value, that is, when the image of the first region image has reached the target resolution of the image frame.
  • four The ordinate of the vertex is increased by a preset second increment to obtain a new preset first area image, and the new preset first area image is matched with the target image, and the target image and the preset first area image are calculated.
  • the abscissas of the four apex angles of the preset first area image are all reduced by the same preset first increment to obtain a new preset first area image, and the new preset first area image and the target image are obtained.
  • Performing matching obtaining a fifth matching degree between the new preset first area image and the target image, comparing the fifth matching degree with the stored matching degree, and comparing the relatively large matching degree of the two with the matching degree.
  • the coordinates of the four apex angles of the preset first area image are stored, and the smaller matching degree and the coordinates of the four apex angles of the preset first area image corresponding to the matching degree are deleted.
  • the maximum matching degree of the target image and all the preset first area images, and the coordinates of the four apex angles of the preset first area image corresponding to the matching degree are obtained, and the coordinates are the first Location information of the area image.
  • the method of matching the target image with the image frame of the target resolution, determining the first region image in which the image frame of the target resolution matches the target image, and determining the position information of the image of the first region is further
  • the image recognition model can be trained by the sample to obtain the trained image recognition model, and the image frame of the target image and the target resolution is input into the image recognition model, and the image frame of the target resolution and the target image can be obtained.
  • the image recognition model is used to determine the position information of the image of the first region, and the determined speed is faster and the efficiency is higher.
  • the method for matching the target image with the image frame of the target resolution, determining the first region image that matches the target image and the target image, and determining the location information of the first region image is further implemented.
  • this application is not mentioned in this example, as long as the image of the target image and the target resolution is matched, the first region image of the image frame of the target resolution and the target image is determined, and the first The location information of an area image can be used. What method is used in this application is not limited in this application.
  • the information integrity score processing may be performed on the second region image before the second region image is intercepted, as shown in FIG. 3, and corresponding processing is performed.
  • the method may be as follows: determining, according to the second location information, a second region image that matches the target image in the image frame of the original resolution; determining an information integrity score of the second region image; when the information integrity score is greater than a preset rating At the threshold, the second area image is captured and displayed.
  • the position information of the first area image is the coordinate information of the four vertices of the first area image, and then the four vertices of the second area image are determined at the original resolution image frame according to the four coordinate information, and then according to the four vertices Determine the second area image.
  • the information integrity score processing is performed on the at least one second region image respectively, and the information integrity score of each second region image is obtained, and the information integrity score is higher, indicating the information integrity score.
  • the display information of the corresponding second area image is more comprehensive. Therefore, the information integrity score is compared with the preset score threshold. If the information integrity score of the image of the second area is greater than the preset score threshold, the location information is intercepted according to the location information.
  • the second area image is displayed to the user, as shown in FIG. In this way, the image of the second area displayed to the user is more complete and contains more useful information, so that the information acquired by the user is more comprehensive.
  • the step of the information integrity scoring process may be based on: determining an information integrity score of the second region image based on the resolution of the region image corresponding to the location information, the photographic target integrity, and the photographic target capturing angle.
  • one or more of the sharpness of the regional image, the completeness of the shooting target, and the shooting angle of the shooting target may be used as the basis of the information integrity scoring process when performing the information integrity scoring process on the regional image.
  • the definition refers to the clarity of each detail and its boundary on the image. The higher the definition, the higher the resolution score of the image quality score, the lower the definition, and the highest the clarity score of the information integrity score.
  • the completeness of the shooting target refers to the completeness of each component of the area image as the shooting target. For example, if the shooting target is a dog, it is determined whether the image of the area contains all the body parts of the dog, such as the head, the ears, the limbs, and the tail. The more the body parts are included, the more complete the shooting target of the image of the area; for example, if the shooting target is a human face, it is determined whether the image of the area contains all the organ parts of the face, such as hair and ears. The organ parts of the two eyes, the mouth, the chin, and the like, and the more the organ parts are included, the higher the degree of completeness of the image of the area. The higher the completeness of the target, the higher the score of the completeness of the target of the information integrity score, the lower the completeness of the target, and the lower the score of the completeness of the target of the information integrity score.
  • the shooting angle of the shooting target is a basis for the shooting target to be a human face.
  • the shooting face is a positive face
  • the information that the user can obtain is more comprehensive, and the angle of the face of the shooting is larger. The less information the user can get. Therefore, it can be set that when the photographed face is a positive face, the photographing target photographing angle is 0 degree, and the photographing target photographing angle score of the information integrity score is the highest; the angle of the side turn is larger, the photographing target photographing angle is larger, the information The lower the score of the shooting target angle of the integrity score.
  • the above-mentioned definition, the completeness of the shooting target, and the shooting angle of the shooting target are only examples of the rating basis of the present application. According to the actual application, other information can be used to perform information integrity scoring processing on the regional image. For example, the contrast of the area image, etc., is not limited in this application.
  • the information integrity score processing is performed on the regional image according to the preset scoring basis, and finally the information integrity score of the regional image is obtained.
  • the video data to be analyzed is obtained; the video data is hardware decoded to obtain an image frame of a original resolution; and the image frame of the original resolution is subjected to hardware downsampling to obtain a preset target. a resolution image frame, wherein the target resolution is smaller than the original resolution; a video analysis process is performed based on the image frame of the target resolution.
  • the processing steps can be simplified, thereby improving the processing efficiency of intelligent analysis of the video.
  • the embodiment of the present application further provides a device for video analysis, which may be the electronic device in the above embodiment.
  • the device includes: an obtaining module 510, and a first processing module. 520.
  • the obtaining module 510 is configured to acquire video data to be analyzed.
  • the first processing module 520 is configured to perform hardware decoding on the video data to obtain an image frame of original resolution
  • the second processing module 530 is configured to perform hardware downsampling processing on the image frame of the original resolution to obtain an image frame of a preset target resolution, where the target resolution is smaller than the original resolution;
  • the analysis module 540 is configured to perform video analysis processing based on the image frame of the target resolution.
  • the analyzing module 540 is configured to:
  • Video analysis processing is performed based on the image frame of the target resolution and the image frame of the original resolution.
  • the analyzing module 540 is configured to:
  • a second region image that matches the target image is intercepted and displayed.
  • the analyzing module 540 is configured to:
  • the analyzing module 540 is configured to:
  • the second region image is intercepted and displayed.
  • the analyzing module 540 is configured to:
  • the video data to be analyzed is obtained; the video data is hardware decoded to obtain an image frame of a original resolution; and the image frame of the original resolution is subjected to hardware downsampling to obtain a preset target.
  • the device for video analysis provided by the foregoing embodiment is only illustrated by the division of each functional module. In actual applications, the function distribution may be completed by different functional modules as needed. The internal structure of the electronic device is divided into different functional modules to complete all or part of the functions described above.
  • the apparatus for video analysis provided by the foregoing embodiment is the same as the method embodiment of the video analysis, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • FIG. 6 is a structural block diagram of a terminal according to an embodiment of the present application.
  • the terminal 600 can be a portable mobile terminal, such as a smart phone or a tablet computer.
  • Terminal 600 may also be referred to as a user device, a portable terminal, or the like.
  • the terminal 600 includes a processor 601 and a memory 602.
  • Processor 601 can include one or more processing cores, such as a 4-core processor, a 6-core processor, and the like.
  • the processor 601 may be configured by at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). achieve.
  • the processor 601 may also include a main processor and a coprocessor.
  • the main processor is a processor for processing data in an awake state, which is also called a CPU (Central Processing Unit); the coprocessor is A low-power processor for processing data in standby.
  • the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and rendering of the content that the display needs to display.
  • the processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
  • AI Artificial Intelligence
  • Memory 602 can include one or more computer readable storage media that can be tangible and non-transitory. Memory 602 can also include high speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer readable storage medium in memory 602 is for storing at least one instruction for a method performed by processor 601 to implement the video analysis provided in this application.
  • the terminal 600 optionally further includes: a peripheral device interface 603 and at least one peripheral device.
  • the peripheral device includes at least one of a radio frequency circuit 604, a touch display screen 605, a camera 606, an audio circuit 607, a positioning component 608, and a power source 609.
  • the peripheral device interface 603 can be used to connect at least one peripheral device associated with an I/O (Input/Output) to the processor 601 and the memory 602.
  • processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any of processor 601, memory 602, and peripheral interface 603 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the RF circuit 604 is configured to receive and transmit an RF (Radio Frequency) signal, also referred to as an electromagnetic signal.
  • Radio frequency circuit 604 communicates with the communication network and other communication devices via electromagnetic signals.
  • the RF circuit 604 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal.
  • the radio frequency circuit 604 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like.
  • Radio frequency circuitry 604 can communicate with other terminals via at least one wireless communication protocol.
  • the wireless communication protocols include, but are not limited to, the World Wide Web, a metropolitan area network, an intranet, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks.
  • the radio frequency circuit 604 may also include a circuit related to NFC (Near Field Communication), which is not limited in this application.
  • the touch display 605 is used to display a UI (User Interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • Touch display 605 also has the ability to capture touch signals over the surface or surface of touch display 605.
  • the touch signal can be input to the processor 601 as a control signal for processing.
  • Touch display 605 is used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
  • the touch display screen 605 may be one, and the front panel of the terminal 600 is set; in other embodiments, the touch display screen 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design.
  • the touch display 605 can be a flexible display disposed on a curved surface or a folded surface of the terminal 600. Even the touch display screen 605 can be set to a non-rectangular irregular pattern, that is, a profiled screen.
  • the touch display screen 605 can be prepared by using an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).
  • Camera component 606 is used to capture images or video.
  • camera assembly 606 includes a front camera and a rear camera.
  • the front camera is used for video calls or self-timer
  • the rear camera is used for photo or video capture.
  • the rear camera is at least two, which are respectively a main camera, a depth of field camera, and a wide-angle camera, so that the main camera and the depth of field camera are combined to realize the background blur function, and the main camera and the wide-angle camera are integrated.
  • Panoramic shooting and VR (Virtual Reality) shooting can also include a flash.
  • the flash can be a monochrome temperature flash or a two-color temperature flash.
  • the two-color temperature flash is a combination of a warm flash and a cool flash that can be used for light compensation at different color temperatures.
  • Audio circuit 607 is used to provide an audio interface between the user and terminal 600.
  • the audio circuit 607 can include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals for processing to the processor 601 for processing, or input to the radio frequency circuit 604 for voice communication.
  • the microphones may be multiple, and are respectively disposed at different parts of the terminal 600.
  • the microphone can also be an array microphone or an omnidirectional acquisition microphone.
  • the speaker is then used to convert electrical signals from the processor 601 or the RF circuit 604 into sound waves.
  • the speaker can be either a conventional film speaker or a piezoceramic speaker.
  • audio circuit 607 can also include a headphone jack.
  • the location component 608 is used to locate the current geographic location of the terminal 600 to implement navigation or LBS (Location Based Service).
  • the positioning component 608 can be a positioning component based on a US-based GPS (Global Positioning System), a Chinese Beidou system, or a Russian Galileo system.
  • Power source 609 is used to power various components in terminal 600.
  • the power source 609 can be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
  • the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
  • a wired rechargeable battery is a battery that is charged by a wired line
  • a wireless rechargeable battery is a battery that is charged by a wireless coil.
  • the rechargeable battery can also be used to support fast charging technology.
  • terminal 600 also includes one or more sensors 610.
  • the one or more sensors 610 include, but are not limited to, an acceleration sensor 611, a gyro sensor 612, a pressure sensor 613, a fingerprint sensor 614, an optical sensor 615, and a proximity sensor 616.
  • the acceleration sensor 611 can detect the magnitude of the acceleration on the three coordinate axes of the coordinate system established by the terminal 600.
  • the acceleration sensor 611 can be used to detect components of gravity acceleration on three coordinate axes.
  • the processor 601 can control the touch display screen 605 to display the user interface in a landscape view or a portrait view according to the gravity acceleration signal collected by the acceleration sensor 611.
  • the acceleration sensor 611 can also be used for the acquisition of game or user motion data.
  • the gyro sensor 612 can detect the body direction and the rotation angle of the terminal 600, and the gyro sensor 612 can cooperate with the acceleration sensor 611 to collect the 3D motion of the user to the terminal 600. Based on the data collected by the gyro sensor 612, the processor 601 can implement functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
  • functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
  • the pressure sensor 613 may be disposed at a side border of the terminal 600 and/or a lower layer of the touch display screen 605.
  • the pressure sensor 613 When the pressure sensor 613 is disposed at the side frame of the terminal 600, the user's holding signal to the terminal 600 can be detected, and the left and right hand recognition or shortcut operation is performed according to the holding signal.
  • the operability control on the UI interface can be controlled according to the user's pressure operation on the touch display screen 605.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 614 is configured to collect a fingerprint of the user to identify the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 601 authorizes the user to perform related sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying and changing settings, and the like.
  • the fingerprint sensor 614 can be disposed on the front, back, or side of the terminal 600. When the physical button or vendor logo is provided on the terminal 600, the fingerprint sensor 614 can be integrated with the physical button or the manufacturer logo.
  • Optical sensor 615 is used to collect ambient light intensity.
  • the processor 601 can control the display brightness of the touch display 605 according to the ambient light intensity acquired by the optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 605 is raised; when the ambient light intensity is low, the display brightness of the touch display screen 605 is lowered.
  • the processor 601 can also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.
  • Proximity sensor 616 also referred to as a distance sensor, is typically disposed on the front side of terminal 600. Proximity sensor 616 is used to capture the distance between the user and the front of terminal 600. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front side of the terminal 600 is gradually decreasing, the touch screen display 605 is controlled by the processor 601 to switch from the bright screen state to the screen state; when the proximity sensor 616 detects When the distance between the user and the front side of the terminal 600 gradually becomes larger, the processor 601 controls the touch display screen 605 to switch from the screen state to the bright screen state.
  • FIG. 6 does not constitute a limitation to the terminal 600, and may include more or less components than those illustrated, or may combine some components or adopt different component arrangements.
  • a computer readable storage medium having stored therein at least one instruction, at least one program, code set or instruction set, at least one instruction, at least one program, code set or instruction set A method of being loaded and executed by a processor to implement the recognition action category in the above embodiments.
  • the computer readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • the video data to be analyzed is obtained; the video data is hardware decoded to obtain an image frame of a original resolution; and the image frame of the original resolution is subjected to hardware downsampling to obtain a preset target.
  • FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present application.
  • the server 700 may generate a large difference due to different configurations or performances, and may include one or more central processing units (CPUs) 701 and one. Or more than one memory 702, wherein the memory 702 stores at least one instruction that is loaded and executed by the processor 701 to implement the method steps of video analysis described below:
  • CPUs central processing units
  • memory 702 stores at least one instruction that is loaded and executed by the processor 701 to implement the method steps of video analysis described below:
  • Video analysis processing is performed based on the image frame of the target resolution.
  • the at least one instruction is loaded and executed by the processor 701 to implement the following method steps:
  • Video analysis processing is performed based on the image frame of the target resolution and the image frame of the original resolution.
  • the at least one instruction is loaded and executed by the processor 701 to implement the following method steps:
  • a second region image that matches the target image is intercepted and displayed.
  • the at least one instruction is loaded and executed by the processor 701 to implement the following method steps:
  • a second region image that matches the target image is determined based on the relative position.
  • the at least one instruction is loaded and executed by the processor 701 to implement the following method steps:
  • the second region image is intercepted and displayed.
  • the at least one instruction is loaded and executed by the processor 701 to implement the following method steps:
  • the video data to be analyzed is obtained; the video data is hardware decoded to obtain an image frame of a original resolution; and the image frame of the original resolution is subjected to hardware downsampling to obtain a preset target.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请公开了一种视频分析的方法和装置,属于数字监控领域。所述方法包括:获取待分析的视频数据;对所述视频数据进行硬件解码,得到原始分辨率的图像帧;对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率;基于所述目标分辨率的图像帧进行视频分析处理。采用本申请,可以提高对视频进行智能分析的处理效率。

Description

视频分析的方法和装置
本申请要求于2018年05月17日提交的申请号为201810473779.7、发明名称为“视频分析的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频监控技术领域,特别涉及一种视频分析的方法和装置。
背景技术
随着视频监控技术的发展,监控视频的智能分析得到了广泛应用,包括运动量检测、人脸分析、丢包检测、人员密集检测等。
目前,视频的智能分析方法可以是,将原始视频硬解码得到原始分辨率的YUV(一种彩色图像数据编码方法)格式帧,然后,将原始分辨率的YUV格式帧,编码为低分辨率的视频,再通过CPU(Central Processing Unit,中央处理器)软解码,得到低分辨率的YUV格式帧,然后可以基于低分辨率的YUV格式帧进行视频分析。
在实现本申请的过程中,申请人发现相关技术至少存在以下问题:
上述视频分析过程中,过程较为复杂,导致对视频进行智能分析的分析效率较低。
发明内容
本申请实施例提供了一种视频分析的方法和装置。所述技术方案如下:
第一方面,提供了一种视频分析的方法,所述方法包括:
获取待分析的视频数据;
对所述视频数据进行硬件解码,得到原始分辨率的图像帧;
对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率;
基于所述目标分辨率的图像帧进行视频分析处理。
可选地,所述基于所述目标分辨率的图像帧进行视频分析处理,包括:
基于所述目标分辨率的图像帧和所述原始分辨率的图像帧进行视频分析处理。
可选地,所述基于所述目标分辨率的图像帧和所述原始分辨率的图像帧进行视频分析处理,包括:
基于获取的目标图像,在所述目标分辨率的图像帧中,确定与所述目标图像相匹配的第一区域图像的第一位置信息;
基于所述目标分辨率和所述原始分辨率,确定与所述目标分辨率的图像帧中的第一位置信息相对应的所述原始分辨率的图像帧中的第二位置信息;
基于所述第二位置信息,在所述原始分辨率的图像帧中,截取并显示与所述目标图像相匹配的第二区域图像。
可选地,所述基于所述目标分辨率和所述原始分辨率,确定与所述目标分辨率的图像帧中的第一位置信息相对应的所述原始分辨率的图像帧中的第二位置信息,包括:
确定所述第一位置信息在所述目标分辨率的图像帧中的相对位置;
在所述原始分辨率的图像帧中,根据所述相对位置,确定与所述目标图像相匹配的第二区域图像。
可选地,所述基于所述第二位置信息,在所述原始分辨率的图像帧中,截取并显示与所述目标图像相匹配的第二区域图像,包括:
基于所述第二位置信息,在所述原始分辨率的图像帧中,确定与所述目标图像相匹配的第二区域图像;
确定所述第二区域图像的信息完整性评分;
当所述信息完整性评分大于预设评分阈值时,截取并显示所述第二区域图像。
可选地,所述确定所述第二区域图像的信息完整性评分,包括:
基于所述位置信息对应的区域图像的清晰度、拍摄目标完整度和拍摄目标拍摄角度,确定所述第二区域图像的信息完整性评分。
第二方面,提供了一种视频分析的装置,所述装置包括:
获取模块,用于获取待分析的视频数据;
第一处理模块,用于对所述视频数据进行硬件解码,得到原始分辨率的图像帧;
第二处理模块,用于对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率;
分析模块,用于基于所述目标分辨率的图像帧进行视频分析处理。
可选地,所述分析模块,用于:
基于所述目标分辨率的图像帧和所述原始分辨率的图像帧进行视频分析处理。
可选地,所述分析模块,用于:
基于获取的目标图像,在所述目标分辨率的图像帧中,确定与所述目标图像相匹配的第一区域图像的第一位置信息;
基于所述目标分辨率和所述原始分辨率,确定与所述目标分辨率的图像帧中的第一位置信息相对应的所述原始分辨率的图像帧中的第二位置信息;
基于所述第二位置信息,在所述原始分辨率的图像帧中,截取并显示与所述目标图像相匹配的第二区域图像。
可选地,所述分析模块,用于:
确定所述第一位置信息在所述目标分辨率的图像帧中的相对位置;
在所述原始分辨率的图像帧中,根据所述相对位置,确定与所述目标图像相匹配的第二区域图像。
可选地,所述分析模块,用于:
基于所述第二位置信息,在所述原始分辨率的图像帧中,确定与所述目标图像相匹配的第二区域图像;
确定所述第二区域图像的信息完整性评分;
当所述信息完整性评分大于预设评分阈值时,截取并显示所述第二区域图像。
可选地,所述分析模块,用于:
基于所述位置信息对应的区域图像的清晰度、拍摄目标完整度和拍摄目标拍摄角度,确定所述第二区域图像的信息完整性评分。
第三方面,提供了一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述第一方面所述的视频分析的方法。
第四方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述第一方面所述的视频分析的方法。
本申请实施例提供的技术方案带来的有益效果至少包括:
本申请实施例中,获取待分析的视频数据;对视频数据进行硬件解码,得到原始分辨率的图像帧;对原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,目标分辨率小于原始分辨率;基于目标分辨率的图像帧进行视频分析处理。这样,在视频分析过程中,可以简化处理步骤,从而,可以提高对视频进行智能分析的处理效率。而且,无需占用CPU资源进行解码处理以及降采样处理,减少了对CPU的资源占用。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种视频分析的方法的流程图;
图2是本申请实施例提供的一种视频分析的方法的流程图;
图3是本申请实施例提供的一种视频分析的方法的流程图;
图4是本申请实施例提供的一种视频分析的方法的界面示意图;
图5是本申请实施例提供的一种视频分析的装置的结构示意图;
图6是本申请实施例提供的一种终端结构示意图;
图7是本申请实施例提供的一种服务器结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请实施例提供了一种视频分析的方法,该方法可以由电子设备实现。其中,电子设备可以是终端或服务器。
终端可以包括处理器、存储器等部件。处理器,可以为CPU(Central Processing Unit,中央处理单元)和GPU(Graphics Processing Unit,图形处理器)等,可以用于对视频数据进行硬件解码、对原始分辨率的图像帧进行硬件降采样处理、基于目标分辨率的图像帧进行视频分析处理等处理。存储器,可以为RAM(Random Access Memory,随机存取存储器),Flash(闪存)等,可以用于存储接收到的数据、处理过程所需的数据、处理过程中生成的数据等,如视频数据、原始分辨率的图像帧、目标分辨率的图像帧等。终端还可以包括屏幕、收发器、图像检测部件、音频输出部件和音频输入部件等。屏幕可以用于显示截取到的区域图像等。收发器,可以用于与其它设备进行数据传输,可以包括天线、匹配电路、调制解调器等。图像检测部件可以是摄像头等。音频输出部件可以是音箱、耳机等。音频输入部件可以是麦克风等。
服务器可以包括处理器、存储器、收发器等部件。处理器,可以为CPU(Central Processing Unit,中央处理单元)和GPU(Graphics Processing Unit,图形处理器)等,可以用于对视频数据进行硬件解码、对原始分辨率的图像帧进行硬件降采样处理、基于目标分辨率的图像帧进行视频分析处理等处理。存储器,可以为RAM(Random Access Memory,随机存取存储器),Flash(闪存)等,可以用于存储接收到的数据、处理过程所需的数据、处理过程中生成的数据等,如视频数据、原始分辨率的图像帧、目标分辨率的图像帧等。收发器,可以用于与终端或其它服务器(如定位服务器)进行数据传输,例如,向终端发送第二区域图像,收发器可以包括天线、匹配电路、调制解调器等。
如图1所示,该方法的处理流程可以包括如下的步骤:
在步骤101中,获取待分析的视频数据。
在实施中,当用户想要对一段视频数据进行视频分析时,可以先获取这段 视频数据。获取的视频数据可以是监控视频中的一段视频数据,例如,用户想要通过监控视频查找某个拍摄目标等。获取的视频数据也可以是电影视频中的一个视频片段,例如,用户想要对电影中的一个视频片段进行特效处理等。除此之外,获取的视频数据还可以是其它的获取方式,本申请对此不做限定。
在步骤102中,对视频数据进行硬件解码,得到原始分辨率的图像帧。
其中,解码是用特定方法把数字编码还原成它所代表的内容或将电脉冲信号转换成它所代表的信息、数据等的过程,硬件解码是一种通过硬件,如GPU(Graphics Processing Unit,图形处理器),解码视频流的一种解码方式。
在实施中,电子设备获取视频数据后,通过GPU对视频数据进行硬件解码,得到原始分辨率的图像帧。其中,硬件解码采用的方法即为现有的硬件解码方法,本申请在此不做赘述。
在步骤103中,对原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,目标分辨率小于原始分辨率。
其中,降采样是使采样点数减少的一种处理方式,经过降采样处理的图像的质量会降低,而对降采样处理过的图像进行图像处理可以降低处理的计算量。举例来说,对于一幅图像来说,如果降采样系数为k,即在原图中每行每列每隔k个点取一个点组成一幅图像。硬件降采样即为通过硬件实现的一种降采样方式。
在实施中,电子设备通过硬件解码,得到原始分辨率的图像帧后,为了可以在后续某些处理中减少计算量,可以对原始分辨率的图像帧进行降采样处理,为了提高处理效率,减轻CPU的负担,可以按照预设的降采样率,在硬件上对原始分辨率的图像帧直接进行降采样处理,得到预设的目标分辨率的图像帧,该目标分辨率一定小于原始分辨率。这样无需将原始分辨率的图像帧拷贝到CPU,既可以减轻数据多次拷贝造成的效率低的问题,又可以减轻CPU的处理负担。
需要说明的是,上述降采样处理,可以是等比例的降采样处理,即得到的目标分辨率的图像帧与原始分辨率的图像帧的宽高比例相同;上述降采样处理也可以是不等比例的降采样处理,即得到的目标分辨率的图像帧与原始分辨率的图像帧的宽高比例不同,本申请对此不做限制。
在步骤104中,基于目标分辨率的图像帧进行视频分析处理。
可选地,根据不同的视频分析算法,在上述得到目标分辨率的图像帧后,可以分别基于目标分辨率的图像帧和原始分辨率的图像帧进行不同的视频分析处理。
在实施中,由于目标分辨率小于原始分辨率,因此目标分辨率的图像帧的清晰度低于原始分辨率的图像帧,但基于目标分辨率的图像帧进行运算的计算量小于基于原始分辨率的图像帧进行运算的计算量。因此,对速度和效率要求更高的算法模块,可以基于目标分辨率的图像帧进行算法运算;对图像质量要求更高的算法模块,可以基于原始分辨率的图像帧进行算法运算,方案的流程可以如图2所示。
可选地,以人脸识别并抓拍的算法为例,可以根据目标分辨率的图像帧和原始分辨率的图像帧识别人脸,并对识别出的人脸图像进行截取以及显示,如图3所示,上述步骤的处理可以是:基于获取的目标图像,在目标分辨率的图像帧中,确定与目标图像相匹配的第一区域图像的第一位置信息;基于目标分辨率和原始分辨率,确定与目标分辨率的图像帧中的第一位置信息相对应的原始分辨率的图像帧中的第二位置信息;基于该第二位置信息,在原始分辨率的图像帧中,截取并显示与目标图像相匹配的第二区域图像。
其中,位置信息可以是区域图像的四个顶角的坐标信息。
在实施中,以人脸识别并抓拍的算法为例,当用户想要在一段视频数据中查找某个人(即抓拍目标),可以将抓拍目标的图像(目标图像)输入到电子设备,将目标图像与目标分辨率的图像帧分别进行匹配,确定目标分辨率的图像帧与目标图像相匹配的区域图像(即第一区域图像),并确定该第一区域图像的位置信息(即第一位置信息)。
然后,根据目标分辨率、原始分辨率以及第一位置信息,通过换算得到第一位置信息对应的在原始分辨率的图像帧中的第二位置信息。
对于换算过程,可以采用等比例的降采样处理方式的方案,如果比例为1:1,即原始分辨率的图像帧的尺寸与目标分辨率的图像帧的尺寸相同时,第一位置信息与第二位置信息相同,因此,可以在原始分辨率的图像帧中,确定该第二位置信息对应的区域图像(即第二区域图像),该第二区域图像即为与目标图像相匹配的区域图像,截取该第二区域图像并显示给用户。如果是等比例的降采样处理方式的方案,且比例不为1:1,即原始分辨率的图像帧的尺寸与目标分辨 率的图像帧的尺寸不相同时,可以先计算第一位置信息与目标分辨率的图像帧的宽高比例的比例信息,通过该比例信息,在原始分辨率的图像帧中,确定与目标图像相匹配的区域图像(即第二区域图像)。例如,可以先确定第一位置信息在目标分辨率的图像帧中的相对位置,然后,在原始分辨率的图像帧中,根据确定出的相对位置确定与目标图像相匹配的第二区域图像。例如,目标分辨率的图像帧的尺寸为180×240个像素,原始分辨率的图像帧的尺寸为720×960个像素,得到的第一位置信息分别是(30,40)、(120,40)、(30,180)、(120,180),则可以得到第一位置信息与目标分辨率的图像帧的宽高比例的比例信息(即相对位置信息)为(1/6,1/6)、(2/3,1/6)、(1/6,3/4)、(2/3,3/4),将该比例信息与原始分辨率的图像帧的尺寸进行运算,得到第二区域图像的第二位置信息,分别为(120,160)、(480,160)、(120,720)、(480,720)。根据得到的第二位置信息,确定第二区域图像,然后截取第二区域图像显示给用户。
上述换算过程也可以采用不等比例的降采样处理方式的方案,根据位置信息确定第二区域图像的方法,与上述比例不为1:1的等比例的降采样处理方式相同,也即,先确定第一位置信息在目标分辨率的图像帧中的相对位置,然后,在原始分辨率的图像帧中,根据确定出的相对位置确定与目标图像相匹配的第二区域图像。举例来说,目标分辨率的图像帧的尺寸为180×240个像素,原始分辨率的图像帧的尺寸为720×720个像素,得到的第一区域图像的第一位置信息分别是(30,40)、(120,40)、(30,180)、(120,180),则可以得到第一位置信息与目标分辨率的图像帧的宽高比例的比例信息(即相对位置信息)为(1/6,1/6)、(2/3,1/6)、(1/6,3/4)、(2/3,3/4),将该比例信息与原始分辨率的图像帧的尺寸进行运算,得到第二区域图像的第二位置信息,分别为(120,120)、(480,120)、(120,540)、(480,540)。根据得到的第二位置信息,确定第二区域图像,然后截取第二区域图像显示给用户。
需要说明的是,上述例举的计算比例信息的方式只是一种可行的方式,除此之外还可以有其他计算方式,例如先计算原始分辨率的图像帧的宽高比例与目标分辨率的图像帧的宽高比例的比例,再根据该比例计算第二区域图像的位置信息,本申请对此不做限制。
可选地,上述提到的将目标图像与目标分辨率的图像帧进行匹配的匹配方法可以是,将目标图像与目标分辨率的图像帧的预设第一区域图像进行匹配, 计算得到目标图像与预设第一区域图像的第一匹配度,并将该第一匹配度与预设第一区域图像的四个顶角的坐标对应存储,其中,预设第一区域图像的尺寸与目标图像的尺寸完全相同;然后将预设第一区域图像的四个顶角的横坐标均增加相同的预设第一增量,得到新的预设第一区域图像,将新的预设第一区域图像与目标图像进行匹配,得到新的预设第一区域图像与目标图像的第二匹配度,将第二匹配度与存储的第一匹配度进行对比,将二者中相对较大的匹配度以及该匹配度对应的预设第一区域图像的四个顶角的坐标进行存储,将较小的匹配度以及该匹配度对应的预设第一区域图像的四个顶角的坐标删除;然后,再次将预设第一区域图像的四个顶角的横坐标均增加相同的预设第一增量,得到新的预设第一区域图像,将新的预设第一区域图像与目标图像进行匹配,得到新的预设第一区域图像与目标图像的第三匹配度,将第三匹配度与存储的匹配度进行对比,将二者中相对较大的匹配度以及该匹配度对应的预设第一区域图像的四个顶角的坐标进行存储,将较小的匹配度以及该匹配度对应的预设第一区域图像的四个顶角的坐标删除……以此类推,一直到预设第一区域图像的四个顶点中的两个顶点的横坐标达到最大值或最小值,即预设第一区域图像已经达到目标分辨率的图像帧的边缘时,将四个顶点的纵坐标增加预设第二增量,得到新的预设第一区域图像,将新的预设第一区域图像与目标图像进行匹配,计算得到目标图像与预设第一区域图像的第四匹配度,将第四匹配度与存储的匹配度进行对比,将二者中相对较大的匹配度以及该匹配度对应的预设第一区域图像的四个顶角的坐标进行存储,将较小的匹配度以及该匹配度对应的预设第一区域图像的四个顶角的坐标删除。这样,存储的匹配度始终为最大的匹配度,存储的四个顶角的坐标始终为匹配度最大的区域图像对应的坐标信息。
然后,将预设第一区域图像的四个顶角的横坐标均减少相同的预设第一增量,得到新的预设第一区域图像,将新的预设第一区域图像与目标图像进行匹配,得到新的预设第一区域图像与目标图像的第五匹配度,将第五匹配度与存储的匹配度进行对比,将二者中相对较大的匹配度以及该匹配度对应的预设第一区域图像的四个顶角的坐标进行存储,将较小的匹配度以及该匹配度对应的预设第一区域图像的四个顶角的坐标删除。以此类推,最后可以得到目标图像与所有预设第一区域图像的最大匹配度,以及该匹配度对应的预设第一区域图像的四个顶角的坐标,该坐标即为上述的第一区域图像的位置信息。
除了上述的方法外,将目标图像与目标分辨率的图像帧进行匹配,确定目标分辨率的图像帧与目标图像相匹配的第一区域图像,并确定该第一区域图像的位置信息的方法还可以是,通过样本对图像识别模型进行训练,得到训练好的图像识别模型,将目标图像与目标分辨率的图像帧输入到图像识别模型中,就可以得到目标分辨率的图像帧与目标图像相匹配的第一区域图像以及该第一区域图像的位置信息。采用图像识别模型确定第一区域图像的位置信息,确定的速度较快,效率较高。
需要说明的是,实现将目标图像与目标分辨率的图像帧进行匹配,确定目标分辨率的图像帧与目标图像相匹配的第一区域图像,并确定该第一区域图像的位置信息的方法还有很多,本申请不在此一一例举,只要能实现将目标图像与目标分辨率的图像帧进行匹配,确定目标分辨率的图像帧与目标图像相匹配的第一区域图像,并确定该第一区域图像的位置信息即可,采用什么样的方法本申请对此不做限制。
可选地,为了用户可以从显示的图像中提取到更多的有用信息,在截取第二区域图像之前,可以对第二区域图像进行信息完整性评分处理,如图3所示,相应的处理可以如下:基于第二位置信息,在原始分辨率的图像帧中,确定与目标图像相匹配的第二区域图像;确定第二区域图像的信息完整性评分;当信息完整性评分大于预设评分阈值时,截取并显示第二区域图像。
在实施中,通过上述步骤确定第一区域图像的位置信息后,根据第一区域图像的第一位置信息、原始分辨率和目标分辨率,确定第二区域图像的第二位置信息,进而确定第二区域图像。例如,第一区域图像的位置信息是第一区域图像的四个顶点的坐标信息,则根据四个坐标信息在原始分辨率的图像帧确定第二区域图像的四个顶点,然后根据四个顶点确定第二区域图像。
确定至少一个第二区域图像后,对至少一个第二区域图像分别进行信息完整性评分处理,得到每个第二区域图像的信息完整性评分,信息完整性评分越高,表示该信息完整性评分对应的第二区域图像的显示信息更全面,因此,将信息完整性评分与预设评分阈值进行比较,如果某第二区域图像的信息完整性评分大于预设评分阈值,则根据位置信息截取该第二区域图像,并将该第二区域图像显示给用户,如图4所示。这样,显示给用户的第二区域图像更完整,包含的有用信息更多,使得用户获取的信息更全面。
可选地,上述信息完整性评分处理的步骤的依据可以是:基于位置信息对应的区域图像的清晰度、拍摄目标完整度和拍摄目标拍摄角度,确定第二区域图像的信息完整性评分。
在实施中,在对区域图像进行信息完整性评分处理时,可以将区域图像的清晰度、拍摄目标完整度以及拍摄目标拍摄角度中的一个或多个作为信息完整性评分处理的依据。其中,清晰度是指影像上各细部影纹及其边界的清晰程度,清晰度越高,图像质量评分的清晰度评分越高,清晰度越低,信息完整性评分的清晰度评分最高。
拍摄目标完整度是指区域图像作为拍摄目标的各部件的完整度,例如,如果拍摄目标为一只狗,则判断区域图像中是否包含了狗的所有身体部件,如脑袋、耳朵、四肢、尾巴等各身体部件,包含的越多,则该区域图像的拍摄目标完整度越高;再例如,如果拍摄目标为人脸,则判断区域图像中是否包含了人脸的所有器官部件,如头发、耳朵、两只眼睛、嘴巴、下巴等器官部件,包含的器官部件越多,则该区域图像的拍摄目标完整度越高。拍摄目标完整度越高,信息完整性评分的拍摄目标完整度评分越高,拍摄目标完整度越低,信息完整性评分的拍摄目标完整度评分越低。
拍摄目标拍摄角度则是针对拍摄目标为人脸的一个依据,拍摄的人脸为正脸时,截取并显示给用户时,用户可以获取的信息会更加全面,拍摄的人脸侧转的角度越大,用户可以获取的信息越少。因此,可以设定当拍摄的人脸为正脸时,拍摄目标拍摄角度为0度,信息完整性评分的拍摄目标拍摄角度评分最高;侧转的角度越大,拍摄目标拍摄角度越大,信息完整性评分的拍摄目标拍摄角度评分越低。
需要说明的是,上述的清晰度、拍摄目标完整度和拍摄目标拍摄角度仅为本申请对评分依据举的例子,根据实际应用情况,还可以采用其他评分依据对区域图像进行信息完整性评分处理,如区域图像的对比度等,本申请对此不作限定。根据预设的评分依据对区域图像进行信息完整性评分处理,最终得到区域图像的信息完整性评分。
本申请实施例中,获取待分析的视频数据;对所述视频数据进行硬件解码,得到原始分辨率的图像帧;对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率; 基于所述目标分辨率的图像帧进行视频分析处理。这样,在视频分析过程中,可以简化处理步骤,从而,可以提高对视频进行智能分析的处理效率。而且,无需占用CPU资源进行解码处理以及降采样处理,减少了对CPU的资源占用。
基于相同的技术构思,本申请实施例还提供了一种视频分析的装置,该装置可以为上述实施例中的电子设备,如图5所示,该装置包括:获取模块510、第一处理模块520、第二处理模块530和分析模块540。
该获取模块510,被配置为获取待分析的视频数据;
该第一处理模块520,被配置为对所述视频数据进行硬件解码,得到原始分辨率的图像帧;
该第二处理模块530,被配置为对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率;
该分析模块540,被配置为基于所述目标分辨率的图像帧进行视频分析处理。
可选地,所述分析模块540,被配置为:
基于所述目标分辨率的图像帧和所述原始分辨率的图像帧进行视频分析处理。
可选地,所述分析模块540,被配置为:
基于获取的目标图像,在所述目标分辨率的图像帧中,确定与所述目标图像相匹配的第一区域图像的第一位置信息;
基于所述目标分辨率和所述原始分辨率,确定与所述目标分辨率的图像帧中的第一位置信息相对应的所述原始分辨率的图像帧中的第二位置信息;
基于所述第二位置信息,在所述原始分辨率的图像帧中,截取并显示与所述目标图像相匹配的第二区域图像。
可选地,所述分析模块540,被配置为:
确定所述第一区域图像与所述目标分辨率的图像帧的宽高比例的比例信息;
根据所述比例信息,在所述原始分辨率的图像帧中,确定与所述目标图像相匹配的第二区域图像。
可选地,所述分析模块540,被配置为:
基于所述第二位置信息,在所述原始分辨率的图像帧中,确定与所述目标图像相匹配的第二区域图像;
确定所述第二区域图像的信息完整性评分;
当所述信息完整性评分大于预设评分阈值时,截取并显示所述第二区域图像。
可选地,所述分析模块540,被配置为:
基于所述位置信息对应的区域图像的清晰度、拍摄目标完整度和拍摄目标拍摄角度,确定所述第二区域图像的信息完整性评分。
本申请实施例中,获取待分析的视频数据;对所述视频数据进行硬件解码,得到原始分辨率的图像帧;对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率;基于所述目标分辨率的图像帧进行视频分析处理。这样,在视频分析过程中,可以简化处理步骤,从而,可以提高对视频进行智能分析的处理效率。而且,无需占用CPU资源进行解码处理以及降采样处理,减少了对CPU的资源占用。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
需要说明的是:上述实施例提供的视频分析的装置在视频分析时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将电子设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的视频分析的装置与视频分析的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图6是本申请实施例提供的一种终端的结构框图。该终端600可以是便携式移动终端,比如:智能手机、平板电脑。终端600还可能被称为用户设备、便携式终端等其他名称。
通常,终端600包括有:处理器601和存储器602。
处理器601可以包括一个或多个处理核心,比如4核心处理器、6核心处理 器等。处理器601可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器601也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器601可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器601还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器602可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是有形的和非暂态的。存储器602还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器602中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器601所执行以实现本申请中提供的视频分析的方法。
在一些实施例中,终端600还可选包括有:外围设备接口603和至少一个外围设备。具体地,外围设备包括:射频电路604、触摸显示屏605、摄像头606、音频电路607、定位组件608和电源609中的至少一种。
外围设备接口603可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器601和存储器602。在一些实施例中,处理器601、存储器602和外围设备接口603被集成在同一芯片或电路板上;在一些其他实施例中,处理器601、存储器602和外围设备接口603中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路604用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路604通过电磁信号与通信网络以及其他通信设备进行通信。射频电路604将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路604包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路604可以通过至少一种无线通信协议来与其它终端进行通信。 该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路604还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
触摸显示屏605用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。触摸显示屏605还具有采集在触摸显示屏605的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器601进行处理。触摸显示屏605用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,触摸显示屏605可以为一个,设置终端600的前面板;在另一些实施例中,触摸显示屏605可以为至少两个,分别设置在终端600的不同表面或呈折叠设计;在再一些实施例中,触摸显示屏605可以是柔性显示屏,设置在终端600的弯曲表面上或折叠面上。甚至,触摸显示屏605还可以设置成非矩形的不规则图形,也即异形屏。触摸显示屏605可以采用LCD(Liquid Crystal Display,液晶显示器)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件606用于采集图像或视频。可选地,摄像头组件606包括前置摄像头和后置摄像头。通常,前置摄像头用于实现视频通话或自拍,后置摄像头用于实现照片或视频的拍摄。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能,主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能。在一些实施例中,摄像头组件606还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路607用于提供用户和终端600之间的音频接口。音频电路607可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器601进行处理,或者输入至射频电路604以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端600的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器601或射频电路604的电信号转换为声波。扬声器可以是传统的薄膜 扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路607还可以包括耳机插孔。
定位组件608用于定位终端600的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件608可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。
电源609用于为终端600中的各个组件进行供电。电源609可以是交流电、直流电、一次性电池或可充电电池。当电源609包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端600还包括有一个或多个传感器610。该一个或多个传感器610包括但不限于:加速度传感器611、陀螺仪传感器612、压力传感器613、指纹传感器614、光学传感器615以及接近传感器616。
加速度传感器611可以检测以终端600建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器611可以用于检测重力加速度在三个坐标轴上的分量。处理器601可以根据加速度传感器611采集的重力加速度信号,控制触摸显示屏605以横向视图或纵向视图进行用户界面的显示。加速度传感器611还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器612可以检测终端600的机体方向及转动角度,陀螺仪传感器612可以与加速度传感器611协同采集用户对终端600的3D动作。处理器601根据陀螺仪传感器612采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器613可以设置在终端600的侧边框和/或触摸显示屏605的下层。当压力传感器613设置在终端600的侧边框时,可以检测用户对终端600的握持信号,根据该握持信号进行左右手识别或快捷操作。当压力传感器613设置在触摸显示屏605的下层时,可以根据用户对触摸显示屏605的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器614用于采集用户的指纹,以根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器601授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器614可以被设置终端600的正面、背面或侧面。当终端600上设置有物理按键或厂商Logo时,指纹传感器614可以与物理按键或厂商Logo集成在一起。
光学传感器615用于采集环境光强度。在一个实施例中,处理器601可以根据光学传感器615采集的环境光强度,控制触摸显示屏605的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏605的显示亮度;当环境光强度较低时,调低触摸显示屏605的显示亮度。在另一个实施例中,处理器601还可以根据光学传感器615采集的环境光强度,动态调整摄像头组件606的拍摄参数。
接近传感器616,也称距离传感器,通常设置在终端600的正面。接近传感器616用于采集用户与终端600的正面之间的距离。在一个实施例中,当接近传感器616检测到用户与终端600的正面之间的距离逐渐变小时,由处理器601控制触摸显示屏605从亮屏状态切换为息屏状态;当接近传感器616检测到用户与终端600的正面之间的距离逐渐变大时,由处理器601控制触摸显示屏605从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图6中示出的结构并不构成对终端600的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
在示例性实施例中,还提供了一种计算机可读存储介质,存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现上述实施例中的识别动作类别的方法。例如,所述计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本申请实施例中,获取待分析的视频数据;对所述视频数据进行硬件解码,得到原始分辨率的图像帧;对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率;基于所述目标分辨率的图像帧进行视频分析处理。这样,在视频分析过程中, 可以简化处理步骤,从而,可以提高对视频进行智能分析的处理效率。而且,无需占用CPU资源进行解码处理以及降采样处理,减少了对CPU的资源占用。
图7是本申请实施例提供的一种服务器的结构示意图,该服务器700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)701和一个或一个以上的存储器702,其中,所述存储器702中存储有至少一条指令,所述至少一条指令由所述处理器701加载并执行以实现下述视频分析的方法步骤:
获取待分析的视频数据;
对所述视频数据进行硬件解码,得到原始分辨率的图像帧;
对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率;
基于所述目标分辨率的图像帧进行视频分析处理。
可选的,所述至少一条指令由所述处理器701加载并执行以实现下述方法步骤:
基于所述目标分辨率的图像帧和所述原始分辨率的图像帧进行视频分析处理。
可选的,所述至少一条指令由所述处理器701加载并执行以实现下述方法步骤:
基于获取的目标图像,在所述目标分辨率的图像帧中,确定与所述目标图像相匹配的第一区域图像的第一位置信息;
基于所述目标分辨率和所述原始分辨率,确定与所述目标分辨率的图像帧中的第一位置信息相对应的所述原始分辨率的图像帧中的第二位置信息;
基于所述第二位置信息,在所述原始分辨率的图像帧中,截取并显示与所述目标图像相匹配的第二区域图像。
可选的,所述至少一条指令由所述处理器701加载并执行以实现下述方法步骤:
确定所述第一位置信息在所述目标分辨率的图像帧中的相对位置;
在所述原始分辨率的图像帧中,根据所述相对位置,确定与所述目标图像相匹配的第二区域图像。
可选的,所述至少一条指令由所述处理器701加载并执行以实现下述方法步骤:
基于所述第二位置信息,在所述原始分辨率的图像帧中,确定与所述目标图像相匹配的第二区域图像;
确定所述第二区域图像的信息完整性评分;
当所述信息完整性评分大于预设评分阈值时,截取并显示所述第二区域图像。
可选的,所述至少一条指令由所述处理器701加载并执行以实现下述方法步骤:
基于所述位置信息对应的区域图像的清晰度、拍摄目标完整度和拍摄目标拍摄角度,确定所述第二区域图像的信息完整性评分。
本申请实施例中,获取待分析的视频数据;对所述视频数据进行硬件解码,得到原始分辨率的图像帧;对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率;基于所述目标分辨率的图像帧进行视频分析处理。这样,在视频分析过程中,可以简化处理步骤,从而,可以提高对视频进行智能分析的处理效率。而且,无需占用CPU资源进行解码处理以及降采样处理,减少了对CPU的资源占用。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (14)

  1. 一种视频分析的方法,所述方法包括:
    获取待分析的视频数据;
    对所述视频数据进行硬件解码,得到原始分辨率的图像帧;
    对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率;
    基于所述目标分辨率的图像帧进行视频分析处理。
  2. 根据权利要求1所述的方法,所述基于所述目标分辨率的图像帧进行视频分析处理,包括:
    基于所述目标分辨率的图像帧和所述原始分辨率的图像帧进行视频分析处理。
  3. 根据权利要求2所述的方法,所述基于所述目标分辨率的图像帧和所述原始分辨率的图像帧进行视频分析处理,包括:
    基于获取的目标图像,在所述目标分辨率的图像帧中,确定与所述目标图像相匹配的第一区域图像的第一位置信息;
    基于所述目标分辨率和所述原始分辨率,确定与所述目标分辨率的图像帧中的第一位置信息相对应的所述原始分辨率的图像帧中的第二位置信息;
    基于所述第二位置信息,在所述原始分辨率的图像帧中,截取并显示与所述目标图像相匹配的第二区域图像。
  4. 根据权利要求3所述的方法,所述基于所述目标分辨率和所述原始分辨率,确定与所述目标分辨率的图像帧中的第一位置信息相对应的所述原始分辨率的图像帧中的第二位置信息,包括:
    确定所述第一位置信息在所述目标分辨率的图像帧中的相对位置;
    在所述原始分辨率的图像帧中,根据所述相对位置,确定与所述目标图像相匹配的第二区域图像。
  5. 根据权利要求3所述的方法,所述基于所述第二位置信息,在所述原始分辨率的图像帧中,截取并显示与所述目标图像相匹配的第二区域图像,包括:
    基于所述第二位置信息,在所述原始分辨率的图像帧中,确定与所述目标图像相匹配的第二区域图像;
    确定所述第二区域图像的信息完整性评分;
    当所述信息完整性评分大于预设评分阈值时,截取并显示所述第二区域图像。
  6. 根据权利要求5所述的方法,所述确定所述第二区域图像的信息完整性评分,包括:
    基于所述位置信息对应的区域图像的清晰度、拍摄目标完整度和拍摄目标拍摄角度,确定所述第二区域图像的信息完整性评分。
  7. 一种视频分析的装置,所述装置包括:
    获取模块,用于获取待分析的视频数据;
    第一处理模块,用于对所述视频数据进行硬件解码,得到原始分辨率的图像帧;
    第二处理模块,用于对所述原始分辨率的图像帧进行硬件降采样处理,得到预设的目标分辨率的图像帧,其中,所述目标分辨率小于所述原始分辨率;
    分析模块,用于基于所述目标分辨率的图像帧进行视频分析处理。
  8. 根据权利要求7所述的装置,所述分析模块,用于:
    基于所述目标分辨率的图像帧和所述原始分辨率的图像帧进行视频分析处理。
  9. 根据权利要求8所述的装置,所述分析模块,用于:
    基于获取的目标图像,在所述目标分辨率的图像帧中,确定与所述目标图像相匹配的第一区域图像的第一位置信息;
    基于所述目标分辨率和所述原始分辨率,确定与所述目标分辨率的图像帧中的第一位置信息相对应的所述原始分辨率的图像帧中的第二位置信息;
    基于所述第二位置信息,在所述原始分辨率的图像帧中,截取并显示与所述目标图像相匹配的第二区域图像。
  10. 根据权利要求9所述的装置,所述分析模块,用于:
    确定所述第一位置信息在所述目标分辨率的图像帧中的相对位置;
    在所述原始分辨率的图像帧中,根据所述相对位置,确定与所述目标图像相匹配的第二区域图像。
  11. 根据权利要求9所述的装置,所述分析模块,用于:
    基于所述第二位置信息,在所述原始分辨率的图像帧中,确定与所述目标图像相匹配的第二区域图像;
    确定所述第二区域图像的信息完整性评分;
    当所述信息完整性评分大于预设评分阈值时,截取并显示所述第二区域图像。
  12. 根据权利要求11所述的装置,所述分析模块,用于:
    基于所述位置信息对应的区域图像的清晰度、拍摄目标完整度和拍摄目标拍摄角度,确定所述第二区域图像的信息完整性评分。
  13. 一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至6任一所述的视频分析的方法。
  14. 一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至6任一所述的视频分析的方法。
PCT/CN2019/087288 2018-05-17 2019-05-16 视频分析的方法和装置 WO2019219065A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810473779.7 2018-05-17
CN201810473779.7A CN110502954B (zh) 2018-05-17 2018-05-17 视频分析的方法和装置

Publications (1)

Publication Number Publication Date
WO2019219065A1 true WO2019219065A1 (zh) 2019-11-21

Family

ID=68539492

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087288 WO2019219065A1 (zh) 2018-05-17 2019-05-16 视频分析的方法和装置

Country Status (2)

Country Link
CN (1) CN110502954B (zh)
WO (1) WO2019219065A1 (zh)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111726536A (zh) * 2020-07-03 2020-09-29 腾讯科技(深圳)有限公司 视频生成方法、装置、存储介质及计算机设备
CN111753784A (zh) * 2020-06-30 2020-10-09 广州酷狗计算机科技有限公司 视频的特效处理方法、装置、终端及存储介质
CN111859549A (zh) * 2020-07-28 2020-10-30 奇瑞汽车股份有限公司 单一配置整车重量与重心信息的确定方法及相关设备
CN112135190A (zh) * 2020-09-23 2020-12-25 成都市喜爱科技有限公司 视频处理方法、装置、系统、服务器及存储介质
CN112528760A (zh) * 2020-11-24 2021-03-19 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备及介质
CN112541429A (zh) * 2020-12-08 2021-03-23 浙江大华技术股份有限公司 一种智能抓图方法、装置、电子设备以及存储介质
CN112804573A (zh) * 2021-01-08 2021-05-14 Oppo广东移动通信有限公司 电视信号的处理方法、装置、电子设备和存储介质
CN112817768A (zh) * 2021-02-26 2021-05-18 北京梧桐车联科技有限责任公司 动画处理方法、装置、设备及计算机可读存储介质
CN113032590A (zh) * 2021-03-29 2021-06-25 广州繁星互娱信息科技有限公司 特效展示方法、装置、计算机设备及计算机可读存储介质
CN113365027A (zh) * 2021-05-28 2021-09-07 上海商汤智能科技有限公司 视频处理方法及装置、电子设备和存储介质
CN113392267A (zh) * 2020-03-12 2021-09-14 平湖莱顿光学仪器制造有限公司 一种用于生成目标对象的二维显微视频信息的方法与设备
CN113422967A (zh) * 2021-06-07 2021-09-21 深圳康佳电子科技有限公司 一种投屏显示控制方法、装置、终端设备及存储介质
CN113435530A (zh) * 2021-07-07 2021-09-24 腾讯科技(深圳)有限公司 图像识别方法、装置、计算机设备及计算机可读存储介质
CN113561916A (zh) * 2021-08-31 2021-10-29 长沙德壹科技有限公司 一种车载显示系统、车辆及车载摄像头图像显示方法
CN113992880A (zh) * 2021-10-15 2022-01-28 上海佰贝科技发展股份有限公司 4k视频识别方法、系统、设备及计算机可读存储介质
CN114187349A (zh) * 2021-11-03 2022-03-15 深圳市正运动技术有限公司 产品加工方法、装置、终端设备以及存储介质
CN114827567A (zh) * 2022-03-23 2022-07-29 阿里巴巴(中国)有限公司 视频质量分析方法、设备和可读介质
CN116684626A (zh) * 2023-08-04 2023-09-01 广东星云开物科技股份有限公司 视频压缩方法和共享售卖柜

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112068771A (zh) * 2020-08-17 2020-12-11 Oppo广东移动通信有限公司 视频处理方法、视频处理装置、终端设备及存储介质
CN112330541A (zh) * 2020-11-11 2021-02-05 广州博冠信息科技有限公司 直播视频处理方法、装置、电子设备和存储介质
CN113852757B (zh) * 2021-09-03 2023-05-26 维沃移动通信(杭州)有限公司 视频处理方法、装置、设备和存储介质
CN114710649B (zh) * 2022-06-01 2023-01-24 广东中浦科技有限公司 污染源视频监控方法及系统
CN117495854B (zh) * 2023-12-28 2024-05-03 淘宝(中国)软件有限公司 视频数据处理方法、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000031981A1 (en) * 1998-11-20 2000-06-02 Koninklijke Philips Electronics N.V. Extraction of foreground information for stereoscopic video coding
CN1960479A (zh) * 2005-11-03 2007-05-09 中国科学院自动化研究所 利用单个摄像机进行主从视频跟踪的方法
CN101826157A (zh) * 2010-04-28 2010-09-08 华中科技大学 一种地面静止目标实时识别跟踪方法
CN104067310A (zh) * 2011-12-23 2014-09-24 超威半导体公司 显示图像的改进
CN105163127A (zh) * 2015-09-07 2015-12-16 浙江宇视科技有限公司 视频分析方法及装置
CN106817608A (zh) * 2015-11-27 2017-06-09 小米科技有限责任公司 实现局部显示的方法及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7116833B2 (en) * 2002-12-23 2006-10-03 Eastman Kodak Company Method of transmitting selected regions of interest of digital video data at selected resolutions
CN105989802B (zh) * 2015-03-05 2018-11-30 西安诺瓦电子科技有限公司 可编程逻辑器件及其亚像素下采样方法和相关应用
CN105472272A (zh) * 2015-11-25 2016-04-06 浙江工业大学 基于fpga的多路视频拼接方法及装置
CN105718929B (zh) * 2016-01-21 2019-04-30 成都信息工程大学 全天候未知环境下高精度快速圆形目标定位方法和系统
CN106791710B (zh) * 2017-02-10 2020-12-04 北京地平线信息技术有限公司 目标检测方法、装置和电子设备
CN108012157B (zh) * 2017-11-27 2020-02-04 上海交通大学 用于视频编码分数像素插值的卷积神经网络的构建方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000031981A1 (en) * 1998-11-20 2000-06-02 Koninklijke Philips Electronics N.V. Extraction of foreground information for stereoscopic video coding
CN1960479A (zh) * 2005-11-03 2007-05-09 中国科学院自动化研究所 利用单个摄像机进行主从视频跟踪的方法
CN101826157A (zh) * 2010-04-28 2010-09-08 华中科技大学 一种地面静止目标实时识别跟踪方法
CN104067310A (zh) * 2011-12-23 2014-09-24 超威半导体公司 显示图像的改进
CN105163127A (zh) * 2015-09-07 2015-12-16 浙江宇视科技有限公司 视频分析方法及装置
CN106817608A (zh) * 2015-11-27 2017-06-09 小米科技有限责任公司 实现局部显示的方法及装置

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392267B (zh) * 2020-03-12 2024-01-16 平湖莱顿光学仪器制造有限公司 一种用于生成目标对象的二维显微视频信息的方法与设备
CN113392267A (zh) * 2020-03-12 2021-09-14 平湖莱顿光学仪器制造有限公司 一种用于生成目标对象的二维显微视频信息的方法与设备
CN111753784A (zh) * 2020-06-30 2020-10-09 广州酷狗计算机科技有限公司 视频的特效处理方法、装置、终端及存储介质
CN111726536A (zh) * 2020-07-03 2020-09-29 腾讯科技(深圳)有限公司 视频生成方法、装置、存储介质及计算机设备
CN111726536B (zh) * 2020-07-03 2024-01-05 腾讯科技(深圳)有限公司 视频生成方法、装置、存储介质及计算机设备
CN111859549A (zh) * 2020-07-28 2020-10-30 奇瑞汽车股份有限公司 单一配置整车重量与重心信息的确定方法及相关设备
CN111859549B (zh) * 2020-07-28 2024-05-14 奇瑞汽车股份有限公司 单一配置整车重量与重心信息的确定方法及相关设备
CN112135190A (zh) * 2020-09-23 2020-12-25 成都市喜爱科技有限公司 视频处理方法、装置、系统、服务器及存储介质
CN112135190B (zh) * 2020-09-23 2022-08-16 成都市喜爱科技有限公司 视频处理方法、装置、系统、服务器及存储介质
CN112528760A (zh) * 2020-11-24 2021-03-19 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备及介质
CN112528760B (zh) * 2020-11-24 2024-01-09 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备及介质
CN112541429B (zh) * 2020-12-08 2024-05-31 浙江大华技术股份有限公司 一种智能抓图方法、装置、电子设备以及存储介质
CN112541429A (zh) * 2020-12-08 2021-03-23 浙江大华技术股份有限公司 一种智能抓图方法、装置、电子设备以及存储介质
CN112804573A (zh) * 2021-01-08 2021-05-14 Oppo广东移动通信有限公司 电视信号的处理方法、装置、电子设备和存储介质
CN112817768A (zh) * 2021-02-26 2021-05-18 北京梧桐车联科技有限责任公司 动画处理方法、装置、设备及计算机可读存储介质
CN112817768B (zh) * 2021-02-26 2024-05-03 北京梧桐车联科技有限责任公司 动画处理方法、装置、设备及计算机可读存储介质
CN113032590A (zh) * 2021-03-29 2021-06-25 广州繁星互娱信息科技有限公司 特效展示方法、装置、计算机设备及计算机可读存储介质
CN113032590B (zh) * 2021-03-29 2024-05-03 广州繁星互娱信息科技有限公司 特效展示方法、装置、计算机设备及计算机可读存储介质
CN113365027A (zh) * 2021-05-28 2021-09-07 上海商汤智能科技有限公司 视频处理方法及装置、电子设备和存储介质
CN113422967A (zh) * 2021-06-07 2021-09-21 深圳康佳电子科技有限公司 一种投屏显示控制方法、装置、终端设备及存储介质
CN113435530B (zh) * 2021-07-07 2023-10-10 腾讯科技(深圳)有限公司 图像识别方法、装置、计算机设备及计算机可读存储介质
CN113435530A (zh) * 2021-07-07 2021-09-24 腾讯科技(深圳)有限公司 图像识别方法、装置、计算机设备及计算机可读存储介质
CN113561916A (zh) * 2021-08-31 2021-10-29 长沙德壹科技有限公司 一种车载显示系统、车辆及车载摄像头图像显示方法
CN113992880B (zh) * 2021-10-15 2024-04-12 上海佰贝科技发展股份有限公司 4k视频识别方法、系统、设备及计算机可读存储介质
CN113992880A (zh) * 2021-10-15 2022-01-28 上海佰贝科技发展股份有限公司 4k视频识别方法、系统、设备及计算机可读存储介质
CN114187349A (zh) * 2021-11-03 2022-03-15 深圳市正运动技术有限公司 产品加工方法、装置、终端设备以及存储介质
CN114827567B (zh) * 2022-03-23 2024-05-28 阿里巴巴(中国)有限公司 视频质量分析方法、设备和可读介质
CN114827567A (zh) * 2022-03-23 2022-07-29 阿里巴巴(中国)有限公司 视频质量分析方法、设备和可读介质
CN116684626B (zh) * 2023-08-04 2023-11-24 广东星云开物科技股份有限公司 视频压缩方法和共享售卖柜
CN116684626A (zh) * 2023-08-04 2023-09-01 广东星云开物科技股份有限公司 视频压缩方法和共享售卖柜

Also Published As

Publication number Publication date
CN110502954B (zh) 2023-06-16
CN110502954A (zh) 2019-11-26

Similar Documents

Publication Publication Date Title
WO2019219065A1 (zh) 视频分析的方法和装置
US11205282B2 (en) Relocalization method and apparatus in camera pose tracking process and storage medium
CN110189340B (zh) 图像分割方法、装置、电子设备及存储介质
WO2021008456A1 (zh) 图像处理方法、装置、电子设备及存储介质
CN108594997B (zh) 手势骨架构建方法、装置、设备及存储介质
WO2019101021A1 (zh) 图像识别方法、装置及电子设备
JP7058760B2 (ja) 画像処理方法およびその、装置、端末並びにコンピュータプログラム
CN110647865A (zh) 人脸姿态的识别方法、装置、设备及存储介质
WO2020221012A1 (zh) 图像特征点的运动信息确定方法、任务执行方法和设备
CN112907725B (zh) 图像生成、图像处理模型的训练、图像处理方法和装置
KR20140104753A (ko) 신체 부위 검출을 이용한 이미지 프리뷰
CN109886208B (zh) 物体检测的方法、装置、计算机设备及存储介质
CN110839128B (zh) 拍照行为检测方法、装置及存储介质
CN109360222B (zh) 图像分割方法、装置及存储介质
CN111723803B (zh) 图像处理方法、装置、设备及存储介质
CN111027490A (zh) 人脸属性识别方法及装置、存储介质
CN111754386A (zh) 图像区域屏蔽方法、装置、设备及存储介质
CN112135191A (zh) 视频编辑方法、装置、终端及存储介质
CN110675473B (zh) 生成gif动态图的方法、装置、电子设备及介质
WO2022199102A1 (zh) 图像处理方法及装置
CN112235650A (zh) 视频处理方法、装置、终端及存储介质
CN110232417B (zh) 图像识别方法、装置、计算机设备及计算机可读存储介质
CN111931712A (zh) 人脸识别方法、装置、抓拍机及系统
CN110853124A (zh) 生成gif动态图的方法、装置、电子设备及介质
CN111860064A (zh) 基于视频的目标检测方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19803393

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19803393

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19803393

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 01.06.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19803393

Country of ref document: EP

Kind code of ref document: A1