WO2020155299A1 - 视频帧中目标对象的拟合方法、系统及设备 - Google Patents

视频帧中目标对象的拟合方法、系统及设备 Download PDF

Info

Publication number
WO2020155299A1
WO2020155299A1 PCT/CN2019/077236 CN2019077236W WO2020155299A1 WO 2020155299 A1 WO2020155299 A1 WO 2020155299A1 CN 2019077236 W CN2019077236 W CN 2019077236W WO 2020155299 A1 WO2020155299 A1 WO 2020155299A1
Authority
WO
WIPO (PCT)
Prior art keywords
geometric
target object
video frame
fitting
area
Prior art date
Application number
PCT/CN2019/077236
Other languages
English (en)
French (fr)
Inventor
孙磊
王风雷
Original Assignee
网宿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网宿科技股份有限公司 filed Critical 网宿科技股份有限公司
Priority to EP19727579.5A priority Critical patent/EP3709666A1/en
Priority to US16/442,081 priority patent/US10699751B1/en
Publication of WO2020155299A1 publication Critical patent/WO2020155299A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4318Generation of visual interfaces for content selection or interaction; Content or additional data rendering by altering the content in the rendering process, e.g. blanking, blurring or masking an image region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/454Content or additional data filtering, e.g. blocking advertisements
    • H04N21/4545Input to filtering algorithms, e.g. filtering a region of the image

Definitions

  • This application relates to the field of Internet technology, and in particular to a method, system and device for fitting a target object in a video frame.
  • the target object in the video frame is usually fitted by means of a binary mask map.
  • a binary mask image consistent with the video frame can be generated.
  • the area occupied by the target object and other areas may have different pixel values.
  • subsequent processing can be performed on the binary mask image.
  • the data volume of the binary mask map is usually relatively large, when the target object is fitted according to the binary mask map, the amount of data that needs to be processed subsequently will increase, resulting in lower processing efficiency.
  • the purpose of some embodiments of the present application is to provide a method, system, and device for fitting a target object in a video frame, which can reduce the amount of data after fitting, thereby improving subsequent processing efficiency.
  • the embodiment of the present application provides a method for fitting a target object in a video frame, the method comprising: identifying an area where the target object is located in the video frame; selecting several geometric figures to fit the target object The area where the geometric figures are located, so that the combination of the several geometric figures covers the area where the target object is located; according to the type of each geometric figure and the layout parameters of each geometric figure in the video frame, generate Fitting parameters of each geometric figure, and using a combination of fitting parameters of each geometric figure as the fitting parameter of the video frame.
  • An embodiment of the present application also provides a fitting system for a target object in a video frame, the system including: an area recognition unit for recognizing the area where the target object is located in the video frame; and a geometric figure selection unit , Used to select several geometric figures to fit the area where the target object is located, so that the combination of the several geometric figures covers the area where the target object is located; the fitting parameter generation unit is used to fit the area where the target object is located; The type of geometric figures and the layout parameters of each geometric figure in the video frame, the fitting parameters of each geometric figure are generated, and the combination of the fitting parameters of each geometric figure is used as the video frame The fitting parameters.
  • An embodiment of the present application also provides a fitting device for a target object in a video frame.
  • the device includes a processor and a memory.
  • the memory is used to store a computer program.
  • the computer program is executed by the processor, the The above-mentioned fitting method.
  • the embodiments of the present application can identify the area where the target object is located for the target object in the video frame. Then, a combination of one or more geometric figures can be used to cover the target object in the video frame by means of geometric figure fitting. After several geometric figures covering the target object are determined, fitting parameters of these geometric figures can be generated, and the fitting parameters can characterize the type of each geometric figure and the layout of each geometric figure in the video frame. Since the fitting parameters of the geometric figures are not image data, the bytes occupied are usually small, which can reduce the amount of data after fitting, thereby improving subsequent processing efficiency.
  • Fig. 1 is a schematic diagram of a fitting method for a target object in an embodiment of the present application
  • Fig. 2 is a schematic diagram of fitting a target object with a geometric figure in an embodiment of the present application
  • Fig. 3 is a schematic diagram of a rectangular area according to an embodiment of the present application.
  • Fig. 4 is a schematic diagram of an elliptical area according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the structure of mask information and video frame data according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an implementation manner of an auxiliary identification bit in an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a fitting device for a target object in an embodiment of the present application.
  • This application provides a method for fitting a target object in a video frame.
  • the method can be applied to a device with image processing functions. Please refer to Figure 1.
  • the method includes the following steps.
  • the video frame may be any video frame in the video data to be analyzed.
  • the video data to be parsed may be video data of an on-demand video uploaded in the device, or video data of a live video stream received by the device, and the video data may include data of each video frame.
  • the device can read the video data to be parsed, and can process each video frame in the video data.
  • the device may predetermine the target object to be recognized in the video data, and the target object may be, for example, a person appearing in the video screen.
  • the target object can also be flexibly changed. For example, in a live video showing the daily life of a cat, the target object may be a cat.
  • the area where the target object is located can be identified from the video frame.
  • identifying the target object from the video frame can be achieved in a variety of ways.
  • the target object can be identified from the video frame through the instance segmentation algorithm or the semantic segmentation algorithm.
  • neural network systems such as Faster-rcnn and Mask-rcnn can be used to identify target objects.
  • the video frame can be input to the above-mentioned neural network system model, and the output result of the model can be marked with the position information of the target object contained in the video frame.
  • the position information may be represented by the coordinate values of the pixels constituting the target object in the video frame. In this way, the set of coordinate values of the pixel points constituting the target object can represent the area where the target object is located in the video frame.
  • S3 Select several geometric figures to fit the area where the target object is located, so that the combination of the several geometric figures covers the area where the target object is located.
  • one or more geometric figures may be selected to jointly fit the area where the target object is located, and the result of the fitting It may be that the combination of one or more geometric figures can just cover the area where the target object is located.
  • the target object to be recognized in the current video frame is a human body. After the human body as shown in Figure 2 is identified from the current video frame, the human body can be fitted with ellipses and rectangles. The area in the video frame. For example, an ellipse can fit the head of a human body, and a rectangle can fit the upper and lower body of the human body.
  • the area where the target object is located may be divided into one or more sub-areas according to the physical characteristics of the target object.
  • the physical characteristics can be flexibly set according to the type of the target object.
  • the physical features may be the head, torso, limbs, etc.
  • the number of sub-regions obtained by segmentation can also be different.
  • the trunk and limbs may not be segmented too finely, but can be simply divided into upper body and lower body.
  • the area where the target object is located can be divided into one or more sub-areas through a variety of pose algorithms.
  • the pose algorithm may include, for example, DensePose algorithm, OpenPose algorithm, Realtime Multi-Person Pose Estimation algorithm, AlphaPose algorithm, Human Body Pose Estimation algorithm, DeepPose algorithm, etc.
  • a geometric figure suitable for the sub-region can be selected. For example, for the head of the human body, a circle or an ellipse can be selected, and for the torso and limbs of the human body, a rectangle can be selected. In this way, the combination of geometric figures corresponding to these sub-regions can cover the region where the target object is located.
  • the layout parameters of the geometric figures can be determined continuously, so that the geometric figures drawn according to the layout parameters can cover the corresponding sub area.
  • the determined layout parameters can also be different according to different geometric figures.
  • the layout parameter may be the coordinate values of the two diagonal vertices of the rectangle in the video frame, and the angle between the sides of the rectangle and the horizontal line.
  • the coordinate values of the vertex a and the vertex b, and the angle between the side ac and the horizontal line (the dotted line in FIG. 3) can be determined.
  • the rectangular area can be determined in the video frame.
  • the determined layout parameters may include the coordinates of the center point of the ellipse, the major axis, the minor axis of the ellipse, and the angle between the major axis and the horizontal line (the dotted line in FIG. 4).
  • Layout parameters may include the center and radius of the circle.
  • the fitting parameters of the geometric figure can be generated according to the selected type of the geometric figure and the layout parameter of the geometric figure.
  • the fitting parameters can be represented by encoded values.
  • the type of the geometric figure can be represented by a preset figure identifier.
  • the preset graphic identifier of a circle is 0, the preset graphic identifier of an ellipse is 1, the preset graphic identifier of a rectangle is 2, and the preset graphic identifier of a triangle is 3, and so on.
  • the layout parameters of the geometric figures can be expressed by the coordinates of the pixels or the number of pixels covered.
  • the center of a circle can be represented by the coordinate value of the pixel at the center of the circle, and the radius can be represented by the number of pixels covered by the radius.
  • the preset graphic identifiers and layout parameters determined above can all be in decimal, and in computer languages, they can usually be expressed in binary or hexadecimal. Therefore, after obtaining the preset graphic identifier and the layout parameter corresponding to the geometric graphic, the preset graphic identifier and the layout parameter can be coded separately. For example, binary coding may be performed on the preset graphic identifier and the layout parameter.
  • the circle's preset graphic ID is 0, the coordinates of the center of the circle in the layout parameters are (16,32), and the radius is 8, then after binary coding, the preset graphic ID can be 00, and the center coordinates can be expressed If it is 010000 100000, the radius can be expressed as 001000, and the combination is 00 010000 100000 001000. Then, finally, the encoded data can be used as the fitting parameter of the geometric figure. For each geometric figure included in the video frame, each fitting parameter can be generated in the above-mentioned manner. Finally, the combination of the fitting parameters of each of the geometric figures can be used as the fitting parameters of the video frame.
  • the mask information of the video frame may also be generated according to the fitting parameters of these geometric figures.
  • the mask information may also contain auxiliary flags added for the fitting parameters.
  • the effect of adding the auxiliary identification bit is to be able to distinguish the mask information of the video frame from the real data of the video frame. Referring to FIG. 5, the processed video data can be divided according to each video frame, where for the same video frame, the mask information of the video frame and the data of the video frame are connected end to end. If the auxiliary identification bit is not added, when other subsequent devices read the video data, they cannot distinguish which is the mask information and which is the data of the video frame that needs to be rendered.
  • an auxiliary flag can be added to the fitting parameter, and a combination of the auxiliary flag and the fitting parameter can be used as the mask information of the video frame.
  • the auxiliary identification bit may indicate the data size of the fitting parameter in a binary manner, and the auxiliary identification bit may be a binary number with a specified number of digits before the fitting parameter.
  • the auxiliary identification bit can be a 6-bit binary number.
  • the data size is 20 bits
  • the auxiliary identification bit can be expressed as 010100
  • the final mask information It can be 010100 00 010000 100000 001000.
  • other devices can know that the data size of the fitting parameter is 20 bits, and then read the 20-bit data content immediately, and use the 20-bit data content as the fitting The content of the parameter.
  • the data after the 20-bit data can be used as the data of the video frame to be rendered.
  • the auxiliary flag can also characterize the number of geometric figures included in the fitting parameters, so when other devices read from the video data that the number is consistent with the number represented by the auxiliary flag After the fitting parameters of the geometric figure, the subsequent data read is the data of the video frame to be rendered.
  • the auxiliary flag can also characterize the data end position of the fitting parameter. As shown in Figure 6, the auxiliary identification bit can be a series of preset fixed characters. When other devices read the fixed characters, they know that the fitting parameters have been read. The fixed characters are waiting to be read. The data of the rendered video frame.
  • a binary mask map of the video frame may also be generated .
  • the pixels constituting the area where the target object is located may have a first pixel value, and other pixels may have a second pixel value.
  • the generated binary mask map may be consistent with the size of the video frame. The same size can be understood as the same length and width of the picture, and the same resolution, so that the number of pixels contained in the original video frame and the generated binary mask image are the same.
  • the size of the generated binary mask image can be consistent with the size of a sub-region cropped in the original video frame, and does not need to be consistent with the size of the original video frame.
  • the region formed by the pixels with the first pixel value can be directly performed in the binary mask map through the several geometric figures in the above-mentioned manner. Fitting to get the fitting parameters of each geometric figure.
  • the fitting parameters of the video frame can also be determined by means of machine learning.
  • different target objects can train the recognition model through different training sample sets.
  • a training sample set of the target object may be obtained, the training sample set may include several image samples, and the several image samples all contain the target object.
  • each image sample can be manually labeled to label the geometric figures required to cover the target object in each image sample.
  • These marked geometric figures may be represented by fitting parameters of the geometric figures, and the fitting parameters may include the type of the geometric figures and the layout parameters of the geometric figures.
  • fitting parameters corresponding to each image sample can be generated, and the fitting parameters can be used as an annotation label of the image sample.
  • the recognition model may include a deep neural network, and neurons in the deep neural network may have initial weight values. After the deep neural network carrying the initial weight value processes the input image samples, the prediction results corresponding to the input image samples can be obtained.
  • the prediction result can indicate the fitting parameters of the geometric figure required to cover the target object in the input image sample. Since the weight value carried by the recognition model in the initial stage is not accurate enough, there will be a certain gap between the fitting parameters represented by the prediction results and the fitting parameters manually labeled.
  • the difference value between the fitting parameter represented by the prediction result and the artificially labeled fitting parameter can be calculated, and the difference value can be provided as feedback data to the recognition model to change the neuron in the recognition model.
  • Weights In this way, by repeatedly correcting the weight value, after any image sample is input into the trained recognition model, the predicted result output by the trained recognition model is consistent with the fitted parameters represented by the label label of the input image sample. , So you can complete the training process.
  • the video frame may be input to the trained recognition model, and the prediction result output by the trained recognition model can be used as the fitting parameter of the video frame .
  • This application also provides a fitting system for a target object in a video frame, the system including:
  • An area identification unit configured to identify the area where the target object is located in the video frame
  • a geometric figure selection unit for selecting several geometric figures to fit the area where the target object is located, so that the combination of the several geometric figures covers the area where the target object is located;
  • the fitting parameter generation unit is configured to generate the fitting parameters of each geometric figure according to the type of each geometric figure and the layout parameter of each geometric figure in the video frame, and combine each geometric figure The combination of the fitting parameters is used as the fitting parameter of the video frame.
  • the geometric figure selection unit includes:
  • a sub-region segmentation module configured to segment the area where the target object is located into one or more sub-regions according to the physical characteristics of the target object
  • the layout parameter determination module is configured to select a geometric figure suitable for the sub-region for any one of the sub-regions, and determine the layout parameters of the geometric figures, so that the geometric figures drawn according to the layout parameters The graphic covers the sub-region.
  • the fitting parameter generation unit includes:
  • the encoding module is used to identify the preset graphic identifier corresponding to the type of the geometric figure, and respectively encode the preset graphic identifier and the layout parameter of the geometric figure, and use the encoded data as the geometric figure The fitting parameters.
  • the fitting parameter generation unit includes:
  • the training sample set obtaining module is used to obtain the training sample set of the target object in advance.
  • the training sample set includes several image samples, and the several image samples all contain the target object, and each image The samples are all provided with a label, and the label is used to characterize the fitting parameters of the geometric figure required to cover the target object in the image sample;
  • the training module is used to train a recognition model using image samples in the training sample set, so that after any image sample is input into the trained recognition model, the predicted result output by the trained recognition model is the same as the input image sample
  • the fitting parameters represented by the labeled labels are consistent;
  • the result prediction module is configured to input the video frame into the trained recognition model, and use the prediction result output by the trained recognition model as the fitting parameter of the video frame.
  • this application also provides a fitting device for a target object in a video frame.
  • the device includes a memory and a processor.
  • the memory is used to store a computer program that is executed by the processor.
  • the method for generating mask information as described above can be realized.
  • the device may include a processor, an internal bus, and a memory.
  • the memory may include memory and non-volatile memory.
  • the processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it.
  • FIG. 7 is only for illustration, and it does not limit the structure of the foregoing device.
  • the device may also include more or fewer components than those shown in FIG. 7, for example, may also include other processing hardware, such as GPU (Graphics Processing Unit, image processor), or may have components similar to those shown in FIG. Different configurations.
  • GPU Graphics Processing Unit
  • this application does not exclude other implementations, such as logic devices or a combination of software and hardware, and so on.
  • the processor may include a central processing unit (CPU) or a graphics processing unit (GPU), of course, may also include other single-chip microcomputers, logic gate circuits, integrated circuits, etc. with logic processing capabilities, or appropriate combination.
  • the memory described in this embodiment may be a memory device for storing information.
  • the device that can store binary data can be a memory; in an integrated circuit, a circuit with storage function without physical form can also be a memory, such as RAM, FIFO, etc.; in the system, it has physical storage
  • the device can also be called a memory and so on.
  • the memory can also be implemented in the form of cloud storage, and the specific implementation manner is not limited in this specification.
  • the technical solution provided by the present application can identify the area where the target object is located for the target object in the video frame. Then, a combination of one or more geometric figures can be used to cover the target object in the video frame by means of geometric figure fitting. After several geometric figures covering the target object are determined, fitting parameters of these geometric figures can be generated, and the fitting parameters can characterize the type of each geometric figure and the layout of each geometric figure in the video frame. Since the fitting parameters of the geometric figures are not image data, the bytes occupied are usually small, which can reduce the amount of data after fitting, thereby improving subsequent processing efficiency.
  • each embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic A disc, an optical disc, etc., include a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请部分实施例提供了一种视频帧中目标对象的拟合方法、系统及设备,其中,所述方法包括:在所述视频帧中识别所述目标对象所处的区域;选择若干个几何图形拟合所述目标对象所处的区域,以使得所述若干个几何图形的组合覆盖所述目标对象所处的区域;根据各个所述几何图形的类型以及各个所述几何图形在所述视频帧中的布局参数,生成各个所述几何图形的拟合参数,并将各个所述几何图形的拟合参数的组合作为所述视频帧的拟合参数。采用本申请的实施例,能够减少拟合后的数据量,从而提高后续的处理效率。

Description

视频帧中目标对象的拟合方法、系统及设备 技术领域
本申请涉及互联网技术领域,特别涉及一种视频帧中目标对象的拟合方法、系统及设备。
背景技术
随着视频播放技术的不断发展,针对视频画面的图像处理需求也在不断提高。当前,很多应用场景下都需要从视频画面中拟合出主要的目标对象,然后再根据拟合出的目标对象进行后续的处理。例如,一些自媒体需要根据视频的内容,制作出带有配图的剧情大纲。在这种情况下,就需要从视频画面中拟合出主要人物,然后根据拟合出的主要人物以及后期添加的文字来制作视频的剧情大纲。又例如,当视频播放画面中展示弹幕信息时,有时候为了避免弹幕信息对视频画面中的主要对象造成遮挡,也需要先从视频画面中拟合出主要对象,然后再通过弹幕处理技术,避免对拟合出的主要对象造成遮挡。
发明人发现现有技术至少存在以下问题:目前,通常是通过二值掩码图的方式对视频帧中的目标对象进行拟合。具体地,可以生成与视频帧一致的二值掩码图,在该二值掩码图中,目标对象所占的区域与其它区域可以具备不同的像素值。这样,后续可以针对二值掩码图进行处理。然而,由于二值掩码图的数据量通常比较大,从而导致按照二值掩码图来拟合目标对象时,会增加后续需要处理的数据量,进而导致处理效率较低。
发明内容
本申请部分实施例的目的在于提供一种视频帧中目标对象的拟合方法、系统及设备,能够减少拟合后的数据量,从而提高后续的处理效率。
本申请实施例提供了一种视频帧中目标对象的拟合方法,所述方法包括:在所述视频帧中识别所述目标对象所处的区域;选择若干个几何图形拟合所述目标对象所处的区域,以使得所述若干个几何图形的组合覆盖所述目标对象所处的区域;根据各个所述几何图形的类型以及各个所述几何图形在所述视频帧中的布局参数,生成各个所述几何图形的拟合参数,并将各个所述几何图形的拟合参数的组合作为所述视频帧的拟合参数。
本申请实施例还提供了一种视频帧中目标对象的拟合系统,所述系统包括:区域识别单元,用于在所述视频帧中识别所述目标对象所处的区域;几何图形选择单元,用于选择若干个几何图形拟合所述目标对象所处的区域,以使得所述若干个几何图形的组合覆盖所述目标对象所处的区域;拟合参数生成单元,用于根据各个所述几何图形的类型以及各个所述几何图形在所述视频帧中的布局参数,生成各个所述几何图形的拟合参数,并将各个所述几何图形的拟合参数的组合作为所述视频帧的拟合参数。
本申请实施例还提供了一种视频帧中目标对象的拟合设备,所述设备包括处理器和存储器,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时,实现上述的拟合方法。
本申请实施例现对于现有技术而言,针对视频帧中的目标对象,可以识别出该目标对象所处的区域。然后,可以通过几何图形拟合的方式,采用一个 或者多个几何图形的组合来覆盖视频帧中的目标对象。在确定出覆盖目标对象的若干个几何图形后,可以生成这些几何图形的拟合参数,该拟合参数可以表征各个几何图形的类型以及各个几何图形在视频帧中的布局。由于几何图形的拟合参数并非是图像数据,因此所占用的字节通常较小,从而能够减少拟合后的数据量,进而提高后续的处理效率。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定。
图1是根据本申请实施例中的目标对象的拟合方法示意图;
图2是根据本申请实施例中的几何图形拟合目标对象的示意图;
图3是根据本申请实施例中的矩形区域的示意图;
图4是根据本申请实施例中的椭圆区域的示意图;
图5是根据本申请实施例中的掩码信息和视频帧的数据的结构示意图;
图6是根据本申请实施例中的辅助标识位的一种实现方式示意图;
图7是根据本申请实施例中的目标对象的拟合设备的结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请部分实施例进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供一种视频帧中目标对象的拟合方法,所述方法可以应用于具 备图像处理功能的设备中。请参阅图1,所述方法包括以下步骤。
S1:在所述视频帧中识别所述目标对象所处的区域。
在本实施例中,所述视频帧可以是待解析的视频数据中的任意一个视频帧。所述待解析的视频数据可以是设备中已经完成上传的点播视频的视频数据,也可以是设备接收到的直播视频流的视频数据,所述视频数据中可以包括每一个视频帧的数据。设备可以读取所述待解析的视频数据,并可以针对该视频数据中的每一个视频帧进行处理。具体地,设备可以预先确定视频数据中需要识别的目标对象,所述目标对象例如可以是视频画面中出现的人物。当然,根据视频内容的不同,所述目标对象也可以灵活更改。例如,在一个展现猫的日常生活的直播视频中,所述目标对象便可以是猫。
在本实施例中,针对所述视频数据中的任一视频帧,可以从所述视频帧中识别出所述目标对象所处的区域。具体地,从视频帧中识别目标对象可以采用多种方式来实现。例如,可以通过实例分割(Instance segmentation)算法或者语义分割(Semantic segmentation)算法从视频帧中识别出目标对象。在实际应用场景中,可以通过Faster-rcnn、Mask-rcnn等神经网络系统来识别目标对象。具体地,可以将视频帧输入上述的神经网络系统的模型,该模型输出的结果中,可以标注出所述视频帧中包含的目标对象的位置信息。所述位置信息可以通过视频帧中构成所述目标对象的像素点的坐标值来表示。这样,构成所述目标对象的像素点的坐标值的集合便可以表征所述目标对象在视频帧中所处的区域。
S3:选择若干个几何图形拟合所述目标对象所处的区域,以使得所述若干个几何图形的组合覆盖所述目标对象所处的区域。
在本实施例中,在确定出所述目标对象在所述视频帧中所处的区域后,可以选用一个或者多个几何图形来共同拟合所述目标对象所处的区域,拟合的结果可以是这一个或者多个几何图形的组合能够刚好覆盖所述目标对象所处的区域。举例来说,请参阅图2,当前视频帧中待识别的目标对象是人体,在从当前视频帧中识别出如图2所示的人体后,可以通过椭圆和矩形来拟合出该人体在视频帧中所处的区域。例如,椭圆可以拟合人体的头部,矩形可以拟合人体的上半身和下半身。
在本实施例中,在确定上述的一个或者多个几何图形时,可以将所述目标对象所处的区域按照所述目标对象的形体特征分割为一个或者多个子区域。具体地,所述形体特征可以根据目标对象的类型而灵活设置。例如,当所述目标对象为人体时,所述形体特征便可以是头部、躯干、四肢等。当然,根据拟合精度的不同,分割得到的子区域的数量也可以不同。例如,当拟合精度的要求不高时,对于躯干和四肢可以无需分割得过于精细,而是可以简单地分为上半身和下半身。在实际应用中,可以通过多种姿态算法来将目标对象所处的区域分割为一个或者多个子区域。所述姿态算法例如可以包括DensePose算法、OpenPose算法、Realtime Multi-Person Pose Estimation算法、AlphaPose算法、Human Body Pose Estimation算法、DeepPose算法等。
在本实施例中,在划分得到各个子区域后,针对任一所述子区域,可以选取与所述子区域相适配的几何图形。例如,对于人体的头部而言,可以选择圆形或者椭圆形,而针对人体的躯干和四肢而言,可以选择矩形。这样,这些子区域对应的各个几何图形的组合便可以覆盖所述目标对象所处的区域。
S5:根据各个所述几何图形的类型以及各个所述几何图形在所述视频帧 中的布局参数,生成各个所述几何图形的拟合参数,并将各个所述几何图形的拟合参数的组合作为所述视频帧的拟合参数。
在本实施例中,在选择了若干个能够刚好覆盖目标对象的几何图形后,可以继续确定所述几何图形的布局参数,以使得按照所述布局参数绘制的所述几何图形能够覆盖对应的子区域。在实际应用中,根据几何图形的不同,确定的布局参数也可以不同。例如,对于矩形而言,所述布局参数可以是矩形的两个对角顶点在视频帧中的坐标值,以及所述矩形的边与水平线的夹角。如图3所示,为了确定该矩形在视频帧中的布局参数,可以确定出顶点a和顶点b的坐标值,以及边ac与水平线(图3中的虚线)的夹角。这样,根据这些布局参数,便可以在视频帧中确定出该矩形的区域。又例如,为了确定图4中椭圆覆盖的区域,确定的布局参数可以包括该椭圆的中心点的坐标、该椭圆的长轴、短轴以及长轴与水平线(图4中的虚线)的夹角布局参数。再例如,为了确定圆形覆盖的区域,确定的布局参数可以包括该圆形的圆心和半径。
在本实施例中,可以根据选取的所述几何图形的类型以及所述几何图形的布局参数,生成所述几何图形的拟合参数。具体地,所述拟合参数可以通过编码后的数值来表示。具体地,所述几何图形的类型可以通过预设图形标识来表示。例如,圆形的预设图形标识为0,椭圆的预设图形标识为1,矩形的预设图形标识为2,三角形的预设图形标识为3等等。而几何图形的布局参数则可以通过像素点的坐标或者覆盖的像素点的数量来表示。例如,圆形的圆心可以通过圆心处像素点的坐标值来表示,而半径则可以通过该半径覆盖的像素点的数量来表示。上述确定出的预设图形标识和布局参数均可以是十进制的,而在计算机语言中,通常可以采用二进制或者十六进制来表示。因此,在得到几何 图形对应的预设图形标识以及布局参数后,可以分别对所述预设图形标识和所述布局参数进行编码。举例来说,可以对所述预设图形标识和所述布局参数进行二进制编码。假设十进制计数方式下,圆形的预设图形标识为0,布局参数中圆心的坐标为(16,32),半径为8,那么二进制编码后,预设图形标识可以为00,圆心坐标可以表示为010000 100000,半径可以表示为001000,组合起来就是00 010000 100000 001000。那么,最终便可以将编码后的数据作为所述几何图形的拟合参数。针对所述视频帧中包含的各个几何图形而言,均可以按照上述的方式生成各自的拟合参数。最终,各个所述几何图形的拟合参数的组合便可以作为所述视频帧的拟合参数。
在一个实施例中,在生成了视频帧的拟合参数之后,还可以根据这些几何图形的拟合参数,生成所述视频帧的掩码信息。具体地,所述掩码信息中除了包含编码后的拟合参数,还可以包含针对所述拟合参数添加的辅助标识位。其中,添加所述辅助标识位的作用在于,能够将视频帧的掩码信息与视频帧的真实数据进行区分。请参阅图5,经过处理后的视频数据中,可以按照每一个视频帧进行划分,其中,对于同一视频帧而言,该视频帧的掩码信息和该视频帧的数据是首尾相连的。如果不添加辅助标识位,那么后续的其它设备在读取视频数据时,无法区分哪些是掩码信息,哪些是需要渲染的视频帧的数据。鉴于此,可以针对所述拟合参数添加辅助标识位,并将所述辅助标识位和所述拟合参数的组合作为所述视频帧的掩码信息。这样,其它设备在读取视频数据时,可以通过识别辅助标识位,从而确定出哪些字段是掩码信息。在实际应用中,辅助标识位的实现方式也多种多样。例如,所述辅助标识位可以通过二进制的方式,注明拟合参数的数据大小,并且所述辅助标识位可以是位于拟合参数之 前的指定位数的二进制数。例如,所述辅助标识位可以是6比特的二进制数,对于00 010000 100000 001000这样的拟合参数,其数据大小为20位,那么辅助标识位便可以表示为010100,那么最终得到的掩码信息便可以是010100 00 010000 100000 001000。其它设备在读取完6位的辅助标识位后,便可以知晓拟合参数的数据大小为20位,便可以紧接着读取20位的数据内容,并将这20位的数据内容作为拟合参数的内容。在这20位数据之后的数据,便可以作为待渲染的视频帧的数据。
此外,在其它的一些实施例中,辅助标识位还可以表征所述拟合参数中包含的几何图形的数量,那么当其它设备从视频数据中读取到与辅助标识位表征的数量相一致的几何图形的拟合参数后,后续继续读取的数据就是待渲染的视频帧的数据。再者,辅助标识位还可以表征所述拟合参数的数据结束位置。如图6所示,所述辅助标识位可以是一串预先设定的固定字符,当其它设备读取到该固定字符后,便知晓拟合参数已经读取完成,该固定字符之后的就是待渲染的视频帧的数据。
在一个实施例中,为了更加方便地拟合视频帧中的目标对象所处的区域,在识别所述视频帧中目标对象所处的区域之后,还可以生成所述视频帧的二进制掩码图。对于所述二进制掩码图中的各个像素点而言,可以仅具备两种不同的像素值。其中,构成所述目标对象所处的区域的像素点可以具备第一像素值,而其它像素点可以具备第二像素值。在实际应用中,为了与原始的视频帧相匹配,生成的二进制掩码图可以与所述视频帧的尺寸一致。该尺寸一致可以理解为画面的长度和宽度一致,并且分辨率也一致,从而使得原始的视频帧和生成的二进制掩码图中包含的像素点的数量是一致的。当然,为了减少二进制掩码 图像的数据量,在生成的二进制掩码图像中可以仅包含所述目标对象对应的区域即可,而无需显示原始视频帧的全部区域。这样,生成的二进制掩码图像的尺寸可以与原始视频帧中剪裁出的一个子区域的尺寸相一致,而无需与原始视频帧的尺寸相一致。在本实施例中,生成所述二进制掩码图之后,可以直接在所述二进制掩码图中,按照上述的方式通过所述若干个几何图形对具备第一像素值的像素点构成的区域进行拟合,从而得到各个几何图形的拟合参数。
在一个实施例中,还可以通过机器学习的方式,来确定视频帧的拟合参数。具体地,不同的目标对象可以通过不同的训练样本集来对识别模型进行训练。首先,可以获取所述目标对象的训练样本集,所述训练样本集中可以包括若干个图像样本,并且所述若干个图像样本中均包含所述目标对象。对于训练样本而言,每个图像样本可以通过人工标注的方式,标注出每个图像样本中覆盖目标对象所需的几何图形。这些标注出的几何图形可以通过几何图形的拟合参数来表示,所述拟合参数可以包括几何图形的类型以及几何图形的布局参数。也就是说,在标注训练样本时,可以生成各个图像样本对应的拟合参数,该拟合参数可以作为图像样本的标注标签。
然后,可以通过人工标注的图像样本对预设的识别模型进行训练。所述识别模型中可以包括深度神经网络,该深度神经网络中的神经元可以具备初始权重值。携带初始权重值的深度神经网络对输入的图像样本进行处理后,可以得到输入的图像样本对应的预测结果。该预测结果可以表明输入的图像样本中,覆盖目标对象所需的几何图形的拟合参数。由于识别模型在初始阶段携带的权重值不够准确,会导致预测结果表征的拟合参数与人工标注的拟合参数存在一定差距。那么在得到预测结果后,可以计算预测结果表征的拟合参数与人工标 注的拟合参数之间的差异值,并将该差异值作为反馈数据提供给识别模型,以更改识别模型中神经元的权重值。这样,通过反复校正权重值,最终可以使得在将任一图像样本输入训练后的识别模型后,所述训练后的识别模型输出的预测结果与输入的图像样本的标注标签表征的拟合参数一致,这样便可以完成训练过程。
后续,当需要确定视频帧的拟合参数时,可以将所述视频帧输入所述训练后的识别模型,并将所述训练后的识别模型输出的预测结果作为所述视频帧的拟合参数。
本申请还提供一种视频帧中目标对象的拟合系统,所述系统包括:
区域识别单元,用于在所述视频帧中识别所述目标对象所处的区域;
几何图形选择单元,用于选择若干个几何图形拟合所述目标对象所处的区域,以使得所述若干个几何图形的组合覆盖所述目标对象所处的区域;
拟合参数生成单元,用于根据各个所述几何图形的类型以及各个所述几何图形在所述视频帧中的布局参数,生成各个所述几何图形的拟合参数,并将各个所述几何图形的拟合参数的组合作为所述视频帧的拟合参数。
在一个实施例中,所述几何图形选择单元包括:
子区域分割模块,用于将所述目标对象所处的区域按照所述目标对象的形体特征分割为一个或者多个子区域;
布局参数确定模块,用于针对任一所述子区域,选取与所述子区域相适配的几何图形,并确定所述几何图形的布局参数,以使得按照所述布局参数绘制的所述几何图形覆盖所述子区域。
在一个实施例中,所述拟合参数生成单元包括:
编码模块,用于识别所述几何图形的类型对应的预设图形标识,并分别对所述预设图形标识和所述几何图形的布局参数进行编码,并将编码后的数据作为所述几何图形的拟合参数。
在一个实施例中,所述拟合参数生成单元包括:
训练样本集获取模块,用于预先获取所述目标对象的训练样本集,所述训练样本集中包括若干个图像样本,所述若干个图像样本中均包含所述目标对象,并且每个所述图像样本均具备标注标签,所述标注标签用于表征覆盖所述图像样本中的目标对象所需的几何图形的拟合参数;
训练模块,用于利用所述训练样本集中的图像样本训练识别模型,以使得在将任一图像样本输入训练后的识别模型后,所述训练后的识别模型输出的预测结果与输入的图像样本的标注标签表征的拟合参数一致;
结果预测模块,用于将所述视频帧输入所述训练后的识别模型,并将所述训练后的识别模型输出的预测结果作为所述视频帧的拟合参数。
请参阅图7,本申请还提供一种视频帧中目标对象的拟合设备,所述设备包括存储器和处理器,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时,可以实现如上述的掩码信息的生成方法。具体地,如图7所示,在硬件层面,该设备可以包括处理器、内部总线和存储器。所述存储器可以包括内存以及非易失性存储器。处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行。本领域普通技术人员可以理解,图7所示的结构仅为示意,其并不对上述设备的结构造成限定。例如,所述设备还可包括比图7中所示更多或者更少的组件,例如还可以包括其他的处理硬件,如GPU(Graphics Processing Unit,图像处理器),或者具有与图7所示不同的配置。 当然,除了软件实现方式之外,本申请并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等。
本实施例中,所述的处理器可以包括中央处理器(CPU)或图形处理器(GPU),当然也可以包括其他的具有逻辑处理能力的单片机、逻辑门电路、集成电路等,或其适当组合。本实施例所述的存储器可以是用于保存信息的记忆设备。在数字系统中,能保存二进制数据的设备可以是存储器;在集成电路中,一个没有实物形式的具有存储功能的电路也可以为存储器,如RAM、FIFO等;在系统中,具有实物形式的存储设备也可以叫存储器等。实现的时候,该存储器也可以采用云存储器的方式实现,具体实现方式,本说明书不做限定。
需要说明的是,本说明书中的系统和设备,具体的实现方式可以参照方法实施例的描述,在此不作一一赘述。
由上可见,本申请提供的技术方案,针对视频帧中的目标对象,可以识别出该目标对象所处的区域。然后,可以通过几何图形拟合的方式,采用一个或者多个几何图形的组合来覆盖视频帧中的目标对象。在确定出覆盖目标对象的若干个几何图形后,可以生成这些几何图形的拟合参数,该拟合参数可以表征各个几何图形的类型以及各个几何图形在视频帧中的布局。由于几何图形的拟合参数并非是图像数据,因此所占用的字节通常较小,从而能够减少拟合后的数据量,进而提高后续的处理效率。
通过以上的实施例的描述,本领域的技术人员可以清楚地了解到各实施例可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件来实现。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存 储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (13)

  1. 一种视频帧中目标对象的拟合方法,其中,所述方法包括:
    在所述视频帧中识别所述目标对象所处的区域;
    选择若干个几何图形拟合所述目标对象所处的区域,以使得所述若干个几何图形的组合覆盖所述目标对象所处的区域;
    根据各个所述几何图形的类型以及各个所述几何图形在所述视频帧中的布局参数,生成各个所述几何图形的拟合参数,并将各个所述几何图形的拟合参数的组合作为所述视频帧的拟合参数。
  2. 根据权利要求1所述的方法,其中,在所述视频帧中识别所述目标对象所处的区域之后,所述方法还包括:
    生成所述视频帧的二进制掩码图,在所述二进制掩码图中,构成所述目标对象所处的区域的像素点具备第一像素值,其它像素点具备第二像素值,所述第一像素值和所述第二像素值不同。
  3. 根据权利要求2所述的方法,其中,选择若干个几何图形拟合所述目标对象所处的区域包括:
    在所述二进制掩码图中,通过所述若干个几何图形对具备第一像素值的像素点构成的区域进行拟合。
  4. 根据权利要求1或2所述的方法,其中,选择若干个几何图形拟合所述目标对象所处的区域包括:
    将所述目标对象所处的区域按照所述目标对象的形体特征分割为一个或者多个子区域;
    针对任一所述子区域,选取与所述子区域相适配的几何图形,并确定所述几何图形的布局参数,以使得按照所述布局参数绘制的所述几何图形覆盖所述子区域。
  5. 根据权利要求1所述的方法,其中,所述几何图形在所述视频帧中的布局参数通过像素点的坐标值和/或像素点的数量表示。
  6. 根据权利要求1所述的方法,其中,生成各个所述几何图形的拟合参数包括:
    识别所述几何图形的类型对应的预设图形标识,并分别对所述预设图形标识和所述几何图形的布局参数进行编码,并将编码后的数据作为所述几何图形的拟合参数。
  7. 根据权利要求1所述的方法,其中,生成各个所述几何图形的拟合参数包括:
    预先获取所述目标对象的训练样本集,所述训练样本集中包括若干个图像样本,所述若干个图像样本中均包含所述目标对象,并且每个所述图像样本均具备标注标签,所述标注标签用于表征覆盖所述图像样本中的目标对象所需的几何图形的拟合参数;
    利用所述训练样本集中的图像样本训练识别模型,以使得在将任一图像样 本输入训练后的识别模型后,所述训练后的识别模型输出的预测结果与输入的图像样本的标注标签表征的拟合参数一致;
    将所述视频帧输入所述训练后的识别模型,并将所述训练后的识别模型输出的预测结果作为所述视频帧的拟合参数。
  8. 根据权利要求1所述的方法,其中,所述方法还包括:
    针对所述视频帧的拟合参数添加辅助标识位,并基于所述辅助标识位和所述视频帧的拟合参数的组合,生成所述视频帧的掩码信息;其中,所述辅助标识位包括以下至少一种功能:
    表征所述视频帧的拟合参数的数据大小;
    表征所述视频帧的拟合参数中包含的几何图形的数量;或者
    表征所述视频帧的拟合参数的数据结束位置。
  9. 一种视频帧中目标对象的拟合系统,其中,所述系统包括:
    区域识别单元,用于在所述视频帧中识别所述目标对象所处的区域;
    几何图形选择单元,用于选择若干个几何图形拟合所述目标对象所处的区域,以使得所述若干个几何图形的组合覆盖所述目标对象所处的区域;
    拟合参数生成单元,用于根据各个所述几何图形的类型以及各个所述几何图形在所述视频帧中的布局参数,生成各个所述几何图形的拟合参数,并将各个所述几何图形的拟合参数的组合作为所述视频帧的拟合参数。
  10. 根据权利要求9所述的系统,其中,所述几何图形选择单元包括:
    子区域分割模块,用于将所述目标对象所处的区域按照所述目标对象的形体特征分割为一个或者多个子区域;
    布局参数确定模块,用于针对任一所述子区域,选取与所述子区域相适配的几何图形,并确定所述几何图形的布局参数,以使得按照所述布局参数绘制的所述几何图形覆盖所述子区域。
  11. 根据权利要求9所述的系统,其中,所述拟合参数生成单元包括:
    编码模块,用于识别所述几何图形的类型对应的预设图形标识,并分别对所述预设图形标识和所述几何图形的布局参数进行编码,并将编码后的数据作为所述几何图形的拟合参数。
  12. 根据权利要求9所述的系统,其中,所述拟合参数生成单元包括:
    训练样本集获取模块,用于预先获取所述目标对象的训练样本集,所述训练样本集中包括若干个图像样本,所述若干个图像样本中均包含所述目标对象,并且每个所述图像样本均具备标注标签,所述标注标签用于表征覆盖所述图像样本中的目标对象所需的几何图形的拟合参数;
    训练模块,用于利用所述训练样本集中的图像样本训练识别模型,以使得在将任一图像样本输入训练后的识别模型后,所述训练后的识别模型输出的预测结果与输入的图像样本的标注标签表征的拟合参数一致;
    结果预测模块,用于将所述视频帧输入所述训练后的识别模型,并将所述训练后的识别模型输出的预测结果作为所述视频帧的拟合参数。
  13. 一种视频帧中目标对象的拟合设备,其中,所述设备包括处理器和存储器,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至8中任一所述的方法。
PCT/CN2019/077236 2019-02-01 2019-03-06 视频帧中目标对象的拟合方法、系统及设备 WO2020155299A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19727579.5A EP3709666A1 (en) 2019-02-01 2019-03-06 Method for fitting target object in video frame, system, and device
US16/442,081 US10699751B1 (en) 2019-03-06 2019-06-14 Method, system and device for fitting target object in video frame

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910105682.5A CN111526422B (zh) 2019-02-01 2019-02-01 一种视频帧中目标对象的拟合方法、系统及设备
CN201910105682.5 2019-02-01

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/442,081 Continuation US10699751B1 (en) 2019-03-06 2019-06-14 Method, system and device for fitting target object in video frame

Publications (1)

Publication Number Publication Date
WO2020155299A1 true WO2020155299A1 (zh) 2020-08-06

Family

ID=67437365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/077236 WO2020155299A1 (zh) 2019-02-01 2019-03-06 视频帧中目标对象的拟合方法、系统及设备

Country Status (3)

Country Link
EP (1) EP3709666A1 (zh)
CN (1) CN111526422B (zh)
WO (1) WO2020155299A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347955A (zh) * 2020-11-12 2021-02-09 上海影卓信息科技有限公司 视频中基于帧预测的物体快速识别方法、系统及介质
WO2022116977A1 (zh) * 2020-12-04 2022-06-09 腾讯科技(深圳)有限公司 目标对象的动作驱动方法、装置、设备及存储介质及计算机程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150117792A1 (en) * 2013-10-30 2015-04-30 Ricoh Imaging Company, Ltd. Image-processing system, imaging apparatus and image-processing method
EP2905738A1 (en) * 2014-02-05 2015-08-12 Panasonic Intellectual Property Management Co., Ltd. Monitoring apparatus, monitoring system, and monitoring method
CN106951820A (zh) * 2016-08-31 2017-07-14 江苏慧眼数据科技股份有限公司 基于环形模板和椭圆拟合的客流统计方法
CN109173263A (zh) * 2018-08-31 2019-01-11 腾讯科技(深圳)有限公司 一种图像数据处理方法和装置
CN109242868A (zh) * 2018-09-17 2019-01-18 北京旷视科技有限公司 图像处理方法、装置、电子设备及储存介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402680B (zh) * 2010-09-13 2014-07-30 株式会社理光 人机交互系统中手部、指示点定位方法和手势确定方法
EP2812894A4 (en) * 2012-02-06 2016-04-06 Legend3D Inc MANAGEMENT SYSTEM FOR CINEMATOGRAPHIC PROJECTS
CN103700112A (zh) * 2012-09-27 2014-04-02 中国航天科工集团第二研究院二O七所 一种基于混合预测策略的遮挡目标跟踪方法
CN102970529B (zh) * 2012-10-22 2016-02-17 北京航空航天大学 一种基于对象的多视点视频分形编码压缩与解压缩方法
CN103236074B (zh) * 2013-03-25 2015-12-23 深圳超多维光电子有限公司 一种2d/3d图像处理方法及装置
WO2015198323A2 (en) * 2014-06-24 2015-12-30 Pic2Go Ltd Photo tagging system and method
CN104299186A (zh) * 2014-09-30 2015-01-21 珠海市君天电子科技有限公司 一种对图片进行马赛克处理的方法及装置
US9864901B2 (en) * 2015-09-15 2018-01-09 Google Llc Feature detection and masking in images based on color distributions
CN106022236A (zh) * 2016-05-13 2016-10-12 上海宝宏软件有限公司 一种基于人体轮廓的动作识别方法
CN107133604A (zh) * 2017-05-25 2017-09-05 江苏农林职业技术学院 一种基于椭圆拟合和预测性神经网络的猪步态异常检测方法
CN108665490B (zh) * 2018-04-02 2022-03-22 浙江大学 一种基于多属性编码及动态权重的图形匹配方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150117792A1 (en) * 2013-10-30 2015-04-30 Ricoh Imaging Company, Ltd. Image-processing system, imaging apparatus and image-processing method
EP2905738A1 (en) * 2014-02-05 2015-08-12 Panasonic Intellectual Property Management Co., Ltd. Monitoring apparatus, monitoring system, and monitoring method
CN106951820A (zh) * 2016-08-31 2017-07-14 江苏慧眼数据科技股份有限公司 基于环形模板和椭圆拟合的客流统计方法
CN109173263A (zh) * 2018-08-31 2019-01-11 腾讯科技(深圳)有限公司 一种图像数据处理方法和装置
CN109242868A (zh) * 2018-09-17 2019-01-18 北京旷视科技有限公司 图像处理方法、装置、电子设备及储存介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3709666A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347955A (zh) * 2020-11-12 2021-02-09 上海影卓信息科技有限公司 视频中基于帧预测的物体快速识别方法、系统及介质
WO2022116977A1 (zh) * 2020-12-04 2022-06-09 腾讯科技(深圳)有限公司 目标对象的动作驱动方法、装置、设备及存储介质及计算机程序产品

Also Published As

Publication number Publication date
EP3709666A4 (en) 2020-09-16
CN111526422A (zh) 2020-08-11
CN111526422B (zh) 2021-08-27
EP3709666A1 (en) 2020-09-16

Similar Documents

Publication Publication Date Title
CN108304835B (zh) 文字检测方法和装置
CN110176027B (zh) 视频目标跟踪方法、装置、设备及存储介质
US11734851B2 (en) Face key point detection method and apparatus, storage medium, and electronic device
US10699751B1 (en) Method, system and device for fitting target object in video frame
US10614574B2 (en) Generating image segmentation data using a multi-branch neural network
CN108122234B (zh) 卷积神经网络训练及视频处理方法、装置和电子设备
WO2020155297A1 (zh) 视频掩码信息的生成、弹幕防遮挡方法、服务器及客户端
WO2022156640A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
CN111291629A (zh) 图像中文本的识别方法、装置、计算机设备及计算机存储介质
JP4738469B2 (ja) 画像処理装置、画像処理プログラムおよび画像処理方法
CN111292334B (zh) 一种全景图像分割方法、装置及电子设备
WO2020155299A1 (zh) 视频帧中目标对象的拟合方法、系统及设备
CN112801236A (zh) 图像识别模型的迁移方法、装置、设备及存储介质
CN114549557A (zh) 一种人像分割网络训练方法、装置、设备及介质
CN114511041A (zh) 模型训练方法、图像处理方法、装置、设备和存储介质
CN111274863A (zh) 一种基于文本山峰概率密度的文本预测方法
WO2022127865A1 (zh) 视频处理方法、装置、电子设备及存储介质
CN114612976A (zh) 关键点检测方法及装置、计算机可读介质和电子设备
WO2023272495A1 (zh) 徽标标注方法及装置、徽标检测模型更新方法及系统、存储介质
CN114494302A (zh) 图像处理方法、装置、设备及存储介质
CN111159976A (zh) 文本位置标注方法、装置
CN115050086B (zh) 样本图像生成方法、模型训练方法、图像处理方法和装置
CN117649358B (zh) 图像处理方法、装置、设备及存储介质
CN117037276A (zh) 姿态信息确定方法、装置、电子设备和计算机可读介质
CN117011848A (zh) 基于提示点的语义分割辅助标注方法、系统及存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019727579

Country of ref document: EP

Effective date: 20190610

ENP Entry into the national phase

Ref document number: 2019727579

Country of ref document: EP

Effective date: 20190610

ENP Entry into the national phase

Ref document number: 2019727579

Country of ref document: EP

Effective date: 20190610

ENP Entry into the national phase

Ref document number: 2019727579

Country of ref document: EP

Effective date: 20190610

ENP Entry into the national phase

Ref document number: 2019727579

Country of ref document: EP

Effective date: 20190610

ENP Entry into the national phase

Ref document number: 2019727579

Country of ref document: EP

Effective date: 20190610

NENP Non-entry into the national phase

Ref country code: DE