WO2020155299A1 - 视频帧中目标对象的拟合方法、系统及设备 - Google Patents
视频帧中目标对象的拟合方法、系统及设备 Download PDFInfo
- Publication number
- WO2020155299A1 WO2020155299A1 PCT/CN2019/077236 CN2019077236W WO2020155299A1 WO 2020155299 A1 WO2020155299 A1 WO 2020155299A1 CN 2019077236 W CN2019077236 W CN 2019077236W WO 2020155299 A1 WO2020155299 A1 WO 2020155299A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- geometric
- target object
- video frame
- fitting
- area
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000012549 training Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 14
- 238000010586 diagram Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4318—Generation of visual interfaces for content selection or interaction; Content or additional data rendering by altering the content in the rendering process, e.g. blanking, blurring or masking an image region
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/454—Content or additional data filtering, e.g. blocking advertisements
- H04N21/4545—Input to filtering algorithms, e.g. filtering a region of the image
Definitions
- This application relates to the field of Internet technology, and in particular to a method, system and device for fitting a target object in a video frame.
- the target object in the video frame is usually fitted by means of a binary mask map.
- a binary mask image consistent with the video frame can be generated.
- the area occupied by the target object and other areas may have different pixel values.
- subsequent processing can be performed on the binary mask image.
- the data volume of the binary mask map is usually relatively large, when the target object is fitted according to the binary mask map, the amount of data that needs to be processed subsequently will increase, resulting in lower processing efficiency.
- the purpose of some embodiments of the present application is to provide a method, system, and device for fitting a target object in a video frame, which can reduce the amount of data after fitting, thereby improving subsequent processing efficiency.
- the embodiment of the present application provides a method for fitting a target object in a video frame, the method comprising: identifying an area where the target object is located in the video frame; selecting several geometric figures to fit the target object The area where the geometric figures are located, so that the combination of the several geometric figures covers the area where the target object is located; according to the type of each geometric figure and the layout parameters of each geometric figure in the video frame, generate Fitting parameters of each geometric figure, and using a combination of fitting parameters of each geometric figure as the fitting parameter of the video frame.
- An embodiment of the present application also provides a fitting system for a target object in a video frame, the system including: an area recognition unit for recognizing the area where the target object is located in the video frame; and a geometric figure selection unit , Used to select several geometric figures to fit the area where the target object is located, so that the combination of the several geometric figures covers the area where the target object is located; the fitting parameter generation unit is used to fit the area where the target object is located; The type of geometric figures and the layout parameters of each geometric figure in the video frame, the fitting parameters of each geometric figure are generated, and the combination of the fitting parameters of each geometric figure is used as the video frame The fitting parameters.
- An embodiment of the present application also provides a fitting device for a target object in a video frame.
- the device includes a processor and a memory.
- the memory is used to store a computer program.
- the computer program is executed by the processor, the The above-mentioned fitting method.
- the embodiments of the present application can identify the area where the target object is located for the target object in the video frame. Then, a combination of one or more geometric figures can be used to cover the target object in the video frame by means of geometric figure fitting. After several geometric figures covering the target object are determined, fitting parameters of these geometric figures can be generated, and the fitting parameters can characterize the type of each geometric figure and the layout of each geometric figure in the video frame. Since the fitting parameters of the geometric figures are not image data, the bytes occupied are usually small, which can reduce the amount of data after fitting, thereby improving subsequent processing efficiency.
- Fig. 1 is a schematic diagram of a fitting method for a target object in an embodiment of the present application
- Fig. 2 is a schematic diagram of fitting a target object with a geometric figure in an embodiment of the present application
- Fig. 3 is a schematic diagram of a rectangular area according to an embodiment of the present application.
- Fig. 4 is a schematic diagram of an elliptical area according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of the structure of mask information and video frame data according to an embodiment of the present application.
- FIG. 6 is a schematic diagram of an implementation manner of an auxiliary identification bit in an embodiment of the present application.
- Fig. 7 is a schematic structural diagram of a fitting device for a target object in an embodiment of the present application.
- This application provides a method for fitting a target object in a video frame.
- the method can be applied to a device with image processing functions. Please refer to Figure 1.
- the method includes the following steps.
- the video frame may be any video frame in the video data to be analyzed.
- the video data to be parsed may be video data of an on-demand video uploaded in the device, or video data of a live video stream received by the device, and the video data may include data of each video frame.
- the device can read the video data to be parsed, and can process each video frame in the video data.
- the device may predetermine the target object to be recognized in the video data, and the target object may be, for example, a person appearing in the video screen.
- the target object can also be flexibly changed. For example, in a live video showing the daily life of a cat, the target object may be a cat.
- the area where the target object is located can be identified from the video frame.
- identifying the target object from the video frame can be achieved in a variety of ways.
- the target object can be identified from the video frame through the instance segmentation algorithm or the semantic segmentation algorithm.
- neural network systems such as Faster-rcnn and Mask-rcnn can be used to identify target objects.
- the video frame can be input to the above-mentioned neural network system model, and the output result of the model can be marked with the position information of the target object contained in the video frame.
- the position information may be represented by the coordinate values of the pixels constituting the target object in the video frame. In this way, the set of coordinate values of the pixel points constituting the target object can represent the area where the target object is located in the video frame.
- S3 Select several geometric figures to fit the area where the target object is located, so that the combination of the several geometric figures covers the area where the target object is located.
- one or more geometric figures may be selected to jointly fit the area where the target object is located, and the result of the fitting It may be that the combination of one or more geometric figures can just cover the area where the target object is located.
- the target object to be recognized in the current video frame is a human body. After the human body as shown in Figure 2 is identified from the current video frame, the human body can be fitted with ellipses and rectangles. The area in the video frame. For example, an ellipse can fit the head of a human body, and a rectangle can fit the upper and lower body of the human body.
- the area where the target object is located may be divided into one or more sub-areas according to the physical characteristics of the target object.
- the physical characteristics can be flexibly set according to the type of the target object.
- the physical features may be the head, torso, limbs, etc.
- the number of sub-regions obtained by segmentation can also be different.
- the trunk and limbs may not be segmented too finely, but can be simply divided into upper body and lower body.
- the area where the target object is located can be divided into one or more sub-areas through a variety of pose algorithms.
- the pose algorithm may include, for example, DensePose algorithm, OpenPose algorithm, Realtime Multi-Person Pose Estimation algorithm, AlphaPose algorithm, Human Body Pose Estimation algorithm, DeepPose algorithm, etc.
- a geometric figure suitable for the sub-region can be selected. For example, for the head of the human body, a circle or an ellipse can be selected, and for the torso and limbs of the human body, a rectangle can be selected. In this way, the combination of geometric figures corresponding to these sub-regions can cover the region where the target object is located.
- the layout parameters of the geometric figures can be determined continuously, so that the geometric figures drawn according to the layout parameters can cover the corresponding sub area.
- the determined layout parameters can also be different according to different geometric figures.
- the layout parameter may be the coordinate values of the two diagonal vertices of the rectangle in the video frame, and the angle between the sides of the rectangle and the horizontal line.
- the coordinate values of the vertex a and the vertex b, and the angle between the side ac and the horizontal line (the dotted line in FIG. 3) can be determined.
- the rectangular area can be determined in the video frame.
- the determined layout parameters may include the coordinates of the center point of the ellipse, the major axis, the minor axis of the ellipse, and the angle between the major axis and the horizontal line (the dotted line in FIG. 4).
- Layout parameters may include the center and radius of the circle.
- the fitting parameters of the geometric figure can be generated according to the selected type of the geometric figure and the layout parameter of the geometric figure.
- the fitting parameters can be represented by encoded values.
- the type of the geometric figure can be represented by a preset figure identifier.
- the preset graphic identifier of a circle is 0, the preset graphic identifier of an ellipse is 1, the preset graphic identifier of a rectangle is 2, and the preset graphic identifier of a triangle is 3, and so on.
- the layout parameters of the geometric figures can be expressed by the coordinates of the pixels or the number of pixels covered.
- the center of a circle can be represented by the coordinate value of the pixel at the center of the circle, and the radius can be represented by the number of pixels covered by the radius.
- the preset graphic identifiers and layout parameters determined above can all be in decimal, and in computer languages, they can usually be expressed in binary or hexadecimal. Therefore, after obtaining the preset graphic identifier and the layout parameter corresponding to the geometric graphic, the preset graphic identifier and the layout parameter can be coded separately. For example, binary coding may be performed on the preset graphic identifier and the layout parameter.
- the circle's preset graphic ID is 0, the coordinates of the center of the circle in the layout parameters are (16,32), and the radius is 8, then after binary coding, the preset graphic ID can be 00, and the center coordinates can be expressed If it is 010000 100000, the radius can be expressed as 001000, and the combination is 00 010000 100000 001000. Then, finally, the encoded data can be used as the fitting parameter of the geometric figure. For each geometric figure included in the video frame, each fitting parameter can be generated in the above-mentioned manner. Finally, the combination of the fitting parameters of each of the geometric figures can be used as the fitting parameters of the video frame.
- the mask information of the video frame may also be generated according to the fitting parameters of these geometric figures.
- the mask information may also contain auxiliary flags added for the fitting parameters.
- the effect of adding the auxiliary identification bit is to be able to distinguish the mask information of the video frame from the real data of the video frame. Referring to FIG. 5, the processed video data can be divided according to each video frame, where for the same video frame, the mask information of the video frame and the data of the video frame are connected end to end. If the auxiliary identification bit is not added, when other subsequent devices read the video data, they cannot distinguish which is the mask information and which is the data of the video frame that needs to be rendered.
- an auxiliary flag can be added to the fitting parameter, and a combination of the auxiliary flag and the fitting parameter can be used as the mask information of the video frame.
- the auxiliary identification bit may indicate the data size of the fitting parameter in a binary manner, and the auxiliary identification bit may be a binary number with a specified number of digits before the fitting parameter.
- the auxiliary identification bit can be a 6-bit binary number.
- the data size is 20 bits
- the auxiliary identification bit can be expressed as 010100
- the final mask information It can be 010100 00 010000 100000 001000.
- other devices can know that the data size of the fitting parameter is 20 bits, and then read the 20-bit data content immediately, and use the 20-bit data content as the fitting The content of the parameter.
- the data after the 20-bit data can be used as the data of the video frame to be rendered.
- the auxiliary flag can also characterize the number of geometric figures included in the fitting parameters, so when other devices read from the video data that the number is consistent with the number represented by the auxiliary flag After the fitting parameters of the geometric figure, the subsequent data read is the data of the video frame to be rendered.
- the auxiliary flag can also characterize the data end position of the fitting parameter. As shown in Figure 6, the auxiliary identification bit can be a series of preset fixed characters. When other devices read the fixed characters, they know that the fitting parameters have been read. The fixed characters are waiting to be read. The data of the rendered video frame.
- a binary mask map of the video frame may also be generated .
- the pixels constituting the area where the target object is located may have a first pixel value, and other pixels may have a second pixel value.
- the generated binary mask map may be consistent with the size of the video frame. The same size can be understood as the same length and width of the picture, and the same resolution, so that the number of pixels contained in the original video frame and the generated binary mask image are the same.
- the size of the generated binary mask image can be consistent with the size of a sub-region cropped in the original video frame, and does not need to be consistent with the size of the original video frame.
- the region formed by the pixels with the first pixel value can be directly performed in the binary mask map through the several geometric figures in the above-mentioned manner. Fitting to get the fitting parameters of each geometric figure.
- the fitting parameters of the video frame can also be determined by means of machine learning.
- different target objects can train the recognition model through different training sample sets.
- a training sample set of the target object may be obtained, the training sample set may include several image samples, and the several image samples all contain the target object.
- each image sample can be manually labeled to label the geometric figures required to cover the target object in each image sample.
- These marked geometric figures may be represented by fitting parameters of the geometric figures, and the fitting parameters may include the type of the geometric figures and the layout parameters of the geometric figures.
- fitting parameters corresponding to each image sample can be generated, and the fitting parameters can be used as an annotation label of the image sample.
- the recognition model may include a deep neural network, and neurons in the deep neural network may have initial weight values. After the deep neural network carrying the initial weight value processes the input image samples, the prediction results corresponding to the input image samples can be obtained.
- the prediction result can indicate the fitting parameters of the geometric figure required to cover the target object in the input image sample. Since the weight value carried by the recognition model in the initial stage is not accurate enough, there will be a certain gap between the fitting parameters represented by the prediction results and the fitting parameters manually labeled.
- the difference value between the fitting parameter represented by the prediction result and the artificially labeled fitting parameter can be calculated, and the difference value can be provided as feedback data to the recognition model to change the neuron in the recognition model.
- Weights In this way, by repeatedly correcting the weight value, after any image sample is input into the trained recognition model, the predicted result output by the trained recognition model is consistent with the fitted parameters represented by the label label of the input image sample. , So you can complete the training process.
- the video frame may be input to the trained recognition model, and the prediction result output by the trained recognition model can be used as the fitting parameter of the video frame .
- This application also provides a fitting system for a target object in a video frame, the system including:
- An area identification unit configured to identify the area where the target object is located in the video frame
- a geometric figure selection unit for selecting several geometric figures to fit the area where the target object is located, so that the combination of the several geometric figures covers the area where the target object is located;
- the fitting parameter generation unit is configured to generate the fitting parameters of each geometric figure according to the type of each geometric figure and the layout parameter of each geometric figure in the video frame, and combine each geometric figure The combination of the fitting parameters is used as the fitting parameter of the video frame.
- the geometric figure selection unit includes:
- a sub-region segmentation module configured to segment the area where the target object is located into one or more sub-regions according to the physical characteristics of the target object
- the layout parameter determination module is configured to select a geometric figure suitable for the sub-region for any one of the sub-regions, and determine the layout parameters of the geometric figures, so that the geometric figures drawn according to the layout parameters The graphic covers the sub-region.
- the fitting parameter generation unit includes:
- the encoding module is used to identify the preset graphic identifier corresponding to the type of the geometric figure, and respectively encode the preset graphic identifier and the layout parameter of the geometric figure, and use the encoded data as the geometric figure The fitting parameters.
- the fitting parameter generation unit includes:
- the training sample set obtaining module is used to obtain the training sample set of the target object in advance.
- the training sample set includes several image samples, and the several image samples all contain the target object, and each image The samples are all provided with a label, and the label is used to characterize the fitting parameters of the geometric figure required to cover the target object in the image sample;
- the training module is used to train a recognition model using image samples in the training sample set, so that after any image sample is input into the trained recognition model, the predicted result output by the trained recognition model is the same as the input image sample
- the fitting parameters represented by the labeled labels are consistent;
- the result prediction module is configured to input the video frame into the trained recognition model, and use the prediction result output by the trained recognition model as the fitting parameter of the video frame.
- this application also provides a fitting device for a target object in a video frame.
- the device includes a memory and a processor.
- the memory is used to store a computer program that is executed by the processor.
- the method for generating mask information as described above can be realized.
- the device may include a processor, an internal bus, and a memory.
- the memory may include memory and non-volatile memory.
- the processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it.
- FIG. 7 is only for illustration, and it does not limit the structure of the foregoing device.
- the device may also include more or fewer components than those shown in FIG. 7, for example, may also include other processing hardware, such as GPU (Graphics Processing Unit, image processor), or may have components similar to those shown in FIG. Different configurations.
- GPU Graphics Processing Unit
- this application does not exclude other implementations, such as logic devices or a combination of software and hardware, and so on.
- the processor may include a central processing unit (CPU) or a graphics processing unit (GPU), of course, may also include other single-chip microcomputers, logic gate circuits, integrated circuits, etc. with logic processing capabilities, or appropriate combination.
- the memory described in this embodiment may be a memory device for storing information.
- the device that can store binary data can be a memory; in an integrated circuit, a circuit with storage function without physical form can also be a memory, such as RAM, FIFO, etc.; in the system, it has physical storage
- the device can also be called a memory and so on.
- the memory can also be implemented in the form of cloud storage, and the specific implementation manner is not limited in this specification.
- the technical solution provided by the present application can identify the area where the target object is located for the target object in the video frame. Then, a combination of one or more geometric figures can be used to cover the target object in the video frame by means of geometric figure fitting. After several geometric figures covering the target object are determined, fitting parameters of these geometric figures can be generated, and the fitting parameters can characterize the type of each geometric figure and the layout of each geometric figure in the video frame. Since the fitting parameters of the geometric figures are not image data, the bytes occupied are usually small, which can reduce the amount of data after fitting, thereby improving subsequent processing efficiency.
- each embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
- the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic A disc, an optical disc, etc., include a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (13)
- 一种视频帧中目标对象的拟合方法,其中,所述方法包括:在所述视频帧中识别所述目标对象所处的区域;选择若干个几何图形拟合所述目标对象所处的区域,以使得所述若干个几何图形的组合覆盖所述目标对象所处的区域;根据各个所述几何图形的类型以及各个所述几何图形在所述视频帧中的布局参数,生成各个所述几何图形的拟合参数,并将各个所述几何图形的拟合参数的组合作为所述视频帧的拟合参数。
- 根据权利要求1所述的方法,其中,在所述视频帧中识别所述目标对象所处的区域之后,所述方法还包括:生成所述视频帧的二进制掩码图,在所述二进制掩码图中,构成所述目标对象所处的区域的像素点具备第一像素值,其它像素点具备第二像素值,所述第一像素值和所述第二像素值不同。
- 根据权利要求2所述的方法,其中,选择若干个几何图形拟合所述目标对象所处的区域包括:在所述二进制掩码图中,通过所述若干个几何图形对具备第一像素值的像素点构成的区域进行拟合。
- 根据权利要求1或2所述的方法,其中,选择若干个几何图形拟合所述目标对象所处的区域包括:将所述目标对象所处的区域按照所述目标对象的形体特征分割为一个或者多个子区域;针对任一所述子区域,选取与所述子区域相适配的几何图形,并确定所述几何图形的布局参数,以使得按照所述布局参数绘制的所述几何图形覆盖所述子区域。
- 根据权利要求1所述的方法,其中,所述几何图形在所述视频帧中的布局参数通过像素点的坐标值和/或像素点的数量表示。
- 根据权利要求1所述的方法,其中,生成各个所述几何图形的拟合参数包括:识别所述几何图形的类型对应的预设图形标识,并分别对所述预设图形标识和所述几何图形的布局参数进行编码,并将编码后的数据作为所述几何图形的拟合参数。
- 根据权利要求1所述的方法,其中,生成各个所述几何图形的拟合参数包括:预先获取所述目标对象的训练样本集,所述训练样本集中包括若干个图像样本,所述若干个图像样本中均包含所述目标对象,并且每个所述图像样本均具备标注标签,所述标注标签用于表征覆盖所述图像样本中的目标对象所需的几何图形的拟合参数;利用所述训练样本集中的图像样本训练识别模型,以使得在将任一图像样 本输入训练后的识别模型后,所述训练后的识别模型输出的预测结果与输入的图像样本的标注标签表征的拟合参数一致;将所述视频帧输入所述训练后的识别模型,并将所述训练后的识别模型输出的预测结果作为所述视频帧的拟合参数。
- 根据权利要求1所述的方法,其中,所述方法还包括:针对所述视频帧的拟合参数添加辅助标识位,并基于所述辅助标识位和所述视频帧的拟合参数的组合,生成所述视频帧的掩码信息;其中,所述辅助标识位包括以下至少一种功能:表征所述视频帧的拟合参数的数据大小;表征所述视频帧的拟合参数中包含的几何图形的数量;或者表征所述视频帧的拟合参数的数据结束位置。
- 一种视频帧中目标对象的拟合系统,其中,所述系统包括:区域识别单元,用于在所述视频帧中识别所述目标对象所处的区域;几何图形选择单元,用于选择若干个几何图形拟合所述目标对象所处的区域,以使得所述若干个几何图形的组合覆盖所述目标对象所处的区域;拟合参数生成单元,用于根据各个所述几何图形的类型以及各个所述几何图形在所述视频帧中的布局参数,生成各个所述几何图形的拟合参数,并将各个所述几何图形的拟合参数的组合作为所述视频帧的拟合参数。
- 根据权利要求9所述的系统,其中,所述几何图形选择单元包括:子区域分割模块,用于将所述目标对象所处的区域按照所述目标对象的形体特征分割为一个或者多个子区域;布局参数确定模块,用于针对任一所述子区域,选取与所述子区域相适配的几何图形,并确定所述几何图形的布局参数,以使得按照所述布局参数绘制的所述几何图形覆盖所述子区域。
- 根据权利要求9所述的系统,其中,所述拟合参数生成单元包括:编码模块,用于识别所述几何图形的类型对应的预设图形标识,并分别对所述预设图形标识和所述几何图形的布局参数进行编码,并将编码后的数据作为所述几何图形的拟合参数。
- 根据权利要求9所述的系统,其中,所述拟合参数生成单元包括:训练样本集获取模块,用于预先获取所述目标对象的训练样本集,所述训练样本集中包括若干个图像样本,所述若干个图像样本中均包含所述目标对象,并且每个所述图像样本均具备标注标签,所述标注标签用于表征覆盖所述图像样本中的目标对象所需的几何图形的拟合参数;训练模块,用于利用所述训练样本集中的图像样本训练识别模型,以使得在将任一图像样本输入训练后的识别模型后,所述训练后的识别模型输出的预测结果与输入的图像样本的标注标签表征的拟合参数一致;结果预测模块,用于将所述视频帧输入所述训练后的识别模型,并将所述训练后的识别模型输出的预测结果作为所述视频帧的拟合参数。
- 一种视频帧中目标对象的拟合设备,其中,所述设备包括处理器和存储器,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至8中任一所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19727579.5A EP3709666A1 (en) | 2019-02-01 | 2019-03-06 | Method for fitting target object in video frame, system, and device |
US16/442,081 US10699751B1 (en) | 2019-03-06 | 2019-06-14 | Method, system and device for fitting target object in video frame |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910105682.5A CN111526422B (zh) | 2019-02-01 | 2019-02-01 | 一种视频帧中目标对象的拟合方法、系统及设备 |
CN201910105682.5 | 2019-02-01 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/442,081 Continuation US10699751B1 (en) | 2019-03-06 | 2019-06-14 | Method, system and device for fitting target object in video frame |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020155299A1 true WO2020155299A1 (zh) | 2020-08-06 |
Family
ID=67437365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/077236 WO2020155299A1 (zh) | 2019-02-01 | 2019-03-06 | 视频帧中目标对象的拟合方法、系统及设备 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3709666A1 (zh) |
CN (1) | CN111526422B (zh) |
WO (1) | WO2020155299A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112347955A (zh) * | 2020-11-12 | 2021-02-09 | 上海影卓信息科技有限公司 | 视频中基于帧预测的物体快速识别方法、系统及介质 |
WO2022116977A1 (zh) * | 2020-12-04 | 2022-06-09 | 腾讯科技(深圳)有限公司 | 目标对象的动作驱动方法、装置、设备及存储介质及计算机程序产品 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150117792A1 (en) * | 2013-10-30 | 2015-04-30 | Ricoh Imaging Company, Ltd. | Image-processing system, imaging apparatus and image-processing method |
EP2905738A1 (en) * | 2014-02-05 | 2015-08-12 | Panasonic Intellectual Property Management Co., Ltd. | Monitoring apparatus, monitoring system, and monitoring method |
CN106951820A (zh) * | 2016-08-31 | 2017-07-14 | 江苏慧眼数据科技股份有限公司 | 基于环形模板和椭圆拟合的客流统计方法 |
CN109173263A (zh) * | 2018-08-31 | 2019-01-11 | 腾讯科技(深圳)有限公司 | 一种图像数据处理方法和装置 |
CN109242868A (zh) * | 2018-09-17 | 2019-01-18 | 北京旷视科技有限公司 | 图像处理方法、装置、电子设备及储存介质 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102402680B (zh) * | 2010-09-13 | 2014-07-30 | 株式会社理光 | 人机交互系统中手部、指示点定位方法和手势确定方法 |
EP2812894A4 (en) * | 2012-02-06 | 2016-04-06 | Legend3D Inc | MANAGEMENT SYSTEM FOR CINEMATOGRAPHIC PROJECTS |
CN103700112A (zh) * | 2012-09-27 | 2014-04-02 | 中国航天科工集团第二研究院二O七所 | 一种基于混合预测策略的遮挡目标跟踪方法 |
CN102970529B (zh) * | 2012-10-22 | 2016-02-17 | 北京航空航天大学 | 一种基于对象的多视点视频分形编码压缩与解压缩方法 |
CN103236074B (zh) * | 2013-03-25 | 2015-12-23 | 深圳超多维光电子有限公司 | 一种2d/3d图像处理方法及装置 |
WO2015198323A2 (en) * | 2014-06-24 | 2015-12-30 | Pic2Go Ltd | Photo tagging system and method |
CN104299186A (zh) * | 2014-09-30 | 2015-01-21 | 珠海市君天电子科技有限公司 | 一种对图片进行马赛克处理的方法及装置 |
US9864901B2 (en) * | 2015-09-15 | 2018-01-09 | Google Llc | Feature detection and masking in images based on color distributions |
CN106022236A (zh) * | 2016-05-13 | 2016-10-12 | 上海宝宏软件有限公司 | 一种基于人体轮廓的动作识别方法 |
CN107133604A (zh) * | 2017-05-25 | 2017-09-05 | 江苏农林职业技术学院 | 一种基于椭圆拟合和预测性神经网络的猪步态异常检测方法 |
CN108665490B (zh) * | 2018-04-02 | 2022-03-22 | 浙江大学 | 一种基于多属性编码及动态权重的图形匹配方法 |
-
2019
- 2019-02-01 CN CN201910105682.5A patent/CN111526422B/zh active Active
- 2019-03-06 EP EP19727579.5A patent/EP3709666A1/en not_active Withdrawn
- 2019-03-06 WO PCT/CN2019/077236 patent/WO2020155299A1/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150117792A1 (en) * | 2013-10-30 | 2015-04-30 | Ricoh Imaging Company, Ltd. | Image-processing system, imaging apparatus and image-processing method |
EP2905738A1 (en) * | 2014-02-05 | 2015-08-12 | Panasonic Intellectual Property Management Co., Ltd. | Monitoring apparatus, monitoring system, and monitoring method |
CN106951820A (zh) * | 2016-08-31 | 2017-07-14 | 江苏慧眼数据科技股份有限公司 | 基于环形模板和椭圆拟合的客流统计方法 |
CN109173263A (zh) * | 2018-08-31 | 2019-01-11 | 腾讯科技(深圳)有限公司 | 一种图像数据处理方法和装置 |
CN109242868A (zh) * | 2018-09-17 | 2019-01-18 | 北京旷视科技有限公司 | 图像处理方法、装置、电子设备及储存介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3709666A4 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112347955A (zh) * | 2020-11-12 | 2021-02-09 | 上海影卓信息科技有限公司 | 视频中基于帧预测的物体快速识别方法、系统及介质 |
WO2022116977A1 (zh) * | 2020-12-04 | 2022-06-09 | 腾讯科技(深圳)有限公司 | 目标对象的动作驱动方法、装置、设备及存储介质及计算机程序产品 |
Also Published As
Publication number | Publication date |
---|---|
EP3709666A4 (en) | 2020-09-16 |
CN111526422A (zh) | 2020-08-11 |
CN111526422B (zh) | 2021-08-27 |
EP3709666A1 (en) | 2020-09-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304835B (zh) | 文字检测方法和装置 | |
CN110176027B (zh) | 视频目标跟踪方法、装置、设备及存储介质 | |
US11734851B2 (en) | Face key point detection method and apparatus, storage medium, and electronic device | |
US10699751B1 (en) | Method, system and device for fitting target object in video frame | |
US10614574B2 (en) | Generating image segmentation data using a multi-branch neural network | |
CN108122234B (zh) | 卷积神经网络训练及视频处理方法、装置和电子设备 | |
WO2020155297A1 (zh) | 视频掩码信息的生成、弹幕防遮挡方法、服务器及客户端 | |
WO2022156640A1 (zh) | 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品 | |
CN111291629A (zh) | 图像中文本的识别方法、装置、计算机设备及计算机存储介质 | |
JP4738469B2 (ja) | 画像処理装置、画像処理プログラムおよび画像処理方法 | |
CN111292334B (zh) | 一种全景图像分割方法、装置及电子设备 | |
WO2020155299A1 (zh) | 视频帧中目标对象的拟合方法、系统及设备 | |
CN112801236A (zh) | 图像识别模型的迁移方法、装置、设备及存储介质 | |
CN114549557A (zh) | 一种人像分割网络训练方法、装置、设备及介质 | |
CN114511041A (zh) | 模型训练方法、图像处理方法、装置、设备和存储介质 | |
CN111274863A (zh) | 一种基于文本山峰概率密度的文本预测方法 | |
WO2022127865A1 (zh) | 视频处理方法、装置、电子设备及存储介质 | |
CN114612976A (zh) | 关键点检测方法及装置、计算机可读介质和电子设备 | |
WO2023272495A1 (zh) | 徽标标注方法及装置、徽标检测模型更新方法及系统、存储介质 | |
CN114494302A (zh) | 图像处理方法、装置、设备及存储介质 | |
CN111159976A (zh) | 文本位置标注方法、装置 | |
CN115050086B (zh) | 样本图像生成方法、模型训练方法、图像处理方法和装置 | |
CN117649358B (zh) | 图像处理方法、装置、设备及存储介质 | |
CN117037276A (zh) | 姿态信息确定方法、装置、电子设备和计算机可读介质 | |
CN117011848A (zh) | 基于提示点的语义分割辅助标注方法、系统及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2019727579 Country of ref document: EP Effective date: 20190610 |
|
ENP | Entry into the national phase |
Ref document number: 2019727579 Country of ref document: EP Effective date: 20190610 |
|
ENP | Entry into the national phase |
Ref document number: 2019727579 Country of ref document: EP Effective date: 20190610 |
|
ENP | Entry into the national phase |
Ref document number: 2019727579 Country of ref document: EP Effective date: 20190610 |
|
ENP | Entry into the national phase |
Ref document number: 2019727579 Country of ref document: EP Effective date: 20190610 |
|
ENP | Entry into the national phase |
Ref document number: 2019727579 Country of ref document: EP Effective date: 20190610 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |