WO2013097072A1 - 识别视频的字符的方法和装置 - Google Patents

识别视频的字符的方法和装置 Download PDF

Info

Publication number
WO2013097072A1
WO2013097072A1 PCT/CN2011/084642 CN2011084642W WO2013097072A1 WO 2013097072 A1 WO2013097072 A1 WO 2013097072A1 CN 2011084642 W CN2011084642 W CN 2011084642W WO 2013097072 A1 WO2013097072 A1 WO 2013097072A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
connected domain
determining
video
class
Prior art date
Application number
PCT/CN2011/084642
Other languages
English (en)
French (fr)
Inventor
杨杰
万华林
张军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2011/084642 priority Critical patent/WO2013097072A1/zh
Priority to CN201280000022.7A priority patent/CN103493067B/zh
Publication of WO2013097072A1 publication Critical patent/WO2013097072A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program

Definitions

  • the present invention relates to the field of video and, more particularly, to a method and apparatus for identifying characters of a video. Background technique
  • the existing method for recognizing the characters of a video is generally to treat the character image as a normal image, and use a segmentation algorithm such as connected domain analysis, graph cut method, K-means clustering, etc. to segment the characters and determine the character text. .
  • a segmentation algorithm such as connected domain analysis, graph cut method, K-means clustering, etc.
  • Embodiments of the present invention provide a method and apparatus for recognizing a character of a video, which can shorten the time of the online character recognition process and improve the real-time performance of the video analysis process.
  • a method for recognizing a character of a video comprising: determining a character model according to a source video; determining, according to the character model corresponding to the target video, from the pixels included in the target video The character pixel of the character of the video; according to the character pixel, at least one character text representing the character is determined.
  • an apparatus for recognizing a character of a video comprising: a character model determining module, configured to determine a character model according to a source video; and a character pixel determining module, configured to determine a module determined according to the character model
  • the character model corresponding to the target video, the character pixel belonging to the character of the target video is determined from the pixels included in the target video;
  • the character text determining module is configured to determine at least the character pixel determined by the character pixel determining module A character literal that represents the character.
  • a method and apparatus for recognizing a character of a video by determining a character model from a source video and determining a character of the target video from pixels of the target video according to a character model corresponding to the target video, thereby eliminating The character is segmented from the image of the target video by a segmentation algorithm or the like, thereby shortening the time of the online character recognition process and improving the real-time performance of the video analysis process.
  • FIG. 1 is a schematic flowchart of a method of identifying characters of a video according to an embodiment of the present invention.
  • 2 is a schematic flow chart of a method of determining a character model according to an embodiment of the present invention.
  • FIG. 3 is a schematic flow diagram of a method of identifying characters of a video in accordance with another embodiment of the present invention.
  • FIG. 4 is a schematic block diagram of an apparatus for recognizing characters of a video according to an embodiment of the present invention.
  • FIG. 5 is a schematic block diagram of a character model determination module according to an embodiment of the present invention.
  • FIG. 6 is a schematic block diagram of an apparatus for recognizing characters of a video according to another embodiment of the present invention. detailed description
  • FIG. 1 shows a schematic flow of a method of identifying characters of a video according to an embodiment of the present invention.
  • the method includes:
  • S120 Determine, according to the character model corresponding to the target video, a character pixel that belongs to a character of the target video from pixels included in the target video.
  • a character model may be determined according to the source video, where the character model is a probability model, and by substituting pixels in the target video into the character model, it can be calculated that the pixel belongs to the character model.
  • the probability of the corresponding character in the embodiment of the present invention, the correspondence between the character model and the character can be represented by the color, size, shape, and the like of the character.
  • the white color character model can be determined according to the white color character in the source video. By substituting pixels in the target video into the white color character model, the probability that the pixel belongs to a white color character can be determined. It should be understood that the correspondence between the character model and the characters is not limited to the color, size, and shape of the characters, and the parameters that can represent the character features fall within the scope of the embodiments of the present invention.
  • the character model can be determined according to the method for determining a character model shown in FIG. As shown in Figure 2, the method includes:
  • S230 performing a clustering operation on the connected domain, determining a connected domain class of the first character region
  • S240 determining a training connected domain class from the connected domain class according to the training data determining parameter, and connecting the training connected domain class The average value of each connected domain included is determined as training data
  • the character region determination parameter may include: a ratio of the number of edges included in the character region to the area of the character region, a ratio of the number of edges in the horizontal direction and the vertical direction, and the ratio The degree of symmetry of the number of edges in the vertical direction.
  • the ratio of the number of edges to the area of the header area is greater than a prescribed threshold, for example, The threshold is set to 0.1;
  • the ratio of the number of edges in the horizontal direction to the vertical direction is within a prescribed range, for example, the range can be set to [0.5, 2];
  • the symmetry of the number of edges in the vertical direction that is, the ratio of the number of edges of the upper half and the lower half of the character region is within a prescribed range, for example, the range can be set to [0.5, 2 1 should be understood,
  • the parameters listed above are merely exemplary descriptions of the embodiments of the present invention. Other parameters that can be used to determine that an image includes a character region and the threshold and range of the parameters fall within the scope of protection of the present invention. Further, the specific values of the thresholds and ranges of the above parameters are merely one embodiment of the present invention, and the present invention is not limited thereto.
  • a connected domain labeling operation is performed on the character region to determine all connected domains of the character region.
  • the connected domain labeling operation is a standard algorithm for image analysis
  • the input content is an image
  • the output result is a number of the image. Areas, pixels within each area have the same or similar characteristics, for example, the same color. Therefore, this step can be performed automatically in an offline manner.
  • a clustering operation is performed on the connected domain to determine all connected domain classes of the character region.
  • the K-means algorithm may be used for clustering.
  • the K-means algorithm is a standard clustering algorithm, and the input content is All data (for example, the above-mentioned connected domain) and the number of categories, the output of which is the data of each category (for example, the above connected domain).
  • colors may be used as clustering parameters, that is, clustering according to colors, for example, all connected domains of white colors are classified into one connected domain class. Therefore, this step can be performed automatically in an offline manner. It should be understood that the above-mentioned clustering parameters are not limited to the color of the connected domain, and the parameters that can express the common features of the connected domain fall within the scope of the embodiments of the present invention.
  • the training data determination parameter may include: the number of connected domains included in the connected domain class, and the connectivity, in the embodiment of the present invention. The ratio of the area of the domain class to the area of the character area, and the symmetry of the area of the connected domain class in the vertical direction. If the character region determination parameter satisfies the following condition, it may be determined that the connected domain in the connected domain class belongs to a character and satisfies the requirement as training data. Therefore, the step may be automatically performed in an offline manner.
  • the number of connected domains included in the connected domain class is greater than a predetermined threshold.
  • the threshold may be set to 20; 2.
  • the ratio of the area of the connected domain class to the area of the entire character region is within a prescribed range.
  • the range may be set to [0.3, 0.9];
  • the symmetry of the area of the connected domain class in the vertical direction that is, the ratio of the area of the connected domain to the area above the neutral line of the character region and the area of the portion below the middle line is within a predetermined range.
  • the range may be set to [ 0.5 , 2 1
  • the character model can be determined based on the training data.
  • a mixed Gaussian model may be adopted, and the character model is represented by the formula (1):
  • / ⁇ ; , ⁇ ⁇ , ) represents the probability of the mixed Gaussian model; represents the weight of the first Gaussian component, ⁇ represents the probability of the first Gaussian component, and A represents the mean of the Gaussian component,
  • the training of the mixed Gaussian model may adopt an EM algorithm, that is, an expectation maximization algorithm, and gradually increase the likelihood of the parameter and the training data by gradually improving the parameters of the model, and finally terminate at a maximum point.
  • the EM algorithm can also be regarded as a successive approximation algorithm: the parameters of the model are not known in advance, and a set of parameters can be randomly selected or an initial parameter ⁇ can be given roughly in advance to determine the corresponding parameters.
  • a plurality of different types of character models may be determined according to the type of the source video, and the plurality of character models are provided with a distinguishing mark, and the category may be based on the source video information (for example) For example, the source of the source video, the production style, etc.), and the difference mark can be embodied in the name of the character model.
  • the source video information is not limited to the source, the production style, and the like of the source video, and other information that can reflect the common features of a certain type of video falls within the protection scope of the present invention.
  • the embodiment of the above-mentioned difference mark is only one embodiment of the present invention, and other ways of recognizing the type of the character model fall within the protection scope of the present invention.
  • each step in the method for determining the character model can be automatically performed in an offline manner, so no manual intervention is required. It can automatically obtain training data and pre-establish the apparent model of characters, which can accelerate the speed of online character recognition, shorten the time of online character recognition process, and improve the real-time performance of video analysis process.
  • a character model corresponding to the target video may be determined according to the target video information (eg, the source of the target video, the production style, etc.) and the distinguishing mark of the character model, and each frame image of the target video is traversed. All the pixels in the pixel, by substituting the pixel into the character model, the probability that the pixel belongs to the character of the target video can be determined. When the probability is greater than a predetermined threshold (for example, 0.8), the pixel can be determined as the character pixel of the character (seed point) ).
  • a predetermined threshold for example, 0.8
  • the pixel can be determined as the character pixel of the character (seed point) ).
  • determining, according to the character region determination parameter, whether a character region is included in one frame image of the target video, and if the character region is included, all pixels of the character region may be traversed. If the character area is not included, you can go directly to the next frame image. Therefore, determining, according to the character model, a character pixel belonging to the character from pixels included in the video, including:
  • character pixels belonging to the character of the target video are determined from pixels included in the second character region.
  • character area determination parameter may include: a ratio of the number of edges included in the character region to the area of the character region, a ratio of the number of edges in the horizontal direction to the vertical direction, and a degree of symmetry in the vertical direction of the number of the edges. Wherein, if the character region determination parameter satisfies the following condition, it may be determined that the image includes a character region.
  • the ratio of the number of edges to the area of the header area is greater than a prescribed threshold, for example, the threshold can be set to 0.1;
  • the ratio of the number of edges in the horizontal direction to the vertical direction is within a prescribed range, for example, the range can be set to [0.5, 2];
  • the symmetry of the number of edges in the vertical direction that is, the ratio of the number of edges of the upper half and the lower half of the character region is within a prescribed range, for example, the range can be set to [0.5, 2 1 should be understood,
  • the parameters listed above are merely exemplary descriptions of the embodiments of the present invention. Other parameters that can be used to determine that an image includes a character region and the threshold and range of the parameters fall within the scope of protection of the present invention. Further, the specific values of the thresholds and ranges of the above parameters are merely one embodiment of the present invention, and the present invention is not limited thereto.
  • the image can further speed up the speed of online character recognition, shorten the time of the online image recognition process, and improve the real-time performance of the video analysis process.
  • the seed point determined at S120 may be filled into a character image (for example, a binary image) according to a seed filling algorithm, and the character image is sent to an OCR (Optical Character Recognition) engine. , finally output character text.
  • the seed filling algorithm is also called the boundary filling algorithm.
  • the basic idea is: Starting from an inner point of the polygon area, draw the point from the inside out with the given color until the boundary. If the boundary is specified in one color, the seed fill algorithm can be processed pixel by pixel until the boundary color is encountered. Since there may be multiple frames in a video that include images of different characters (for example, color, number of words, fonts, shapes, etc.), the output of the character text may be more than one.
  • the method for recognizing a character of a video by determining a character model according to the source video and determining a character of the target video from pixels of the target video according to a character model corresponding to the target video, It is not necessary to segment the character from the image of the target video by using a segmentation algorithm (for example, connected domain analysis, graph cut method, K-means clustering, etc.), thereby shortening the time of the online image recognition process and improving the video analysis process. real-time.
  • a segmentation algorithm for example, connected domain analysis, graph cut method, K-means clustering, etc.
  • S130 may further include:
  • S140 Determine, according to an edit distance between the character texts and a number of characters included, a similarity between the character texts
  • S150 Determine, according to the similarity, a character text class, where the character text class includes at least three character texts whose similarities between the two are less than a first threshold;
  • S160 Determine, according to the similarity between the character texts included in the character text class, the representative character text of the character text class.
  • a similarity model can be used to express a relationship for determining the edit distance between the character texts and the number of characters included, which is expressed by the following formula (2):
  • S represents the similarity of the character texts rapl and rap2, and the value range can be set to: [ 0, 1 ];
  • Dis ( capl , cap2 ) is the edit distance between the character text capl and cap2, the edit The distance may represent the steps required to convert the character text c ⁇ l into the character text cap2, I c ⁇ l I , I capl I are respectively the character text ", the number of single words included in the capl.
  • the similarity between each other is less than A specified threshold (for example, 0.5) is classified as a character text class, that is, the character text in the same character text class can be considered to be the same.
  • the apparatus may include:
  • a character model determining module 410 configured to determine a character model according to the source video
  • the character pixel determining module 420 is configured to determine, according to the character model determined by the character model determining module 410, the character model corresponding to the target video, and determine, from the pixels included in the target video, character pixels of the characters belonging to the target video;
  • the character text determining module 430 is configured to determine at least one character text representing the character according to the character pixel determined by the character pixel determining module 420.
  • An apparatus for recognizing a character of a video may correspond to an execution body of a method of recognizing a character of a video of the embodiment of the present invention, and each module in the apparatus for recognizing a character of the video and the above other operations and/or The functions are respectively implemented in order to implement the corresponding processes of the method in FIG. 1, and are not described here.
  • FIG. 5 shows a schematic block diagram of a character model determination module 410 in accordance with an embodiment of the present invention.
  • the character model determination module 410 can include:
  • a character area determining unit 411 configured to determine, according to the character area determining parameter, that the first image of the source video includes a first character area
  • the connected domain labeling unit 412 is configured to perform a connected domain labeling operation on the first character region determined by the character region determining unit 411, and determine a connected domain of the first character region;
  • a connected domain clustering unit 413 configured to perform a clustering operation on the connected domain determined by the connected domain labeling unit 412, and determine a connected domain class of the first character region;
  • the training data determining unit 414 is configured to determine a training connected domain class from the connected domain class determined by the connected domain clustering unit 413 according to the training data determining parameter, and average the connected domains included in the training connected domain class. The value is determined as training data;
  • the character model determining unit 415 is configured to determine the character model based on the training data determined by the training data determining unit 414.
  • the operations performed by the character model determining module 410 and the units included therein can be automatically performed in an offline manner, the training data can be automatically obtained and the appearance of the characters can be established in advance without manual intervention.
  • the model can speed up the online character recognition, shorten the time of the online image recognition process, and improve the real-time performance of the video analysis process.
  • the character model determining module 410 may correspond to the execution body of the method for determining the character model of the embodiment of the present invention, and the units in the character model determining module 410 and the other operations and/or functions described above are respectively Implement the corresponding flow of the method in Figure 2, for the tube Clean, no longer repeat here.
  • the character pixel determining module 420 is further configured to determine, according to the character region determining parameter, that the second image of the target video includes the second character region;
  • the character model corresponding to the target video determined by the character model determining module 410, and the character pixels belonging to the character of the target video are determined from pixels included in the second character region.
  • the image can further speed up the speed of online character recognition, shorten the time of the online image recognition process, and improve the real-time performance of the video analysis process.
  • An apparatus for recognizing a character of a video by determining a character model according to a source video, and determining a character of the target video from pixels of the target video according to a character model corresponding to the target video, and thus does not need to pass Segmentation algorithm (for example, connected domain analysis, graph cut method, K-means clustering, etc.), segmenting the character from the image of the target video, thereby shortening the time of the online image recognition process and improving the real-time performance of the video analysis process .
  • Segmentation algorithm for example, connected domain analysis, graph cut method, K-means clustering, etc.
  • the apparatus for recognizing characters of a video may further include:
  • the similarity confirmation module 440 is configured to determine, according to the edit distance between the character texts determined by the character text determining module 430 and the number of characters included, the similarity between the character texts;
  • a character text class determining module 450 configured to determine, according to the similarity determined by the similarity confirmation module 440, a character text class, where the character text class includes at least three character texts whose similarities between each other are less than a first threshold ;
  • the representative character text determining module 460 is configured to determine the representative character text of the character text class according to the similarity between the character texts included in the character text class determined by the character text class determining module 450. Therefore, by clustering based on the similarity model and representing the character text, it is possible to remove the repetition of the character text and correct some of the errors caused by the OCR.
  • An apparatus for recognizing a character of a video may correspond to an execution body of a method of recognizing a character of a video of the embodiment of the present invention, and each module in the apparatus for recognizing a character of the video and the above other operations and/or The functions are respectively implemented in order to implement the corresponding processes of the methods in FIG. 1-3, and are not described here.
  • the size of the sequence numbers of the above processes does not mean the order of execution sequence, and the execution order of each process should be determined by its function and internal logic, and should not be implemented in the embodiment of the present invention. Form any limit.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential to the prior art or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .

Abstract

本发明实施例提供了一种识别视频的字符的方法和装置,该方法包括:根据源视频,确定字符模型;根据与目标视频相对应的该字符模型,从该目标视频包括的像素中,确定属于该目标视频的字符的字符像素;根据该字符像素,确定至少一个代表该字符的字符文本。根据本实施例的识别视频的字符的方法和装置,通过源视频确定字符模型,并根据该字符模型,从该视频的像素中确定该字符,因此无需通过分割算法等,从该视频的图像中分割出该字符,从而能够缩短在线图像识别过程的时间,提高视频分析过程的实时性。

Description

识别视频的字符的方法和装置 技术领域
本发明涉及视频领域, 并且更具体地, 涉及识别视频的字符的方法和装 置。 背景技术
随着多媒体技术和网络技术的快速发展, 数字视频出现爆炸式增长, 以 视频的方式获取信息成为一种比较方便的方式, 尤其是新闻视频更是人们获 取最新资讯的常用方式之一。 但由于视频量巨大, 顺序线性地观看大规模的 视频(几十小时甚至上百小时)变得难以接受。 人们更倾向于从大规模的视 频中, 有选择性地观看感兴趣的视频。 基于内容的视频分析检索技术为这一 需求提供了可能,传统的基于内容的视频分析检索技术是使用音视频以及文 本等多模态特征对视频进行拆条分割以达到方便浏览的目的。 视频的字符 (例如, 新闻视频的标题)往往高度概括了该视频的主要内容。 因此, 对字 符的识别, 对于视频高层语义分析起着至关重要的作用。
现有的识别视频的字符的方法通常是将字符图像当作普通图像,使用连 通域分析、 图切法、 K-均值(K-means ) 聚类等分割算法, 分割出字符, 进 而确定字符文本。 在进行图像分割时, 为了达到较好的分割效果, 通常需要 耗费较大的计算量, 延长了整个字符识别过程的时间, 从而降低了整个视频 分析过程的实时性。
因此, 需要合适的方案来识别视频的字符, 以缩短在线实时识别字符的 时间, 提高视频分析过程的实时性。 发明内容
本发明实施例提供一种识别视频的字符的方法和装置, 能够缩短在线字 符识别过程的时间, 提高视频分析过程的实时性。
一方面,提供了一种识别视频的字符的方法,该方法包括:根据源视频, 确定字符模型; 根据与目标视频相对应的该字符模型, 从该目标视频包括的 像素中, 确定属于该目标视频的字符的字符像素; 根据该字符像素, 确定至 少一个代表该字符的字符文本。 另一方面, 提供了一种识别视频的字符的装置, 该装置包括: 字符模型 确定模块, 用于根据源视频, 确定字符模型; 字符像素确定模块, 用于根据 该字符模型确定模块确定的与目标视频相对应的该字符模型,从该目标视频 包括的像素中,确定属于该目标视频的字符的字符像素;字符文本确定模块, 用于根据该字符像素确定模块确定的该字符像素,确定至少一个代表该字符 的字符文本。
根据本发明实施例的识别视频的字符的方法和装置,通过根据源视频确 定字符模型, 并根据与目标视频相对应的字符模型, 从该目标视频的像素中 确定该目标视频的字符, 因此无需通过分割算法等, 从该目标视频的图像中 分割出该字符, 从而能够缩短在线字符识别过程的时间, 提高视频分析过程 的实时性。 附图说明
为了更清楚地说明本发明实施例的技术方案, 下面将对实施例或现有技 术描述中所需要使用的附图作筒单地介绍, 显而易见地, 下面描述中的附图 仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造 性劳动的前提下, 还可以根据这些附图获得其他的附图。
图 1是根据本发明实施例的识别视频的字符的方法的示意性流程图。 图 2是根据本发明实施例的确定字符模型的方法的示意性流程图。
图 3 是根据本发明另一实施例的识别视频的字符的方法的示意性流程 图。
图 4是根据本发明实施例的识别视频的字符的装置的示意性框图。
图 5是根据本发明实施例的字符模型确定模块的示意性框图。
图 6是根据本发明另一实施例的识别视频的字符的装置的示意性框图。 具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是 全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创 造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
图 1 示出了根据本发明实施例的识别视频的字符的方法的示意性流程 图。 如图 1所示, 该方法包括:
S110, 根据源视频, 确定字符模型;
S120, 根据与目标视频相对应的该字符模型, 从该目标视频包括的像素 中, 确定属于该目标视频的字符的字符像素;
S130, 根据该字符像素, 确定至少一个代表该字符的字符文本。
具体地说, 在 S110 中, 可以根据源视频, 确定字符模型, 其中, 该字 符模型是一个概率模型, 通过将目标视频中的像素代入该字符模型, 能够计 算出该像素属于与该字符模型相对应的字符的概率, 在本发明实施例中, 字 符模型与字符的对应关系可以通过字符的颜色、 大小、 形状等体现, 例如, 可以根据源视频中的白颜色字符, 确定白颜色字符模型, 通过将目标视频中 的像素代入该白颜色字符模型, 能够确定该像素属于白颜色字符的概率。 应 理解,上述字符模型与字符的对应关系并不限定于字符的颜色、大小、形状, 能够体现字符特征的参数均落入本发明实施例的范围内。
在本发明实施例中, 可以根据图 2所示的确定字符模型的方法, 确定该 字符模型。 如图 2所示, 该方法包括:
S210, 根据字符区域判定参数, 确定该源视频的第一图像包括第一字符 区域;
S220, 对该第一字符区域进行连通域标注操作, 确定该第一字符区域的 连通域;
S230, 对该连通域进行聚类操作, 确定该第一字符区域的连通域类; S240, 根据训练数据判定参数, 从该连通域类中, 确定训练连通域类, 并将该训练连通域类包括的各连通域的平均值确定为训练数据;
S250, 根据该训练数据, 确定该字符模型。
具体地说, 由于在每段视频(例如, 新闻视频)中, 字符(例如, 标题) 出现的区域通常是固定的, 因此, 在 S210, 可以根据字符区域判定参数, 判定源视频的一帧图像中是否包括字符区域, 在本发明实施例中, 字符区域 判定参数可以包括: 字符区域包括的边缘数量与该字符区域的面积的比率、 该边缘数量在水平方向与垂直方向上的比率、 以及该边缘数量在垂直方向上 的对称度。 其中, 如果上述字符区域判定参数满足以下条件, 则可以确定该 图像包括字符区域, 因此, 该步骤可以以离线的方式自动进行。
1. 边缘数量和标题区域的面积的比率大于规定阈值, 例如, 可以将该 阈值设为 0.1 ;
2. 边缘数量在水平方向与垂直方向上的比率在规定的范围内, 例如, 可以将该范围设为 [ 0.5 , 2 ];
3. 边缘数量在垂直方向上的对称度, 即该字符区域中上半部和下半部 分的边缘数量的比率在规定范围内, 例如, 可以将该范围设为 [ 0.5 , 2 1 应理解, 以上列举的参数仅是本发明实施例的示例性说明, 其他能够用 于判定图像包括字符区域的参数及该参数阈值、 范围均落入本发明的保护范 围内。 并且, 以上参数的阈值及范围的具体数值仅是本发明的一个实施例, 本发明并不限定于此。
在 S220, 对该字符区域进行连通域标注操作, 确定该字符区域的所有 连通域, 这里, 连通域标注操作是图像分析的一个标准算法, 输入的内容是 图像, 输出的结果是该图像的若干个区域, 每个区域内的像素具有相同或相 似的特征, 例如, 相同的颜色。 因此, 该步骤可以以离线的方式自动进行。
在 S230, 对该连通域进行聚类操作, 确定该字符区域的所有连通域类, 这里, 可以使用 K-means算法进行聚类, K-means算法是一个标准的聚类算 法, 输入的内容是全部数据(例如, 上述连通域)和类别数量, 输出的结果 是各个类别的数据 (例如, 上述连通域)。 在本发明实施例中, 可以使用颜 色作为聚类参数, 即, 根据颜色进行聚类, 例如, 所有白颜色的连通域归为 一个连通域类。 因此, 该步骤可以以离线的方式自动进行。 应理解, 上述聚 类参数并不限定于连通域的颜色, 能够体现连通域共同特征的参数均落入本 发明实施例的范围内。
在 S240, 根据训练数据判定参数, 确定满足要求的连通域类, 即训练 连通域类, 求出该训练连通域类中各连通域的平均值, 例如, 如果该连通域 类中包括 10个连通域, 则能够获得 10个平均值, 并保存为与该训练连通域 类相对应的训练数据, 在本发明实施例中, 训练数据判定参数可以包括: 连 通域类包括的连通域的数量、 连通域类的面积与字符区域的面积的比率、 以 及连通域类的面积在垂直方向上的对称度。 其中, 如果上述字符区域判定参 数满足以下条件, 则可以确定该连通域类中的连通域属于字符, 满足作为训 练数据的要求, 因此, 该步骤可以以离线的方式自动进行。
1. 该连通域类包括的连通域的数量大于规定阈值, 例如, 可以将该阈 值设为 20; 2. 该连通域类的面积与整个字符区域的面积的比例在规定的范围内, 例如, 可以将该范围设为 [ 0.3 , 0.9 ];
3. 该连通域类的面积在垂直方向上的对称度, 即该连通域类在字符区 域中线以上部分的面积与中线以下部分的面积的比率在规定范围内, 例如, 可以将该范围设为 [ 0.5 , 2 1
应理解, 以上列举的参数仅是本发明实施例的示例性说明, 其他能够确 定该连通域类中的连通域属于字符的参数及该参数阈值、 范围均落入本发明 的保护范围内。并且,以上阈值及范围的具体数值仅是本发明的一个实施例, 本发明并不限定于此。
在 S250, 可以根据该训练数据, 确定该字符模型。 在本发明实施例中, 可以采用混合高斯模型, 由公式( 1 )表示该字符模型:
Figure imgf000007_0001
公式( 1 )中 /^; ,∑λ, )表示混合高斯模型的概率; 表示第 个高斯 分量的权重, Λ 表示第 个高斯分量的概率, A表示该高斯分量的均值,
k 表示该高斯分量的方差, 高斯分量的个数 m取值为 2~3 , d表示特征向 量 c的维数。 在本发明实施例中, 混合高斯模型的训练可以采用 EM算法, 即期望最大化算法, 通过逐步改进模型的参数, 使参数和训练数据的似然 概率逐渐增大, 最后终止于一个极大点。 直观地理解, EM算法也可被看 作为一个逐次逼近算法: 事先并不知道模型的参数, 可以随机的选择一套 参数或者事先粗略地给定某个初始参数 λθ , 确定出对应于这组参数的最 可能的状态, 计算每个训练样本的可能结果的概率, 在当前的状态下再由 样本对参数修正, 重新估计参数 λ , 并在新的参数下重新确定模型的状 态, 这样, 通过多次的迭代, 循环直至某个收敛条件满足为止, 就可以使 得模型的参数逐渐逼近真实参数。 因此, 该步骤可以以离线的方式自动进 行。
应理解, 以上列举的字符模型的表达式以及训练方法仅是本发明的一 个实施例, 本发明并不限定于此。
在本发明实施例中,可以根据源视频的种类而确定多个不同种类的字符 模型, 并对该多个字符模型附以区别标记, 该种类可以根据源视频信息(例 如, 源视频的来源、 制作风格等)来确定, 并且该区别标记可以在字符模型 的名称中体现。 应理解, 在本发明实施例中, 该源视频信息并不限定于上述 源视频的来源、 制作风格等, 其他能够体现某类视频的共同特征的信息均落 入本发明的保护范围内。 并且, 上述区别标记的体现方式仅为本发明的一个 实施例, 其他能够识别该字符模型的种类的方式均落入本发明的保护范围 内。
在本发明实施例中, 为了使确定的字符模型具有统计意义, 优选使用 在本发明实施例中, 由于确定字符模型的方法中的各步骤均可以以离线 的方式自动进行, 因此无需人工干预, 能够自动获得训练数据并预先建立字 符的表观模型, 从而能够加速在线字符识别的速度, 缩短在线字符识别过程 的时间, 提高视频分析过程的实时性。
返回图 1 , 在 S120, 可以根据目标视频信息 (例如, 目标视频的来源、 制作风格等)和字符模型的区分标记,确定与该目标视频相对应的字符模型, 遍历目标视频的每一帧图像中的所有像素, 通过将该像素代入该字符模型, 能够确定该像素属于目标视频的字符的概率, 当概率大于规定阈值(例如, 0.8 ) 时, 可以确定该像素为字符的字符像素 (种子点)。 应理解, 该阈值的 具体数值仅为本发明的一个实施例, 本发明并不限定于此, 即, 该阈值可以 针对目标视频而设定, 阈值设定相对较高时可以去除噪声点。
可选地, 在本发明实施例中, 还可以进一步根据字符区域判定参数, 确 定目标视频的一帧图像中是否包括字符区域, 如果包括字符区域, 则可以遍 历该字符区域的所有像素。 如果不包括字符区域, 则可以直接转入下一帧图 像。 因此, 该根据所述字符模型, 从所述视频包括的像素中, 确定属于所述 字符的字符像素, 包括:
根据字符区域判定参数, 确定该目标视频的第二图像包括第二字符区 域;
根据与该目标视频相对应的该字符模型,从该第二字符区域包括的像素 中, 确定属于该目标视频的字符的字符像素。
具体地说, 由于在每段视频(例如, 新闻视频)中, 字符(例如, 标题) 出现的区域通常是固定的, 因此, 可以根据字符区域判定参数, 判定目标视 频的一帧图像中是否包括字符区域, 在本发明实施例中, 字符区域判定参数 可以包括: 字符区域包括的边缘数量与该字符区域的面积的比率、 该边缘数 量在水平方向与垂直方向上的比率、 以及该边缘数量在垂直方向上的对称 度。 其中, 如果上述字符区域判定参数满足以下条件, 则可以确定该图像包 括字符区域。
1. 边缘数量和标题区域的面积的比率大于规定阈值, 例如, 可以将该 阈值设为 0.1 ;
2. 边缘数量在水平方向与垂直方向上的比率在规定的范围内, 例如, 可以将该范围设为 [ 0.5 , 2 ];
3. 边缘数量在垂直方向上的对称度, 即该字符区域中上半部和下半部 分的边缘数量的比率在规定范围内, 例如, 可以将该范围设为 [ 0.5 , 2 1 应理解, 以上列举的参数仅是本发明实施例的示例性说明, 其他能够用 于判定图像包括字符区域的参数及该参数阈值、 范围均落入本发明的保护范 围内。 并且, 以上参数的阈值及范围的具体数值仅是本发明的一个实施例, 本发明并不限定于此。
在本发明实施例中, 通过判断目标视频的一帧图像中是否包括字符区 域, 如果包括字符区域, 则可以遍历该字符区域的所有像素, 如果不包括字 符区域, 则可以直接转入下一帧图像, 能够进一步提高加速在线字符识别的 速度, 缩短在线图像识别过程的时间, 提高视频分析过程的实时性。
返回图 1 , 在 S130, 可以根据种子填充算法, 将在 S120确定的种子点 填充为字符图像(例如, 二值图像), 并将该字符图像送入光学字符识别 ( OCR , Optical Character Recognition ) 引擎, 最后输出字符文本。 其中, 种子填充算法又称为边界填充算法。 其基本思想是: 从多边形区域的一个内 点开始, 由内向外用给定的颜色画点直到边界为止。 如果边界是以一种颜色 指定的, 则种子填充算法可逐个像素地处理直到遇到边界颜色为止。 由于一 段视频中可能存在多帧包括不同字符(例如, 颜色, 字数、 字体、 形状等不 同) 的图像, 因此输出的字符文本也可能是一个以上。
因此, 根据本发明实施例的识别视频的字符的方法, 通过根据源视频确 定字符模型,并根据与目标视频相对应的字符模型,从该目标视频的像素中, 确定该目标视频的字符, 因此无需通过分割算法(例如, 连通域分析, 图切 法, K-means聚类等)等, 从目标视频的图像中分割出该字符, 从而能够缩 短在线图像识别过程的时间, 提高视频分析过程的实时性。 在一段视频中可能存在相同的字符持续一定时间(即, 在多帧图像中出 现相同的字符)的情况, 也可能由于识别的准确性而出现字符中个别单字出 现错误的情况。 因此, 优选地, 如图 4所示, 根据图 1所示的本发明实施例 的识别视频的字符的方法, S130之后还可以包括:
S140, 根据所述字符文本彼此之间的编辑距离和包括的字符数量, 确定 所述字符文本彼此之间的相似度;
S150, 根据该相似度, 确定字符文本类, 该字符文本类包括至少三个彼 此之间的相似度小于第一阈值的字符文本;
S160, 根据该字符文本类包括的字符文本彼此之间的相似度, 确定该字 符文本类的代表字符文本。
具体地说, 在 S140, 可以用相似度模型来表达用于确定字符文本彼此 之间的编辑距离和包括的字符数量的关系, 由以下公式(2 )表示:
5 = 1 - Dis(capl, cap!) I max (| ΐ|,| 2|) ( 2 )
公式(2 )中, S代表字符文本 rapl , rap2的相似度, 取值范围可以设为: [ 0, 1 ]; Dis ( capl , cap2 ) 为字符文本 capl和 cap2之间的编辑距离, 该编 辑距离可以代表将字符文本 c^l转换为字符文本 cap2所需要的步骤, I c^l I , I capl I分别为字符文本" 、 capl所包括的单个字的数量。 彼此之间的相似度小于规定阈值(例如 0.5 ) 的归为一类, 作为一个字符文 本类, 即, 可以认为同一字符文本类中的字符文本是相同的。
在 S160, 对于每一类字符文本类, 可以比较其包括的每个字符文本与 同类中其它字符文本的相似度, 并求和。 将相似度之和最大的字符文本作为 该字符文本类的代表字符文本。
应理解, 以上规定阈值的具体数值仅为本发明的一个实施例, 本发明并 不限定于此。
因此, 通过根据相似度模型进行的聚类以及代表字符文本求取, 能够去 除字符文本的重复, 校正由 OCR带来的部分错误。
上文中, 结合图 1至图 3 , 详细描述了根据本发明实施例的识别视频的 字符的方法, 下面将结合图 4至图 6, 详细描述根据本发明实施例的识别视 频的字符的装置。
图 4示出了根据本发明实施例的识别视频的字符的装置的示意性框图。 如图 4所示, 该装置可以包括:
字符模型确定模块 410, 用于根据源视频, 确定字符模型;
字符像素确定模块 420, 用于根据字符模型确定模块 410确定的与目标 视频相对应的该字符模型, 从该目标视频包括的像素中, 确定属于该目标视 频的字符的字符像素;
字符文本确定模块 430, 用于根据字符像素确定模块 420确定的该字符 像素, 确定至少一个代表该字符的字符文本。
根据本发明实施例的识别视频的字符的装置可对应于本发明实施例的 识别视频的字符的方法的执行主体, 并且, 该识别视频的字符的装置中的各 模块和上述其他操作和 /或功能分别为了实现图 1中的方法的相应流程,为了 筒洁, 在此不再赘述。
图 5示出了根据本发明实施例的字符模型确定模块 410的示意性框图。 如图 5所示, 该字符模型确定模块 410可以包括:
字符区域判定单元 411 , 用于根据字符区域判定参数, 确定所述源视频 的第一图像包括第一字符区域;
连通域标注单元 412, 用于对所述字符区域判定单元 411确定的所述第 一字符区域进行连通域标注操作, 确定所述第一字符区域的连通域;
连通域聚类单元 413 , 用于对所述连通域标注单元 412确定的所述连通 域进行聚类操作, 确定所述第一字符区域的连通域类;
训练数据确定单元 414, 用于根据训练数据判定参数, 从该连通域聚类 单元 413确定的该连通域类中, 确定训练连通域类, 并将该训练连通域类包 括的各连通域的平均值确定为训练数据;
字符模型确定单元 415 , 用于根据所述训练数据确定单元 414确定的所 述训练数据, 确定所述字符模型。
在本发明实施例中, 由于所述字符模型确定模块 410及其包括的各单元 进行的操作均可以以离线的方式自动进行, 因此无需人工干预, 能够自动获 得训练数据并预先建立字符的表观模型, 从而能够加速在线字符识别的速 度, 缩短在线图像识别过程的时间, 提高视频分析过程的实时性。
根据本发明实施例的字符模型确定模块 410可对应于本发明实施例的确 定字符模型的方法的执行主体, 并且, 该字符模型确定模块 410中的各单元 和上述其他操作和 /或功能分别为了实现图 2 中的方法的相应流程, 为了筒 洁, 在此不再赘述。
可选地, 在本发明实施例中, 还可以进一步根据字符区域判定参数, 确 定目标视频的一帧图像中是否包括字符区域, 如果包括字符区域, 则可以遍 历该字符区域的所有像素。 如果不包括字符区域, 则可以直接转入下一帧图 像。 因此, 该字符像素确定模块 420还用于根据字符区域判定参数, 确定该 目标视频的第二图像包括第二字符区域; 以及
用于根据该字符模型确定模块 410确定的与该目标视频相对应的该字符 模型, 从该第二字符区域包括的像素中, 确定属于该目标视频的字符的字符 像素。
在本发明实施例中, 通过判断目标视频的一帧图像中是否包括字符区 域, 如果包括字符区域, 则可以遍历该字符区域的所有像素, 如果不包括字 符区域, 则可以直接转入下一帧图像, 能够进一步提高加速在线字符识别的 速度, 缩短在线图像识别过程的时间, 提高视频分析过程的实时性。
根据本发明实施例的识别视频的字符的装置,通过根据源视频确定字符 模型, 并根据与目标视频相对应的字符模型, 从该目标视频的像素中, 确定 该目标视频的字符, 因此无需通过分割算法(例如, 连通域分析, 图切法, K-means聚类等)等, 从目标视频的图像中分割出该字符, 从而能够缩短在 线图像识别过程的时间, 提高视频分析过程的实时性。
在一段视频中可能存在相同的字符持续一定时间(即, 在多帧图像中出 现相同的字符)的情况, 也可能由于识别的准确性而出现字符中个别单字出 现错误的情况。 因此, 如图 6所示, 根据本发明实施例的识别视频的字符的 装置还可以包括:
相似度确认模块 440, 用于根据所述字符文本确定模块 430确定的所述 字符文本彼此之间的编辑距离和包括的字符数量,确定所述字符文本彼此之 间的相似度;
字符文本类确定模块 450, 用于根据所述相似度确认模块 440确定的所 述相似度, 确定字符文本类, 该字符文本类包括至少三个彼此之间的相似度 小于第一阈值的字符文本;
代表字符文本确定模块 460, 用于根据所述字符文本类确定模块 450确 定的所述字符文本类包括的字符文本彼此之间的相似度,确定所述字符文本 类的代表字符文本。 因此, 通过根据相似度模型进行的聚类以及代表字符文本求取, 能够去 除字符文本的重复, 校正由 OCR带来的部分错误。
根据本发明实施例的识别视频的字符的装置可对应于本发明实施例的 识别视频的字符的方法的执行主体, 并且, 该识别视频的字符的装置中的各 模块和上述其他操作和 /或功能分别为了实现图 1-3中的方法的相应流程,为 了筒洁, 在此不再赘述。
在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺 序的先后, 各过程的执行顺序应以其功能和内在逻辑确定, 而不应对本发明 实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到, 结合本文中所公开的实施例描述的各 示例的单元及算法步骤, 能够以电子硬件、 或者计算机软件和电子硬件的结 合来实现。 这些功能究竟以硬件还是软件方式来执行, 取决于技术方案的特 定应用和设计约束条件。 专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能, 但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到, 为描述的方便和筒洁, 上述描 述的系统、 装置和单元的具体工作过程, 可以参考前述方法实施例中的对应 过程, 在此不再赘述。
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统、 装置和 方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示 意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可 以有另外的划分方式, 例如多个单元或组件可以结合或者可以集成到另一个 系统, 或一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间 的耦合或直接耦合或通信连接可以是通过一些接口, 装置或单元的间接耦合 或通信连接, 可以是电性, 机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作 为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或 者全部单元来实现本实施例方案的目的。
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元 中, 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一 个单元中。 所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使 用时, 可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发明 的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部 分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质 中, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。 而前 述的存储介质包括: U盘、移动硬盘、只读存储器( ROM, Read-Only Memory )、 随机存取存储器(RAM, Random Access Memory ), 磁碟或者光盘等各种可 以存储程序代码的介质。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应所述以权利要求的保护范围为准。

Claims

权利要求
1、 一种识别视频的字符的方法, 其特征在于, 所述方法包括: 根据源视频, 确定字符模型;
根据与目标视频相对应的所述字符模型, 从所述目标视频包括的像素 中, 确定属于所述目标视频的字符的字符像素;
根据所述字符像素, 确定至少一个代表所述字符的字符文本。
2、 根据权利要求 1所述的方法, 其特征在于, 所述方法还包括: 根据所述字符文本彼此之间的编辑距离和包括的字符数量,确定所述字 符文本彼此之间的相似度;
根据所述相似度, 确定字符文本类, 所述字符文本类包括至少三个彼此 之间的相似度小于设定的第一阈值的字符文本;
根据所述字符文本类包括的字符文本彼此之间的相似度,确定所述字符 文本类的代表字符文本。
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述根据源视频, 确定字符模型, 包括:
根据字符区域判定参数, 确定所述源视频的第一图像包括第一字符区 域;
对所述第一字符区域进行连通域标注操作,确定所述第一字符区域的连 通域;
对所述连通域进行聚类操作, 确定所述第一字符区域的连通域类; 根据训练数据判定参数, 从所述连通域类中, 确定训练连通域类, 并将 所述训练连通域类包括的各连通域的平均值确定为训练数据;
根据所述训练数据, 确定所述字符模型。
4、 根据权利要求 3所述的方法, 其特征在于, 所述训练数据判定参数 包括: 所述连通域类包括的连通域的数量、 所述连通域类的面积与所述第一 字符区域的面积的比率、 以及所述连通域类的面积在垂直方向上的对称度。
5、 根据权利要求 1至 4中任一项所述的方法, 其特征在于, 所述根据 与目标视频相对应的所述字符模型, 从所述目标视频包括的像素中, 确定属 于所述目标视频的字符的字符像素, 包括:
根据字符区域判定参数,确定所述目标视频的第二图像包括第二字符区 域; 根据与所述目标视频相对应的所述字符模型,从所述第二字符区域包括 的像素中, 确定属于所述目标视频的字符的字符像素。
6、 根据权利要求 3至 5中任一项所述的方法, 其特征在于, 所述字符 区域判定参数包括: 字符区域包括的边缘数量与所述字符区域的面积的比 率、 所述边缘数量在水平方向与垂直方向上的比率、 以及所述边缘数量在垂 直方向上的对称度。
7、 一种识别视频的字符的装置, 其特征在于, 所述装置包括: 字符模型确定模块, 用于根据源视频, 确定字符模型;
字符像素确定模块, 用于根据所述字符模型确定模块确定的与目标视频 相对应的所述字符模型, 从所述目标视频包括的像素中, 确定属于所述目标 视频的字符的字符像素;
字符文本确定模块, 用于根据所述字符像素确定模块确定的所述字符像 素, 确定至少一个代表所述字符的字符文本。
8、 根据权利要求 7所述的装置, 其特征在于, 所述装置还包括: 相似度确认模块,用于根据所述字符文本确定模块确定的所述字符文本 彼此之间的编辑距离和包括的字符数量,确定所述字符文本彼此之间的相似 度;
字符文本类确定模块, 用于根据所述相似度确认模块确定的所述相似 度, 确定字符文本类, 所述字符文本类包括至少三个彼此之间的相似度小于 设定的第一阈值的字符文本;
代表字符文本确定模块,用于根据所述字符文本类确定模块确定的所述 字符文本类包括的字符文本彼此之间的相似度,确定所述字符文本类的代表 字符文本。
9、 根据权利要求 7或 8所述的装置, 其特征在于, 所述字符模型确定 模块包括:
字符区域判定单元, 用于根据字符区域判定参数, 确定所述源视频的第 一图像包括第一字符区域;
连通域标注单元,用于对所述字符区域判定单元确定的所述第一字符区 域进行连通域标注操作, 确定所述第一字符区域的连通域;
连通域聚类单元,用于对所述连通域标注单元确定的所述连通域进行聚 类操作, 确定所述第一字符区域的连通域类; 训练数据确定单元, 用于根据训练数据判定参数, 从所述连通域聚类单 元确定的所述连通域类中, 确定训练连通域类, 并将所述训练连通域类包括 的各连通域的平均值确定为训练数据; 据, 确定所述字符模型。
10、 根据权利要求 9所述的装置, 其特征在于, 所述训练数据判定参数 包括: 所述连通域类包括的连通域的数量、 所述连通域类的面积与所述第一 字符区域的面积的比率、 以及所述连通域类的面积在垂直方向上的对称度。
11、 根据权利要求 7至 10所述的装置, 其特征在于, 所述字符像素确 定模块还用于根据字符区域判定参数,确定所述目标视频的第二图像包括第 二字符区域;
用于根据所述字符模型确定模块确定的与所述目标视频相对应的所述 字符模型, 从所述第二字符区域包括的像素中, 确定属于所述目标视频的字 符的字符像素。
12、 根据权利要求 9至 11 中任一项所述的装置, 其特征在于, 所述字 符区域判定参数包括: 字符区域包括的边缘数量与所述字符区域的面积的比 率、 所述边缘数量在水平方向与垂直方向上的比率、 以及所述边缘数量在垂 直方向上的对称度。
PCT/CN2011/084642 2011-12-26 2011-12-26 识别视频的字符的方法和装置 WO2013097072A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2011/084642 WO2013097072A1 (zh) 2011-12-26 2011-12-26 识别视频的字符的方法和装置
CN201280000022.7A CN103493067B (zh) 2011-12-26 2011-12-26 识别视频的字符的方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/084642 WO2013097072A1 (zh) 2011-12-26 2011-12-26 识别视频的字符的方法和装置

Publications (1)

Publication Number Publication Date
WO2013097072A1 true WO2013097072A1 (zh) 2013-07-04

Family

ID=48696168

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/084642 WO2013097072A1 (zh) 2011-12-26 2011-12-26 识别视频的字符的方法和装置

Country Status (2)

Country Link
CN (1) CN103493067B (zh)
WO (1) WO2013097072A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389115A (zh) * 2017-08-11 2019-02-26 腾讯科技(上海)有限公司 文本识别方法、装置、存储介质和计算机设备
CN111310413A (zh) * 2020-02-20 2020-06-19 阿基米德(上海)传媒有限公司 一种基于节目串联单的广播节目音频智能拆条方法及装置
CN112749690A (zh) * 2020-03-27 2021-05-04 腾讯科技(深圳)有限公司 一种文本检测方法、装置、电子设备和存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156711B (zh) * 2015-04-21 2020-06-30 华中科技大学 文本行的定位方法及装置
CN111832554A (zh) * 2019-04-15 2020-10-27 顺丰科技有限公司 一种图像检测方法、装置及存储介质
CN110532983A (zh) * 2019-09-03 2019-12-03 北京字节跳动网络技术有限公司 视频处理方法、装置、介质和设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101115151A (zh) * 2007-07-10 2008-01-30 北京大学 一种视频字幕提取的方法
CN101261722A (zh) * 2008-01-17 2008-09-10 北京航空航天大学 电子警察后台智能管理和自动实施系统
CN101334836A (zh) * 2008-07-30 2008-12-31 电子科技大学 一种融合色彩、尺寸和纹理特征的车牌定位方法
CN101599124A (zh) * 2008-06-03 2009-12-09 汉王科技股份有限公司 一种从视频图像中分割字符的方法和装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185329B1 (en) * 1998-10-13 2001-02-06 Hewlett-Packard Company Automatic caption text detection and processing for digital images
CN101615252B (zh) * 2008-06-25 2012-07-04 中国科学院自动化研究所 一种自适应图像文本信息提取方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101115151A (zh) * 2007-07-10 2008-01-30 北京大学 一种视频字幕提取的方法
CN101261722A (zh) * 2008-01-17 2008-09-10 北京航空航天大学 电子警察后台智能管理和自动实施系统
CN101599124A (zh) * 2008-06-03 2009-12-09 汉王科技股份有限公司 一种从视频图像中分割字符的方法和装置
CN101334836A (zh) * 2008-07-30 2008-12-31 电子科技大学 一种融合色彩、尺寸和纹理特征的车牌定位方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389115A (zh) * 2017-08-11 2019-02-26 腾讯科技(上海)有限公司 文本识别方法、装置、存储介质和计算机设备
CN109389115B (zh) * 2017-08-11 2023-05-23 腾讯科技(上海)有限公司 文本识别方法、装置、存储介质和计算机设备
CN111310413A (zh) * 2020-02-20 2020-06-19 阿基米德(上海)传媒有限公司 一种基于节目串联单的广播节目音频智能拆条方法及装置
CN111310413B (zh) * 2020-02-20 2023-03-03 阿基米德(上海)传媒有限公司 一种基于节目串联单的广播节目音频智能拆条方法及装置
CN112749690A (zh) * 2020-03-27 2021-05-04 腾讯科技(深圳)有限公司 一种文本检测方法、装置、电子设备和存储介质
CN112749690B (zh) * 2020-03-27 2023-09-12 腾讯科技(深圳)有限公司 一种文本检测方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN103493067B (zh) 2018-01-02
CN103493067A (zh) 2014-01-01

Similar Documents

Publication Publication Date Title
WO2018108129A1 (zh) 用于识别物体类别的方法及装置、电子设备
WO2018103608A1 (zh) 一种文字检测方法、装置及存储介质
CN107193962B (zh) 一种互联网推广信息的智能配图方法及装置
CN111967302B (zh) 视频标签的生成方法、装置及电子设备
CN113313022B (zh) 文字识别模型的训练方法和识别图像中文字的方法
WO2013097072A1 (zh) 识别视频的字符的方法和装置
KR102576344B1 (ko) 비디오를 처리하기 위한 방법, 장치, 전자기기, 매체 및 컴퓨터 프로그램
US20210406549A1 (en) Method and apparatus for detecting information insertion region, electronic device, and storage medium
CN110334753B (zh) 视频分类方法、装置、电子设备及存储介质
CN108734159B (zh) 一种图像中敏感信息的检测方法及系统
CN110502664A (zh) 视频标签索引库创建方法、视频标签生成方法及装置
CN113159010A (zh) 视频分类方法、装置、设备和存储介质
CN109241299B (zh) 多媒体资源搜索方法、装置、存储介质及设备
CN112434553A (zh) 一种基于深度字典学习的视频鉴别方法及系统
CN113642584A (zh) 文字识别方法、装置、设备、存储介质和智能词典笔
CN113609892A (zh) 深度学习与景区知识图谱融合的手写诗词识别方法
US20160283582A1 (en) Device and method for detecting similar text, and application
CN116645624A (zh) 视频内容理解方法和系统、计算机设备、存储介质
CN113780276B (zh) 一种结合文本分类的文本识别方法及系统
CN114445826A (zh) 视觉问答方法、装置、电子设备以及存储介质
WO2021114634A1 (zh) 文本标注方法、设备及存储介质
CN111680190B (zh) 一种融合视觉语义信息的视频缩略图推荐方法
CN113569119A (zh) 一种基于多模态机器学习的新闻网页正文抽取系统及方法
CN114241490A (zh) 基于笔画扰动与后处理的手写体识别模型性能的提升方法
CN114996360A (zh) 数据分析方法、系统、可读存储介质及计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11879152

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11879152

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 11879152

Country of ref document: EP

Kind code of ref document: A1