WO2001037212A1 - Video stream classifiable symbol isolation method and system - Google Patents
Video stream classifiable symbol isolation method and system Download PDFInfo
- Publication number
- WO2001037212A1 WO2001037212A1 PCT/EP2000/010730 EP0010730W WO0137212A1 WO 2001037212 A1 WO2001037212 A1 WO 2001037212A1 EP 0010730 W EP0010730 W EP 0010730W WO 0137212 A1 WO0137212 A1 WO 0137212A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- pixels
- text
- regions
- edge
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the present invention is related to one described in United States Patent Application entitled “SYSTEM AND METHOD FOR ANALYZING VIDEO CONTENT USING DETECTED TEXT IN VIDEO FRAMES,” filed August 9, 1999, Ser. No. 09/370,931, which is commonly assigned to the assignee of the present invention and the entirety of which is hereby incorporated by reference as if fully set forth herein.
- the invention is also related to one disclosed in United States Provisional Patent Application No. 60/117,658, filed on January 28, 1999, entitled “METHOD AND APPARATUS FOR DETECTION AND LOCALIZATION OF TEXT IN VIDEO,” which is commonly assigned to the assignee of the present invention.
- the present invention relates to systems that recognize patterns in digitized images and more particularly to such systems that isolate symbols such as text characters in video data streams.
- Real-time broadcast, analog tape, and digital video are important for education, entertainment, and a host of multimedia applications. With the size of video collections being in the millions of hours, technology is needed to interpret video data to allow this material to be used and accessed more effectively.
- Various such enhanced uses have been proposed. For example, the use of text and sound recognition can lead to the creation of a synopsis of an original video and the automatic generation of keys for indexing video content.
- Another range of applications relies on rapid real-time classification of text and/or other symbols in broadcast (or multicast, etc.) video data streams.
- text recognition can be used for any suitable purpose, for example video content indexing.
- OCR document optical character recognition
- the general model for all of these techniques is that an input vector is derived from an image, the input vector characterizing the raw pattern.
- the vector is mapped to one of a fixed number or range of symbol classes to "recognize" the image.
- the pixel values of a bitmap image may serve as an input vector and the corresponding classification set may be an alphabet, for example, the English alphabet.
- No particular technique for pattern recognition has achieved universal dominance.
- Each recognition problem has its own set of application difficulties: the size of the classification set, the size of the input vector, the required speed and accuracy, and other issues.
- reliability is an area that cries out for improvement in nearly every area of application.
- pattern recognition is a field of continuous active research, the various applications receiving varying degrees of attention based on their respective perceived merits, such as utility and practicability.
- OCR optical character recognition
- This technology has developed because of the desirability and practicality of converting printed subject matter to computer-readable characters.
- printed documents offer a data source that is relatively clear and consistent. Such documents are generally characterized by high-contrast patterns set against a uniform background and are storable with high resolution. For example, printed documents may be scanned at arbitrary resolution to form a binary image of the printed characters.
- pattern recognition there is a clear need for such an application of pattern recognition in that the conversion of documents to computer-based text avoids the labor of keyboard transcription, realize economy in data storage, permits documents to be searched, etc.
- an image processing device and method for classifying symbols relies on a connected-component technique for isolating symbol regions.
- the device and method form connected components from an image derived by the application of an edge detection filter.
- the formation of connected components from this filtered image defines the edges of character shapes.
- the connected components are filtered based on threshold criteria such as area, height, width, aspect ratio, etc. As a result, the number of pixels that must be connected to define each connected component is substantially reduced and the speed of processing thereby increased.
- the application of the method is discussed primarily with respect to text in a video stream.
- a character classifier for text in video streams employs a back propagation neural network (BPNN) whose feature space is derived from size, translation, and rotation invariant shape-dependent features.
- BPNN back propagation neural network
- Such feature spaces are made practicable by the accurate isolation of character regions using the above technique. Examples of such feature spaces include regular and invariant moments and an angle histogram derived from a Delaunay triangulation of the thinned, thresholded, character.
- Such feature spaces provide a good match to BPNN as a classifier because of the poor resolution of characters in video streams.
- the ability to detect and classify text appearing in video streams has many uses. For example, video sequences and portions thereof, can be characterized and indexed according to classifications derived from such text.
- recognition of text in a video stream can permit the presentation of context-sensitive features such as an invokable link to a web site generated in response to the appearance of a web address in a broadcast video stream.
- Text in video presents a very different problem set from that of document OCR, which is a well-developed, but still maturing technology. Text in documents tends to be uni-colored and high quality.
- scaled-down scene images may contain noise and uncontrolled illumination. Characters appearing in video can be of varying color, sizes, fonts, orientation, thickness, backgrounds can be complex and temporally variant, etc. Also, many applications for video symbol recognition require high speed.
- the technique employed by the invention for classifying video text employs an accurate high speed technique for symbol isolation.
- the symbol bitmap is then used to generate a shape-dependent feature vector, which is applied to a BPNN.
- the feature vector provides greater emphasis on overall image shape while being relatively insensitive to the variability problems identified above.
- connected component structures are defined based on the edges detected. Since edge detection produces far fewer pixels overall than binarizing the entire field occupied by a symbol, the process of generating connected components can be much more rapid.
- the selection of feature space also enhances recognition speed. With simulated BPNNs the size of the input vector can seriously affect throughput. It is very important to be selective with regard to the components used from the selected feature space. Of course, heterogeneous feature spaces may be formed by combining mixes of different features such as moments and line-segment features. Also, computational economies may be realized where the selected features share computational steps.
- FIG. 1 is diagram illustrating machinery that may be used to implement the invention.
- FIG. 2 is a flowchart showing a character classification method according to an embodiment of the invention.
- FIGS. 3 A and 3B illustrate text regions in a video screen that contain information classifiable according to an embodiment of the invention.
- FIG. 4A shows the appearance of a text segment from a captured digital image of a video frame.
- FIG. 4B shows the text segment after edge detection filtering.
- FIG. 4C illustrates the effect of several stages of filtering within or prior to edge detection, noting that these may not actually show intermediate results but are shown for purposes of illustrating concepts associated with the invention.
- FIGS. 5 A and 5B illustrate the effect of edge filtering according to an embodiment of the invention.
- FIG. 5C illustrates an example of a gap-closing algorithm that can be used in the invention.
- FIGS. 6A-6D illustrate a technique for text line segmentation according to an embodiment of the invention.
- FIGS. 7 A and 7B are flow diagrams illustrating a technique for the creation and management of connected components according to a filtering process of an embodiment of the invention.
- FIG. 8 is a flowchart illustrating a character classification method according to an embodiment of the invention.
- FIGS. 9A-9D illustrate the filtering of a segmented character to derive a feature vector precursor.
- FIGS. 10A and 10B illustrate Delaunay triangulation and Voronoy diagram stages in an image filtering step in a character classification process according to an embodiment of the invention.
- FIGS. 11 A and 11B illustrate an angle histogram-type feature space according to an embodiment of the invention.
- an image text analysis system 100 employs a video processing device 110, video source 180, and possibly, monitor 185 to receive video input and generate and store character information embedded in it.
- Video processing device 110 receives video images, parses frames, isolates text areas and character regions, and classifies the text and/or character regions according to procedures discussed in detail below.
- Video is supplied from the video source 180.
- Video source 180 can be any source of video data including a VCR with a analog-digital converter (ADC), a disk with digitized video, a cable box with an ADC, a DVD or CD-ROM drive, digital video home system (DVHS), digital video recorder (DVR), hard disk drive (HDD), etc.
- ADC analog-digital converter
- DVD digital video home system
- DVR digital video recorder
- HDD hard disk drive
- Video source 180 may be capable of providing a few short clips or multiple clips, including longer length digitized video images. Video source 180 may provide video data in any analog or digital format, such as MPEG-2, MJPEG.
- Video processing device 110 may include image processor 120, RAM 130, storage 140, user I O card 150, video card 160, I/O buffer 170, and processor bus 175. Processor bus 175 transfers data between the various elements of video processing device 110.
- RAM 130 further comprises image text work space 132 and text analysis controller 134.
- Image processor 120 provides over- all control for video processing device 110 and performs the image processing required for image text analysis system 100, including analyzing text in video frames based upon system-selected and user-selected attributes.
- This also includes implementing editing processes, processing digitized video images for display on monitor 185 and/or storage in storage 140, and transferring data between the various elements of image text analysis system 100.
- image processor 120 The requirements and capabilities for image processor 120 are well known in the art and need not be described in greater detail, other than as required for the present invention.
- RAM 130 provides random access memory for temporary storage of data produced by video processing device 110, which is not otherwise provided by components within the system.
- RAM 130 includes memory for image text work space 132 and text analysis controller 134, as well as other memory required by image processor 120 and associated devices.
- Image text work space 132 represents the portion of RAM 130 in which video images associated with a particular video clip are temporarily stored during the text analysis process. Image text work space 132 allows copies of frames to be modified without affecting the original data, so that the original data may later be recovered.
- text analysis controller 134 represents the portion of RAM 130 dedicated to storage of an application program executed by image processor 120 that performs the analysis of video images on the basis of system- or user-defined text attributes.
- Text analysis controller 134 may execute well-known editing techniques, such as morphing or boundary detection between scenes, as well as the novel techniques for video text recognition associated with the present invention.
- Text Analysis controller 134 may also be embodied as a program on a CD-ROM, computer diskette, or other storage media that may be loaded into a removable disk port in storage 140 or elsewhere, such as in video source 180.
- Storage 140 comprises one or more disk systems, including removable disks
- storage 140 may be configured to interface with one or more bi-directional buses for the transfer of video and audio data to and from video source(s) 180, as well as the rest of the system. Storage 140 is capable of transferring data at video rates, as required. Storage 140 is sized to provide adequate storage for several minutes of video for editing purposes, including text attribute analysis. Depending upon specific applications and the capability of image processor 120, storage 140 may be configured to provide capability for storage of a large number of video clips.
- User I/O card 150 may interface various user device(s) (not shown) to the rest of image text analysis system 100.
- User I/O card 150 converts data received from the user devices to the format of interface bus 175 for transfer to image processor 120 or to RAM 130 for subsequent access by image processor 120.
- User I/O card 150 also transfers data to user output devices such as printers (not shown).
- Video card 160 provides an interface between monitor 185 and the rest of video processing device 110 through data bus 175.
- I/O buffer 170 interfaces between video source 180 and the rest of image text analysis system 100 through bus 175.
- video source 180 has at least one bi-directional bus to interface with I/O buffer 170.
- I/O buffer 170 transfers data to/from video source 180 at the required video image transfer rate.
- I/O buffer 170 transfers data received from video source 180 to storage 140, to image processor 120, or to RAM 130, as required. Simultaneous transfer of video data to image processor 120 provides means for displaying video image as they are received.
- a text extraction and recognition operation (as outlined in FIG. 2) 100 can be performed by the video processing device 110 or any other suitable device on a video sequence containing text, such as illustrated in FIGS. 3 A and 3B.
- Individual frames 305 are subjected to the procedure outlined in FIG. 2 to result in an isolation of individual text regions such as 310, 315, 360, 365, 370, and 375. Note that the procedure can be applied to an integral of multiple frames integrated to reduce the complexity of the background and increase the clarity of the text.
- image processor 120 may separate colors of one or more frames of the video image and store a reduced color image for use in extracting text.
- image processor 120 uses a red-green-blue (RGB) color space model to isolate the red component of the pixels.
- RGB red-green-blue
- FIG. 4A An example of how a text portion of a frame might look is shown in FIG. 4A. The red component is often the most useful for detecting white, yellow, and black colors, which are predominantly used for video text.
- the isolated red frame provides sharp, high- contrast edges for the common text colors.
- the current method may also be used to extract text that is not overlaid on the video but is actually part of it such as a film sequence that dwells on a billboard or street sign.
- the red frame may not be the best to use.
- a gray scale alpha channel
- image processor 120 may use various color space models, such as the gray scale image or the Y component of a YIQ video frame, etc.
- the isolated frame image is stored in image text work space 132.
- the captured image may be sharpened.
- the following 3x3 mask could be used: -1 -1 -1 -1 8 -1
- each pixel is the sum of eight times itself plus the negative of each of its neighbors.
- the above matrix representation for bitmap filters is a common notation in the art. There are many such derivative filters that are known in the art and invention contemplates the use of any of a variety of different techniques for isolating text regions. The above is merely a very simple example.
- the filtering step can include multiple passes, for example gradient detection along one dimension followed by gradient detection along the other dimension (while simultaneously smoothing in the respective orthogonal directions) followed by addition of the two filtering results.
- random noise may be reduced using, for example, a median filter as described by R.C. Gonzalez and R.E. Woods in "Digital Image Processing,” Addison- Wesley Publishing Company, Inc. (1992).
- Edge detection may employ another edge filter. Through this filter, the edges in the sharpened (red, gray-scale, etc.) image may be (and preferably are) amplified and non- edges, attenuated, using, for example, the following edge mask. -1 -1 -1
- each pixel is the sum of the above respective coefficients (weights) applied to itself and the neighboring pixels.
- FIG. 4C the results of the previous filtering steps is illustrated.
- the original image 163 is edge-filtered to result in a differential image 164, which is then edge-enhanced to result in a final image 165 that is subjected to the following filtering.
- a threshold edge filter or "edge detector” is applied.
- UEdge m n represents the m,n pixel of an xNedge image and F m n the enhanced image resulting from step S210, the following equation may be used for edge detection: Equation 1
- the values w tJ are weights from the edge mask. The outermost pixels may be ignored in the edge detection process. Note, again, that the sharpening filter may also be applied implicitly in this thresholding operation.
- the edge threshold L edge is a pre-determined threshold value, which may be a fixed-value or a variable value. The use of a fixed threshold may result in excessive salt and pepper noise and cause discontinuities in the fixed edges around the text. Known methods of opening (e.g., erosion followed by dilation) result in loss of parts of text.
- An adaptive threshold edge filter one with a variable threshold, ameliorates these tendencies and is a great improvement over the use of a static threshold.
- step S220 in one mode of adjusting the edge detection threshold, after a first fixed threshold is applied using the edge detector, the local threshold for any pixels neighboring (within a specified tolerance) edge pixels identified in the fixed threshold step is lowered, and the filter reapplied.
- the latter effect may as easily be accomplished by applying to the result of the threshold step, a smoothing function (assuming the result is stored with a pixel depth greater than two), and then thresholding again. This would cause pixels, marked as non-edges, to become marked as edges.
- the degree of threshold- lowering for a pixel preferably depends on the number of neighboring pixels marked as edges. The rationale behind this is that when neighboring pixels are edges, it is more likely that the current pixel is an edge. The edge pixels resulting from the lowering of their local threshold is not used for calculating the reduced threshold for neighboring pixels.
- a fixed threshold value may be used with a low-pass weighting function to insure that single or small numbers of non-edge pixels surrounded by strong edge pixels (pixels that have a high gradient) are marked as edge pixels.
- all the steps S210 - S220 described above can be described by a single numerical operation in the form of Equation 1 , but with wider ranges on the summations. Their separation into distinct steps should not be considered necessary or limiting and may depend on particulars of the computational equipment and software as well as other considerations.
- image processor 120 performs preliminary edge filtering to remove image regions that do not contain text or in which text cannot reliably be detected. For example, frames with an extreme paucity of edges, a very low edge density (number of edge pixels per unit area), or low degree of aggregation of edge pixels (that is, they do not form long-range structures, e.g., noise) may be excluded from further processing.
- Image processor 120 may perform edge filtering at different levels. For instance, edge filtering may be performed at a frame level or a sub-frame level. At the frame level, image processor 120 may ignore a frame if more than a reasonable fraction of the frame appears to be composed of edges. Alternatively, filter- functions such as spectral analysis can be applied to determine if the frame is likely to have too many edges. This could result from a high density of strong-edge objects in the frame. The assumption is that overly complex frames contain a high proportion of non-character detail and that it would be disproportionately burdensome to filter it through character classification.
- image processor 120 When frame-level filtering is used, image processor 120 maintains an edge counter to determine the number of edge pixels in the image frame. This, however, can lead to the skipping and ignoring of frames that contain intelligible text, such as frames with noisy portions as well as portions with intelligible text. To avoid the exclusion of such image frames or sub frames, image processor 120 may perform edge filtering at a sub-frame level. To do this, image processor 120 may divide the frame into smaller areas. To accomplish this, image processor 120 may, for example, divide the frame into three groups of pixel columns and three groups of pixel rows.
- image processor 120 determines the number of edges in each sub-frame and sets its associated counter accordingly. If a subframe has more than a predetermined number of edges, the processor may abandon that subframe.
- the predetermined maximum edge count per region may be set according to the amount of time required to process the image region or the probability that their size relative to the pixel density would render the accuracy of recognition below a desired minimum. A greater number of sub-frames may be utilized to insure against missing smaller regions of clean text surrounded by regions identified as uninterpretable.
- step S225 image processor 120 performs a connected component (CC) analysis on edges generated in the previous step.
- CC connected component
- This analysis groups all edge pixels that are contiguous within a specified tolerance. That is, every edge pixel that is adjacent, or within a certain distance of another edge pixel, is merged together with that pixel.
- this merging process defines structures, or connected components each having a contiguous or near-contiguous set of edge pixels. The motivation for this is that each text character region is assumed to correspond to a single CC.
- the tolerance may be set to any suitable value depending on the resolution of the image capture, the degree of upsampling (the proportion of pixels added by interpolation from the original image) or downsampling (the proportion of pixels removed from the original image).
- inadvertent gaps or breaks between CCs corresponding to contiguous characters may appear as a result of edge detection with a fixed threshold. For example, breaks such as shown at 171 or 172, can occur.
- the use of the edge- detection scheme described helps to insure the merging of such broken CC portions.
- the CC merging method results in the points in the breaks 174, 175, and 176 being identified as edge points and being merged into the single connected component structures at 181 and 182, respectively.
- the closing of "bad" breaks in connected regions can be accomplished by various mechanisms in addition to the particular method described above. For example, dilation could be applied after erosion or thinning.
- Equation 1 the gray scale depth of the binarized thresholded image resulting from the application of Equation 1 could be increased and then a smoothing function could be applied and thresholding (Equation 1) performed again.
- image processing techniques that could be used to accomplish the desired closing effect.
- Still another alternative is to mark pixels as edges when they are substantially surrounded by edge pixels in a contiguous series such as illustrated in FIG. 5C. That is, each of the 24 cases illustrated is a pixel with its neighborhood of eight pixels. In each of these cases, the neighborhood has 5 or more edge pixels in a contiguous series.
- the number in the contiguous series could be changed or special cases added to the group as well.
- the size of the matrices could be increased.
- the type of pixels favored to be marked as edges by an algorithm such as defined with respect to FIG. 5C are those where a pixel is deemed less likely to be part of a continuous break. A similar result may be obtained by closing
- the CC is a set of pixels determined to form a contiguous series with no non- edge pixels dividing one portion from another.
- a list is made of each CC, which contains the coordinate of the leftmost, rightmost, topmost, and bottommost pixels in the structure, along with an indication of the location of the structure, for example, the coordinates of the center of the structure.
- Also stored can be the number of pixels that form the connected component structure. Note that the pixel count represents the area of the particular connected component structure.
- Predetermined system and/or user thresholds may be used to define the maximum and minimum limits for area, height and width of the connected component structure to determine which connected component structures to pass on to the next processing stage.
- the last step is a filter to determine if a CC may qualify as a character or not.
- Other heuristics can be used to assemble CCs too small to meet CC heuristics by themselves or to split ones that are too large.
- image processor 120 sorts the connected components satisfying the criteria in the previous steps in ascending order based on the location of the bottom left pixel.
- Image processor 120 sorts on the basis of the pixel coordinate. The sorted list of Connected components is traversed to determine which CCs form blocks ("boxes") of text.
- Image processor 120 assigns the first CC to the first box and also as the initial or current box for analysis.
- Image processor 120 tests each subsequent CC to see if its bottommost pixel lies on the same horizontal line (or a nearby one) as the corresponding pixel of the first CC. That is, it is added to the current text box if its vertical location is close to that of the current CC. If it does, it is assumed to belong to the same line of text.
- the vertical coordinate difference threshold may be fixed or variable.
- the closeness of the horizontal coordinate of the second CC is a function of the height of the CCs.
- the horizontal distance of the candidate new addition to the current text box is also tested to see if it lies within an acceptable range.
- a new text box is generated with the failing CC marked as its first element. This process may result in multiple text boxes for a single line of text in the image.
- the next connected component in a series has a substantially different vertical coordinate or a horizontal coordinate that is lower than that of the last CC the current text box may be closed at the end of the horizontal traverse and a new one started.
- image processor 120 For each box, image processor 120 then performs a second level of merging for each of the text boxes created by the initial character merging process. This merges text boxes that might have been erroneously interpreted as separate lines of text and therefore placed in separate boxes.
- Image processor 120 compares each box to the text boxes following it for a set of conditions.
- the multiple test conditions for two text boxes are: a) The bottom of one box is within a specified vertical-spacing of the other, the spacing corresponding to an expected line spacing. Also, the horizontal spacing between the two boxes is less than a variable threshold based on the average width of characters in the first box. b) The center of either of the boxes lies within the area of the other text box, or c) The top of the first box overlaps with the bottom of the second text box and the left or right side of one box is within a few pixels of the left or right side of the other, respectively.
- image processor 120 deletes the second box from the list of text boxes and merges it into the first box. Image processor 120 repeats the process until all text boxes are tested relative to each other and combined as far as possible.
- step S235 image processor 120 accepts the text boxes obtained from step 235 as text lines if they conform to specified constraints of area, width and height. For each of the text boxes, image processor 120 extracts the sub-image corresponding to the text box from the original image. Image processor 120 then binarizes the subimage in preparation for character recognition. That is, the color depth is decreased to 2, with thresholding set to a value that insures the characters are properly set off from the background. This is a difficult problem and it may involved a number of steps, such as integrating multiple frames to simplify a complex background.
- the threshold for binarizing the image can be determined as follows.
- Image processor 120 modifies the text box image by calculating the average grayscale value of the pixels in the text box (AvgFG). This is used as the threshold for binarizing the image. Also calculated is the average grayscale value of a region (for example, 5 pixels) around the text box (AvgBG).
- the subimage is binarized by marking anything above AvgFG as white and anything below AvgFG as black.
- the average for the pixels marked as white, Avgl is calculated along with the average for the pixels marked as black, Avg2.
- image processor 120 compares Avgl and Avg2 to AvgBG.
- the region that has an average closer to AvgBG is assigned as the background and the other region is assigned as the foreground (or text). For example, if the black region average is closer to AvgBG, the black region is converted to white and vice versa. This assures that the text is always a consistent value for input to an OCR program.
- Image processor 120 subsequently stores the extracted frame text in image text work space 132 and the process continues with the next frame at process step 205. Note that, prior to local thresholding, a super-resolution step can be performed to enhance the text resolution. Next, individual character regions must be isolated before classification can be done.
- various heuristics may be used, for example, ratios of character height to width, ceilings and thresholds for height and width, etc. These heuristics generally fall into the category of predictions of permissible values for various dimensional features.
- Connected components may fail to correspond to a character because of a lack of clarity in the original text.
- another tool may be used for partitioning the characters along a horizontal line.
- One example is a vertical projection 425 that is a function of the horizontal coordinate and whose value is proportional to the number (and possibly also, the gray-scale value, as illustrated) of foreground pixels in a vertical column coinciding with the x-coordinate and contained within the current text box. That is, the vertical column over which the pixels are integrated does not exceed the size of the text box so only the current row of characters is measured this way.
- This "Gray-scale" vertical projection 425 may also be weighted by a window function 425 whose width is proportional to an expected width for the next character in a sequence.
- the result of weighting by the window function 425 is illustrated at 420.
- the minimum projection values may be used to define the left and right edges of the character.
- a method for isolating the character regions starts with a first CC and proceeds sequentially through a text box. Beginning at step S310 a first, or next, CC is selected. At step S312 the selected CC is tested against dimensional heuristics to see if the CC meets them. The heuristic tests on the CC may indicate that the CC is unlikely to be a full character or that it is too large and likely includes more than one character. If the CC is found to be too big in step S314, an alternative method of partitioning the characters is applied in step S316, for example, the Gray-scale projection described above.
- step S322 If the CC is found to be too small in step S322, the next CC is tested against the heuristics in step S318. If this shows, in step S320, that the following CC is too small also, the then the current and following CCs are merged in step S326 and flow proceeds back to step S310 until all the character regions have been isolated. If the following CC is not too small, the current CC is discarded in step S324 and flow proceeds to step S310. Referring to FIG. 7B, another way of partitioning the characters saves alternative character regions that fail the heuristics and attempts to classify the alternatives. Upon classification, the alternative that achieves the highest confidence level is chosen. Other character regions are then treated accordingly.
- step S330 a first, or next, CC is selected.
- the selected CC is tested against dimensional heuristics to see if the CC meets them. If the CC is found to be too big in step S334, an alternative method of partitioning the characters is applied in step S336. If the CC is found to be too small in step S338, the current CC, and the current CC combined with the next CC, are both retained as alternative character fields.
- step S310 When the character fields are submitted for classification as described below, a confidence measure is used to choose between the alternatives. Then flow proceeds back to step S310 until all the character regions have been isolated. If the break operation of step S336 produces a low-confidence measure, then the oversized and fractured fields are retained as alternatives for use in classification and the classification results used to choose between the alternatives.
- regions coinciding with characters need not be defined to be rectilinear boxes. They could be rubber-band type bounding regions (a convex polygon with an arbitrary number of sides) or an orthogonally convex rectilinear polygon (A rectilinear polygon in which every horizontal or vertical segment connecting two points inside also lies totally inside) or any other suitable shape substantially enclosing the interesting features of the expected symbols or characters.
- step S410 a first or sequential character region is selected.
- step S415 the part of the original image (or the red portion thereof) is then subjected to some appropriate image analysis to prepare for feature analysis.
- the image may be binarized (thresholded), gray-scaled image, binarized and thinned, etc.
- the pretreatment varies based on the feature space used. Referring also to FIGS. 9A-9D, for example, a feature space may make use of certain feature points, (as described below). The feature points are identifiable with skeleton characters and to derive these from the regular video characters (FIG.
- the image may be binarized (FIG. 9B) and then thinned (FIG. 9C). Then the feature points (FIG. 9D, 465-468) may be derived as the corner points 465, bends 466, crossing points 467, and end points 468 of the thinned character 460,470.
- This sort of image processing is well-suited to the angle- histogram feature space described below. A lower degree of image processing would be needed for calculating size-invariant moments. Note that other feature point definition systems may be used as well.
- the original character may be subjected to various different analyses to define a feature vector that may be applied to the inputs of a suitably- trained back propagation neural network (BPNN).
- BPNN back propagation neural network
- the unthinned or thinned characters may be used.
- the chosen feature vector is generated by the appropriate image analysis.
- a variety of these can be used.
- a number of different feature spaces have been defined for the application that concerns the instant patent.
- the defined feature spaces which are described in detail below, are size and rotation invariant and considered particularly suitable to video character classification using a BPNN classifier.
- a first feature space is derived from the feature points of the thinned character as illustrated by FIGS. 9A-9D.
- a Delaunay triangulation (FIG. 10A) or a Voronoy diagram (FIGL. 10B) is derived from the feature points 12.
- the image processor 120 performs the triangulation and then, for each triangle 1-6, generates an inventory of the internal angles. It then uses this inventory to generate a histogram of the angles, as shown illustrated in FIG. 11 A.
- the histogram simply represents the frequency of angles A, B, and C of a given size range in the set of triangles 1-6 defined by the triangulation. Note that other triangulation methods or polygon-generating methods can be used.
- a set of Voronoy polygons 17 and 18 can be used to define a set of angles A', B', and C, each associated with a vertex 14 of the Voronoy diagram.
- the angle histogram that results serves as the feature vector for the particular character from which the features points were derived.
- angle histogram is the use of only the two largest (or smallest) of the interior angles of each triangle.
- angle histogram is to use a two dimensional angle histogram instead of the one-dimensional angle histogram.
- the largest (or smallest) pair of angles for each triangle defines an ordered pair (ordered by size) for each triangle in the Delaunay triangulation (or each vertex of the Voronoy diagram).
- the first element of each ordered pair is used for the first dimension of the matrix and the second element, for the second dimension of the matrix. In this way, the association between angles is preserved as information for training and classifying using the BPNN classifier.
- Still another feature spaced considered particularly suitable for the video character BPNN classifier is an array of size-invariant moments. These moments are defined by the following equations. Although there is a large number of separate moments that could be used in the circumstance, a particular few are selected for this application. First, the pixel indices of the pixel location coinciding with the center of mass, i,j , is given by n m
- B[i][j] is 1 where the i th pixel of the thresholded image is a foreground pixel and 0 otherwise and A is the aggregate area of the foreground pixels given by
- the invariant moments selected for input to the BPNN are:
- ⁇ 3 ( %,o - ⁇ , 2 ) 2 +( 3 2 , ⁇ - o, 3 ) 2 ;
- ⁇ Ps ( 3 7 2 , ⁇ - Vo )3 )( 2, ⁇ - 7o, 3 )[3( 3,o " 7 ⁇ , 2 ) 2 - 3(7 2 , ⁇ - 7o, 3 ) 2 ] +
- ⁇ Po ( 2 ,o " o,2 )[(7 3 ,o - 3 i, ) 2 - ( , ⁇ - 7o,3 ⁇ ] + 4 7 ⁇ , ⁇ (73,o +7 ⁇ , 2 )(72, ⁇ -7o,3)
- each feature vector is applied to the trained BPNN which outputs various candidate classes and hopefully, depending on the inputs, one very strong candidate. If there are multiple candidate characters, a best guess may be made in step S430 by combining the probability output by the BPNN with frequency of use data for the presumed language and context. Such data may be compiled from different types of material, for example, television advertising transcripts, printed material, streaming or downloaded files the Internet. One way to combine is to weight the probabilities output by the BPNN by the corresponding probabilities associated with frequency-of-use statistics.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2001539232A JP2003515230A (en) | 1999-11-17 | 2000-10-27 | Method and system for separating categorizable symbols of video stream |
KR1020017008973A KR20010110416A (en) | 1999-11-17 | 2000-10-27 | Video stream classifiable symbol isolation method and system |
EP00975971A EP1147485A1 (en) | 1999-11-17 | 2000-10-27 | Video stream classifiable symbol isolation method and system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/441,943 US6614930B1 (en) | 1999-01-28 | 1999-11-17 | Video stream classifiable symbol isolation method and system |
US09/441,943 | 1999-11-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001037212A1 true WO2001037212A1 (en) | 2001-05-25 |
Family
ID=23754912
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2000/010730 WO2001037212A1 (en) | 1999-11-17 | 2000-10-27 | Video stream classifiable symbol isolation method and system |
Country Status (6)
Country | Link |
---|---|
US (1) | US6614930B1 (en) |
EP (1) | EP1147485A1 (en) |
JP (1) | JP2003515230A (en) |
KR (1) | KR20010110416A (en) |
CN (1) | CN1276384C (en) |
WO (1) | WO2001037212A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003090155A1 (en) * | 2002-04-19 | 2003-10-30 | Hewlett-Packard Development Company, L. P. | System and method for identifying and extracting character strings from captured image data |
EP1569162A1 (en) * | 2004-02-18 | 2005-08-31 | Samsung Electronics Co., Ltd. | Method and apparatus for detecting text of video |
CN104053048A (en) * | 2014-06-13 | 2014-09-17 | 无锡天脉聚源传媒科技有限公司 | Method and device for video localization |
Families Citing this family (119)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001060247A (en) * | 1999-06-14 | 2001-03-06 | Fuji Xerox Co Ltd | Device and method for image processing |
US7343617B1 (en) | 2000-02-29 | 2008-03-11 | Goldpocket Interactive, Inc. | Method and apparatus for interaction with hyperlinks in a television broadcast |
US7367042B1 (en) * | 2000-02-29 | 2008-04-29 | Goldpocket Interactive, Inc. | Method and apparatus for hyperlinking in a television broadcast |
US7346184B1 (en) | 2000-05-02 | 2008-03-18 | Digimarc Corporation | Processing methods combining multiple frames of image data |
GB2379312A (en) * | 2000-06-09 | 2003-03-05 | British Broadcasting Corp | Generation subtitles or captions for moving pictures |
JP3725418B2 (en) * | 2000-11-01 | 2005-12-14 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Signal separation method, image processing apparatus, and storage medium for restoring multidimensional signal from image data mixed with a plurality of signals |
WO2002047390A1 (en) | 2000-12-04 | 2002-06-13 | Isurftv | E-mail, telephone number or url within tv frame |
US6798912B2 (en) | 2000-12-18 | 2004-09-28 | Koninklijke Philips Electronics N.V. | Apparatus and method of program classification based on syntax of transcript information |
US6735337B2 (en) * | 2001-02-02 | 2004-05-11 | Shih-Jong J. Lee | Robust method for automatic reading of skewed, rotated or partially obscured characters |
AU2002348826A1 (en) * | 2001-11-30 | 2003-06-10 | Yissum Research Development Company Of The Hebrew University Of Jerusalem | System and method for providing multi-sensor super-resolution |
US20030113015A1 (en) * | 2001-12-18 | 2003-06-19 | Toshiaki Tanaka | Method and apparatus for extracting text information from moving image |
US8204079B2 (en) | 2002-10-28 | 2012-06-19 | Qualcomm Incorporated | Joint transmission of multiple multimedia streams |
US20040083495A1 (en) * | 2002-10-29 | 2004-04-29 | Lane Richard D. | Mulitmedia transmission using variable gain amplification based on data importance |
US7274823B2 (en) * | 2002-11-12 | 2007-09-25 | Qualcomm Incorporated | System and method for processing symbols embedded in digital video |
JP4112968B2 (en) * | 2002-12-26 | 2008-07-02 | 富士通株式会社 | Video text processing device |
US7336890B2 (en) * | 2003-02-19 | 2008-02-26 | Microsoft Corporation | Automatic detection and segmentation of music videos in an audio/video stream |
JP2006525537A (en) * | 2003-04-14 | 2006-11-09 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Method and apparatus for summarizing music videos using content analysis |
US20050097046A1 (en) | 2003-10-30 | 2005-05-05 | Singfield Joy S. | Wireless electronic check deposit scanning and cashing machine with web-based online account cash management computer application system |
US7805003B1 (en) * | 2003-11-18 | 2010-09-28 | Adobe Systems Incorporated | Identifying one or more objects within an image |
US7127104B2 (en) * | 2004-07-07 | 2006-10-24 | The Regents Of The University Of California | Vectorized image segmentation via trixel agglomeration |
US20060045346A1 (en) * | 2004-08-26 | 2006-03-02 | Hui Zhou | Method and apparatus for locating and extracting captions in a digital image |
US8108776B2 (en) * | 2004-08-31 | 2012-01-31 | Intel Corporation | User interface for multimodal information system |
US7873911B2 (en) * | 2004-08-31 | 2011-01-18 | Gopalakrishnan Kumar C | Methods for providing information services related to visual imagery |
US8576924B2 (en) * | 2005-01-25 | 2013-11-05 | Advanced Micro Devices, Inc. | Piecewise processing of overlap smoothing and in-loop deblocking |
US7792385B2 (en) * | 2005-01-25 | 2010-09-07 | Globalfoundries Inc. | Scratch pad for storing intermediate loop filter data |
US9769354B2 (en) | 2005-03-24 | 2017-09-19 | Kofax, Inc. | Systems and methods of processing scanned data |
US9137417B2 (en) | 2005-03-24 | 2015-09-15 | Kofax, Inc. | Systems and methods for processing video data |
US7522782B2 (en) * | 2005-04-06 | 2009-04-21 | Hewlett-Packard Development Company, L.P. | Digital image denoising |
US7965773B1 (en) | 2005-06-30 | 2011-06-21 | Advanced Micro Devices, Inc. | Macroblock cache |
US7636497B1 (en) * | 2005-12-27 | 2009-12-22 | Advanced Micro Devices, Inc. | Video rotation in a media acceleration engine |
US20080091713A1 (en) * | 2006-10-16 | 2008-04-17 | Candelore Brant L | Capture of television metadata via OCR |
US7966552B2 (en) * | 2006-10-16 | 2011-06-21 | Sony Corporation | Trial selection of STB remote control codes |
US20080098357A1 (en) * | 2006-10-23 | 2008-04-24 | Candelore Brant L | Phantom information commands |
US7814524B2 (en) * | 2007-02-14 | 2010-10-12 | Sony Corporation | Capture of configuration and service provider data via OCR |
US7689613B2 (en) * | 2006-10-23 | 2010-03-30 | Sony Corporation | OCR input to search engine |
US7991271B2 (en) | 2007-02-14 | 2011-08-02 | Sony Corporation | Transfer of metadata using video frames |
US20080098433A1 (en) * | 2006-10-23 | 2008-04-24 | Hardacker Robert L | User managed internet links from TV |
US8077263B2 (en) | 2006-10-23 | 2011-12-13 | Sony Corporation | Decoding multiple remote control code sets |
US8351677B1 (en) | 2006-10-31 | 2013-01-08 | United Services Automobile Association (Usaa) | Systems and methods for remote deposit of checks |
US7873200B1 (en) | 2006-10-31 | 2011-01-18 | United Services Automobile Association (Usaa) | Systems and methods for remote deposit of checks |
US8708227B1 (en) | 2006-10-31 | 2014-04-29 | United Services Automobile Association (Usaa) | Systems and methods for remote deposit of checks |
US8799147B1 (en) | 2006-10-31 | 2014-08-05 | United Services Automobile Association (Usaa) | Systems and methods for remote deposit of negotiable instruments with non-payee institutions |
US8763038B2 (en) * | 2009-01-26 | 2014-06-24 | Sony Corporation | Capture of stylized TV table data via OCR |
US8959033B1 (en) | 2007-03-15 | 2015-02-17 | United Services Automobile Association (Usaa) | Systems and methods for verification of remotely deposited checks |
US10380559B1 (en) | 2007-03-15 | 2019-08-13 | United Services Automobile Association (Usaa) | Systems and methods for check representment prevention |
US8438589B2 (en) | 2007-03-28 | 2013-05-07 | Sony Corporation | Obtaining metadata program information during channel changes |
US20080273114A1 (en) * | 2007-05-04 | 2008-11-06 | Hardacker Robert L | STB channel reader |
US8433127B1 (en) | 2007-05-10 | 2013-04-30 | United Services Automobile Association (Usaa) | Systems and methods for real-time validation of check image quality |
US8538124B1 (en) | 2007-05-10 | 2013-09-17 | United Services Auto Association (USAA) | Systems and methods for real-time validation of check image quality |
US9058512B1 (en) | 2007-09-28 | 2015-06-16 | United Services Automobile Association (Usaa) | Systems and methods for digital signature detection |
US9892454B1 (en) | 2007-10-23 | 2018-02-13 | United Services Automobile Association (Usaa) | Systems and methods for obtaining an image of a check to be deposited |
US9898778B1 (en) | 2007-10-23 | 2018-02-20 | United Services Automobile Association (Usaa) | Systems and methods for obtaining an image of a check to be deposited |
US8358826B1 (en) | 2007-10-23 | 2013-01-22 | United Services Automobile Association (Usaa) | Systems and methods for receiving and orienting an image of one or more checks |
US9159101B1 (en) | 2007-10-23 | 2015-10-13 | United Services Automobile Association (Usaa) | Image processing |
US8290237B1 (en) | 2007-10-31 | 2012-10-16 | United Services Automobile Association (Usaa) | Systems and methods to use a digital camera to remotely deposit a negotiable instrument |
US8320657B1 (en) | 2007-10-31 | 2012-11-27 | United Services Automobile Association (Usaa) | Systems and methods to use a digital camera to remotely deposit a negotiable instrument |
US7900822B1 (en) | 2007-11-06 | 2011-03-08 | United Services Automobile Association (Usaa) | Systems, methods, and apparatus for receiving images of one or more checks |
US10380562B1 (en) | 2008-02-07 | 2019-08-13 | United Services Automobile Association (Usaa) | Systems and methods for mobile deposit of negotiable instruments |
US8230039B2 (en) * | 2008-04-16 | 2012-07-24 | Adobe Systems, Incorporated | Systems and methods for accelerated playback of rich internet applications |
US8351678B1 (en) | 2008-06-11 | 2013-01-08 | United Services Automobile Association (Usaa) | Duplicate check detection |
US8422758B1 (en) | 2008-09-02 | 2013-04-16 | United Services Automobile Association (Usaa) | Systems and methods of check re-presentment deterrent |
US8320674B2 (en) * | 2008-09-03 | 2012-11-27 | Sony Corporation | Text localization for image and video OCR |
US10504185B1 (en) | 2008-09-08 | 2019-12-10 | United Services Automobile Association (Usaa) | Systems and methods for live video financial deposit |
JP5353170B2 (en) * | 2008-10-02 | 2013-11-27 | 富士通株式会社 | Recording reservation program, recording reservation method, and recording reservation device |
US8391599B1 (en) | 2008-10-17 | 2013-03-05 | United Services Automobile Association (Usaa) | Systems and methods for adaptive binarization of an image |
US8035656B2 (en) * | 2008-11-17 | 2011-10-11 | Sony Corporation | TV screen text capture |
WO2010079559A1 (en) * | 2009-01-06 | 2010-07-15 | 日本電気株式会社 | Credit information segment detection method, credit information segment detection device, and credit information segment detection program |
US8503814B2 (en) * | 2009-01-19 | 2013-08-06 | Csr Technology Inc. | Method and apparatus for spectrum estimation |
US8885967B2 (en) * | 2009-01-19 | 2014-11-11 | Csr Technology Inc. | Method and apparatus for content adaptive sharpness enhancement |
US8958605B2 (en) | 2009-02-10 | 2015-02-17 | Kofax, Inc. | Systems, methods and computer program products for determining document validity |
US9349046B2 (en) | 2009-02-10 | 2016-05-24 | Kofax, Inc. | Smart optical input/output (I/O) extension for context-dependent workflows |
US8774516B2 (en) | 2009-02-10 | 2014-07-08 | Kofax, Inc. | Systems, methods and computer program products for determining document validity |
US9576272B2 (en) | 2009-02-10 | 2017-02-21 | Kofax, Inc. | Systems, methods and computer program products for determining document validity |
US9767354B2 (en) | 2009-02-10 | 2017-09-19 | Kofax, Inc. | Global geographic information retrieval, validation, and normalization |
US8452689B1 (en) | 2009-02-18 | 2013-05-28 | United Services Automobile Association (Usaa) | Systems and methods of check detection |
US10956728B1 (en) | 2009-03-04 | 2021-03-23 | United Services Automobile Association (Usaa) | Systems and methods of check processing with background removal |
EP2457196A4 (en) * | 2009-07-21 | 2013-02-06 | Qualcomm Inc | A method and system for detection and enhancement of video images |
US8542921B1 (en) | 2009-07-27 | 2013-09-24 | United Services Automobile Association (Usaa) | Systems and methods for remote deposit of negotiable instrument using brightness correction |
US9779392B1 (en) | 2009-08-19 | 2017-10-03 | United Services Automobile Association (Usaa) | Apparatuses, methods and systems for a publishing and subscribing platform of depositing negotiable instruments |
US8977571B1 (en) | 2009-08-21 | 2015-03-10 | United Services Automobile Association (Usaa) | Systems and methods for image monitoring of check during mobile deposit |
US8699779B1 (en) | 2009-08-28 | 2014-04-15 | United Services Automobile Association (Usaa) | Systems and methods for alignment of check during mobile deposit |
WO2011080763A1 (en) * | 2009-12-31 | 2011-07-07 | Tata Consultancy Services Limited | A method and system for preprocessing the region of video containing text |
US9129340B1 (en) | 2010-06-08 | 2015-09-08 | United Services Automobile Association (Usaa) | Apparatuses, methods and systems for remote deposit capture with enhanced image detection |
TWI462576B (en) * | 2011-11-25 | 2014-11-21 | Novatek Microelectronics Corp | Method and circuit for detecting edge of logo |
US10380565B1 (en) | 2012-01-05 | 2019-08-13 | United Services Automobile Association (Usaa) | System and method for storefront bank deposits |
US9483794B2 (en) | 2012-01-12 | 2016-11-01 | Kofax, Inc. | Systems and methods for identification document processing and business workflow integration |
US9165187B2 (en) | 2012-01-12 | 2015-10-20 | Kofax, Inc. | Systems and methods for mobile image capture and processing |
US10146795B2 (en) | 2012-01-12 | 2018-12-04 | Kofax, Inc. | Systems and methods for mobile image capture and processing |
US9058580B1 (en) | 2012-01-12 | 2015-06-16 | Kofax, Inc. | Systems and methods for identification document processing and business workflow integration |
US9058515B1 (en) | 2012-01-12 | 2015-06-16 | Kofax, Inc. | Systems and methods for identification document processing and business workflow integration |
US8849041B2 (en) | 2012-06-04 | 2014-09-30 | Comcast Cable Communications, Llc | Data recognition in content |
US10552810B1 (en) | 2012-12-19 | 2020-02-04 | United Services Automobile Association (Usaa) | System and method for remote deposit of financial instruments |
KR101283189B1 (en) * | 2012-12-31 | 2013-07-05 | 한국항공우주연구원 | Apparatus and method for calculating satellite visibility |
US9311531B2 (en) | 2013-03-13 | 2016-04-12 | Kofax, Inc. | Systems and methods for classifying objects in digital images captured using mobile devices |
US9208536B2 (en) | 2013-09-27 | 2015-12-08 | Kofax, Inc. | Systems and methods for three dimensional geometric reconstruction of captured image data |
US9355312B2 (en) | 2013-03-13 | 2016-05-31 | Kofax, Inc. | Systems and methods for classifying objects in digital images captured using mobile devices |
US20140316841A1 (en) | 2013-04-23 | 2014-10-23 | Kofax, Inc. | Location-based workflows and services |
EP2992481A4 (en) | 2013-05-03 | 2017-02-22 | Kofax, Inc. | Systems and methods for detecting and classifying objects in video captured using mobile devices |
US11138578B1 (en) | 2013-09-09 | 2021-10-05 | United Services Automobile Association (Usaa) | Systems and methods for remote deposit of currency |
US9762950B1 (en) | 2013-09-17 | 2017-09-12 | Amazon Technologies, Inc. | Automatic generation of network pages from extracted media content |
US9286514B1 (en) | 2013-10-17 | 2016-03-15 | United Services Automobile Association (Usaa) | Character count determination for a digital image |
WO2015073920A1 (en) | 2013-11-15 | 2015-05-21 | Kofax, Inc. | Systems and methods for generating composite images of long documents using mobile video data |
US9036083B1 (en) | 2014-05-28 | 2015-05-19 | Gracenote, Inc. | Text detection in video |
US9378435B1 (en) * | 2014-06-10 | 2016-06-28 | David Prulhiere | Image segmentation in optical character recognition using neural networks |
JP6225844B2 (en) * | 2014-06-30 | 2017-11-08 | 株式会社デンソー | Object detection device |
US10133948B2 (en) * | 2014-07-10 | 2018-11-20 | Sanofi-Aventis Deutschland Gmbh | Device and method for performing optical character recognition |
US9760788B2 (en) | 2014-10-30 | 2017-09-12 | Kofax, Inc. | Mobile document detection and orientation based on reference object characteristics |
US20160323483A1 (en) * | 2015-04-28 | 2016-11-03 | Invent.ly LLC | Automatically generating notes and annotating multimedia content specific to a video production |
US10402790B1 (en) | 2015-05-28 | 2019-09-03 | United Services Automobile Association (Usaa) | Composing a focused document image from multiple image captures or portions of multiple image captures |
US10242285B2 (en) | 2015-07-20 | 2019-03-26 | Kofax, Inc. | Iterative recognition-guided thresholding and data extraction |
US9471990B1 (en) * | 2015-10-20 | 2016-10-18 | Interra Systems, Inc. | Systems and methods for detection of burnt-in text in a video |
RU2613734C1 (en) * | 2015-10-22 | 2017-03-21 | Общество с ограниченной ответственностью "Аби Девелопмент" | Video capture in data input scenario |
US20170148170A1 (en) * | 2015-11-24 | 2017-05-25 | Le Holdings (Beijing) Co., Ltd. | Image processing method and apparatus |
US9779296B1 (en) | 2016-04-01 | 2017-10-03 | Kofax, Inc. | Content-based detection and three dimensional geometric reconstruction of objects in image and video data |
US10803350B2 (en) | 2017-11-30 | 2020-10-13 | Kofax, Inc. | Object detection and image cropping using a multi-detector approach |
US11030752B1 (en) | 2018-04-27 | 2021-06-08 | United Services Automobile Association (Usaa) | System, computing device, and method for document detection |
CN109146908A (en) * | 2018-07-25 | 2019-01-04 | 安徽师范大学 | A kind of bianry image stream quick region identification algorithm |
US11900755B1 (en) | 2020-11-30 | 2024-02-13 | United Services Automobile Association (Usaa) | System, computing device, and method for document detection and deposit processing |
JP2022092837A (en) * | 2020-12-11 | 2022-06-23 | 株式会社東海理化電機製作所 | Control device and program |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0255419A1 (en) * | 1986-07-01 | 1988-02-03 | Thomson-Csf | Method for extracting and modeling the contours in an image and device for implementing this method |
EP0473476A1 (en) * | 1990-07-31 | 1992-03-04 | Thomson-Trt Defense | Straight edge real-time localisation device and process in a numerical image, especially for pattern recognition in a scene analysis process |
EP0720114A2 (en) * | 1994-12-28 | 1996-07-03 | Siemens Corporate Research, Inc. | Method and apparatus for detecting and interpreting textual captions in digital video signals |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4965763A (en) * | 1987-03-03 | 1990-10-23 | International Business Machines Corporation | Computer method for automatic extraction of commonly specified information from business correspondence |
US5131053A (en) * | 1988-08-10 | 1992-07-14 | Caere Corporation | Optical character recognition method and apparatus |
JPH03290774A (en) * | 1990-04-06 | 1991-12-20 | Fuji Facom Corp | Sentence area extracting device for document picture |
US5568571A (en) * | 1992-12-14 | 1996-10-22 | University Microfilms, Inc. | Image enhancement system |
US5774579A (en) * | 1995-08-11 | 1998-06-30 | Canon Kabushiki Kaisha | Block selection system in which overlapping blocks are decomposed |
US6009196A (en) * | 1995-11-28 | 1999-12-28 | Xerox Corporation | Method for classifying non-running text in an image |
US5852678A (en) * | 1996-05-30 | 1998-12-22 | Xerox Corporation | Detection and rendering of text in tinted areas |
US5892843A (en) * | 1997-01-21 | 1999-04-06 | Matsushita Electric Industrial Co., Ltd. | Title, caption and photo extraction from scanned document images |
US6128414A (en) * | 1997-09-29 | 2000-10-03 | Intermec Ip Corporation | Non-linear image processing and automatic discriminating method and apparatus for images such as images of machine-readable symbols |
JP3008908B2 (en) * | 1997-11-10 | 2000-02-14 | 日本電気株式会社 | Character extraction device and character extraction method |
US6366699B1 (en) * | 1997-12-04 | 2002-04-02 | Nippon Telegraph And Telephone Corporation | Scheme for extractions and recognitions of telop characters from video data |
-
1999
- 1999-11-17 US US09/441,943 patent/US6614930B1/en not_active Expired - Lifetime
-
2000
- 2000-10-27 JP JP2001539232A patent/JP2003515230A/en not_active Withdrawn
- 2000-10-27 WO PCT/EP2000/010730 patent/WO2001037212A1/en not_active Application Discontinuation
- 2000-10-27 EP EP00975971A patent/EP1147485A1/en not_active Withdrawn
- 2000-10-27 KR KR1020017008973A patent/KR20010110416A/en not_active Application Discontinuation
- 2000-10-27 CN CNB008050112A patent/CN1276384C/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0255419A1 (en) * | 1986-07-01 | 1988-02-03 | Thomson-Csf | Method for extracting and modeling the contours in an image and device for implementing this method |
EP0473476A1 (en) * | 1990-07-31 | 1992-03-04 | Thomson-Trt Defense | Straight edge real-time localisation device and process in a numerical image, especially for pattern recognition in a scene analysis process |
EP0720114A2 (en) * | 1994-12-28 | 1996-07-03 | Siemens Corporate Research, Inc. | Method and apparatus for detecting and interpreting textual captions in digital video signals |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003090155A1 (en) * | 2002-04-19 | 2003-10-30 | Hewlett-Packard Development Company, L. P. | System and method for identifying and extracting character strings from captured image data |
EP1569162A1 (en) * | 2004-02-18 | 2005-08-31 | Samsung Electronics Co., Ltd. | Method and apparatus for detecting text of video |
US7446817B2 (en) | 2004-02-18 | 2008-11-04 | Samsung Electronics Co., Ltd. | Method and apparatus for detecting text associated with video |
CN104053048A (en) * | 2014-06-13 | 2014-09-17 | 无锡天脉聚源传媒科技有限公司 | Method and device for video localization |
Also Published As
Publication number | Publication date |
---|---|
JP2003515230A (en) | 2003-04-22 |
US6614930B1 (en) | 2003-09-02 |
KR20010110416A (en) | 2001-12-13 |
CN1276384C (en) | 2006-09-20 |
EP1147485A1 (en) | 2001-10-24 |
CN1343339A (en) | 2002-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6614930B1 (en) | Video stream classifiable symbol isolation method and system | |
US6731788B1 (en) | Symbol Classification with shape features applied to neural network | |
Gllavata et al. | A robust algorithm for text detection in images | |
Gllavata et al. | Text detection in images based on unsupervised classification of high-frequency wavelet coefficients | |
Shivakumara et al. | A laplacian approach to multi-oriented text detection in video | |
Lyu et al. | A comprehensive method for multilingual video text detection, localization, and extraction | |
US6993185B2 (en) | Method of texture-based color document segmentation | |
US5335290A (en) | Segmentation of text, picture and lines of a document image | |
Ngo et al. | Video text detection and segmentation for optical character recognition | |
Lu et al. | Video text detection | |
Gllavata et al. | A text detection, localization and segmentation system for OCR in images | |
CN107368826A (en) | Method and apparatus for text detection | |
Agrawal et al. | Text extraction from images | |
Huang et al. | A new video text extraction approach | |
Vu et al. | Automatic extraction of text regions from document images by multilevel thresholding and k-means clustering | |
Liu et al. | A simple and fast text localization algorithm for indoor mobile robot navigation | |
Felhi et al. | Multiscale stroke-based page segmentation approach | |
Gllavata et al. | Finding text in images via local thresholding | |
Okun et al. | A survey of texture-based methods for document layout analysis | |
EP1147484A1 (en) | Symbol classification with shape features applied to neural network | |
Chun et al. | Text extraction in videos using topographical features of characters | |
Chen et al. | Video-text extraction and recognition | |
Ghai et al. | Comparison of Different Text Extraction Techniques for Complex Color Images | |
Al-Asadi et al. | Arabic-text extraction from video images | |
Awoke et al. | Ethiopic and latin multilingual text detection from images using hybrid techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 00805011.2 Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2000975971 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020017008973 Country of ref document: KR |
|
ENP | Entry into the national phase |
Ref document number: 2001 539232 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 2000975971 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1020017008973 Country of ref document: KR |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000975971 Country of ref document: EP |