WO2020052085A1 - Procédé et dispositif de détection de texte vidéo et support d'informations lisible par ordinateur - Google Patents

Procédé et dispositif de détection de texte vidéo et support d'informations lisible par ordinateur Download PDF

Info

Publication number
WO2020052085A1
WO2020052085A1 PCT/CN2018/117715 CN2018117715W WO2020052085A1 WO 2020052085 A1 WO2020052085 A1 WO 2020052085A1 CN 2018117715 W CN2018117715 W CN 2018117715W WO 2020052085 A1 WO2020052085 A1 WO 2020052085A1
Authority
WO
WIPO (PCT)
Prior art keywords
image block
image
score
text
text information
Prior art date
Application number
PCT/CN2018/117715
Other languages
English (en)
Chinese (zh)
Inventor
周多友
王长虎
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020052085A1 publication Critical patent/WO2020052085A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present disclosure relates to the technical field of information processing, and in particular, to a video text detection method, device, and computer-readable storage medium.
  • the technical problem solved by the present disclosure is to provide a video text detection method to at least partially solve the technical problem that the OCR has a poor recognition effect and low recognition accuracy when recognizing small characters.
  • a video text detection device, a video text detection hardware device, a computer-readable storage medium, and a video text detection terminal are also provided.
  • a video text detection method includes:
  • the step of determining whether text information is included in the video to be detected according to a text detection result on the image block includes:
  • the method further includes:
  • a deep learning classification algorithm is used to perform training and learning on the labeled training samples to obtain an image classifier.
  • the step of segmenting the to-be-detected picture extracted from the to-be-detected video to obtain at least one image block includes:
  • the method further includes:
  • Text detection is performed on the image block by the image classifier, and a text detection result of the image block is determined according to a classification result of the image classifier.
  • the steps of performing text detection on the image block by the image classifier, and determining the text detection result of the image block based on the classification result of the image classifier include:
  • a text detection result of the image block is determined according to the score.
  • the step of determining a text detection result of the image block according to the score includes:
  • the score exceeds a preset score, determine that the image block contains text information; or, select a maximum score from the score, and if the maximum score exceeds the preset score, determine the maximum score.
  • the image block contains text information; or, if the score is smaller than a preset score, it is determined that the image block contains text information; or, a minimum score is selected from the scores, and if the minimum score is If the value is less than the preset score, it is determined that the image block contains text information.
  • the step of performing text detection on the image block by the image classifier, and determining a text detection result of the image block according to a classification result of the image classifier includes:
  • the output result is used as a text detection result of the image block.
  • a video text detection device includes:
  • a picture block module configured to block the pictures to be detected extracted from the videos to be detected to obtain at least one image block
  • a text determining module is configured to determine whether text information is included in the video to be detected according to a text detection result of the image block.
  • the text determination module is specifically configured to: perform text detection on each image block; if it is detected that any image block contains text information, determine that the video to be detected includes text information.
  • the device further includes:
  • a classifier training module configured to block pictures that have known text information and / or pictures that do not contain text information to obtain at least one image block as a training sample; and perform training on the training sample according to whether text information is included Labeling; using deep learning classification algorithms to train and learn the labeled training samples to obtain an image classifier.
  • the picture segmentation module is specifically configured to: input the picture to be detected into the image classifier, and divide the picture to be detected by the image classifier to obtain at least one image block;
  • the device further includes:
  • a text detection module is configured to perform text detection on the image block through the image classifier, and determine a text detection result of the image block according to a classification result of the image classifier.
  • the character detection module includes:
  • a scoring unit configured to score each image block through the image classifier to obtain a score value of each image block
  • a character detection unit configured to determine a character detection result of the image block according to the score.
  • the character detection unit is specifically configured to:
  • the score exceeds a preset score, determine that the image block contains text information; or, select a maximum score from the score, and if the maximum score exceeds a preset score, determine the image block. Contains text information; or, if the score is less than a preset score, determine that the image block contains text information; or select a minimum score from the scores, and if the minimum score is less than a preset score , It is determined that the image block contains text information.
  • the text detection module is specifically configured to perform text detection on each image block through the image classifier, and directly output any one of the following results through the image classifier: including text information and not including text information; The output result is used as a text detection result of the image block.
  • a video text detection hardware device includes:
  • Memory for storing non-transitory computer-readable instructions
  • a processor configured to run the computer-readable instructions, so that the processor, when executed, implements the steps described in any of the foregoing technical solutions of a video text detection method.
  • a computer-readable storage medium is used for storing non-transitory computer-readable instructions.
  • the computer is caused to execute any of the technical solutions of the video text detection method described above. The steps described.
  • a video text detection terminal includes any of the video text detection devices described above.
  • Embodiments of the present disclosure provide a video text detection method, a video text detection device, a video text detection hardware device, a computer-readable storage medium, and a video text detection terminal.
  • the video text detection method includes segmenting a to-be-detected picture extracted from the to-be-detected video to obtain at least one image block; and determining whether the to-be-detected video contains text information based on a text detection result of the image block.
  • the embodiment of the present disclosure first divides the to-be-detected picture extracted from the to-be-detected video into at least one image block, and then determines whether the to-be-detected video contains text information according to a text detection result on the image block. Improve text detection accuracy.
  • FIG. 1a is a schematic flowchart of a video text detection method according to an embodiment of the present disclosure
  • FIG. 1b is a schematic flowchart of a video text detection method according to another embodiment of the present disclosure.
  • 1c is a schematic flowchart of a video text detection method according to another embodiment of the present disclosure.
  • FIG. 2a is a schematic structural diagram of a video text detection device according to an embodiment of the present disclosure
  • FIG. 2b is a schematic structural diagram of a video text detection device according to another embodiment of the present disclosure.
  • FIG. 3 is a schematic structural diagram of a video text detection hardware device according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a video text detection terminal according to an embodiment of the present disclosure.
  • the video text detection method mainly includes the following steps S1 to S2. among them:
  • Step S1 Divide the detected pictures extracted from the videos to be detected to obtain at least one image block.
  • the picture to be detected may be one frame or multiple frames. When the picture to be detected is multiple frames, the pictures to be detected are divided into blocks.
  • the number of image blocks or the size of the image blocks may be specifically determined according to the size of the picture to be detected. Specifically, in order to improve the accuracy of text detection, a plurality of pictures of different sizes can be divided into blocks in advance, and text detection can be performed, and the optimal number or size of blocks can be determined according to the accuracy of text detection.
  • Step S2 Determine whether text information is included in the video to be detected according to the text detection result of the image block.
  • the text information includes, but is not limited to, any one or combination of numbers, Chinese characters, and foreign languages.
  • the text information can be enlarged by segmentation, thereby improving the accuracy of text detection.
  • At least one image block is obtained by dividing the to-be-detected picture extracted from the to-be-detected video, and then determining whether the to-be-detected video contains text information based on the text detection result of the image block, which can improve the accuracy of text detection.
  • step S2 includes:
  • the text detection method in the prior art can be used to perform text detection on the image block. Because the image to be detected is divided, the text contained in the image block may be incomplete. For example, the detected image block may be incomplete. Containing only a part of a text or a part of a text, it is determined that the image block contains text information.
  • At least one image block is obtained by dividing the to-be-detected picture extracted from the to-be-detected video, and each image block is subjected to text detection. If any image block is detected to contain text information, the video to be detected is determined. The text information is included in the block, and the text information contained in the picture to be detected can be enlarged by the block, thereby improving the accuracy of the text detection.
  • the method in this embodiment further includes:
  • S3 Block pictures that are known to contain text information and / or pictures that do not contain text information to obtain at least one image block as a training sample.
  • S4 Annotate training samples according to whether text information is included.
  • each image block needs to be labeled. For example, an image block containing text information is marked with 1 and an image block without text information is marked with 0.
  • S5 The deep learning classification algorithm is used to train and learn the labeled training samples to obtain an image classifier.
  • the deep learning classification algorithms include, but are not limited to, any of the following: Naive Bayes algorithm, artificial neural network algorithm, genetic algorithm, K-Nearest Neighbor (KNN) classification algorithm, clustering algorithm, and the like.
  • KNN K-Nearest Neighbor
  • the image classifier obtained through this embodiment not only has an automatic block function, but also can directly determine whether each image block contains text information.
  • step S1 specifically includes:
  • the picture to be detected is input to an image classifier, and the picture to be detected is divided into blocks by the image classifier to obtain at least one image block.
  • S6 Perform text detection on the image block through the image classifier, and determine the text detection result of the image block according to the classification result of the image classifier.
  • step S6 specifically includes:
  • S61 Score each image block by an image classifier to obtain a score value of each image block.
  • the score may be a normalized score, for example, any value from 0 to 100 or 0-1.
  • S62 Determine the text detection result of the image block according to the score.
  • step S62 specifically includes:
  • the score exceeds the preset score, determine that the image block contains text information; or, select the maximum score from the score, and if the maximum score exceeds the preset score, determine that the image block contains text information; or, If the score is smaller than the preset score, it is determined that the image block contains text information; or, the minimum score is selected from the scores; if the minimum score is smaller than the preset score, the image block is determined to contain text information.
  • a scoring rule can be set in advance. For example, the larger the score, the higher the probability that the character information is included, or the smaller the score, the higher the possibility that the character information is included. Based on the scoring rules set above, it is determined whether the image block contains text information.
  • step S6 specifically includes:
  • S63 Perform text detection on each image block through the image classifier, and directly output any one of the following results through the image classifier: including text information and not including text information.
  • the following is a device embodiment of the present disclosure.
  • the device embodiment of the present disclosure can be used to perform the steps implemented by the method embodiments of the present disclosure.
  • Only parts related to the embodiments of the present disclosure are shown. Specific technical details are not disclosed. Reference is made to the method embodiments of the present disclosure.
  • an embodiment of the present disclosure provides a video text detection device.
  • the device can perform the steps in the foregoing embodiment of the video text detection method.
  • the device mainly includes: a picture block module 21 and a text determination module 22; wherein the picture block module 21 is configured to block a picture to be detected extracted from a video to be detected to obtain at least one image Block; the text determining module 22 is configured to determine whether text information is included in a video to be detected according to a text detection result on an image block.
  • the picture to be detected may be one frame or multiple frames. When the picture to be detected is multiple frames, the pictures to be detected are divided into blocks.
  • the number of image blocks or the size of the image blocks may be specifically determined according to the size of the picture to be detected. Specifically, in order to improve the accuracy of text detection, a plurality of pictures of different sizes can be divided into blocks in advance, and text detection can be performed, and the optimal number or size of blocks can be determined according to the accuracy of text detection.
  • the text information includes, but is not limited to, any one or combination of numbers, Chinese characters, and foreign languages.
  • the text information can be enlarged by segmentation, thereby improving the accuracy of text detection.
  • the picture segmentation module 21 is used to divide the picture to be detected extracted from the video to be detected to obtain at least one image block, and then the text determination module 22 determines whether the video to be detected is based on the text detection result of the image block. Contains text information to improve text detection accuracy.
  • the text determination module 22 is specifically configured to: perform text detection on each image block; if it is detected that any image block contains text information, determine that the video to be detected contains text information.
  • the text determination module 22 may use the text detection methods in the prior art to perform text detection on image blocks. Because the pictures to be detected are divided, the text contained in the image blocks may be incomplete, for example, the detected image blocks It may contain only a part of a character or a part of a character. At this time, it is determined that the image block contains character information.
  • the picture segmentation module 21 is used to segment the pictures to be detected extracted from the video to be detected to obtain at least one image block
  • the text determination module 22 is used to perform text detection on each image block. If any image block is detected, If text information is included in the video, it is determined that the text information is included in the video to be detected. Since the text information contained in the image to be detected can be enlarged by segmentation, the accuracy of text detection is improved.
  • the apparatus in this embodiment further includes: a classifier training module 23; wherein the classifier training module 23 is configured to perform a process on pictures and / or pictures that already contain text information.
  • the pictures that do not contain text information are divided into blocks to obtain at least one image block as training samples; the training samples are labeled according to whether they contain text information; the deep learning classification algorithm is used to train and learn the labeled training samples to obtain an image classifier .
  • the classifier training module 23 needs to label each image block in order to distinguish different image blocks, that is, image blocks containing text information and image blocks that do not contain text information. For example, an image block containing text information is marked with 1 and an image block without text information is marked with 0.
  • the deep learning classification algorithms include, but are not limited to, any of the following: Naive Bayes algorithm, artificial neural network algorithm, genetic algorithm, K-Nearest Neighbor (KNN) classification algorithm, clustering algorithm, and the like.
  • KNN K-Nearest Neighbor
  • the image classifier obtained through this embodiment not only has an automatic block function, but also can directly determine whether each image block contains text information.
  • the picture blocking module 21 is specifically configured to: input a picture to be detected into an image classifier, and divide the picture to be detected by the image classifier to obtain at least one image block;
  • the device of this embodiment further includes a text detection module 24; wherein the text detection module 24 is configured to perform text detection on the image block through the image classifier, and determine the text detection result of the image block according to the classification result of the image classifier.
  • the text detection module 24 includes: a scoring unit 241 and a text detection unit 242; wherein the scoring unit 241 is configured to score each image block through an image classifier to obtain a score value of each image block; the text detection unit 242 is configured to: Determine the text detection result of the image block according to the score.
  • the score may be a normalized score, for example, any value from 0 to 100 or 0-1.
  • the character detection unit 242 is specifically configured to: if the score exceeds a preset score, determine that the image block contains text information; or select a maximum score from the scores, and if the maximum score exceeds the preset score, Determine that the image block contains text information; or, if the score is less than a preset score, determine that the image block contains text information; or select a minimum score from the scores, and if the minimum score is less than the preset score, It is determined that the image block contains text information.
  • a scoring rule can be set in advance. For example, the larger the score, the higher the probability that the character information is included, or the smaller the score, the higher the probability that the character information is included. Based on the scoring rules set above, it is determined whether the image block contains text information.
  • the text detection module 24 is specifically configured to: perform text detection on each image block through an image classifier, and directly output any of the following results through the image classifier: including text information and not including text information; and using the output result as an image Block text detection results.
  • FIG. 3 is a hardware block diagram illustrating a video text detection hardware device according to an embodiment of the present disclosure.
  • a video text detection hardware device 30 according to an embodiment of the present disclosure includes a memory 31 and a processor 32.
  • the memory 31 is configured to store non-transitory computer-readable instructions.
  • the memory 31 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory.
  • the volatile memory may include, for example, a random access memory (RAM) and / or a cache memory.
  • the non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like.
  • the processor 32 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and / or instruction execution capabilities, and may control other components in the video text detection hardware device 30 to perform a desired function.
  • the processor 32 is configured to run the computer-readable instructions stored in the memory 31, so that the video text detection hardware device 30 executes the foregoing video text detection method of the embodiments of the present disclosure. All or part of the steps.
  • this embodiment may also include well-known structures such as a communication bus and an interface. These well-known structures should also be included in the protection scope of the present disclosure. within.
  • FIG. 4 is a schematic diagram illustrating a computer-readable storage medium according to an embodiment of the present disclosure.
  • a computer-readable storage medium 40 according to an embodiment of the present disclosure stores non-transitory computer-readable instructions 41 thereon.
  • the non-transitory computer-readable instruction 41 is executed by a processor, all or part of the steps of the method for comparing video features of the foregoing embodiments of the present disclosure are performed.
  • the computer-readable storage medium 40 includes, but is not limited to, optical storage media (for example, CD-ROM and DVD), magneto-optical storage media (for example, MO), magnetic storage media (for example, magnetic tape or mobile hard disk), Non-volatile memory rewritable media (for example: memory card) and media with built-in ROM (for example: ROM box).
  • optical storage media for example, CD-ROM and DVD
  • magneto-optical storage media for example, MO
  • magnetic storage media for example, magnetic tape or mobile hard disk
  • Non-volatile memory rewritable media for example: memory card
  • media with built-in ROM for example: ROM box
  • FIG. 5 is a schematic diagram illustrating a hardware structure of a terminal according to an embodiment of the present disclosure. As shown in FIG. 5, the video text detection terminal 50 includes the foregoing video text detection device embodiment.
  • the terminal may be implemented in various forms, and the terminal in the present disclosure may include, but is not limited to, such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP ( Portable multimedia players), navigation devices, on-board terminals, on-board display terminals, on-board electronic rear-view mirrors, and other mobile terminals, and fixed terminals such as digital TVs, desktop computers, and the like.
  • a mobile phone such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP ( Portable multimedia players), navigation devices, on-board terminals, on-board display terminals, on-board electronic rear-view mirrors, and other mobile terminals, and fixed terminals such as digital TVs, desktop computers, and the like.
  • PDA personal digital assistant
  • PAD tablet computer
  • PMP Portable multimedia players
  • navigation devices
  • the terminal may further include other components.
  • the video text detection terminal 50 may include a power supply unit 51, a wireless communication unit 52, an A / V (audio / video) input unit 53, a user input unit 54, a sensing unit 55, an interface unit 56, and a control unit.
  • FIG. 5 shows a terminal with various components, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the wireless communication unit 52 allows radio communication between the terminal 50 and a wireless communication system or network.
  • the A / V input unit 53 is used to receive audio or video signals.
  • the user input unit 54 may generate key input data according to a command input by the user to control various operations of the terminal.
  • the sensing unit 55 detects the current state of the terminal 50, the position of the terminal 50, the presence or absence of a user's touch input to the terminal 50, the orientation of the terminal 50, the acceleration or deceleration movement and direction of the terminal 50, and the like, and generates a signal for controlling the terminal 50 commands or signals for operation.
  • the interface unit 56 functions as an interface through which at least one external device can be connected to the terminal 50.
  • the output unit 58 is configured to provide an output signal in a visual, audio, and / or tactile manner.
  • the memory 59 may store software programs and the like for processing and control operations performed by the controller 55, or may temporarily store data that has been output or is to be output.
  • the memory 59 may include at least one type of storage medium.
  • the terminal 50 may cooperate with a network storage device that performs a storage function of the memory 59 through a network connection.
  • the controller 57 generally controls the overall operation of the terminal.
  • the controller 57 may include a multimedia module for reproducing or playing back multimedia data.
  • the controller 57 may perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as characters or images.
  • the power supply unit 51 receives external power or internal power under the control of the controller 57 and provides appropriate power required to operate each element and component.
  • Various embodiments of the video feature comparison method proposed by the present disclosure may be implemented in a computer-readable medium using, for example, computer software, hardware, or any combination thereof.
  • various embodiments of the video feature comparison method proposed in the present disclosure can be implemented by using an application-specific integrated circuit (ASIC), a digital signal processor (DSP), a digital signal processing device (DSPD), and a programmable logic device. (PLD), field programmable gate array (FPGA), processor, controller, microcontroller, microprocessor, electronic unit designed to perform the functions described herein, and in some cases implemented
  • ASIC application-specific integrated circuit
  • DSP digital signal processor
  • DSPD digital signal processing device
  • PLD programmable logic device
  • FPGA field programmable gate array
  • processor controller
  • microcontroller microprocessor
  • electronic unit designed to perform the functions described herein and in some cases implemented
  • Various embodiments of the video feature comparison method proposed in the present disclosure may be implemented in the controller 57.
  • various embodiments of the video feature comparison method proposed by the present disclosure can be implemented with a separate software module that allows at least one function or operation to be performed.
  • the software codes may be implemented by a software application (or program) written in any suitable programming language, and the software codes may be stored in the memory 59 and executed by the controller 57.
  • an "or” used in an enumeration of items beginning with “at least one” indicates a separate enumeration such that, for example, an "at least one of A, B, or C” enumeration means A or B or C, or AB or AC or BC, or ABC (ie A and B and C).
  • the word "exemplary” does not mean that the described example is preferred or better than other examples.
  • each component or each step can be disassembled and / or recombined.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé de détection de texte vidéo, un dispositif de détection de texte vidéo, un dispositif matériel de détection de texte vidéo, et un support d'informations lisible par ordinateur. Le procédé de détection de texte vidéo comprend les étapes suivantes : sectionner une image à détecter extraite d'une vidéo à détecter, de façon à obtenir au moins un bloc d'image ; et déterminer si ladite vidéo comprend des informations de texte en fonction des résultats de détection de texte des blocs d'image. Dans des modes de réalisation de la présente invention, en premier lieu, l'image à détecter extraite de la vidéo à détecter est sectionnée pour obtenir au moins un bloc d'image, puis, il est déterminé si ladite vidéo comprend des informations de texte selon les résultats de détection de texte des blocs d'image ; de cette manière, la précision de détection de texte peut être améliorée.
PCT/CN2018/117715 2018-09-13 2018-11-27 Procédé et dispositif de détection de texte vidéo et support d'informations lisible par ordinateur WO2020052085A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811065276.2A CN109299682A (zh) 2018-09-13 2018-09-13 视频文字检测方法、装置和计算机可读存储介质
CN201811065276.2 2018-09-13

Publications (1)

Publication Number Publication Date
WO2020052085A1 true WO2020052085A1 (fr) 2020-03-19

Family

ID=65166772

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/117715 WO2020052085A1 (fr) 2018-09-13 2018-11-27 Procédé et dispositif de détection de texte vidéo et support d'informations lisible par ordinateur

Country Status (2)

Country Link
CN (1) CN109299682A (fr)
WO (1) WO2020052085A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753133A (zh) * 2020-06-11 2020-10-09 北京小米松果电子有限公司 视频分类方法、装置及存储介质
CN111832082A (zh) * 2020-08-20 2020-10-27 支付宝(杭州)信息技术有限公司 图文完整性检测方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050281455A1 (en) * 2004-06-17 2005-12-22 Chun-Chia Huang System of using neural network to distinguish text and picture in images and method thereof
CN104281850A (zh) * 2013-07-09 2015-01-14 腾讯科技(深圳)有限公司 一种文字区域识别方法和装置
CN104463103A (zh) * 2014-11-10 2015-03-25 小米科技有限责任公司 图像处理方法及装置
CN104484867A (zh) * 2014-12-30 2015-04-01 小米科技有限责任公司 图片处理方法及装置
CN106156777A (zh) * 2015-04-23 2016-11-23 华中科技大学 文本图片检测方法及装置
CN106257496A (zh) * 2016-07-12 2016-12-28 华中科技大学 海量网络文本与非文本图像分类方法
CN106385592A (zh) * 2016-08-31 2017-02-08 苏睿 图像压缩方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833664A (zh) * 2010-04-21 2010-09-15 中国科学院自动化研究所 基于稀疏表达的视频图像文字检测方法
CN102915438B (zh) * 2012-08-21 2016-11-23 北京捷成世纪科技股份有限公司 一种视频字幕的提取方法及装置
CN103955718A (zh) * 2014-05-15 2014-07-30 厦门美图之家科技有限公司 一种图像主体对象的识别方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050281455A1 (en) * 2004-06-17 2005-12-22 Chun-Chia Huang System of using neural network to distinguish text and picture in images and method thereof
CN104281850A (zh) * 2013-07-09 2015-01-14 腾讯科技(深圳)有限公司 一种文字区域识别方法和装置
CN104463103A (zh) * 2014-11-10 2015-03-25 小米科技有限责任公司 图像处理方法及装置
CN104484867A (zh) * 2014-12-30 2015-04-01 小米科技有限责任公司 图片处理方法及装置
CN106156777A (zh) * 2015-04-23 2016-11-23 华中科技大学 文本图片检测方法及装置
CN106257496A (zh) * 2016-07-12 2016-12-28 华中科技大学 海量网络文本与非文本图像分类方法
CN106385592A (zh) * 2016-08-31 2017-02-08 苏睿 图像压缩方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753133A (zh) * 2020-06-11 2020-10-09 北京小米松果电子有限公司 视频分类方法、装置及存储介质
CN111832082A (zh) * 2020-08-20 2020-10-27 支付宝(杭州)信息技术有限公司 图文完整性检测方法及装置

Also Published As

Publication number Publication date
CN109299682A (zh) 2019-02-01

Similar Documents

Publication Publication Date Title
WO2020052084A1 (fr) Procédé de sélection de couverture vidéo, dispositif et support d'informations lisible par ordinateur
WO2020052083A1 (fr) Procédé et dispositif de reconnaissance d'image de violation, et support d'informations lisible par ordinateur
US10963504B2 (en) Zero-shot event detection using semantic embedding
KR101428715B1 (ko) 인물 별로 디지털 컨텐츠를 분류하여 저장하는 시스템 및방법
KR102402511B1 (ko) 영상 검색 방법 및 이를 위한 장치
US8965115B1 (en) Adaptive multi-modal detection and fusion in videos via classification-based-learning
JP7152528B2 (ja) フェイシャル特殊効果による複数のフェイシャルの追跡方法、装置および電子機器
WO2021143624A1 (fr) Procédé de détermination d'étiquette vidéo, dispositif, terminal et support d'informations
US20170017844A1 (en) Image content providing apparatus and image content providing method
Escalante et al. A naive bayes baseline for early gesture recognition
CN111930809A (zh) 数据处理方法、装置及设备
WO2022227218A1 (fr) Procédé et appareil de reconnaissance de nom de médicament, dispositif informatique et support de stockage
TW201546636A (zh) 註解顯示器輔助裝置及輔助方法
US20210089334A1 (en) Electronic device and screen capturing method thereof
CN107861948B (zh) 一种标签提取方法、装置、设备和介质
CN107924398B (zh) 用于提供以评论为中心的新闻阅读器的系统和方法
WO2019137259A1 (fr) Procédé et appareil de traitement d'image, support de stockage et dispositif électronique
WO2020052085A1 (fr) Procédé et dispositif de détection de texte vidéo et support d'informations lisible par ordinateur
US20200236421A1 (en) Extracting Session Information From Video Content To Facilitate Seeking
WO2024179519A1 (fr) Procédé et appareil de reconnaissance sémantique
US11184670B2 (en) Display apparatus and control method thereof
CN111104572A (zh) 用于模型训练的特征选择方法、装置及电子设备
US8498978B2 (en) Slideshow video file detection
WO2024131398A1 (fr) Procédé et appareil d'interaction vocale et support de stockage
WO2018166499A1 (fr) Procédé et dispositif de classification de texte, et support de stockage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18933590

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23/06/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18933590

Country of ref document: EP

Kind code of ref document: A1