CN116389855A - Video tagging method based on OCR - Google Patents

Video tagging method based on OCR Download PDF

Info

Publication number
CN116389855A
CN116389855A CN202310639256.6A CN202310639256A CN116389855A CN 116389855 A CN116389855 A CN 116389855A CN 202310639256 A CN202310639256 A CN 202310639256A CN 116389855 A CN116389855 A CN 116389855A
Authority
CN
China
Prior art keywords
video
image
ocr
analysis
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310639256.6A
Other languages
Chinese (zh)
Inventor
杨龑骄
田国彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kuangzhi Zhongke Beijing Technology Co ltd
Original Assignee
Kuangzhi Zhongke Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kuangzhi Zhongke Beijing Technology Co ltd filed Critical Kuangzhi Zhongke Beijing Technology Co ltd
Priority to CN202310639256.6A priority Critical patent/CN116389855A/en
Publication of CN116389855A publication Critical patent/CN116389855A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • H04N21/8405Generation or processing of descriptive data, e.g. content descriptors represented by keywords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Abstract

The invention discloses a video tag method based on OCR, which is used for respectively analyzing an input field command video and a radar video, defining the field command video as a left path video, defining the field radar video as a right path video, carrying out segment analysis on the field command video and the radar video, setting the interval of each segment of video through front end configuration, generating a video tag at each time interval, and obtaining the uniqueness of the video tag through analysis on historical interval data records. The method comprises the following specific steps: processing left path video: step one, continuously acquiring a video frame sequence, wherein each time interval T1, T1 is more than or equal to 1min, and setting the video frame sequence as a video analysis unit to obtain S video analysis units. When the method is implemented, each video unit obtains the unique label, and the purposes of quickly previewing the key event and saving more storage space are achieved.

Description

Video tagging method based on OCR
Technical Field
The invention belongs to the technical field of intelligent video analysis, and particularly relates to a video tag method based on OCR.
Background
Video tagging is a specific phrase used to describe video features, and tagging video can help users to quickly and efficiently retrieve video content. The existing video tag generation method mainly relies on manual marking, and for the online video tag generation method, the online video tag generation method mainly starts based on the aspects of image, video or voice text understanding and the like. From the image perspective, frames are mainly extracted from a video to obtain a picture, the picture is marked, and finally, image tags of the video are integrated to obtain a video tag. From the video perspective, the video label is mainly obtained by using a video understanding method.
In actual command and combat, a command video image and a radar video image are usually synthesized into one path of video through a video acquisition device, and a large amount of useless information exists in the video, so that the synthesized video data needs to be subjected to labeling treatment, and more useful video information is reserved.
Disclosure of Invention
Accordingly, the present invention has been made to solve the above-mentioned problems occurring in the prior art, by providing an OCR-based video tagging method.
In order to achieve the above object, the present invention provides the following technical solutions:
according to the video tagging method based on OCR, the input site command video and radar video are respectively analyzed, the site command video is defined as a left-way video, the site radar video is defined as a right-way video, the site command video and the radar video are subjected to segmentation analysis, the interval of each video can be set through front end configuration, a video tag is generated at each time interval, and the uniqueness of the video tag is obtained through analysis of historical interval data records.
Further, the specific steps are as follows:
processing left path video:
step one, a sequence of video frames is continuously acquired, with each time interval T1,
Figure SMS_1
setting the video analysis unit as one video analysis unit to obtain S video analysis units;
in the Si-th video analysis unit, taking a frame of image Ik every N seconds to extract edge characteristics, wherein the Si-th video analysis unit can obtain T60/N image characteristics containing edge information; the method comprises the following steps: firstly, carrying out graying treatment on an image Ik, and converting a color image into a gray image Gk; extracting the edges of the gray level image Gk, wherein the edge extraction algorithm can firstly obtain a binarization characteristic image by using conventional algorithms such as sobel, canny and the like in the prior art;
step three, carrying out normalization processing on each image feature containing edge information: the method comprises the following steps: counting the number of the binarized feature points and recording as
Figure SMS_2
The characteristic value is normalized to be a function of the characteristic value,
Figure SMS_3
where w represents the image width and h represents the image height, resulting in a set of T1 x 60/N-dimensional vector arrays si
Step three, each video unit is analyzed, and the current Arrayl is utilized si Vector and history ArrayL H Each vector in the queue performs Euclidean distance calculation, if the distance is greater than a threshold T, flag=0 is set, flag=1 is set, and the array is simultaneously used si Adding to history queue ArrayL H And the judgment basis of the next round.
Further, the specific steps are as follows:
right-way video processing:
s1, continuously collecting a right-path video frame sequence, T2,
Figure SMS_4
obtaining M right-path video analysis units for one video analysis unit;
s2, in a Mi video analysis unit, taking a frame of image Pk every N seconds to obtain T2 x 60/N pieces of image data to be analyzed;
s3, T1 x 60/N-dimensional vector array based on left-path video analysis s The RN vectors in the samples are recorded as
Figure SMS_5
For ArrayR N The vectors are ordered from big to small;
s4, selecting the first Q ArrayRs N OCR recognition is carried out on the image corresponding to the vector value;
s5, merging the recognition results based on a statistical method for the Q recognition results obtained by recognition, namely, performing comparative analysis on the recognition results of the OCR at the corresponding positions, and recording the statistical results in a form of a table;
s6, comparing the historical OCR information with the historical OCR information, if the similarity is larger than a threshold D, setting flagR=0, and setting the similarity smaller than or equal to D, setting flagR=1, and simultaneously adding the current OCR recognition information into a historical information queue to be used as a judgment basis of the next round.
Further, the specific steps are as follows,
and (3) data fusion processing:
the result value of each set video unit is obtained through the analysis of the left path video and the right path video, the next analysis is carried out according to the result value, and the method is concretely realized as follows,
obtaining state values of a left video flag L and a right video flag R;
performing AND operation on the status values of the flag L and the flag R, and when the status values are 1, indicating that the video tag belongs to a unique tag;
and combining the character information into a character string by utilizing unique tag information obtained by OCR, and then renaming the video unit to generate a tagged video file.
Further, the interval of each video may be 5 minutes, 10 minutes, or 15 minutes.
The invention has the following advantages:
when the method is implemented, each video unit obtains the unique label, and the purposes of quickly previewing the key event and saving more storage space are achieved.
Drawings
Fig. 1 is a flow chart of an OCR-based video tagging method provided in some embodiments of the present invention.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, in the video tagging method based on OCR according to the embodiment of the first aspect of the present invention, an input field command video and a radar video are respectively analyzed, the field command video is defined as a left path video, the field radar video is defined as a right path video, for more precisely tagging the two paths of videos, the video is segmented and analyzed, and an interval of each segment of video can be set through front end configuration, and the time interval is generally 5 minutes, 10 minutes and 15 minutes, that is, each time interval generates a video tag, and the uniqueness of the video tag is obtained through analysis of historical interval data records, which specifically includes the following steps:
left path video processing:
a sequence of video frames is continuously acquired, with each interval T1,
Figure SMS_6
setting the video analysis unit as one video analysis unit to obtain S video analysis units;
in the Si-th video analysis unit, taking a frame of image Ik every N seconds to extract edge characteristics, wherein the Si-th video analysis unit can obtain T1 x 60/N image characteristics containing edge information; the method comprises the following steps: firstly, carrying out graying treatment on an image Ik, and converting a color image into a gray image Gk; extracting the edges of the gray level image Gk, wherein the edge extraction algorithm can firstly obtain a binarization characteristic image by using conventional algorithms such as sobel, canny and the like in the prior art;
normalization processing is carried out on each image feature containing edge information: the method comprises the following steps: counting the number of the binarized feature points and recording as
Figure SMS_7
The characteristic value is normalized to be a function of the characteristic value,
Figure SMS_8
where w represents the image width and h represents the image height, resulting in a set of T60/N-dimensional vectors Arrayl si
Analyzing each video unit using the current Arrayl si Vector and history ArrayL H Each vector in the queue performs Euclidean distance calculation, if the distance is greater than a threshold T, flag=0 is set, flag=1 is set, and the array is simultaneously used si Adding to history queue ArrayL H And the judgment basis of the next round.
Right-way video processing:
the right video frame sequence, T2,
Figure SMS_9
obtaining M right-path video analysis units for one video analysis unit;
in a Mi video analysis unit, taking a frame of image Pk every N seconds to obtain T2 x 60/N pieces of image data to be analyzed;
t1 x 60/N-dimensional vector Arrayl based on left-path video analysis si The RN vectors in the samples are recorded as
Figure SMS_10
For ArrayR N The vectors are ordered from big to small;
selecting the first Q ArrayRs N OCR recognition is carried out on the image corresponding to the vector value;
the Q OCR recognition results obtained by recognition are combined based on a statistical method, and the recognition results are specifically: performing comparative analysis on OCR recognition results of corresponding positions, and recording statistical results in a form of a table;
comparing with the historical OCR information, if the similarity is larger than a threshold value D, setting FlagR=0, and setting FlagR=1, and meanwhile adding the current OCR recognition information into a historical information queue to be used as a judgment basis of the next round.
And (3) data fusion processing:
the result value of each set video unit is obtained through the analysis of the left path video and the right path video, and the next analysis is carried out according to the result value, and the specific implementation is as follows:
obtaining state values of a left video flag L and a right video flag R;
performing AND operation on the status values of the flag L and the flag R, and when the status values are 1, indicating that the video tag belongs to a unique tag;
and combining the character information into a character string by utilizing unique tag information obtained by OCR, and renaming the video unit to generate a tagged video file.
Standard parts used in the invention can be purchased from the market, special-shaped parts can be customized according to the description of the specification and the drawings, the specific connection modes of all parts adopt conventional means such as mature bolts, rivets and welding in the prior art, the machinery, the parts and the equipment adopt conventional modes in the prior art, and the circuit connection adopts conventional connection modes in the prior art, so that details are not described in detail in the specification, and the invention belongs to the prior art known to the person skilled in the art.
Although the present invention has been described with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described, or equivalents may be substituted for elements thereof, and any modifications, equivalents, improvements and changes may be made without departing from the spirit and principles of the present invention.

Claims (5)

1. The video tag method based on OCR is characterized in that the input site command video and radar video are respectively analyzed, the site command video is defined as a left path video, the site radar video is defined as a right path video, the site command video and the radar video are subjected to segmentation analysis, the interval of each segment of video can be set through front end configuration, a video tag is generated at each time interval, and the uniqueness of the video tag is obtained through analysis of historical interval data records.
2. The OCR based video tagging method of claim 1, comprising the specific steps of:
processing left path video:
step one, a sequence of video frames is continuously acquired, with each time interval T1,
Figure QLYQS_1
setting the video analysis unit as one video analysis unit to obtain S video analysis units;
in the Si-th video analysis unit, taking a frame of image Ik every N seconds to extract edge characteristics, wherein the Si-th video analysis unit can obtain T60/N image characteristics containing edge information; the method comprises the following steps: firstly, carrying out graying treatment on an image Ik, and converting a color image into a gray image Gk; extracting the edges of the gray level image Gk, wherein the edge extraction algorithm can firstly obtain a binarization characteristic image by using conventional algorithms such as sobel, canny and the like in the prior art;
step three, carrying out normalization processing on each image feature containing edge information: the method comprises the following steps: counting the number of the binarized feature points and recording as
Figure QLYQS_2
The characteristic value is normalized to be a function of the characteristic value,
Figure QLYQS_3
where w represents the image width and h represents the image height, resulting in a set of T1 x 60/N-dimensional vector arrays si
Step three, each video unit is analyzed, and the current Arrayl is utilized si Vector and history ArrayL H Each vector in the queue performs Euclidean distance calculation, if the distance is greater than a threshold T, flag=0 is set, flag=1 is set, and the array is simultaneously used si Adding to history queue ArrayL H And the judgment basis of the next round.
3. The OCR based video tagging method of claim 2, comprising the specific steps of:
right-way video processing:
s1, continuously collecting a right-path video frame sequence, T2,
Figure QLYQS_4
obtaining M right-path video analysis units for one video analysis unit;
s2, in a Mi video analysis unit, taking a frame of image Pk every N seconds to obtain T2 x 60/N pieces of image data to be analyzed;
s3, T1 x 60/N-dimensional vector array based on left-path video analysis s The RN vectors in the samples are recorded as
Figure QLYQS_5
For ArrayR N The vectors are ordered from big to small;
s4, selecting the first Q ArrayRs N OCR recognition is carried out on the image corresponding to the vector value;
s5, merging the recognition results based on a statistical method for the Q recognition results obtained by recognition, namely, performing comparative analysis on the recognition results of the OCR at the corresponding positions, and recording the statistical results in a form of a table;
s6, comparing the historical OCR information with the historical OCR information, if the similarity is larger than a threshold D, setting flagR=0, and setting the similarity smaller than or equal to D, setting flagR=1, and simultaneously adding the current OCR recognition information into a historical information queue to be used as a judgment basis of the next round.
4. The method for video tagging based on OCR according to claim 3, comprising the steps of,
and (3) data fusion processing:
the result value of each set video unit is obtained through the analysis of the left path video and the right path video, the next analysis is carried out according to the result value, and the method is concretely realized as follows,
obtaining state values of a left video flag L and a right video flag R;
performing AND operation on the status values of the flag L and the flag R, and when the status values are 1, indicating that the video tag belongs to a unique tag;
and combining the character information into a character string by utilizing unique tag information obtained by OCR, and then renaming the video unit to generate a tagged video file.
5. The OCR based video tagging method of claim 1, wherein the interval of each video segment may be 5 minutes, 10 minutes, or 15 minutes.
CN202310639256.6A 2023-06-01 2023-06-01 Video tagging method based on OCR Pending CN116389855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310639256.6A CN116389855A (en) 2023-06-01 2023-06-01 Video tagging method based on OCR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310639256.6A CN116389855A (en) 2023-06-01 2023-06-01 Video tagging method based on OCR

Publications (1)

Publication Number Publication Date
CN116389855A true CN116389855A (en) 2023-07-04

Family

ID=86971385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310639256.6A Pending CN116389855A (en) 2023-06-01 2023-06-01 Video tagging method based on OCR

Country Status (1)

Country Link
CN (1) CN116389855A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663391A (en) * 2012-02-27 2012-09-12 安科智慧城市技术(中国)有限公司 Image multifeature extraction and fusion method and system
CN113490049A (en) * 2021-08-10 2021-10-08 深圳市前海动竞体育科技有限公司 Sports event video editing method and system based on artificial intelligence
CN113534146A (en) * 2021-07-26 2021-10-22 中国人民解放军海军航空大学 Radar video image target automatic detection method and system
CN113901259A (en) * 2021-09-09 2022-01-07 特赞(上海)信息科技有限公司 Video annotation method and system based on artificial intelligence and storage medium
CN114089370A (en) * 2021-11-17 2022-02-25 海华电子企业(中国)有限公司 Method, system and equipment for processing radar echo video data vectorization
CN116027319A (en) * 2021-10-27 2023-04-28 南方海洋科学与工程广东省实验室(广州) Radar automatic labeling system and method based on radar photoelectric target fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663391A (en) * 2012-02-27 2012-09-12 安科智慧城市技术(中国)有限公司 Image multifeature extraction and fusion method and system
CN113534146A (en) * 2021-07-26 2021-10-22 中国人民解放军海军航空大学 Radar video image target automatic detection method and system
CN113490049A (en) * 2021-08-10 2021-10-08 深圳市前海动竞体育科技有限公司 Sports event video editing method and system based on artificial intelligence
CN113901259A (en) * 2021-09-09 2022-01-07 特赞(上海)信息科技有限公司 Video annotation method and system based on artificial intelligence and storage medium
CN116027319A (en) * 2021-10-27 2023-04-28 南方海洋科学与工程广东省实验室(广州) Radar automatic labeling system and method based on radar photoelectric target fusion
CN114089370A (en) * 2021-11-17 2022-02-25 海华电子企业(中国)有限公司 Method, system and equipment for processing radar echo video data vectorization

Similar Documents

Publication Publication Date Title
Shahab et al. ICDAR 2011 robust reading competition challenge 2: Reading text in scene images
US9626594B2 (en) Method and system to perform text-to-image queries with wildcards
US9384423B2 (en) System and method for OCR output verification
Yang et al. A framework for improved video text detection and recognition
US8315465B1 (en) Effective feature classification in images
Zhang et al. Automatic discrimination of text and non-text natural images
Karanje et al. Survey on text detection, segmentation and recognition from a natural scene images
Mirza et al. Urdu caption text detection using textural features
Dinh et al. An efficient method for text detection in video based on stroke width similarity
Gao et al. Automatic news video caption extraction and recognition
CN116389855A (en) Video tagging method based on OCR
Zhuge et al. Robust video text detection with morphological filtering enhanced MSER
Yang A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions
Kovar et al. Logo detection and classification in a sport video: video indexing for sponsorship revenue control
Paliwal et al. A survey on various text detection and extraction techniques from videos and images
Lokkondra et al. ETDR: An Exploratory View of Text Detection and Recognition in Images and Videos.
Wang et al. An efficient coarse-to-fine scheme for text detection in videos
Smitha et al. Illumination invariant text recognition system based on contrast limit adaptive histogram equalization in videos/images
Lalonde et al. Key-text spotting in documentary videos using adaboost
Islam et al. Towards a standard bangla photoocr: Text detection and localization
Al-Asadi et al. Arabic-text extraction from video images
Thounaojam et al. Video shot boundary detection using gray level cooccurrence matrix
Cao et al. Adaptive and robust feature selection for low bitrate mobile augmented reality applications
Dahake Face Recognition from Video using Threshold based Clustering
Aggarwal et al. Event summarization in videos

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230704