CN112990142A

CN112990142A - Video guide generation method, device and equipment based on OCR (optical character recognition), and storage medium

Info

Publication number: CN112990142A
Application number: CN202110478515.2A
Authority: CN
Inventors: 许丹
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-06-18
Anticipated expiration: 2041-04-30
Also published as: CN112990142B

Abstract

The embodiment of the application belongs to the technical field of image processing in artificial intelligence and relates to a video guide generation method based on OCR. The application also provides a video guide generation device based on the OCR, computer equipment and a storage medium. In addition, the present application also relates to blockchain techniques in which the user's raw video data can be stored. According to the method and the device, the incidence relation between the key frame data is obtained by analyzing the key frame text data, so that the incidence relation can effectively help a learning user to quickly know the incidence of different knowledge contents, the capabilities of memorizing, clearing thought and capturing jump thinking of the user are effectively improved, and the learning efficiency of the user is further improved.

Description

Video guide generation method, device and equipment based on OCR (optical character recognition), and storage medium

Technical Field

The present application relates to the field of image processing technology in artificial intelligence, and in particular, to a method and an apparatus for generating a video guide based on OCR, a computer device, and a storage medium.

Background

Teaching video has become an important learning medium in some fields, especially in enterprise training. Compared with texts, the videos enable students to have more senses during learning, so that the students can obtain stronger learning interest and better learning experience. On the basis of the teaching video, the method for learning the teaching video by using the thinking guide graph is a convenient and effective learning method, and has the advantages of convenience in memory, excellence in clearing thought and catching jump thinking. The thinking guide graph is an effective means for enhancing the course understanding as the learning output.

In the existing video guide generation method, a video image of an original video is captured, and the video image is sequenced according to video playing time to obtain a series of video image sequences, so that the purpose of generating a sight video guide is achieved.

However, the conventional video guide generation method is generally not intelligent, and only depends on video playing time to establish the association relationship of different video images, so that the association between different video images is weak, the abilities of memorizing, clearing thought and capturing jump thinking of a user are weakened, and the learning efficiency of the user is reduced.

Disclosure of Invention

An embodiment of the present application aims to provide a video guide generation method and apparatus based on OCR, a computer device, and a storage medium, so as to solve the problem that the learning efficiency of a user is reduced by a conventional video guide generation method.

In order to solve the foregoing technical problem, an embodiment of the present application provides a video guide generating method based on OCR, which adopts the following technical solutions:

responding to a guide map generation request carrying original video data;

performing key frame extraction operation on the original video data to obtain key frame data carrying video time information;

sequentially performing text recognition operation on the key frame data based on an OCR technology and the sequence of the video time information to obtain key frame text data;

confirming key frame corresponding relation among the key frame data based on text content recorded in the key frame text data;

establishing an incidence relation between the key frame data based on the key frame corresponding relation to obtain a target video guide picture;

and outputting the target video guide picture.

In order to solve the foregoing technical problem, an embodiment of the present application further provides an OCR-based video guide generating apparatus, which adopts the following technical solution:

the request response module is used for responding to a guide map generation request carrying original video data;

the key frame extraction module is used for carrying out key frame extraction operation on the original video data to obtain key frame data carrying video time information;

the text recognition module is used for sequentially carrying out text recognition operation on the key frame data based on an OCR technology and the sequence of the video time information to obtain key frame text data;

a correspondence obtaining module, configured to determine a key frame correspondence between the key frame data based on text content recorded in the key frame text data;

the guide image acquisition module is used for establishing an incidence relation between the key frame data based on the key frame corresponding relation to obtain a target video guide image;

and the map output module is used for outputting the target video map.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

comprising a memory having computer readable instructions stored therein and a processor implementing the steps of the OCR-based video guide generation method as described above when the processor executes the computer readable instructions.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

the computer readable storage medium has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the OCR-based video guide generation method as described above.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

the application provides an OCR-based video guide generation method, which comprises the following steps: responding to a guide map generation request carrying original video data; performing key frame extraction operation on the original video data to obtain key frame data carrying video time information; sequentially performing text Recognition operation on the key frame data based on an OCR (Optical Character Recognition) technology and the sequence of the video time information to obtain key frame text data; confirming key frame corresponding relation among the key frame data based on text content recorded in the key frame text data; establishing an incidence relation between the key frame data based on the key frame corresponding relation to obtain a target video guide picture; and outputting the target video guide picture. The method comprises the steps of sequentially obtaining key frame data with two frames which are not completely the same through video time information of original video data, extracting key frame text data of the key frame data by utilizing an Optical Character Recognition (OCR) technology, obtaining the association relation between the key frame data based on the key frame text data, and finally splicing the key frame data based on the association relation to obtain a target video guide picture corresponding to the original video data.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is a flowchart illustrating an implementation of a method for generating a video guide based on OCR according to an embodiment of the present application;

FIG. 2 is a flowchart of an implementation of step S102 in FIG. 1;

FIG. 3 is a flowchart of an implementation of step S203 in FIG. 2;

FIG. 4 is a flowchart of an implementation of step S104 in FIG. 1;

FIG. 5 is a flowchart of an implementation of step S403 in FIG. 4;

FIG. 6 is a schematic structural diagram of an OCR-based video guide generation apparatus according to the second embodiment of the present application;

FIG. 7 is a schematic diagram of the structure of the key frame extraction module 120 in FIG. 6;

FIG. 8 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a flowchart for implementing an OCR-based video guide generation method provided in an embodiment of the present application is shown, and for convenience of description, only the portion related to the present application is shown.

The OCR-based video guide generation method comprises the following steps:

step S101, step S102, step S103, step S104, step S105, and step S106.

Step S101: and responding to a guide map generation request carrying original video data.

In the embodiment of the application, the original video data refers to a video object which needs to be learned or summarized by a user, and the original video data is usually relatively concise video content such as PPT (point-to-point) and the like so as to improve the accuracy of subsequent video frame extraction and OCR (optical character recognition); of course, the original video data may also be other complex video contents, and it should be understood that the example of the original video data is only for convenience of understanding and is not intended to limit the present application.

In the embodiment of the present application, the user may send the guide generation request through a user terminal, which may be a mobile terminal such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like, and it should be understood that the examples of the user terminal are only for convenience of understanding and are not limited to the present application.

Step S102: and performing key frame extraction operation on the original video data to obtain key frame data carrying video time information.

In the embodiment of the application, the key frame extraction operation is mainly used for screening repeated video frames, and whether the repeated video frames are repeated or not can be confirmed by comparing the picture similarity of two opposite pictures, so that key frame data can be extracted. In practical applications, cosine similarity or a hash algorithm may be used to measure the similarity of the pictures.

In the embodiment of the present application, taking original video data as a PPT picture as an example, since the PPT picture is a still picture before page turning, as long as the two previous and next pictures are not completely in phaseAnd meanwhile, judging that the PPT turns pages, namely extracting the next picture as a key frame and storing the corresponding time when the key frame appears. All key frames (key frames) and their corresponding times are recorded as

。

In the embodiment of the application, when too many teachers label PPT explanations in the content of the original video data, the situation that the number of key frame data is too many may be caused, and in practical application, the occurrence of repeated picture frames may be reduced by relaxing the similarity threshold of the picture similarity in the key frame extraction operation, so that the key frame data extracted by the key frame extraction operation is more simplified.

Step S103: and sequentially carrying out text recognition operation on the key frame data based on an OCR technology and the sequence of the video time information to obtain the key frame text data.

In the embodiment of the present application, OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.

In this embodiment of the application, the text recognition operation mainly obtains the video content text information of each key frame in sequence based on the OCR technology, and obtains the key frame text data after sorting based on the video time sequence corresponding to the key frame.

In the embodiment of the present application, at each key frame time point

In the above, each piece of text in each PPT page is obtained by using Optical Character Recognition (OCR) technology(Key Points) and location data corresponding to the text. Text and location information remain unchanged

Until the next key frame occurs.

Step S104: the key frame correspondence between the pieces of key frame data is confirmed based on the text content described in the key frame text data.

In the embodiment of the present application, the key frame correspondence relationship is mainly used to indicate an association relationship existing between different key frame data, where the association relationship may be an upper-lower level relationship, a same-level relationship, an inclusion relationship, and the like.

In the embodiment of the application, the corresponding relation of the key frames can be determined by judging whether the first key frame text data has the same type number format or not; if the first key frame text data does not have the same type number format, judging whether the next key frame text data has the same type number format; if the first key frame text data has the same type number format, determining the same type number format as a primary association relation, and determining other key frame text data corresponding to the same type number format content as a same level association relation; and obtaining the corresponding relation of the key frames after finishing the judgment operation of the text data of the last key frame.

Step S105: and establishing an incidence relation between key frame data based on the key frame corresponding relation to obtain the target video guide picture.

In the embodiment of the present application, since the key frame correspondence indicates that there is an association relationship between different key frame data, based on the association relationship and the construction method of the mind map, the mind map of the key frame data, that is, the target video map, is established.

Step S106: and outputting the target video guide picture.

In some optional embodiments of the present application, in the video playing process, when a user moves a mouse to any text and double-clicks, a Key Point interception function is triggered, and text content at this position will enter a web-side mind map editing window as a tag. This tag also stores trigger time information. The user edits the mind map in the mind map editor on the web side. The thought graph file is stored on the web end after the edition is finished and can be downloaded in a pdf form. When a user clicks the same video in the learning process or next time, the user can find the video position back by clicking the label to review. The completed mind map can be shared, becoming a new form of UGC.

In an embodiment of the present application, a video guide generation method based on OCR is provided, including: responding to a guide map generation request carrying original video data; performing key frame extraction operation on original video data to obtain key frame data carrying video time information; sequentially performing text recognition operation on the key frame data based on an OCR technology and the sequence of the video time information to obtain key frame text data; confirming key frame corresponding relation between each key frame data based on text content recorded in the key frame text data; establishing an incidence relation between key frame data based on the key frame corresponding relation to obtain a target video guide picture; and outputting the target video guide picture. The method comprises the steps of sequentially obtaining key frame data with two frames which are not completely the same through video time information of original video data, extracting key frame text data of the key frame data by utilizing an Optical Character Recognition (OCR) technology, obtaining the association relation between the key frame data based on the key frame text data, and finally splicing the key frame data based on the association relation to obtain a target video guide picture corresponding to the original video data.

With continued reference to fig. 2, a flowchart for implementing step S102 in fig. 1 is shown, and for convenience of explanation, only the portions relevant to the present application are shown.

In some optional implementation manners of this embodiment, step S102 specifically includes: step S201, step S202, and step S203.

Step S201: and carrying out video frame extraction operation on the original video data to obtain video frame data.

In the embodiment of the present application, the video frame extraction operation mainly functions to extract the above-mentioned original video data frame by frame into a single photo, i.e. video frame data. The video frame extraction operation may be implemented by an existing method of reading a video stream to extract a video frame, or by other technical means commonly used in the art, and it should be understood that the example of the video frame extraction operation is only for convenience of understanding, and is not limited to the present application.

Step S202: and performing screen capture sampling operation on the video frame data to obtain a video picture sequence.

Step S203: and carrying out similarity comparison on the video picture sequence and filtering the same pictures to obtain key frame data.

In the embodiment of the present application, the similarity comparison may be performed on the video picture sequence by using a cosine similarity or a hash algorithm.

In the embodiment of the application, taking original video data as a PPT picture as an example, since the PPT picture is a still picture before page turning, if two pictures before and after page turning are not identical, page turning of the PPT picture can be determined, and the next picture can be extracted as a key frame, and the corresponding time when the key frame appears is stored. All key frames (key frames) and their corresponding times are recorded as

。

Continuing to refer to fig. 3, a flowchart for implementing step S203 in fig. 2 is shown, and for convenience of illustration, only the portions relevant to the present application are shown.

In some optional implementation manners of this embodiment, step S203 specifically includes: step S301, step S302, step S303, and step S304.

Step S301: and sequentially carrying out similarity comparison operation on two adjacent pictures of the picture sequence to obtain the picture similarity.

Step S302: and judging whether the image similarity meets a preset similarity threshold value.

In the embodiment of the present application, the similarity threshold is mainly used for uniquely determining whether two compared pictures are the same.

Step S303: and if the picture similarity meets a preset similarity threshold, determining that the two adjacent picture pictures are the same.

Step S304: and if the picture similarity does not meet the preset similarity threshold, determining that the pictures of the two adjacent pictures are different, and taking the tail pictures of the two adjacent pictures as key frame data.

Continuing to refer to fig. 4, a flowchart for implementing step S104 in fig. 1 is shown, and for convenience of illustration, only the portions relevant to the present application are shown.

In some optional implementations of this embodiment, step S104 specifically includes: step S401, step S402, step S403, and step S404.

Step S401: and judging whether the first key frame text data has the same serial number format.

In the embodiment of the present application, since the video content of the teaching video is usually displayed in a structured form, and the text portions in the video content are distinguished by the numbering format based on the hierarchical relationship of the content itself, for example, the bullets, the numbers, the multilevel lists, and the like of the text paragraphs, based on the different numbering formats, the association relationship between different video frames can be quickly obtained.

In the embodiment of the present application, the same-class numbering format refers to a numbering format that is the same or consistent with a level list, and the same-class numbering format is mainly used for determining whether two video frames compared with each other can return to a same-class relationship, so as to construct a thought map in the following.

Step S402: if the first key frame text data does not have the same type number format, judging whether the next key frame text data has the same type number format.

Step S403: if the same type number format exists in the first key frame text data, the same type number format is determined as a first-level association relationship, and other key frame text data corresponding to the same type number format content are determined as a same-level association relationship.

Step S404: and obtaining the corresponding relation of the key frames after finishing the judgment operation of the text data of the last key frame.

In the embodiment of the application, the positions of the guide structures of the video frames in the whole teaching video data can be quickly obtained by screening the number formats in the key frame text data frame by frame, so that the efficiency of obtaining the corresponding relation of the key frames of the key frame data is improved.

Continuing to refer to fig. 5, a flowchart of an implementation of step S403 in fig. 4 is shown, and for convenience of illustration, only the portions relevant to the present application are shown.

In some optional implementation manners of this embodiment, step S403 specifically includes: step S501, step S502, step S503, and step S504.

Step S501: and respectively inputting the text data of other key frames into a semantic analysis model to perform word meaning identification operation to obtain real word meaning information.

In the embodiment of the application, because the explanation video content of the instructor is generally summarized into a short text, two PPT pages with different teaching contents and extremely similar text contents often exist, and the classification based on the same type of numbering format can confirm the relationship at the same level, so that the corresponding relationship of subsequent key frames is disordered, and the accuracy of the target video guide is influenced.

In the embodiment of the application, the semantic analysis model is a pre-trained deep recognition network model, and the semantic analysis model can acquire the real meaning of the target vocabulary by analyzing the associated text content.

In the embodiment of the present application, the real word sense information refers to a real word sense of an ambiguous word predicted by the semantic analysis model based on the associated text information, so as to avoid a situation of erroneous judgment.

Step S502: and judging whether the real word meaning information is the same as the content of the same type number format.

In the embodiment of the application, the real word meaning information and the similar number format are used for determining whether the two video frames are in the same level relationship, so that the phenomenon that the corresponding relationship of the key frames is disordered is effectively avoided.

Step S503: and if the real word meaning information is the same as the content of the same type number format, determining that the current key frame text data and the first key frame text data have an association relation.

Step S504: and if the real word meaning information is not the same as the content of the same type number format, determining that the current key frame text data and the first key frame text data do not have an association relation.

In the embodiment of the application, the real meaning of the text content in the key frame text data is analyzed to further confirm the incidence relation of different key frame data, so that the phenomenon of disordered corresponding relation of key frames is effectively avoided, and the accuracy of the target video guide picture is improved.

In summary, the present application provides a video guide generating method based on OCR, including: responding to a guide map generation request carrying original video data; performing key frame extraction operation on the original video data to obtain key frame data carrying video time information; sequentially performing text recognition operation on the key frame data based on an OCR technology and the sequence of the video time information to obtain key frame text data; confirming key frame corresponding relation among the key frame data based on text content recorded in the key frame text data; establishing an incidence relation between the key frame data based on the key frame corresponding relation to obtain a target video guide picture; and outputting the target video guide picture. The method comprises the steps of sequentially obtaining key frame data with two frames which are not completely the same through video time information of original video data, extracting key frame text data of the key frame data by utilizing an Optical Character Recognition (OCR) technology, obtaining the association relation between the key frame data based on the key frame text data, and finally splicing the key frame data based on the association relation to obtain a target video guide picture corresponding to the original video data. Meanwhile, by screening the number formats in the key frame text data frame by frame, the positions of the guide structures of all the video frames in the whole teaching video data can be quickly acquired, and the efficiency of acquiring the corresponding relation of the key frames of all the key frame data is further improved; by analyzing the real meaning of the text content in the key frame text data, the incidence relation of different key frame data is further confirmed, the phenomenon that the corresponding relation of the key frames is disordered is effectively avoided, and the accuracy of the target video guide picture is improved.

It is emphasized that, to further ensure the privacy and security of the original video data, the original video data may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

Example two

With further reference to fig. 6, as an implementation of the method shown in fig. 1, the present application provides an embodiment of an OCR-based video guide generation apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 1, and the apparatus may be applied to various electronic devices.

As shown in fig. 7, the OCR-based video guide generating apparatus 100 of the present embodiment includes: a request response module 110, a key frame extraction module 120, a text recognition module 130, a correspondence obtaining module 140, a guidance obtaining module 150, and a guidance output module 160. Wherein:

a request response module 110, configured to respond to a map generation request carrying original video data;

a key frame extraction module 120, configured to perform a key frame extraction operation on the original video data to obtain key frame data carrying video time information;

the text recognition module 130 is configured to perform text recognition operations on the key frame data in sequence based on an OCR technology and the sequence of the video time information to obtain key frame text data;

a correspondence obtaining module 140, configured to confirm a key frame correspondence between key frame data based on text content recorded in the key frame text data;

the guide image obtaining module 150 is configured to establish an association relationship between the key frame data based on the key frame correspondence relationship, so as to obtain a target video guide image;

and a guide output module 160 for outputting the target video guide.

。

In the embodiment of the present application, at each key frame time point

In the above, Optical Character Recognition (OCR) technology is used to obtain each piece of text (Key Points) in each page of PPT and the corresponding position data of the text. Text and location information remain unchanged

Until the next key frame occurs.

In the embodiment of the application, an OCR-based video guide generation device is provided, which sequentially obtains key frame data of two front and back frames that are not identical through video time information of original video data, extracts key frame text data of the key frame data by using an Optical Character Recognition (OCR) technology, obtains an association relationship between each key frame data based on the key frame text data, and finally performs a splicing operation on the key frame data based on the association relationship to obtain a target video guide corresponding to the original video data, wherein the association relationship between the key frame data is obtained by analyzing the key frame text data, so that the association relationship can effectively help a learning user to quickly obtain the association of different knowledge contents, and effectively improves the capabilities of the user in memorizing, clearing thought and capturing skip thought, thereby improving the learning efficiency of the user.

Continuing to refer to fig. 7, a schematic diagram of the structure of the key frame extraction module 120 in fig. 6 is shown, and for convenience of illustration, only the relevant portions of the present application are shown.

In some optional implementations of this embodiment, the key frame extraction module 120 further includes: a video frame extraction sub-module 121, a screen capture sampling sub-module 122, and a key frame extraction sub-module 123. Wherein:

the video frame extraction submodule 121 is configured to perform a video frame extraction operation on the historical video data to obtain video frame data;

the screen capture sampling submodule 122 is configured to perform screen capture sampling operation on video frame data to obtain a video picture sequence;

and the key frame extraction submodule 123 is configured to perform similarity comparison on the video picture sequence and filter the same picture to obtain the key frame data.

。

In some optional implementations of this embodiment, the key frame extraction sub-module 123 includes: the device comprises a similarity comparison unit, a similarity threshold judgment unit, a same picture unit and a different picture unit. Wherein:

the similarity comparison unit is used for sequentially carrying out similarity comparison operation on two adjacent pictures of the picture sequence to obtain picture similarity;

a similarity threshold judging unit, configured to judge whether the picture similarity satisfies a preset similarity threshold;

the picture identity unit is used for confirming that the pictures of the two adjacent pictures are identical if the picture similarity meets a preset similarity threshold;

and the picture difference unit is used for confirming that the pictures of the two adjacent pictures are different and taking the final picture of the two adjacent pictures as the key frame data if the picture similarity does not meet a preset similarity threshold value.

In some optional implementations of this embodiment, the correspondence obtaining module 104 includes: the system comprises a same-class number judgment sub-module, an absence sub-module, a presence sub-module and a corresponding relation acquisition sub-module. Wherein:

the same type number judging submodule is used for judging whether the first key frame text data has the same type number format or not;

the non-existence submodule is used for judging whether the next key frame text data has the same type number format or not if the first key frame text data does not have the same type number format;

the existing submodule is used for determining the same type number format as a primary association relationship if the same type number format exists in the first key frame text data, and determining other key frame text data corresponding to the same type number format content as a same level association relationship;

and the corresponding relation obtaining submodule is used for obtaining the corresponding relation of the key frames after the judgment operation of the last key frame text data is finished.

In summary, the present application provides an OCR-based video guide generation apparatus, which sequentially obtains key frame data of two front and back frames that are not identical through video time information of original video data, extracts key frame text data of the key frame data by using an Optical Character Recognition (OCR) technique, obtains an association relationship between the key frame data based on the key frame text data, and finally performs a splicing operation on the key frame data based on the association relationship to obtain a target video guide corresponding to the original video data, wherein the association relationship between the key frame data is obtained by analyzing the key frame text data, so that the association relationship can effectively help a learning user to quickly obtain the association of different knowledge contents, and effectively improve the abilities of the user to memorize, clear and capture a skip thought, thereby improving the learning efficiency of the user. Meanwhile, by screening the number formats in the key frame text data frame by frame, the positions of the guide structures of all the video frames in the whole teaching video data can be quickly acquired, and the efficiency of acquiring the corresponding relation of the key frames of all the key frame data is further improved; by analyzing the real meaning of the text content in the key frame text data, the incidence relation of different key frame data is further confirmed, the phenomenon that the corresponding relation of the key frames is disordered is effectively avoided, and the accuracy of the target video guide picture is improved.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 8, fig. 8 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 200 includes a memory 210, a processor 220, and a network interface 230 communicatively coupled to each other via a system bus. It is noted that only computer device 200 having

components

210 and 230 is shown, but it is understood that not all of the illustrated components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 210 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 210 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 210 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 200. Of course, the memory 210 may also include both internal and external storage devices of the computer device 200. In this embodiment, the memory 210 is generally used for storing an operating system installed in the computer device 200 and various types of application software, such as computer readable instructions of a video guide generation method based on OCR. In addition, the memory 210 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 220 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 220 is generally operative to control overall operation of the computer device 200. In this embodiment, the processor 220 is configured to execute the computer readable instructions or process data stored in the memory 210, for example, execute the computer readable instructions of the OCR-based video guide generation method.

The network interface 230 may include a wireless network interface or a wired network interface, and the network interface 230 is generally used to establish a communication connection between the computer device 200 and other electronic devices.

According to the computer equipment, the key frame data of two front and back frames which are not completely the same are sequentially acquired through the video time information of original video data, the key frame text data of the key frame data are extracted through an Optical Character Recognition (OCR) technology, the association relation between the key frame data is obtained based on the key frame text data, finally the key frame data are spliced based on the association relation to obtain the target video guide picture corresponding to the original video data, and the association relation between the key frame data is obtained through analysis based on the key frame text data, so that the association relation can effectively help a learning user to quickly know the association of different knowledge contents, the capacities of the user for memorizing, clearing thought and capturing skip thought are effectively improved, and the learning efficiency of the user is further improved.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the OCR-based video guide generation method as described above.

The computer-readable storage medium provided by the application sequentially acquires the key frame data of the front and the back images which are not completely the same through the video time information of the original video data, extracts the key frame text data of the key frame data by utilizing an Optical Character Recognition (OCR) technology, acquiring the incidence relation among the key frame data based on the key frame text data, finally performing splicing operation on the key frame data based on the incidence relation to obtain a target video guide map corresponding to the original video data, since the correlation between the key frame data is analyzed based on the key frame text data, the association relationship can effectively help the learning user to quickly acquire the association of different knowledge contents, effectively improves the abilities of memorizing, clearing thought and capturing jumping thinking of the user, and further improves the learning efficiency of the user.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. An OCR-based video guide generation method, comprising the steps of:

responding to a guide map generation request carrying original video data;

and outputting the target video guide picture.

2. An OCR-based video guide generation method according to claim 1, wherein the step of performing a key frame extraction operation on the original video data to obtain key frame data carrying video time information specifically comprises:

performing video frame extraction operation on the original video data to obtain video frame data;

performing screen capture sampling operation on the video frame data to obtain a video picture sequence;

and carrying out similarity comparison on the video picture sequence and filtering the same picture to obtain the key frame data.

3. An OCR-based video guide generation method according to claim 2, wherein the step of comparing the similarity of the video picture sequences and filtering the same pictures to obtain the key frame data specifically comprises:

sequentially carrying out similarity comparison operation on two adjacent pictures of the picture sequence to obtain picture similarity;

judging whether the image similarity meets a preset similarity threshold or not;

if the picture similarity meets a preset similarity threshold, confirming that the two adjacent picture pictures are the same;

and if the picture similarity does not meet a preset similarity threshold value, determining that the pictures of the two adjacent pictures are different, and taking the final pictures of the two adjacent pictures as the key frame data.

4. An OCR-based video guide generation method according to claim 1, wherein the step of obtaining the key frame correspondence of each key frame data based on the key frame text data specifically includes:

judging whether the first key frame text data has the same serial number format or not;

if the first key frame text data does not have the same type number format, judging whether the next key frame text data has the same type number format;

if the same type number format exists in the first key frame text data, determining the same type number format as a primary association relation, and determining other key frame text data corresponding to the same type number format content as a same level association relation;

and obtaining the corresponding relation of the key frames after finishing the judgment operation of the last key frame text data.

5. An OCR-based video guide generation method according to claim 4, wherein the step of determining the same-class number format as a primary association relationship and determining other key frame text data corresponding to the same-class number format content as a same-class association relationship specifically includes:

respectively inputting the text data of the other key frames into a semantic analysis model for word meaning identification operation to obtain real word meaning information;

judging whether the real word meaning information is the same as the content of the same type number format;

if the real word meaning information is the same as the same type number format content, confirming that the current key frame text data and the first key frame text data have an association relation;

and if the real word meaning information is not the same as the content of the same type number format, confirming that the current key frame text data does not have an association relation with the first key frame text data.

6. An OCR-based video guide generation method according to claim 1, wherein after the step of responding to the guide generation request carrying the original video data, further comprising:

storing the raw video data into a blockchain.

7. An OCR-based video guide generation apparatus, comprising:

and the map output module is used for outputting the target video map.

8. An OCR-based video guide generation apparatus according to claim 7, wherein said key frame extraction module comprises:

the video frame extraction submodule is used for carrying out video frame extraction operation on the original video data to obtain video frame data;

the screen capture sampling submodule is used for carrying out screen capture sampling operation on the video frame data to obtain a video picture sequence;

and the key frame extraction submodule is used for carrying out similarity comparison on the video picture sequence and filtering the same picture to obtain the key frame data.

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of an OCR-based video guide generation method according to any one of claims 1 to 6.

10. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of the OCR-based video guide generation method according to any one of claims 1 to 6.