CN117648445A - Rich media training courseware knowledge graph construction method and device and computer equipment - Google Patents
Rich media training courseware knowledge graph construction method and device and computer equipment Download PDFInfo
- Publication number
- CN117648445A CN117648445A CN202311342061.1A CN202311342061A CN117648445A CN 117648445 A CN117648445 A CN 117648445A CN 202311342061 A CN202311342061 A CN 202311342061A CN 117648445 A CN117648445 A CN 117648445A
- Authority
- CN
- China
- Prior art keywords
- courseware
- rich media
- script
- time sequence
- knowledge graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 127
- 238000010276 construction Methods 0.000 title claims description 43
- 238000013515 script Methods 0.000 claims abstract description 178
- 238000007781 pre-processing Methods 0.000 claims abstract description 53
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000004590 computer program Methods 0.000 claims description 22
- 238000003058 natural language processing Methods 0.000 claims description 16
- 238000005516 engineering process Methods 0.000 claims description 15
- 238000012015 optical character recognition Methods 0.000 description 21
- 238000010586 diagram Methods 0.000 description 11
- 238000000605 extraction Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000007547 defect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Engineering & Computer Science (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a method, a device and computer equipment for constructing a knowledge graph of rich media training courseware. The method comprises the following steps: acquiring rich media training courseware; preprocessing the rich media training courseware to obtain a text script; and constructing a knowledge graph according to the text script. By implementing the method provided by the embodiment of the invention, the knowledge graph can be established for the rich media courseware library, and the scene requirements such as searching, intelligent recommendation and the like are met.
Description
Technical Field
The invention relates to a knowledge graph construction method, in particular to a knowledge graph construction method, a knowledge graph construction device and computer equipment for rich media training courseware.
Background
The rich media training courseware adopts various media elements such as characters, pictures, audio, video, animation and the like, and the training courseware manufactured by using rich media is vivid, visual and easy to understand, so that the rich media training courseware is widely used in electric power safety education and training. The electric power training department generally accumulates a large number of rich media courseware, and establishes a rich media courseware library mainly composed of video and audio courseware. Rich media cannot be directly indexed and searched because its content is presented in multimedia form of video, audio, and not traditional text. Search engines typically rely on text and markup to understand and organize content, while content in rich media is not easily interpreted and analyzed directly. The existing method for increasing the searchability of the rich media courseware is to add metadata to the video, including adding the metadata such as titles, descriptions, keywords and labels to the video, so that a search engine can better understand the content and the theme of the rich media courseware. The scheme has a plurality of defects, such as incapability of realizing complete searching of knowledge in courseware because only a small part of information in the courseware is contained; the search content cannot be directly located to a specific position of the video, semantic search cannot be realized, and the like.
Therefore, a new method is necessary to be designed, so that a knowledge graph is established for the rich media courseware library, and scene requirements such as searching, intelligent recommendation and the like are met.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a rich media training courseware knowledge graph construction method, a rich media training courseware knowledge graph construction device and computer equipment.
In order to achieve the above purpose, the present invention adopts the following technical scheme: the rich media training courseware knowledge graph construction method comprises the following steps:
acquiring rich media training courseware;
preprocessing the rich media training courseware to obtain a text script;
and constructing a knowledge graph according to the text script.
The further technical scheme is as follows: the preprocessing of the rich media training courseware to obtain a text script comprises the following steps:
preprocessing the audio courseware in the rich media training courseware to obtain a courseware time sequence script;
preprocessing the video courseware in the rich media training courseware to obtain a voice and courseware time sequence script and an image and courseware time sequence script;
and combining the courseware time sequence script, the voice and courseware time sequence script and the image and courseware time sequence script to obtain a character script.
The further technical scheme is as follows: the preprocessing of the audio courseware in the rich media training courseware to obtain a courseware time sequence script comprises the following steps:
and identifying the voice speaking content of the audio courseware in the rich media training courseware by using voice identification and natural language processing technology, converting the voice speaking content into text, and generating a courseware time sequence script.
The further technical scheme is as follows: preprocessing the video courseware in the rich media training courseware to obtain a time sequence script of voice and courseware and a time sequence script of image and courseware, wherein the method comprises the following steps:
extracting video courseware audio tracks from the video courseware in the rich media training courseware, and identifying voice speaking content by using voice identification and natural language processing technology so as to generate a voice and courseware time sequence script containing time sequence information;
extracting video courseware in the rich media training courseware frame by frame to form pictures;
OCR recognition is carried out on the designated area of the picture so as to generate an image containing time sequence information and a courseware time sequence script.
The further technical scheme is as follows: the OCR is performed on the designated area of the picture to generate an image containing time sequence information and a courseware time sequence script, which comprises the following steps:
OCR recognition is carried out on the appointed area of the picture so as to extract caption information;
and removing repeated captions from the caption information to generate images and courseware time sequence scripts containing time sequence information.
The further technical scheme is as follows: the construction of the knowledge graph according to the text script comprises the following steps:
extracting a named entity from the text script;
extracting the relation of the named entity from the text script;
constructing an entity relation triplet with a context by adopting the relation of the named entity;
and constructing a knowledge graph by using the entity relation triplet.
The invention also provides a rich media training courseware knowledge graph construction device, which comprises:
the courseware acquisition unit is used for acquiring rich media training courseware;
the preprocessing unit is used for preprocessing the rich media training courseware to obtain a text script;
and the construction unit is used for constructing a knowledge graph according to the text script.
The further technical scheme is as follows: the preprocessing unit includes:
the first preprocessing subunit is used for preprocessing the audio courseware in the rich media training courseware so as to obtain a courseware time sequence script;
the second preprocessing subunit is used for preprocessing video courseware in the rich media training courseware so as to obtain a voice and courseware time sequence script and an image and courseware time sequence script;
and the combining subunit is used for combining the courseware time sequence script, the voice and courseware time sequence script and the image and courseware time sequence script to obtain a text script.
The invention also provides a computer device which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the method when executing the computer program.
The present invention also provides a storage medium storing a computer program which, when executed by a processor, implements the above method.
Compared with the prior art, the invention has the beneficial effects that: according to the invention, the text script capable of reflecting the content of the rich media training courseware is extracted through methods such as voice recognition, image recognition and natural language processing, and the knowledge graph with time sequence information is constructed based on the text script, so that the problem that the rich media courseware cannot construct the knowledge graph because the entity and entity relation cannot be directly extracted is solved, the support of semantic search of the rich media training library is realized, the search result can be directly positioned to a specific position of the rich media courseware, the time sequence information is stored in the knowledge graph, the search result can be directly positioned to the specific position of the rich media courseware, the knowledge graph is established for the rich media courseware library, and the scene requirements such as search and intelligent recommendation are met.
The invention is further described below with reference to the drawings and specific embodiments.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an application scenario schematic diagram of a rich media training courseware knowledge graph construction method provided by an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for constructing knowledge graphs of rich media training courseware according to an embodiment of the present invention;
FIG. 3 is a schematic sub-flowchart of a method for constructing a knowledge graph of a rich media training courseware according to an embodiment of the present invention;
FIG. 4 is a schematic sub-flowchart of a method for constructing a knowledge graph of a rich media training courseware according to an embodiment of the present invention;
FIG. 5 is a schematic sub-flowchart of a method for constructing a knowledge graph of a rich media training courseware according to an embodiment of the present invention;
FIG. 6 is a schematic sub-flowchart of a method for constructing a knowledge graph of a rich media training courseware according to an embodiment of the present invention;
FIG. 7 is a schematic block diagram of a rich media training courseware knowledge graph construction device provided by an embodiment of the invention;
FIG. 8 is a schematic block diagram of a preprocessing unit of the rich media training courseware knowledge graph construction device provided by an embodiment of the invention;
FIG. 9 is a schematic block diagram of a second preprocessing subunit of the rich media training courseware knowledge graph construction apparatus provided by an embodiment of the present invention;
FIG. 10 is a schematic block diagram of an identification module of a rich media training courseware knowledge graph construction device provided by an embodiment of the invention;
FIG. 11 is a schematic block diagram of a construction unit of the rich media training courseware knowledge graph construction device provided by the embodiment of the invention;
fig. 12 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic diagram of an application scenario of a knowledge graph construction method for rich media training courseware according to an embodiment of the present invention. Fig. 2 is a schematic flowchart of a method for constructing a knowledge graph of a rich media training courseware according to an embodiment of the present invention. The knowledge graph construction method for the rich media training courseware is applied to a server, the server performs data interaction with a terminal, the knowledge graph is constructed by utilizing each courseware in the rich media training courseware library, and the knowledge graph contains most of knowledge contained in the rich media courseware library, so that effective support can be provided for application scenes such as semantic search, intelligent recommendation and the like of the rich media courseware library. Particularly, the knowledge graph constructed by the method comprises time sequence information (namely specific appearance positions, such as the seconds of video courseware comprising the knowledge entity) of the knowledge entity relationship in the corresponding rich media courseware, and when the knowledge graph is applied to a searching scene, the specific positions comprising the searching results can be accurately positioned in the courseware by using the time sequence information, so that a user does not need to completely watch the positions of the whole courseware to manually position the searching results, the user experience is effectively improved, and the searching efficiency is improved.
Fig. 2 is a schematic flow chart of a method for constructing a knowledge graph of a rich media training courseware according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S130.
S110, acquiring rich media training courseware.
In this embodiment, the rich media training courseware refers to each courseware from the rich media training courseware library,
s120, preprocessing the rich media training courseware to obtain a text script.
In this embodiment, the text script refers to a text script extracted from an audio courseware and a video courseware.
In one embodiment, referring to fig. 3, the step S120 may include steps S121 to S123.
S121, preprocessing the audio courseware in the rich media training courseware to obtain a courseware time sequence script.
In this embodiment, when the courseware time sequence script rich media courseware generates a text script, each sentence is saved, and meanwhile, corresponding time sequence information needs to be saved, that is, the occurrence time of the sentence in the rich media courseware, the finally generated script is a set of < timestamp, sentence > pairs, and the specific physical saving mode of the script is not limited. One implementation is to save in the txt file in a format of "< timestamp 1, sentence 1> < timestamp 2, sentence 2 >".
And converting the rich media courseware content into a text script, and creating a knowledge graph based on the text script. In order to enable a user to directly jump to a video/audio position corresponding to a search result in a rich media courseware when viewing the search result by using the established knowledge graph, a courseware time sequence script is constructed.
Specifically, voice recognition and natural language processing technology are used for the audio courseware in the rich media training courseware to recognize the voice speaking content, and the voice speaking content is converted into text to generate a courseware time sequence script.
Wherein, the voice speaking content comprises human languages such as side white, dialogue and the like.
For audio training courseware such as mp3, WAV and the like, for example, important leading speech, songs and the like, the contents of the courseware are basically contained in a voice part, so that the voice part is converted into characters by using a voice recognition technology, and the characters are stored as courseware time sequence scripts of the courseware. The script thus obtained can retain most of the information of the courseware. The specific speech recognition method is the prior art, and can be implemented by using a SAN-M model in an intelligent speech interaction service, such as alicloud, and will not be described herein.
S122, preprocessing the video courseware in the rich media training courseware to obtain a voice and courseware time sequence script and an image and courseware time sequence script.
In this embodiment, the voice and courseware timing script refers to a courseware timing script composed of text converted from voice.
The image and courseware time sequence script refers to a courseware time sequence script formed by text converted from video images.
Specifically, for video training courseware such as mp4, because the information is distributed in two parts of the sound and the picture of the video, the information needs to be extracted from two dimensions of the sound and the picture, and a time sequence script of the voice and the courseware and a time sequence script of the image and the courseware are respectively generated, and the two types of scripts are combined together, so that the main information in the courseware can be reserved.
In one embodiment, referring to fig. 4, the step S122 may include steps S1221 to S1223.
S1221, extracting a video courseware audio track from the video courseware in the rich media training courseware, and identifying the voice speaking content by using voice identification and natural language processing technology so as to generate a voice and courseware time sequence script containing time sequence information.
In this embodiment, the rich media training courseware for course explanation is generally configured with voice-based, so that the voice basically includes all courseware information; for film types, the human voice dialect contains part of courseware information. Therefore, the audio track of the courseware can be extracted and then processed by using and using voice recognition and natural language processing technology to recognize the voice speaking content of the courseware, so as to generate the voice-courseware time sequence script of the courseware. There are many well-established methods for extracting audio tracks, such as an extract_audio function using an open source tool FFmpeg, which will not be described in detail herein.
S1222, extracting video courseware in the rich media training courseware frame by frame to be pictures.
In this embodiment, the information of the video rich media training courseware is mainly hidden in the picture, but the existing artificial intelligence technology cannot accurately identify and understand the picture content, and considering that the rich media courseware is generally provided with subtitles for training purposes, including bottom caption and cut captions when key paragraphs are switched, the subtitles are used for explaining the image content of the courseware, so that the script obtained by identifying the subtitles can reflect most of the image information and can be used for constructing a knowledge graph.
Specifically, firstly, video courseware is split into pictures frame by frame, each frame corresponds to one picture, then each frame of picture is identified by using an OCR technology, and text information in each frame of picture is extracted. There are various mature schemes for frame splitting and OCR, one implementation is to split a video into frame-by-frame pictures by FFmpeg, and recognize caption characters in the pictures by tessellact OCR, which is not described here again.
S1223, performing OCR (optical character recognition) on the designated area of the picture to generate an image containing time sequence information and a courseware time sequence script.
In one embodiment, referring to fig. 5, the step S1223 may include steps S12231 to S12232.
S12231, performing OCR recognition on the specified area of the picture to extract subtitle information.
In this embodiment, when characters in a picture are recognized by OCR to extract subtitles, a large amount of other irrelevant characters, such as a billboard, characters in a road sign, etc., appear in a video, are not much related to the content of courseware, and if recognition is included, scripts are polluted. One possible approach is to define the identification area. Because a caption typically has a fixed display area, such as a narrow rectangular area at the bottom of the screen, in a rich media training courseware, only the text in this area can be identified as a caption-in script. Various mature schemes exist in specific embodiments, one implementation manner is to use an image_to_string function in a Tesseract OCR SDK packet, and the return value after the function identifies the picture includes the region information of the text in the picture, so that only the coordinates of the upper left corner and the lower right corner of a rectangular region of the subtitle display are required to be given, namely, the subtitle text in the region is extracted from the return value.
S12232, removing repeated subtitles of the subtitle information to generate an image and courseware time sequence script containing time sequence information.
In this embodiment, since the same subtitle will appear in consecutive multi-frame pictures, the text sequence extracted from each frame of picture needs to be compared with the text sequence extracted from the previous frame and de-duplicated, so as to prevent the script volume from being too large caused by repeated text.
S123, combining the courseware time sequence script, the voice and courseware time sequence script and the image and courseware time sequence script to obtain a text script.
In this embodiment, the text script includes three types of scripts, a courseware timing script, a voice and courseware timing script, and an image and courseware timing script.
S130, constructing a knowledge graph according to the text script.
Specifically, corresponding text scripts of all courseware in the rich media training library are obtained, and the text of the script can basically reflect the specific content of the rich media courseware. Meanwhile, the three types of scripts also contain specific courseware time sequence index information. All scripts are analyzed in the following, and a knowledge graph of the rich media courseware library is constructed.
Extracting named entities and relations in the three types of scripts by using an NLP technology; disambiguating the named entity; constructing an entity relation triplet SPO with context information; and constructing a knowledge graph.
In one embodiment, referring to fig. 6, the step S130 may include steps S131 to S134.
S131, extracting a named entity from the text script;
s132, extracting the relation of the named entity from the text script;
s133, constructing an entity relation triplet with a context by adopting the relation of the named entity;
s134, constructing a knowledge graph by utilizing the entity relation triplet.
Specifically, by performing the first stage of processing on all courseware in the rich media training courseware library, a courseware time sequence script library is obtained, each audio courseware in the rich media training courseware library corresponds to one script in the courseware time sequence script library, and each video courseware corresponds to two (voice and courseware time sequence script, image and courseware time sequence script). The scripts are natural language texts and reflect the knowledge contained in the corresponding rich media courseware, so that a knowledge graph can be directly constructed on the basis.
The existing mature scheme for constructing the knowledge graph by utilizing the text is adopted. One implementation is briefly described as follows: extracting alternative named entities from the text script by using a named entity recognition function provided by a natural language processing tool package HanLP, and performing entity disambiguation by using a BERT neural network model to obtain named entities; extracting entity relations by using a deep learning semantic analysis function of HanLP; and constructing a triplet by using the named entity and entity relation obtained in the steps, and simultaneously storing the time sequence information into the corresponding triplet, namely, the courseware number and the appearance time containing the relation. And importing the generated entity relation group into a map database Neo4J, storing the entity as a 'node' of a map in the Neo4J, storing the relation as a 'relation' in the map, and storing the time sequence information in the entity relation triplet as an attribute of the 'relation' in the map. If a relationship occurs multiple times in the rich media courseware library, all timing information is saved in a collective manner into the "relationship" attribute. Through the steps, the construction of the corresponding knowledge graph of the rich media courseware library can be completed.
According to the knowledge graph construction method for the rich media training courseware, the text script capable of reflecting the content of the rich media training courseware is extracted through the methods of voice recognition, image recognition, natural language processing and the like, the knowledge graph with time sequence information is constructed based on the text script, the problem that the knowledge graph cannot be constructed due to the fact that the rich media courseware cannot be directly extracted in relation to the entity is solved, the support of semantic search of a rich media training library is achieved, the search result can be directly positioned to the specific position of the rich media courseware, the time sequence information is stored in the knowledge graph, the search result can be directly positioned to the specific position of the rich media courseware, the knowledge graph is built for the rich media courseware library, and the scene requirements such as search and intelligent recommendation are met.
Fig. 7 is a schematic block diagram of a rich media training courseware knowledge graph construction apparatus 300 according to an embodiment of the invention. As shown in fig. 7, the present invention further provides a rich media training courseware knowledge graph construction device 300, corresponding to the above rich media training courseware knowledge graph construction method. The rich media training courseware knowledge graph construction apparatus 300, which includes a unit for performing the above-described rich media training courseware knowledge graph construction method, may be configured in a server. Specifically, referring to fig. 7, the rich media training courseware knowledge graph construction apparatus 300 includes a courseware acquisition unit 301, a preprocessing unit 302, and a construction unit 303.
A courseware obtaining unit 301, configured to obtain a rich media training courseware; the preprocessing unit 302 is configured to preprocess the rich media training courseware to obtain a text script; and the construction unit 303 is used for constructing a knowledge graph according to the text script.
In one embodiment, as shown in fig. 8, the preprocessing unit 302 includes a first preprocessing subunit 3021, a second preprocessing subunit 3022, and a combining subunit 3023.
A first preprocessing subunit 3021, configured to preprocess audio courseware in the rich media training courseware to obtain a courseware time sequence script; a second preprocessing subunit 3022, configured to preprocess video courseware in the rich media training courseware to obtain a voice and courseware time sequence script and an image and courseware time sequence script; and the combination subunit 3023 is configured to combine the courseware time sequence script, the voice and courseware time sequence script, and the image and courseware time sequence script to obtain a text script.
In an embodiment, the first preprocessing subunit 3021 is configured to identify voice speech content for an audio courseware in the rich media training courseware, and convert the voice speech content into text to generate a courseware time sequence script.
In one embodiment, as shown in fig. 9, the second preprocessing subunit 3022 includes an audio track processing module 30221, a picture extraction module 30222, and an identification module 30223.
The audio track processing module 30221 is configured to extract an audio track of a video courseware from the video courseware in the rich media training courseware, and identify speech content of a voice using speech recognition and natural language processing techniques, so as to generate a speech and courseware time sequence script including time sequence information; the picture extraction module 30222 is used for extracting video courseware in the rich media training courseware into pictures frame by frame; the recognition module 30223 is used for performing OCR (optical character recognition) on the designated area of the picture so as to generate an image containing time sequence information and a courseware time sequence script.
In an embodiment, as shown in fig. 10, the recognition module 30223 includes a subtitle extraction sub-module 302231 and a deduplication sub-module 302232.
The subtitle extraction submodule 302231 is used for performing OCR (optical character recognition) on the appointed area of the picture so as to extract subtitle information; and the de-duplication sub-module 302232 is used for removing the repeated subtitles of the subtitle information so as to generate images and courseware time sequence scripts containing time sequence information.
In one embodiment, as shown in fig. 11, the construction unit 303 includes an entity extraction subunit 3031, a relationship extraction subunit 3032, a triplet construction subunit 3033, and a map construction subunit 3034.
An entity extraction subunit 3031, configured to extract a named entity from the text script; a relationship extraction subunit 3032, configured to extract, for the text script, a relationship of the named entity; a triplet construction subunit 3033, configured to construct an entity relation triplet with a context by using the named entity and the relation of the named entity; and the map construction subunit 3034 is configured to construct a knowledge map by using the entity relationship triplet.
It should be noted that, as those skilled in the art can clearly understand, the specific implementation process of the above-mentioned rich media training courseware knowledge graph construction device 300 and each unit may refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, the description is omitted here.
The rich media training courseware knowledge graph construction apparatus 300 described above may be implemented in the form of a computer program that may run on a computer device as shown in fig. 12.
Referring to fig. 12, fig. 12 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, where the server may be a stand-alone server or may be a server cluster formed by a plurality of servers.
With reference to FIG. 12, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032 includes program instructions that, when executed, cause the processor 502 to perform a rich media training courseware knowledge graph construction method.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a rich media training courseware knowledge graph construction method.
The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of a portion of the architecture in connection with the present application and is not intended to limit the computer device 500 to which the present application is applied, and that a particular computer device 500 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to implement the steps of:
acquiring rich media training courseware; preprocessing the rich media training courseware to obtain a text script; and constructing a knowledge graph according to the text script.
In one embodiment, when the step of preprocessing the rich media training courseware to obtain a text script is implemented by the processor 502, the following steps are specifically implemented:
preprocessing the audio courseware in the rich media training courseware to obtain a courseware time sequence script; preprocessing the video courseware in the rich media training courseware to obtain a voice and courseware time sequence script and an image and courseware time sequence script; and combining the courseware time sequence script, the voice and courseware time sequence script and the image and courseware time sequence script to obtain a character script.
In an embodiment, when the preprocessing is performed on the audio courseware in the rich media training courseware to obtain a courseware time sequence script step, the processor 502 specifically performs the following steps:
and identifying the voice speaking content of the audio courseware in the rich media training courseware by using voice identification and natural language processing technology, converting the voice speaking content into text, and generating a courseware time sequence script.
In an embodiment, when the preprocessing is performed on the video courseware in the rich media training courseware to obtain the voice and courseware time sequence script and the image and courseware time sequence script, the processor 502 specifically performs the following steps:
extracting video courseware audio tracks from the video courseware in the rich media training courseware, and identifying voice speaking content by using voice identification and natural language processing technology so as to generate a voice and courseware time sequence script containing time sequence information; extracting video courseware in the rich media training courseware frame by frame to form pictures; OCR recognition is carried out on the designated area of the picture so as to generate an image containing time sequence information and a courseware time sequence script.
In one embodiment, when implementing the step of performing OCR recognition on the specified area of the picture to generate an image and courseware timing script including timing information, the processor 502 specifically implements the following steps:
OCR recognition is carried out on the appointed area of the picture so as to extract caption information; and removing repeated captions from the caption information to generate images and courseware time sequence scripts containing time sequence information.
In an embodiment, when the step of building a knowledge graph according to the text script is implemented by the processor 502, the following steps are specifically implemented:
extracting a named entity from the text script; extracting the relation of the named entity from the text script; constructing an entity relation triplet with a context by adopting the relation of the named entity; and constructing a knowledge graph by using the entity relation triplet.
It should be appreciated that in embodiments of the present application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Those skilled in the art will appreciate that all or part of the flow in a method embodying the above described embodiments may be accomplished by computer programs instructing the relevant hardware. The computer program comprises program instructions, and the computer program can be stored in a storage medium, which is a computer readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer readable storage medium. The storage medium stores a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring rich media training courseware; preprocessing the rich media training courseware to obtain a text script; and constructing a knowledge graph according to the text script.
In one embodiment, when the processor executes the computer program to perform the preprocessing on the rich media training courseware to obtain a text script, the following steps are specifically implemented:
preprocessing the audio courseware in the rich media training courseware to obtain a courseware time sequence script; preprocessing the video courseware in the rich media training courseware to obtain a voice and courseware time sequence script and an image and courseware time sequence script; and combining the courseware time sequence script, the voice and courseware time sequence script and the image and courseware time sequence script to obtain a character script.
In an embodiment, when the processor executes the computer program to perform the preprocessing on the audio courseware in the rich media training courseware to obtain a courseware time sequence script, the following steps are specifically implemented:
and identifying the voice speaking content of the audio courseware in the rich media training courseware by using voice identification and natural language processing technology, converting the voice speaking content into text, and generating a courseware time sequence script.
In an embodiment, when the processor executes the computer program to perform the preprocessing on the video courseware in the rich media training courseware to obtain a voice and courseware time sequence script and an image and courseware time sequence script, the following steps are specifically implemented:
extracting video courseware audio tracks from the video courseware in the rich media training courseware, and identifying voice speaking content by using voice identification and natural language processing technology so as to generate a voice and courseware time sequence script containing time sequence information; extracting video courseware in the rich media training courseware frame by frame to form pictures; OCR recognition is carried out on the designated area of the picture so as to generate an image containing time sequence information and a courseware time sequence script.
In one embodiment, the processor, when executing the computer program to perform the steps, performs the following steps:
in one embodiment, when executing the computer program to perform the step of performing OCR recognition on the specified area of the picture to generate an image and courseware time sequence script including time sequence information, the processor specifically performs the following steps:
OCR recognition is carried out on the appointed area of the picture so as to extract caption information; and removing repeated captions from the caption information to generate images and courseware time sequence scripts containing time sequence information.
In one embodiment, when the processor executes the computer program to implement the step of constructing a knowledge graph according to the text script, the method specifically includes the following steps:
extracting a named entity from the text script; extracting the relation of the named entity from the text script; constructing an entity relation triplet with a context by adopting the relation of the named entity; and constructing a knowledge graph by using the entity relation triplet.
The storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, or other various computer-readable storage media that can store program codes.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be combined, divided and deleted according to actual needs. In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The integrated unit may be stored in a storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a terminal, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.
Claims (10)
1. The rich media training courseware knowledge graph construction method is characterized by comprising the following steps of:
acquiring rich media training courseware;
preprocessing the rich media training courseware to obtain a text script;
and constructing a knowledge graph according to the text script.
2. The method for constructing a knowledge graph of a rich media training courseware according to claim 1, wherein the preprocessing the rich media training courseware to obtain a text script comprises:
preprocessing the audio courseware in the rich media training courseware to obtain a courseware time sequence script;
preprocessing the video courseware in the rich media training courseware to obtain a voice and courseware time sequence script and an image and courseware time sequence script;
and combining the courseware time sequence script, the voice and courseware time sequence script and the image and courseware time sequence script to obtain a character script.
3. The method for constructing a knowledge graph of a rich media training courseware according to claim 2, wherein the preprocessing the audio courseware in the rich media training courseware to obtain a courseware time sequence script comprises:
and identifying the voice speaking content of the audio courseware in the rich media training courseware by using voice identification and natural language processing technology, converting the voice speaking content into text, and generating a courseware time sequence script.
4. The method for constructing a knowledge graph of a rich media training courseware according to claim 2, wherein the preprocessing the video courseware in the rich media training courseware to obtain a time sequence script of voice and courseware and a time sequence script of image and courseware comprises:
extracting video courseware audio tracks from the video courseware in the rich media training courseware, and identifying voice speaking content by using voice identification and natural language processing technology so as to generate a voice and courseware time sequence script containing time sequence information;
extracting video courseware in the rich media training courseware frame by frame to form pictures;
OCR recognition is carried out on the designated area of the picture so as to generate an image containing time sequence information and a courseware time sequence script.
5. The method for constructing a knowledge graph of a rich media training courseware according to claim 4, wherein the performing OCR on the specified area of the picture to generate an image and courseware time sequence script including time sequence information comprises:
OCR recognition is carried out on the appointed area of the picture so as to extract caption information;
and removing repeated captions from the caption information to generate images and courseware time sequence scripts containing time sequence information.
6. The method for constructing a knowledge graph of a rich media training courseware according to claim 1, wherein the constructing a knowledge graph according to the text script comprises:
extracting a named entity from the text script;
extracting the relation of the named entity from the text script;
constructing an entity relation triplet with a context by adopting the relation of the named entity;
and constructing a knowledge graph by using the entity relation triplet.
7. The rich media training courseware knowledge graph construction device is characterized by comprising:
the courseware acquisition unit is used for acquiring rich media training courseware;
the preprocessing unit is used for preprocessing the rich media training courseware to obtain a text script;
and the construction unit is used for constructing a knowledge graph according to the text script.
8. The rich media training courseware knowledge graph construction apparatus of claim 7, wherein the preprocessing unit comprises:
the first preprocessing subunit is used for preprocessing the audio courseware in the rich media training courseware so as to obtain a courseware time sequence script;
the second preprocessing subunit is used for preprocessing video courseware in the rich media training courseware so as to obtain a voice and courseware time sequence script and an image and courseware time sequence script;
and the combining subunit is used for combining the courseware time sequence script, the voice and courseware time sequence script and the image and courseware time sequence script to obtain a text script.
9. A computer device, characterized in that it comprises a memory on which a computer program is stored and a processor which, when executing the computer program, implements the method according to any of claims 1-6.
10. A storage medium storing a computer program which, when executed by a processor, implements the method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311342061.1A CN117648445A (en) | 2023-10-17 | 2023-10-17 | Rich media training courseware knowledge graph construction method and device and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311342061.1A CN117648445A (en) | 2023-10-17 | 2023-10-17 | Rich media training courseware knowledge graph construction method and device and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117648445A true CN117648445A (en) | 2024-03-05 |
Family
ID=90046710
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311342061.1A Pending CN117648445A (en) | 2023-10-17 | 2023-10-17 | Rich media training courseware knowledge graph construction method and device and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117648445A (en) |
-
2023
- 2023-10-17 CN CN202311342061.1A patent/CN117648445A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101994592B1 (en) | AUTOMATIC VIDEO CONTENT Metadata Creation METHOD AND SYSTEM | |
Hong et al. | Dynamic captioning: video accessibility enhancement for hearing impairment | |
CN108259971A (en) | Subtitle adding method, device, server and storage medium | |
CN114465737B (en) | Data processing method and device, computer equipment and storage medium | |
US10783314B2 (en) | Emphasizing key points in a speech file and structuring an associated transcription | |
US20210073272A1 (en) | Digital image classification and annotation | |
WO2018045646A1 (en) | Artificial intelligence-based method and device for human-machine interaction | |
JP2004533756A (en) | Automatic content analysis and display of multimedia presentations | |
CN110781328A (en) | Video generation method, system, device and storage medium based on voice recognition | |
CN110750996B (en) | Method and device for generating multimedia information and readable storage medium | |
US10089898B2 (en) | Information processing device, control method therefor, and computer program | |
CN113392273A (en) | Video playing method and device, computer equipment and storage medium | |
US11074939B1 (en) | Disambiguation of audio content using visual context | |
CN112382295A (en) | Voice recognition method, device, equipment and readable storage medium | |
CN113301382B (en) | Video processing method, device, medium, and program product | |
CN113038175B (en) | Video processing method and device, electronic equipment and computer readable storage medium | |
AlMousa et al. | Nlp-enriched automatic video segmentation | |
US11386163B2 (en) | Data search method and data search system thereof for generating and comparing strings | |
CN110008314B (en) | Intention analysis method and device | |
US20230326369A1 (en) | Method and apparatus for generating sign language video, computer device, and storage medium | |
CN117648445A (en) | Rich media training courseware knowledge graph construction method and device and computer equipment | |
CN111681680B (en) | Method, system, device and readable storage medium for acquiring audio frequency by video recognition object | |
CN111681679B (en) | Video object sound effect searching and matching method, system, device and readable storage medium | |
CN115705705A (en) | Video identification method, device, server and storage medium based on machine learning | |
CN117082293B (en) | Automatic video generation method and device based on text creative |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |