CN114048368A

CN114048368A - Method, device and medium for extracting data based on unstructured information

Info

Publication number: CN114048368A
Application number: CN202110933584.8A
Authority: CN
Inventors: 崔鹏飞
Original assignee: Beijing Gengtu Technology Co ltd
Current assignee: Beijing Gengtu Technology Co ltd
Priority date: 2021-08-14
Filing date: 2021-08-14
Publication date: 2022-02-15

Abstract

The application relates to a method, a device and a medium for extracting data based on unstructured information, which relate to the field of information extraction, wherein processed information is input into a trained LSTM network model for processing, and a vector sequence corresponding to the processed information is output, the information comprises at least one of collected video information, audio information, picture information and character information, the vector sequence is input into the trained CRF network model for labeling, a labeling sequence corresponding to the vector sequence is output, and the labeling sequence is an information phrase labeled with part of speech. The application has the effect of facilitating the rapid extraction of useful information from unstructured intelligence.

Description

Method, device and medium for extracting data based on unstructured information

Technical Field

The present application relates to the field of information extraction, and in particular, to a method, an apparatus, and a medium for extracting data based on unstructured intelligence.

Background

In the fields of military affairs and public opinion, it is generally necessary to collect intelligence information and then analyze the collected intelligence information to make a decision.

At present, when information is collected, information is generally unstructured text, video, audio, pictures and the like, a traditional manual interpretation mode generally has higher requirements on information analysts, on one hand, the information has higher timeliness, on the other hand, a large amount of potential information is generally hidden in single information and can be obtained only by comparing a large amount of related information, and both the two aspects need that analysts have higher efficiency and specialty.

How to quickly extract useful information from unstructured information and assist information personnel to quickly finish information self-editing and information potential value automatic mining are the key points of the invention.

Disclosure of Invention

To facilitate rapid extraction of useful information from unstructured intelligence, the present application provides a method, apparatus, and medium for extracting data based on unstructured intelligence.

In a first aspect, the present application provides a method for extracting data based on unstructured intelligence, which adopts the following technical scheme:

a method for extracting data based on unstructured intelligence, comprising: inputting the processed information into a trained LSTM network model for processing, and outputting a vector sequence corresponding to the processed information, wherein the information comprises at least one of collected video information, audio information, picture information and text information;

and inputting the vector sequence into a trained CRF network model for labeling, and outputting a labeling sequence corresponding to the vector sequence, wherein the labeling sequence is an information phrase labeled with part of speech.

By adopting the technical scheme, after processed information is input into a trained LSTM network model, the processed information is processed in the LSTM network model, the LSTM network model outputs probability vector arrays of all labels corresponding to each word in the information aiming at the processed information, then a vector sequence is input into a CRF network model for labeling, the vector sequence outputs a labeling sequence through the CRF network model, and the labeling sequence is the part of speech corresponding to each useful information phrase, so that the useful information in unstructured information such as videos, audios and pictures can be conveniently and quickly collected.

In another possible implementation manner, the inputting the processed intelligence information into the trained LSTM network model and outputting the vector sequence includes:

generating a video database based on all the collected video intelligence;

generating an audio database based on all the collected audio intelligence;

generating a picture database based on all the collected picture intelligence;

generating a text database based on all the collected text intelligence;

processing the contents stored in the video database, the audio database, the picture database and the character database;

and generating a standard information base based on the processing result, wherein the standard information base comprises processed information.

Through the technical scheme, the collected video information, audio information, picture information and text information are classified, and the classified video information, audio information, picture information and text information are stored in the corresponding databases respectively, so that various information can be processed conveniently.

In another possible implementation manner, the processing the video database and the audio database includes:

extracting audio information in each video information;

storing the audio information into an audio database;

converting all audio information in the audio database into first text information, wherein the all audio information comprises each audio information and audio information extracted from each video information, and the first text information comprises text information converted from each audio information and text information converted from audio signals extracted from each video information;

and storing the first text information into a standard situation message library.

By adopting the technical scheme, the video information comprises audio and pictures, the audio information in the video information is extracted firstly, then the extracted audio information is stored in the audio database and is converted into the first text information together with the collected audio information, then the first text information is stored in the standard information message library, and the information collection is more convenient by extracting the audio information in the video information.

In another possible implementation manner, the processing the content stored in the video database, the audio database, the picture database, and the text database further includes:

detecting all transition moments in each video information, wherein the transition moments are moments when the information in the video pictures changes;

intercepting two pieces of picture information before and after each transition moment;

storing the two cut-off picture information into a picture database;

performing character recognition on all picture information and the picture information in the picture database and generating second text information, wherein the second text information comprises text information recognized by all the picture information and text information recognized by the picture information;

storing the second text information into a standard situation message library;

converting each text information in the text database into text information;

and storing the character information into a standard situation message library.

By adopting the technical scheme, each video information processing audio information also comprises picture information which usually contains useful information, and when one picture in the video is switched to another picture, the information in the picture changes, so that all transition moments in the video information are detected, the information of two pictures before and after the transition is intercepted, the intercepted information of the two pictures is stored in a picture database, the character recognition is carried out together with the collected picture information to generate second text information, and then the second text information is stored in a standard situation message library. The information in the video information can be conveniently extracted by intercepting the picture information before and after the transition moment. And storing the character information in the character database into a standard information document library to obtain the standard information document library containing all the information.

In another possible implementation, the method includes: inputting the vector sequence into a trained CRF network model for labeling, and outputting a labeling sequence corresponding to the vector sequence, and then:

carrying out the same labeling on each audio message, the first text information extracted from each audio message and the word group in the labeling sequence corresponding to the first text information corresponding to each audio message, wherein the first text information corresponding to each audio message is the text information converted from each audio message;

carrying out the same labeling on each piece of video information, audio information extracted from each piece of video information, first text information corresponding to the audio information and phrases in a labeling sequence corresponding to the first text information, wherein the first text information corresponding to the audio information is text information converted from the audio signal extracted from each piece of video information;

carrying out the same labeling on each piece of video information, the image information extracted from each piece of video information, second text information corresponding to the image information and phrases in a labeling sequence corresponding to the second text information, wherein the second text information corresponding to the image information is the text information identified by the image information;

carrying out the same labeling on the word groups in the labeling sequences corresponding to each picture intelligence, the second text information corresponding to each picture intelligence and the second text information corresponding to each picture intelligence, wherein the second text information corresponding to each picture intelligence is the text information identified by all the picture intelligence;

and carrying out the same labeling on each character message, the character message extracted from each character message and the phrase in the labeling sequence corresponding to the character message.

By adopting the technical scheme, the phrase in the labeling sequence can be used for positioning which document information, which picture information, which audio information or which video information the phrase initially appears in, so that personnel can quickly find the initial information and analyze the initial information.

In another possible implementation, the method includes: and marking the picture information intercepted from the video information, wherein the mark is the time point of the picture information appearing in the video information.

By adopting the technical scheme, if the phrase in the labeling sequence corresponds to the picture information intercepted from the video information, the personnel can position the time point of the phrase appearing in the corresponding video information according to the phrase, thereby facilitating the analysis of the video content of the phrase before and after the time point in the video information by the personnel.

In another possible implementation manner, the method further includes:

and generating map information based on the labeling sequence corresponding to the vector sequence, wherein the map information is the relation information between the phrases in the labeling sequence.

By adopting the technical scheme, the electronic equipment generates the map information based on the labeling sequence, so that a more visual information relation can be obtained, and the personnel can conveniently check the obtained information.

In a second aspect, the present application provides an apparatus for extracting data based on unstructured information documents, which adopts the following technical solution:

an apparatus for extracting data based on unstructured intelligence, comprising:

the first processing module is used for inputting the processed information into a trained LSTM network model for processing and outputting a vector sequence corresponding to the processed information, wherein the information comprises at least one of collected video information, audio information, picture information and text information;

and the first labeling module is used for inputting the vector sequence into a trained CRF network model for labeling and outputting a labeling sequence corresponding to the vector sequence, wherein the labeling sequence is an information phrase labeled with part of speech.

By adopting the technical scheme, after processed information is input into a trained LSTM network model, the first processing module processes the processed information in the LSTM network model, the LSTM network model outputs probability vector arrays of all labels corresponding to each word in the information aiming at the processed information, the first labeling module inputs a vector sequence into a CRF network model for labeling, the vector sequence outputs a labeling sequence through the CRF network model, and the labeling sequence is the part of speech corresponding to each useful information phrase, so that the useful information in unstructured information such as videos, audios and pictures can be conveniently and quickly collected.

In another possible implementation manner, the apparatus further includes:

the first generation module is used for generating a video database based on all the collected video intelligence;

the second generation module is used for generating an audio database based on all the collected audio intelligence;

the third generation module is used for generating a picture database based on all the collected picture intelligence;

a fourth generation module for generating a character database based on all the collected character intelligence;

the second processing module is used for processing the contents stored in the video database, the audio database, the picture database and the character database;

and the fifth generation module is used for generating a standard information base based on the processing result, and the standard information base comprises processed information.

In another possible implementation manner, the second processing module, when processing the video database, the audio database, the picture database, and the character database, is specifically configured to:

extracting audio information in each video information;

storing the audio information into an audio database;

In another possible implementation manner, when the second processing module processes the video database, the audio database, the picture database, and the text database, the second processing module is further specifically configured to:

storing the two cut-off picture information into a picture database;

storing the second text information into a standard situation message library;

converting each text information in the text database into text information;

In another possible implementation manner, the apparatus further includes:

the second labeling module is used for carrying out the same labeling on each audio message, the first text information extracted from each audio message and the word group in the labeling sequence corresponding to the first text information corresponding to each audio message, wherein the first text information corresponding to each audio message is the text information converted from each audio message;

the third labeling module is used for carrying out the same labeling on each piece of video information and audio information extracted from each piece of video information, first text information corresponding to the audio information and phrases in a labeling sequence corresponding to the first text information, wherein the first text information corresponding to the audio information is text information converted from an audio signal extracted from each piece of video information;

the fourth labeling module is used for carrying out the same labeling on each piece of video information, the picture information extracted from each piece of video information, the second text information corresponding to the picture information and the word group in the labeling sequence corresponding to the second text information, wherein the second text information corresponding to the picture information is the text information identified by the picture information;

the fifth labeling module is used for labeling the word groups in the labeling sequence corresponding to each picture intelligence, the second text information corresponding to each picture intelligence and the second text information corresponding to each picture intelligence in the same way, wherein the second text information corresponding to each picture intelligence is text information identified by all the picture intelligence;

and the sixth labeling module is used for carrying out the same labeling on each character information, the character information extracted from each character information and the word group in the labeling sequence corresponding to the character information.

In another possible implementation manner, the apparatus further includes:

and the seventh marking module is used for marking the picture information intercepted from the video information, wherein the marking is the time point of the picture information appearing in the video information.

In another possible implementation manner, the apparatus further includes:

and the sixth generating module is used for generating map information based on the labeling sequence corresponding to the vector sequence, wherein the map information is the relation information between the phrases in the labeling sequence.

In a third aspect, the present application provides an electronic device, which adopts the following technical solutions:

an electronic device, comprising:

one or more processors;

a memory;

one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: a method of extracting data based on unstructured intelligence is performed as shown in any one of the possible implementations of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:

a computer-readable storage medium, comprising: there is stored a computer program that can be loaded by a processor and that can perform a method for extracting data based on unstructured intelligence that is shown in any of the possible implementations of the first aspect.

In summary, the present application includes at least one of the following beneficial technical effects:

1. the processed information is input into a trained LSTM network model, then the processed information is processed in the LSTM network model, the LSTM network model outputs probability vector arrays of each label corresponding to each word in the information aiming at the processed information, then a vector sequence is input into a CRF network model for labeling, the vector sequence outputs a labeling sequence through the CRF network model, and the labeling sequence is the part of speech corresponding to each useful information phrase, so that the useful information in unstructured information such as videos, audios and pictures can be conveniently and quickly collected;

2. each video information processing audio information also comprises picture information which usually contains useful information, when the video is switched from one picture to another picture, the information in the picture changes, so that all transition moments in the video information are detected, two pieces of picture information before and after the transition are intercepted, the intercepted two pieces of picture information are stored in a picture database, character recognition is carried out together with the collected picture information to generate second text information, and then the second text information is stored in a standard situation message library. The information in the video information can be conveniently extracted by intercepting the picture information before and after the transition moment. Storing the character information in the character database into a standard information message library to obtain a standard information document library containing all the information;

3. the video information comprises audio and pictures, the audio information in the video information is extracted firstly, then the extracted audio information is stored in an audio database and is converted into first text information together with the collected audio information, then the first text information is stored in a standard situation message library, and the information collection is more convenient by extracting the audio information in the video information.

Drawings

Fig. 1 is a flowchart illustrating a method for extracting data based on unstructured intelligence according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of generation of map information in an embodiment of the present application.

Fig. 3 is a schematic structural diagram of an apparatus for extracting data based on unstructured intelligence according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application is described in further detail below with reference to the attached drawings.

A person skilled in the art, after reading the present description, may make modifications to the embodiments as required, without any inventive contribution thereto, but shall be protected by the patent laws within the scope of the claims of the present application.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship, unless otherwise specified.

The embodiments of the present application will be described in further detail with reference to the drawings attached hereto.

The embodiment of the application provides a method for extracting data based on unstructured intelligence, which is executed by an electronic device, wherein the electronic device can be a server or a terminal device, the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service. The terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited thereto, the terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited thereto, as shown in fig. 1, the method includes steps S101 and S102, wherein,

s101, inputting the processed information into the trained LSTM network model for processing, and outputting a vector sequence corresponding to the processed information, wherein the information comprises at least one of collected video information, audio information, picture information and text information.

For the embodiment of the application, a manually labeled corpus is input into the LSTM network model as a training sample, the corpus includes various phrases with labels, for example, "XXX" and "YYY", "ZZZ" and the like are "place nouns"; "AA", "BB", and "CC" are "position nouns"; "Zhangsomewhat" and "Lijiasomewhat" are the "personal name"; "deploy", "camp", and "buy" etc. are "action words"; "a fighter plane" and "a naval vessel" are "weapon nouns". And inputting the processed information into the trained LSTM network model to obtain a scoring matrix of each label corresponding to each word. For example, inputting "XXX deploys a fighter on YYY" into the trained LSTM network model, a score is obtained for each word corresponding to each label. "XXX" belonging to the tag "place noun" has a score of 0.8, "XXX" belonging to the tag "position noun" has a score of 0.2, "XXX" belonging to the tag "person noun" has a score of 0.3, and so on to form a score matrix for each word corresponding to each tag.

And S102, inputting the vector sequence into the trained CRF network model for labeling, and outputting a labeling sequence corresponding to the vector sequence, wherein the labeling sequence is an information phrase labeled with the part of speech.

Specifically, the probability is input into a prediction formula, the maximum value of the prediction formula is solved, and a labeling sequence with the highest score is calculated through a CRF network model, so that the label of each information word group is obtained.

The LSTM network model can well combine the characteristics of the context, and effectively considers the label information of phrases before and after the sentence through the CRF network model. Useful information is extracted from unstructured information through the LSTM network model and the CRF network model more efficiently and quickly, and the method is applied to the military field, so that key information can be conveniently extracted from massive and complex unstructured information, and military personnel can master situation development and make decisions conveniently.

In a possible implementation manner of the embodiment of the present application, the step S101 includes a step S103 (not shown), a step S104 (not shown), a step S105 (not shown), a step S106 (not shown), a step S107 (not shown), and a step S108 (not shown), and the execution sequence of the step S103, the step S104, the step S105, and the step S106 is not limited, wherein,

s103, generating a video database based on all the collected video intelligence.

For the embodiment of the application, when the electronic device classifies the collected video information, the video information can be classified by detecting the format of the video information file, for example, the electronic device may extract the name of each video information file first, detect the suffix information in the name of the video information file, assign the video information files with the suffix information in the same video format (such as AVI, WMV, MP4, etc.) to the same class, create a video database for all the video information files, and store all the video information files in the video database.

And S104, generating an audio database based on all the collected audio intelligence.

For the embodiment of the application, when the electronic device classifies the collected audio information, the electronic device may classify the collected audio information by detecting the format of the audio information file, for example, the electronic device may extract the name of each audio information file first, detect suffix information in the name of the audio information file, assign the audio information files with the suffix information in the same audio format (such as MP3, WAV, FLAC, and the like) to the same class, create and generate an audio database for all the audio information files, and store all the audio information files in the audio database.

S105, generating a picture database based on all the collected picture intelligence.

For the embodiment of the application, when the electronic device classifies the collected picture information, the format of the picture information files can be detected for classification, for example, the electronic device can extract the name of each picture information file first, detect suffix information in the name of the picture information file, belong the picture information files with the suffix information in the same picture format (such as JPEG, RAW, BMP, GIF, PNG, and the like) to the same class, establish a generated picture database for all the picture information files, and store all the picture information files in the picture database.

S106, generating a character database based on all the collected character intelligence.

For example, the electronic device may extract the name of each text information file, detect suffix information in the text information file name, assign the suffix information to the text information files belonging to the same text format (such as TXT and DOC) to the same class, create a text database for all the text information files, and store all the text information files in the text database.

And S107, processing the contents stored in the video database, the audio database, the picture database and the character database.

For the embodiment of the application, after all the collected intelligence files are classified, the intelligence in the video database, the audio database, the picture database and the character database is more conveniently processed. For example, the information conversion in the video database, audio database, picture database, and character database is all converted into a text form.

And S108, generating a standard information base based on the processing result, wherein the standard information base comprises processed information.

For the embodiment of the application, the processed information of the video database, the audio database, the picture database and the character database is stored in the standard information database, and then the standard information database is input into the LSTM network model for processing more conveniently.

In one possible implementation manner of the embodiment of the present application, the step S107 includes a step S1071 (not shown), a step S1072 (not shown), a step S1073 (not shown), and a step S1074 (not shown) when processing the video database and the audio database,

s1071, extracts audio information in each piece of video information.

For the embodiment of the application, the video information file generally comprises pictures and audio, so when the video information in the video database is processed, the audio information in the video information file in the video information is extracted first. For example, the audio information of a video intelligence file includes a company profile, and the picture does not have a caption related to the company profile, so the audio information in the video intelligence file is extracted to better acquire the intelligence.

S1072, storing the audio information into an audio database.

For the embodiment of the application, the audio information in the video information file is extracted to generate the audio format file, and then the audio format file is stored in the audio database. After the audio information in all the video information files is extracted and stored in the audio database, the electronic equipment is convenient to process and analyze all the audio information and the audio information extracted from the video information.

S1073, converting all audio information in the audio database into first text information, the all audio information including each audio intelligence and audio information extracted from each video intelligence, the first text information including text information into which each audio intelligence is converted and text information into which an audio signal extracted from each video intelligence is converted.

For the embodiment of the application, a corpus can be used for training a neural network capable of completing audio-to-character conversion in advance, after the neural network for audio-to-character conversion is trained, all audio information files in an audio database and audio information extracted from video information files are input into the trained neural network for audio-to-character conversion, and first text information is output and comprises information contained in all audio files in the audio database. For example, the audio database includes an audio file, the intelligence contained in the audio file is a recording of a person interviewing a company, after the audio file is converted into first text information, the content in the audio file is embodied in a text form, and the first text information corresponding to the audio file may be "XXX AA announces marching ZZZ".

S1074, storing the first text information into a standard situation message library.

For the embodiment of the present application, after all audio files in the audio database are converted into the first text information, the first text information is stored in the standard situation message library, and by taking the example in step S1073 as an example, the first text information "XXX AA announces to ZZZ camp" is stored in the standard situation message library. Thereby forming part of a standard case message library.

In one possible implementation manner of the embodiment of the present application, step S107 further includes step S1075 (not shown), step S1076 (not shown), step S1077 (not shown), step S1078 (not shown), step S1079 (not shown), step S1080 (not shown), and step S1081 (not shown) when processing the contents stored in the video database, the picture database, and the text database, wherein,

s1075, all transition moments in each video information are detected, and the transition moments are moments when the information in the video pictures changes.

In this case, the video information usually contains information in the same way, so it is necessary to extract information in the video information, the information in the video image changes with the change of the image, and the information contained before and after the image change is usually different.

For the embodiment of the present application, the electronic device detects transition moments of all pictures in the video information, the detection of the transition moments can be performed by image processing, the electronic device divides the video picture, for example, the video picture is divided into grid areas of 3X3, the color in each grid is monitored, and when the color in one or more grids changes greatly at a certain moment, it can be determined that the transition of the picture occurs at the moment. When the video picture is divided, the more the video picture is divided into areas, the more accurate the transition moment is detected. For a video with longer time, the electronic equipment can also play at double speed, so that the detection efficiency at the transition moment is improved.

S1076, intercepting two pieces of picture information before and after each transition moment.

For the embodiment of the present application, the two pictures before and after the transition time generally represent different information, and therefore, after the transition time is detected, the two pictures before and after the transition time are respectively intercepted. For example, a place name "XXX" appears in a picture before a certain transition point, and "AA" appears in a picture after a certain transition point. Therefore, the information of the two pictures before and after the transition is intercepted, and the information in the video information can be more conveniently extracted.

S1077, storing the two cut pictures into a picture database.

For the embodiment of the application, the electronic device intercepts two pieces of picture information before and after the transition moment, generates a picture format file based on the two pieces of picture information, and then stores the picture format file corresponding to the two pieces of picture information into the picture database. The electronic equipment can conveniently analyze and process the collected picture information and the intercepted video information together.

S1078, character recognition is performed on all the picture information and the screen information in the picture database to generate second text information, and the second text information includes text information recognized for all the picture information and text information recognized for the screen information.

For the embodiment of the present application, for example, an ocr (optical character recognition) text recognition technology may be utilized to perform text recognition on text information in a picture. Taking step S1077 as an example, the characters in the two pieces of screen information in step S1077 are recognized by using the OCR character recognition technique, and the second text information corresponding to the two pieces of screen information is "XXX" and "AA". And performing character recognition on the picture information collected in the picture database by using an OCR character recognition technology, so as to generate second text information corresponding to the picture information.

S1079, storing the second text information into a standard situation message library.

For the embodiment of the application, after character recognition is carried out on the picture information collected in the picture database and the picture information in the captured video information, the generated second text information is stored in the standard situation message library, and the second text information forms a part of the standard situation message library.

S1080, each text intelligence in the text database is converted into text information.

For the embodiment of the application, for example, the text information in the text database is in a format of a document, and the electronic equipment opens all text information files in the text database and converts the content in the document into text information.

S1081, storing the character information into a standard situation message library.

For the embodiment of the application, the converted text information is stored in the standard situation message library and further becomes a part of the standard situation message library. Therefore, the first text information, the second text information and all the text information in the text information file form the whole standard situation message library.

In a possible implementation manner of the embodiment of the present application, after the step S102, the method further includes a step S109 (not shown in the figure), a step S110 (not shown in the figure), a step S111 (not shown in the figure), a step S112 (not shown in the figure), and a step S113 (not shown in the figure); step S109, step S110, step S111, step S112, and step S113 may be executed simultaneously or in other orders, and are not limited herein, wherein,

and S109, carrying out the same labeling on each audio message, the first text message extracted from each audio message and the word group in the labeling sequence corresponding to the first text message corresponding to each audio message, wherein the first text message corresponding to each audio message is the text message converted from each audio message.

For the embodiment of the present application, for example, an audio information file is labeled "audio a", an "XXX to YYY camp" is extracted from an audio information file, and an "XXX to YYY camp" is also labeled "audio a". Processing the XXX-YY parking troops from the XXX direction by an LSTM network model and a CRF network model to obtain three phrases of XXX, YYY and parking troops, and marking the XXX, the YYY and the parking troops as audio A.

The electronic equipment carries out the same mark through carrying out the above-mentioned information, and the phrase that personnel output through CRF network model can fix a position and show that this phrase appears in which audio information originally, conveniently traces to the source to the phrase to the personnel of being convenient for study the audio information that this phrase appears originally.

And S110, carrying out the same labeling on each piece of video information, audio information extracted from each piece of video information, first text information corresponding to the audio information and phrases in a labeling sequence corresponding to the first text information, wherein the first text information corresponding to the audio information is text information converted from the audio signal extracted from each piece of video information.

In the embodiment of the present application, for example, the audio information extracted for a certain video is labeled as "video a", and it is assumed that the audio information extracted for the video includes "a certain fighter that bought YYY by XXX", and "a certain fighter that bought YYY by XXX" is also labeled as "video a". The "XXX purchases a certain fighter of YYY" is processed by the LSTM network model and the CRF network model to obtain four phrases of "XXX", "YYY", "purchase", and "certain fighter", and the phrases "XXX", "yyyy", "purchase", and "certain fighter" are all labeled as "video a".

The electronic equipment carries out the same mark through the above-mentioned information, and the phrase that personnel output through the CRF network model can fix a position that this phrase appears in the audio frequency of which video information originally to conveniently trace to the phrase, and then the personnel of being convenient for study the video information that this phrase appears originally.

And S111, labeling the image information extracted from each video information and each video information, the second text information corresponding to the image information and the word group in the labeling sequence corresponding to the second text information in the same way, wherein the second text information corresponding to the image information is the text information identified by the image information.

For the embodiment of the present application, taking step S110 as an example, all the pieces of screen information extracted from the video are labeled as "video a", all the pieces of second text information extracted from the pieces of screen information are labeled as "video a", and it is assumed that the second text information "XXX deploys a certain fighter on the WWW" is extracted from the pieces of screen information, and "XXX deploys a certain fighter on the WWW" is labeled as "video a". "XXX deploys Z fighters on WWW" and gets four phrases "XXX", "a fighter", "deploy", and "WWW" after being processed by LSTM network model and CRF network model, and all of "XXX", "a fighter", "deploy", and "WWW" are labeled as "video a".

The electronic equipment carries out the same marking on the information, and the phrase output by the personnel through the CRF network model can position the image information of which video information the phrase appears at first, so that the phrase can be traced conveniently, and the personnel can research the video information the phrase appears at first.

And S112, carrying out the same labeling on each picture information, the second text information corresponding to each picture information and the word group in the labeling sequence corresponding to the second text information corresponding to each picture information, wherein the second text information corresponding to each picture information is the text information identified by all the picture information.

For the embodiment of the present application, each picture in the picture database is labeled, for example, a certain picture is labeled as "picture a", the second text information extracted from the picture is "XXX agrees with YYY", and "XXX agrees with YYY" is also labeled as "picture a". The "XXX and YYY reach a certain agreement", which is processed by the LSTM network model and the CRF network model to obtain four phrases "XXX", "YYY", "reach" and "a certain agreement", wherein "XXX", "yyyy", "reach" and "a certain agreement" are all labeled as "picture a".

The electronic equipment carries out the same marking through the above-mentioned information, and the phrase that personnel output through the CRF network model can fix a position and show that this phrase appears in which picture information originally to the convenience traces to the phrase, and then the personnel of being convenient for study the picture information that this phrase appears originally.

And S113, carrying out same labeling on each character information, the character information extracted from each character information and the word group in the labeling sequence corresponding to the character information.

For the embodiment of the present application, each document in the text database is labeled, for example, a document is labeled as "document a", the text information in the document is also labeled as "document a", it is assumed that the text information in the document is "N-gram for military fee of XXX in a certain year", and "N-gram for military fee of XXX in a certain year" is also labeled as "document a". The military charge of XXX in a certain year is N-element, and is processed by an LSTM network model and a CRF network model to obtain four phrases of XXX, certain year, military charge and N-element, and the four phrases of XXX, certain year, military charge and N-element are labeled as document A.

The electronic equipment carries out the same marking to the above-mentioned information, and the phrase that personnel output through the CRF network model can fix a position and show that this phrase appears in which characters information originally to conveniently trace to the phrase, and then the personnel of being convenient for study the characters information that this phrase appears originally.

One possible implementation manner of the embodiment of the present application further includes step S113 (not shown in the figure), and step S113 may be executed after step S1076 or simultaneously with step S1076, wherein,

s113, marking the picture information intercepted from the video information as the time point of the picture information appearing in the video information.

For the embodiment of the present application, for example, after a transition time of a certain video is detected, two pieces of picture information before and after the transition are captured, for example, the picture information before the transition appears at one tenth of a second of the video, the picture information after the transition appears at one eleventh of a second of the video, one tenth of a second is marked to the picture information before the transition time, and one eleventh of a second is marked to the picture information after the transition time.

If the phrase output by the CRF network model appears in certain picture information of certain video information, the video information is positioned to the time length of the appearance of the picture information, so that personnel can conveniently research the video information corresponding to the phrase.

In a possible implementation manner of the embodiment of the present application, step S102 further includes step S114 (not shown in the figure), wherein,

and S114, generating map information based on the labeling sequence corresponding to the vector sequence, wherein the map information is the relation information between word groups in the labeling sequence.

For the embodiment of the application, referring to fig. 2, after the CRF network model outputs the tagging sequence, the information in the tagging sequence cannot be visually displayed, and the electronic device generates a relationship map corresponding to each intelligence information phrase based on the tagging sequence, such as information such as "XXX", "yyyy", "sign", "certain agreement", "sell", "certain fighter", "deploy", and "MMM". The electronic device generates the map information shown in fig. 2 based on the above information.

The above embodiments describe a method for extracting data based on unstructured intelligence from the perspective of method flow, and the following embodiments describe an apparatus 20 for extracting data based on unstructured intelligence from the perspective of virtual modules or virtual units, which are described in detail in the following embodiments.

The embodiment of the present application provides an apparatus 20 for extracting data based on unstructured intelligence, as shown in fig. 2, the apparatus 20 for extracting data based on unstructured intelligence may specifically include:

the first processing module 201 is used for inputting the processed information into the trained LSTM network model for processing and outputting a vector sequence corresponding to the processed information, wherein the information comprises at least one of collected video information, audio information, picture information and text information;

the first labeling module 202 is configured to input the vector sequence into a trained CRF network model for labeling, and output a labeling sequence corresponding to the vector sequence, where the labeling sequence is an intelligence phrase labeled with a part of speech.

For the embodiment of the application, after the processed information is input into the trained LSTM network model, the first processing module 201 processes the processed information in the LSTM network model, the LSTM network model outputs a probability vector array corresponding to each label for each word in the processed information, the first labeling module 202 inputs a vector sequence into the CRF network model for labeling, the vector sequence outputs a labeling sequence through the CRF network model, and the labeling sequence is a part of speech corresponding to each useful information phrase, so as to be convenient for rapidly collecting useful information in unstructured information such as video, audio, and pictures.

In a possible implementation manner of the embodiment of the present application, the apparatus 20 further includes:

and the fifth generation module is used for generating a standard information base based on the processing result, wherein the standard information base comprises processed information.

In a possible implementation manner of the embodiment of the present application, the second processing module is specifically configured to, when processing the video database, the audio database, the picture database, and the character database:

extracting audio information in each video information;

storing the audio information into an audio database;

converting all audio information in an audio database into first text information, wherein all audio information comprises each audio information and audio information extracted from each video information, and the first text information comprises text information converted from each audio information and text information converted from audio signals extracted from each video information;

In a possible implementation manner of the embodiment of the application, the second processing module is further specifically configured to, when processing the video database, the audio database, the picture database, and the character database:

storing the two cut-off picture information into a picture database;

performing character recognition on all picture information and picture information in the picture database and generating second text information, wherein the second text information comprises text information recognized by all the picture information and text information recognized by the picture information;

storing the second text information into a standard situation message library;

converting each text information in the text database into text information;

the third labeling module is used for carrying out the same labeling on each piece of video information, audio information extracted from each piece of video information, first text information corresponding to the audio information and phrases in a labeling sequence corresponding to the first text information, wherein the first text information corresponding to the audio information is text information converted from the audio signal extracted from each piece of video information;

the fifth labeling module is used for labeling the word groups in the labeling sequence corresponding to each picture information, the second text information corresponding to each picture information and the second text information corresponding to each picture information in the same way, wherein the second text information corresponding to each picture information is the text information identified by all the picture information;

and the sixth labeling module is used for carrying out the same labeling on each character information, the character information extracted from each character information and the phrases in the labeling sequence corresponding to the character information.

and the seventh marking module is used for marking the picture information intercepted from the video information, and marking the picture information as the time point of the picture information appearing in the video information.

and the sixth generating module is used for generating map information based on the labeling sequence corresponding to the vector sequence, wherein the map information is the relation information between word groups in the labeling sequence.

For the embodiment of the present application, the first processing module 201 and the second processing module may be the same processing module, or may be different processing modules. The first labeling module 202, the second labeling module, the third labeling module, the fourth labeling module, the fifth labeling module, the sixth labeling module, and the seventh labeling module may all be the same labeling module, or may be different labeling modules, or may be partially the same labeling module. The first generating module, the second generating module, the third generating module, the fourth generating module, the fifth generating module and the sixth generating module may all be the same generating module, may be different generating modules, or may be partially the same generating module.

The embodiment of the present application provides an apparatus for extracting data based on unstructured information, which is suitable for the above method embodiment and is not described herein again.

In an embodiment of the present application, an electronic device is provided, and as shown in fig. 4, an electronic device 30 shown in fig. 4 includes: a processor 301 and a memory 303. Wherein processor 301 is coupled to memory 303, such as via bus 302. Optionally, the electronic device 30 may also include a transceiver 304. It should be noted that the transceiver 304 is not limited to one in practical applications, and the structure of the electronic device 30 is not limited to the embodiment of the present application.

The Processor 301 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 301 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 302 may include a path that transfers information between the above components. The bus 302 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 302 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.

The Memory 303 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 303 is used for storing application program codes for executing the scheme of the application, and the processor 301 controls the execution. The processor 301 is configured to execute application program code stored in the memory 303 to implement the aspects illustrated in the foregoing method embodiments.

Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. But also a server, etc. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the processed information is input into the trained LSTM network model and then processed in the LSTM network model, the LSTM network model outputs probability vector arrays of all labels corresponding to all words in the information aiming at the processed information, then a vector sequence is input into the CRF network model for labeling, the vector sequence outputs a labeling sequence through the CRF network model, and the labeling sequence is part of speech corresponding to each useful information phrase, so that the information in videos, audios and pictures can be conveniently and quickly collected.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A method for extracting data based on unstructured intelligence, which is characterized by comprising the following steps:

inputting the processed information into a trained LSTM network model for processing, and outputting a vector sequence corresponding to the processed information, wherein the information comprises at least one of collected video information, audio information, picture information and text information;

2. The method of claim 1, wherein the inputting the processed intelligence information into a trained LSTM network model and outputting the vector sequence comprises:

generating a video database based on all the collected video intelligence;

generating an audio database based on all the collected audio intelligence;

generating a picture database based on all the collected picture intelligence;

generating a text database based on all the collected text intelligence;

3. The method of claim 2, wherein processing the video database and the audio database comprises:

extracting audio information in each video information;

storing the audio information into an audio database;

4. The method of claim 2, wherein the processing of the content stored in the video database, the audio database, the picture database, and the text database, further comprises:

storing the two cut-off picture information into a picture database;

storing the second text information into a standard situation message library;

converting each text information in the text database into text information;

5. The method of claim 4, wherein the inputting the vector sequence into a trained CRF network model for labeling and outputting a labeling sequence corresponding to the vector sequence, then comprises:

6. The method of claim 4, comprising:

and marking the picture information intercepted in each piece of video information, wherein the mark is the time point of the picture information appearing in the video information.

7. The method of claim 1, further comprising:

8. An apparatus for extracting data based on unstructured intelligence, comprising:

the first processing module is used for inputting the processed information into a trained LSTM network model for processing and outputting a vector sequence corresponding to the processed information;

and the first labeling module is used for inputting the vector sequence into a trained CRF network model for labeling and outputting a labeling sequence corresponding to the vector sequence.

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing a method of extracting data based on unstructured intelligence according to any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method for extracting data based on unstructured intelligence according to any of claims 1 to 7.