CN110347894A - Knowledge mapping processing method, device, computer equipment and storage medium based on crawler - Google Patents
Knowledge mapping processing method, device, computer equipment and storage medium based on crawler Download PDFInfo
- Publication number
- CN110347894A CN110347894A CN201910471975.5A CN201910471975A CN110347894A CN 110347894 A CN110347894 A CN 110347894A CN 201910471975 A CN201910471975 A CN 201910471975A CN 110347894 A CN110347894 A CN 110347894A
- Authority
- CN
- China
- Prior art keywords
- information
- data
- relationship
- entity
- knowledge mapping
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013507 mapping Methods 0.000 title claims abstract description 71
- 238000003672 processing method Methods 0.000 title claims abstract description 23
- 238000003860 storage Methods 0.000 title claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 33
- 238000000605 extraction Methods 0.000 claims abstract description 23
- 238000004458 analytical method Methods 0.000 claims abstract description 18
- 230000009193 crawling Effects 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims description 39
- 238000003058 natural language processing Methods 0.000 claims description 34
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 15
- 230000010354 integration Effects 0.000 claims description 11
- 238000010276 construction Methods 0.000 claims description 8
- 238000005520 cutting process Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000011049 filling Methods 0.000 claims description 5
- 238000004140 cleaning Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000013178 mathematical model Methods 0.000 claims description 3
- 238000005192 partition Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 18
- 238000005516 engineering process Methods 0.000 description 16
- 238000012549 training Methods 0.000 description 10
- 239000013598 vector Substances 0.000 description 8
- 238000011161 development Methods 0.000 description 7
- 230000018109 developmental process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000003780 insertion Methods 0.000 description 7
- 230000037431 insertion Effects 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 4
- 238000011068 loading method Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000009412 basement excavation Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 238000007794 visualization technique Methods 0.000 description 2
- 241000239290 Araneae Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application involves data collecting fields, the mode that specifically used data crawl constructs knowledge mapping, and disclose a kind of knowledge mapping processing method based on crawler, device, computer equipment and storage medium, the data information in webpage information is crawled using crawler system, data cleansing is carried out to the data information crawled, data information to crawling and Jing Guo data cleansing carries out information completion, to obtain target information, feature extraction is carried out to the target information, relationship disaggregated model is constructed according to the characteristic information extracted, serial relationship entity is obtained from the data information using the relationship disaggregated model to data, and by relationship entity to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.By the above-mentioned means, the application can greatly improve working efficiency, and help to be best understood from information content by the knowledge that knowledge mapping provides and carry out information the analysis of public opinion by the quick fast information that obtains information.
Description
Technical field
This application involves the network informations to obtain field more particularly to a kind of knowledge mapping processing method based on crawler, dress
It sets, computer equipment and storage medium.
Background technique
In the technical field that the network information obtains, web crawlers (be otherwise known as webpage spider, network robot,
It is more frequent to be known as webpage follower among the community FOAF), be it is a kind of according to certain rules, automatically grab WWW letter
The program or script of breath.There are also ant, automatic indexing, simulation program or worms for the rarely needed name of other.
With the rapid development of network, WWW becomes the carrier of bulk information, how to efficiently extract and use these
Information becomes a huge challenge.Search engine, such as traditional universal search engine Baidu and Google are assisted as one
The tool that people retrieve information becomes entrance and guide that user accesses WWW.But these versatility search engines are also deposited
In certain limitation, such as:
(1) different field, different background user often there is different retrieval purpose and demand, universal search engine institute
The result of return includes the unconcerned webpage of a large number of users.
(2) target of universal search engine is the network coverage as big as possible, limited search engine server resource
Contradiction between unlimited network data resource will further deepen.
(3) abundant and network technology the continuous development of world wide web data form, the more matchmakers of picture, database, audio, video
The different data such as body largely occur, often intensive to these information contents and data with certain structure of universal search engine without
It can be power, cannot find and obtain well.
(4) universal search engine provides the retrieval based on keyword mostly, it is difficult to which support is looked into according to what semantic information proposed
It askes.
It is well known that, to information, public sentiment etc., need to understand in time in financial technology field, while also needing point
The quickly analysis processing of door other class ground, but existing network information acquiring technology haves the defects that above-mentioned etc., leads to not meet and use
The demand at family is difficult to meet the needs of technology development.
Summary of the invention
This application provides a kind of knowledge mapping processing method, device, equipment and storage medium based on crawler, can be right
The field of the sensitivity such as some pairs of information, public sentiments provides the method and approach of quickly and effectively network information acquisition, the processing that classifies,
Meet the needs of users the needs with technology development.
In a first aspect, this application provides a kind of knowledge mapping processing method based on crawler, comprising:
The data information in webpage information is crawled using crawler system;
Data cleansing is carried out to the data information crawled;
Data information to crawling and Jing Guo data cleansing carries out information completion, to obtain target information;
Feature extraction is carried out to the target information, relationship disaggregated model is constructed according to the characteristic information extracted;
Serial relationship entity is obtained from the data information using the relationship disaggregated model to data, and relationship is real
Body is to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.
Second aspect, the knowledge mapping processing unit based on crawler that this application provides a kind of, comprising:
Information crawler module, for crawling the data information in webpage information using crawler system;
Data cleansing module, for carrying out data cleansing to the data information crawled;
Information completion module carries out information completion for the data information to crawling and Jing Guo data cleansing, to obtain mesh
Mark information;
Model construction module is constructed for carrying out feature extraction to the target information according to the characteristic information extracted
Relationship disaggregated model;
Map construction module, for obtaining serial relationship entity from the data information using the relationship disaggregated model
To data, and by relationship entity to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.
The third aspect, present invention also provides a kind of computer equipment, the computer equipment includes memory and processing
Device;
The memory is for storing computer program;
The processor, for executing the computer program and realization such as first party when executing the computer program
Knowledge mapping processing method described in face.
Fourth aspect, present invention also provides a kind of computer readable storage medium, the computer readable storage medium
It is stored with computer program, the computer program realizes the processor as described in relation to the first aspect
Knowledge mapping processing method.
This application discloses knowledge mapping processing method, device, computer equipment and storage mediums based on crawler, utilize
Crawler system crawls the data information in webpage information, carries out data cleansing to the data information that crawls, to crawling and pass through
The data information for crossing data cleansing carries out information completion, to obtain target information, carries out feature extraction, root to the target information
Relationship disaggregated model is constructed according to the characteristic information extracted, obtains system from the data information using the relationship disaggregated model
Column relationship entity is to data, and by relationship entity to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete processing knowledge graph
The building of spectrum.By the above-mentioned means, the application can greatly improve work effect by the quick fast information that obtains information
Rate, and help to be best understood from information content by the knowledge that knowledge mapping provides, more accurately carry out information the analysis of public opinion.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in embodiment description
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the step schematic flow diagram of the knowledge mapping processing method provided by the embodiments of the present application based on crawler;
Fig. 2 is the step schematic flow diagram that a wherein embodiment for data cleansing is carried out to the data information crawled;
Fig. 3 is the step schematic flow diagram that another embodiment of data cleansing is carried out to the data information crawled;
Fig. 4 is signal the step of handled data information to obtain a wherein embodiment for the data information of ontological format
Flow chart;
Fig. 5 is obtains series relationship entity from data information using relationship disaggregated model to data, and by relationship entity
To the step schematic flow diagram of a wherein embodiment for the rudimentary knowledge carrier for being stored as knowledge mapping;
Fig. 6 is a kind of schematic block diagram for knowledge mapping processing unit based on crawler that embodiments herein provides;
Fig. 7 is a kind of structural representation block diagram for computer equipment that one embodiment of the application provides.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen
Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall in the protection scope of this application.
Flow chart shown in the drawings only illustrates, it is not necessary to including all content and operation/step, also not
It is that must be executed by described sequence.For example, some operation/steps can also decompose, combine or partially merge, therefore practical
The sequence of execution is possible to change according to the actual situation.
Embodiments herein provide a kind of knowledge mapping processing method based on crawler, device, computer equipment and
Storage medium can provide the field of the sensitivity such as some pairs of information, public sentiments the quickly and effectively network information and obtain, classify
The method and approach of processing meets the needs of users the needs with technology development.
With reference to the accompanying drawing, it elaborates to some embodiments of the application.In the absence of conflict, following
Feature in embodiment and embodiment can be combined with each other.
Referring to Fig. 1, Fig. 1 shows the step of being the knowledge mapping processing method based on crawler of embodiments herein offer
Meaning flow chart.It needs to illustrate first, the present embodiment knowledge mapping (Knowledge Graph) is also known as scientific knowledge figure
Spectrum is known as knowledge domain visualization or ken mapping map in books and information group, is that explicit knowledge's development process and structure are closed
A series of a variety of different figures of system describe knowledge resource and its carrier with visualization technique, excavate, analysis, building, draw
And explicit knowledge and connecting each other between them.
As shown in Figure 1, the herein described knowledge mapping processing method based on crawler, can include but is not limited to step
S101~step S105.
S101 crawls the data information in webpage information using crawler system.
In this application, S101 needs to carry out webpage reading, that is, reads the complete content of webpage, specifically may include asynchronous
The content of load is such as completely presented to the content of browser window.
For example, network speed, flow, device rate, the screen for different user end (such as mobile phone or computer equipment) are big
The reasons such as small, and lead to common python requests, the web page contents that python urlib, wget, curl etc. are got
It is imperfect, such as only the skeleton of webpage and without content, need to wait the asynchronous loading content of JS.At this point, band can be used in the application
The browser of JS enforcement engine drives to execute the asynchronous load JS in webpage, asynchronous loading problem is solved, further, it is also possible to match
The mode without interface browser of conjunction uses.
In this application, current embodiment require that analyzing webpage, such as the web page contents crawled is analyzed, are mentioned
The content of needs is taken, that is, extracts the address URL for needing further to crawl in the webpage, or utilize information architecture URL in webpage
Address.
It should be noted that the content that the web page analysis of the present embodiment is targeted, can be structured content (such as HTML and
JSON), semi-structured content and unstructured content (such as pure txt).
It is noted that the present embodiment also needs to carry out task duplicate removal and scheduling during crawling, to prevent net
The repetition crawl of page, for example the address of B is contained in A, the address for returning to A is contained again in B, avoids crawler between A and B
The problem of endless loop.Meanwhile the scheduling of the present embodiment is the angle from system performance, the main time-consuming of web page crawl be
Network interaction waits network address to carry out dns resolution, request, returned data, asynchronous load completion etc., need several seconds even more
The long time.
It should be added that the present embodiment can the customized crawler relevant information that needs to crawl first, it can
Setting crawler crawls customized information, such as financial Information.
S102 carries out data cleansing to the data information crawled.
It should be strongly noted that pair data information crawled described in the present embodiment carries out data cleansing, can wrap
Include step S1021~S1023 as shown in Figure 2.
S1021 handles the data information to obtain the data information of ontological format.
The data information of the ontological format is carried out Data Integration by gradual disambiguation algorithm and obtains difference by S1022
Linking relationship between the identical entity of data source.
It is noted that the data information of the ontological format is carried out data by gradual disambiguation algorithm by the application
Integration, which obtains the linking relationship between the identical entity of different data sources, can specifically include following treatment processes: input target is real
Body name and the first context parameters are searched from the knowledge data base according to the target entity name, obtain with it is described
The identical number of target entity name judges whether the target entity name is original reality if the number is the first quantity
Body noun exports first instance name identical with the target entity name, if described if the number is the second quantity
Number is third quantity, then multiple second instance names is carried out disambiguation processing.
Furthermore, judge whether the target entity name is that primary entities noun specifically may be used described in the embodiment of the present application
To include following treatment processes: if the entitled primary entities noun of the target entity, by the primary entities noun
It is split to obtain multiple substantive nouns, be searched from the knowledge data base respectively according to the multiple substantive noun,
Obtain physical name identical with the target entity name.
In addition, it may include following treatment process that multiple second instance names, which are carried out disambiguation processing, described in the present embodiment: will
The context parameters of the target entity name and the context parameters of the multiple second instance name carry out at natural language respectively
Reason obtains bag of words and bag of words collection, and the bag of words and the bag of words collection are carried out similarity calculation respectively, it is maximum to obtain similarity
Word frequency exports the maximum word frequency of the similarity.
S1023 obtains key message by automatic excavating, wherein the key message includes summary info and title
Information.
It should be strongly noted that the present embodiment can be using based on natural language processing NLP (Natural Language
Processing mode) carries out crawler operation and correspondence is classified, specifically, referring to Fig. 3, the described pair of institute crawled
It states data information and carries out data cleansing, may include step S1021 '~S1023 ':
S1021 ' handles the data information to obtain the data information of ontological format.
S1022 ', by the data information of the ontological format by natural language processing NLP Chinese word segmentation disambiguation algorithm into
Row Data Integration obtains the linking relationship between the identical entity of different data sources.
It should be noted that the data information of the ontological format is passed through natural language processing NLP described in the present embodiment
Chinese word segmentation disambiguation algorithm carries out Data Integration and obtains the linking relationship between the identical entity of different data sources, as shown in figure 4,
It may include following processing step S401~S404.
S401 obtains Chinese sentence, detects overlapping ambiguity present in Chinese sentence by maximum matching algorithm, and be put into
Chiasma type ambiguity set indicates directly to return without chiasma type ambiguity without any processing in read statement if collection is combined into sky,
Otherwise, all ambiguities in traversal set.
S402 is carried out the full cutting in path to ambiguity, is obtained all paths using the recursion method based on depth-first search
Set, traverse path set.
S403 calculates mathematical model according to given selection possibility to each path, builds to ambiguity partition path
Mould calculates and records the selection possibility numerical value of respective paths and calculates maximum two selections possibility in the set of paths of ambiguity
The difference of property numerical value.
S404 stops clearing up and by preset true if difference in a given threshold value, confirms that the ambiguity is true ambiguity
Ambiguity resolution mode is handled, and otherwise, judges that the ambiguity is made for pseudo- ambiguity and by the maximum path of selection possibility numerical value
For the resolution result of the ambiguity.
S1023 ' obtains key message by automatic excavating, wherein the key message includes summary info and title
Information.
S103, the data information to crawling and Jing Guo data cleansing carries out information completion, to obtain target information.
It should be noted that the information completion of the present embodiment may include the basis mark for carrying out subject completion operation, than
It can such as carry out including participle (word-seg), part-of-speech tagging (POS), name Entity recognition (NER), syntax dependency parsing
(dep-parser), the basic mark movement of subject completion operation.
In the present embodiment, it crawls and data information Jing Guo data cleansing carries out information completion for described pair, may include:
Data information to crawling and Jing Guo data cleansing carries out natural language processing NLP labeling operation, according to paragraph symbol or mark
Point symbol to webpage information carry out sentence cutting, each sentence is successively segmented, part-of-speech tagging, name Entity recognition and
Interdependent syntactic analysis.
In NER treatment process, the method that can be combined with dictionary and entity recognition model, entity recognition model is used
Crowdsourcing platform mark plus model training, last recombination region dictionary provide result.The present embodiment can be according to Entity recognition
As a result the entity word that those are cut open is restored.Such as: " Space Science and Technology " may be cut into " space flight " and " science and technology ", but
It is that incision can be reassembled into " Space Science and Technology " according to the result of NER below.
It should be pointed out that pair crawled described in the present embodiment and the data information Jing Guo data cleansing carries out information completion,
It can also include: the interdependent information progress syntax dependency parsing of the sentence context of information based on the data;If syntax is interdependent
Analysis lacks subject by target sentences or replaces subject to refer to word, but described target sentences itself include the pass of preset strength
It is feature, then completion and filling is carried out to the subject of target sentences.
Furthermore, the herein described subject to target sentences carries out completion and filling, may include handling as follows
Process:
Judge whether target sentences include subject;
If target sentences include subject, judges whether subject refers to pronoun, if subject refers to pronoun, judge target sentence
Whether upper one of son includes that subject if upper one of target sentences includes subject judges whether subject is entity word, if main
Language is entity word, then the subject completion of target sentences is carried out according to subject;
If not including subject, judge whether upper one of target sentences includes subject, if a upper packet for target sentences
Containing subject, then judge whether subject is entity word, if subject is entity word, is mended according to the subject that subject carries out target sentences
Entirely.
S104, carries out feature extraction to the target information, constructs relationship disaggregated model according to the characteristic information extracted.
Specifically, the present embodiment S104 carries out feature extraction to the target information, can be used for based on neural network language
Say the word insertion feature of model, based on the feature of the vocabulary level of co-occurrence sequence between word and grammer based on syntactic structure
Feature.
As one of embodiment, word insertion, which refers to, is expressed as dense low-dimensional for the semantic information distribution of word
Spend real-valued vectors.Word insertion is characterized in utilizing distributed term vector spatial translation based on word2vec term vector trained in advance
Invariance finds out the COS distance value of the insertion vector of two entity words.
As one of embodiment, grammar property refers to the sentence structure feature based on dependency analysis and part of speech, such as
The part of speech POSD1 of the interdependent word D2 and interdependent word D1 of the interdependent word D1 of entity word c1, entity word c2, the part of speech of interdependent word D2
POSD2 etc..
For example, it is special to extract sentence context obtaining inside sentence sequence and sentence after the interdependent information of each word
Sign, such as: two entity middle verbs, the previous word of first entity, second entity the latter word etc..Next, root
Model training is carried out according to feature extraction result, constructs relationship classifier.
In this application, relationship classifier is preferably Bayes classifier.
Specifically, the building process of the relationship classifier of the present embodiment has following two method:
Method one, first collection fraction entity relationship example crawl its related text using crawler orientation, artificial to mark
A small amount of sample, one Relation extraction model of pre-training;Then model training, building relationship classification are carried out according to feature extraction result
Device.
Or method two, model training is directly carried out according to feature extraction result, constructs relationship classifier.
It should be strongly noted that classifier of the invention only determines a kind of positive and negative class of relationship, multirelation is sentenced
Surely multiple classifiers can be established in parallel.
S105 obtains serial relationship entity using the relationship disaggregated model to data from the data information, and will
Relationship entity is to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.
It is corresponding with Fig. 2 to be, referring to Fig. 5, herein described utilize the relationship disaggregated model from the data information
The serial relationship entity of middle acquisition may include to the rudimentary knowledge carrier for being stored as knowledge mapping to data, and by relationship entity
Step S1051~S1053.
S1051 is associated with map with heading message building according to the summary info using the relationship disaggregated model.
S1052 infers potential related information according to the association map and expands the association map.
S1053, by the linking relationship and the association map between the data information of different data sources, the identical entity
Relationship entity knowledge data base is configured to data, using the rudimentary knowledge carrier as knowledge mapping.
It is not difficult to find out that the application can use crawler system, NLP (natural language processing), relationship classifier carry out it is organic
Complete data screening, feature extraction, the function of classifying and be finally built into knowledge mapping in ground.
As one of application examples, after building knowledge mapping relational network, it can provide and carry out interface polls work
Make, for example be applied to universal, the financial industry analysis of Sci-tech Knowledge etc..
For example, NLP combination knowledge mapping is applied to financial industry, the realization process includes: by NLP technology to client
The problem of proposition, is handled;According to treated, information selection represents entity/relationship phrase, keyword in case later retrieval
It uses;It is retrieved in financial vertical map according to the information that the semantic analysis is obtained with understanding;It is raw according to search result
It exports at answer to answer client questions, and carries out the guidance and excavation of problem;The financial vertical map is made by
Relationship between financial field knowledge and the financial field knowledge is combed and is saved by data base manipulation machine learning techniques
And it obtains.
It is noted that with the reform and propulsion to finance in national 2019, it is economical gradually transition to arrive finance production
In industry, then, the application can provide technical support for this field, such as: prepare financial field relational learning data;By
Machine learning techniques learn the financial field relational learning data in the way of semi-supervised;The finance neck that will learn
Relationship between domain knowledge entity and financial field knowledge entity is combed and saved obtains chart database in the database;It is right
Relationship between the financial field knowledge entity carries out semi-supervised maintenance, and the synonym for indicating the same relation is related in the relationship
And entity within the scope of polymerize;When increasing picture description to common entity, and being exported as search terms for subsequent result
Selective extraction uses, and improves the diversification of interaction answer form;Wherein, when each entitative concept is stored in, other can be associated with
Existing entity, ultimately forms between each entity that there are the financial vertical knowledge mappings of correlation.
In conclusion the application carries out information classification processing by using the mode that knowledge mapping is combined with NLP, and keep away
The acquisition of information defect of the prior art is exempted from;Furthermore, the application is in semantic expression, from traditional keyword
Bag of words mode rises to more three-dimensional semantic net mode, so that any target of client's concern, can contain in model, and
Machine learning algorithm can be based on Manual definition, and further self study extension finds more knowledge points, such as finds more competitions
Opponent enterprise and upstream and downstream firms;In addition, the application is on knowledge-based reasoning, due to a large amount of knowledge point and its relationship by
It clearly indicates and defines, can find the relationship hidden between knowledge point, such as the text from magnanimity automatically by computer
It is middle to excavate the influence of finance " science and technology to ", or made inferences based on existing relational network, predict certain knowledge points it
Between with the presence or absence of the manually relationship do not listed.Knowledge hierarchy framework of this reasoning based on Manual definition is a kind of interpretable
Aptitude manner, therefore effect is more preferable in practice.
Referring to Fig. 6, Fig. 6 is a kind of showing for knowledge mapping processing unit based on crawler that embodiments herein provides
Meaning property block diagram should be used to execute the knowledge mapping processing side above-mentioned based on crawler based on the knowledge mapping processing unit of crawler
Method.Wherein, being somebody's turn to do the knowledge mapping processing unit based on crawler can be configured in server or terminal.
As shown in fig. 6, being somebody's turn to do the knowledge mapping processing unit 200 based on crawler, comprising: information crawler module 201, data are clear
Mold cleaning block 202, information completion module 203, model construction module 204 and map construction module 205.
Information crawler module 201, for crawling the data information in webpage information using crawler system.
Data cleansing module 202, for carrying out data cleansing to the data information crawled.
Information completion module 203 carries out information completion for the data information to crawling and Jing Guo data cleansing, to obtain
Target information.
Model construction module 204, for carrying out feature extraction to the target information, according to the characteristic information structure extracted
Build relationship disaggregated model.
Map construction module 205, for obtaining serial relationship from the data information using the relationship disaggregated model
Entity handles the structure of knowledge mapping with completion to the rudimentary knowledge carrier for being stored as knowledge mapping to data, and by relationship entity
It builds.
It should be noted that it is apparent to those skilled in the art that, for convenience of description and succinctly,
The specific work process of the device of foregoing description and each module, can refer to corresponding processes in the foregoing method embodiment, herein
It repeats no more.
It is the structural representation frame of one embodiment of the application computer equipment refering to Fig. 7, Fig. 7 incorporated by reference to above-described embodiment
Figure.It needs to illustrate first, the present embodiment knowledge mapping (Knowledge Graph) is also known as mapping knowledge domains, is scheming
Book intelligence community is known as knowledge domain visualization or ken mapping map, is a system of explicit knowledge's development process and structural relation
A variety of different figures are arranged, describe knowledge resource and its carrier with visualization technique, excavates, analysis, construct, drawing and display is known
Knowledge and connecting each other between them.
Wherein, the computer equipment of the present embodiment can configured with operating system, computer program and its be possibly stored to
It is total in addition it can configure built-in storage, network interface and system for connecting each module in non-volatile memory medium
Line etc..
It illustrates in brief, as shown in fig. 7, herein described computer equipment may include memory 71 and processor
72, the memory 71 is for storing computer program, and the processor 72 is for executing the computer program and executing
When the computer program, for realizing method include:
The data information in webpage information is crawled using crawler system;
Data cleansing is carried out to the data information crawled;
Data information to crawling and Jing Guo data cleansing carries out information completion, to obtain target information;
Feature extraction is carried out to the target information, relationship disaggregated model is constructed according to the characteristic information extracted;
Serial relationship entity is obtained from the data information using the relationship disaggregated model to data, and relationship is real
Body is to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.
In this application, the processor 72 needs to carry out webpage reading, that is, reads the complete content of webpage, specifically can be with
Content comprising asynchronous load is such as completely presented to the content of browser window.
For example, network speed, flow, device rate, the screen for different user end (such as mobile phone or computer equipment) are big
The reasons such as small, and lead to common python requests, the web page contents that python urlib, wget, curl etc. are got
It is imperfect, such as only the skeleton of webpage and without content, need to wait the asynchronous loading content of JS.At this point, band can be used in the application
The browser of JS enforcement engine drives to execute the asynchronous load JS in webpage, asynchronous loading problem is solved, further, it is also possible to match
The mode without interface browser of conjunction uses.
In this application, current embodiment require that analyzing webpage, such as the web page contents crawled is analyzed, are mentioned
The content of needs is taken, that is, extracts the address URL for needing further to crawl in the webpage, or utilize information architecture URL in webpage
Address.
It should be noted that the content that the web page analysis of the present embodiment is targeted, can be structured content (such as HTML and
JSON), semi-structured content and unstructured content (such as pure txt).
It is noted that the present embodiment also needs to carry out task duplicate removal and scheduling during crawling, to prevent net
The repetition crawl of page, for example the address of B is contained in A, the address for returning to A is contained again in B, avoids crawler between A and B
The problem of endless loop.Meanwhile the scheduling of the present embodiment is the angle from system performance, the main time-consuming of web page crawl be
Network interaction waits network address to carry out dns resolution, request, returned data, asynchronous load completion etc., need several seconds even more
The long time.
It should be added that the present embodiment can the customized crawler relevant information that needs to crawl first, it can
Setting crawler crawls customized information, such as financial Information.
In this application, the processor 72 is when carrying out data cleansing to the data information that crawls, for realizing:
The data information is handled to obtain the data information of ontological format, the data information of the ontological format is passed through progressive
Formula disambiguation algorithm carries out Data Integration and obtains the linking relationship between the identical entity of different data sources, and the processor 72 passes through
Automatic excavating obtains key message, wherein the key message includes summary info and heading message.
It is noted that the data information of the ontological format is passed through gradual disambiguation by herein described processor 72
Algorithm progress Data Integration, which obtains the linking relationship between the identical entity of different data sources, can specifically include following process
Journey: input target entity name and the first context parameters are looked into from the knowledge data base according to the target entity name
It looks for, obtains number identical with the target entity name, if the number is the first quantity, judge the target entity name
Whether it is primary entities noun, if the number is the second quantity, it is real exports identical with the target entity name first
Multiple second instance names are carried out disambiguation processing if the number is third quantity by body name.
Furthermore, judge whether the target entity name is that primary entities noun specifically may be used described in the embodiment of the present application
To include following treatment processes: if the entitled primary entities noun of the target entity, the processor 72 will be described
Primary entities noun is split to obtain multiple substantive nouns, and the processor 72 is according to the multiple substantive noun respectively from institute
It states in knowledge data base and is searched, the processor 72 obtains physical name identical with the target entity name.
In addition, it may include handling as follows that multiple second instance names are carried out disambiguation processing by processor 72 described in the present embodiment
Process: the processor 72 is by the context parameters of the context parameters of the target entity name and the multiple second instance name
Carry out natural language processing respectively and obtain bag of words and bag of words collection, the processor 72 by the bag of words and the bag of words collection respectively into
Row similarity calculation, the processor 72 obtain the maximum word frequency of similarity, export the maximum word frequency of the similarity.
It should be strongly noted that processor 72 described in the present embodiment can use the side based on natural language processing NLP
Formula carries out crawler operation and correspondence is classified, and the processor 72 is carrying out data cleansing to the data information crawled
When, it may be implemented as follows: the data information being handled to obtain the data information of ontological format, by the ontological format
Data information carries out Data Integration by natural language processing NLP Chinese word segmentation disambiguation algorithm and obtains the identical reality of different data sources
Linking relationship between body obtains key message by automatic excavating, wherein the key message includes summary info and mark
Inscribe information.
It should be noted that the data information of the ontological format is passed through natural language processing NLP described in the present embodiment
Chinese word segmentation disambiguation algorithm carry out Data Integration obtain the linking relationship between the identical entity of different data sources, may include as
Lower processing mode: obtaining Chinese sentence, detects overlapping ambiguity present in Chinese sentence by maximum matching algorithm, and be put into friendship
Forked type ambiguity set indicates directly to return without chiasma type ambiguity without any processing in read statement if collection is combined into sky, no
Then, all ambiguities in traversal set carry out the full cutting in path to ambiguity using the recursion method based on depth-first search,
The set in all paths is obtained, traverse path set calculates mathematical model according to given selection possibility to each path, right
Ambiguity partition path is modeled, in the selection possibility numerical value and the set of paths of calculating ambiguity for calculating and recording respective paths
The difference of maximum two selections possibility numerical value is stopped if difference in a given threshold value, confirms that the ambiguity is true ambiguity
It only clears up and is handled by preset true ambiguity resolution mode, otherwise, judge the ambiguity for pseudo- ambiguity and will select possible
Resolution result of the property maximum path of numerical value as the ambiguity.
Wherein, data information of the processor 72 to crawling and Jing Guo data cleansing carries out information completion, to obtain mesh
When marking information, the information completion of the present embodiment may include the basis mark for carrying out subject completion operation, for example can be wrapped
Include participle (word-seg), part-of-speech tagging (POS), name Entity recognition (NER), syntax dependency parsing (dep-parser), master
The basic mark movement of language completion operation.
In the present embodiment, it crawls and data information Jing Guo data cleansing carries out information completion for described pair, may include:
Data information to crawling and Jing Guo data cleansing carries out natural language processing NLP labeling operation, according to paragraph symbol or mark
Point symbol to webpage information carry out sentence cutting, each sentence is successively segmented, part-of-speech tagging, name Entity recognition and
Interdependent syntactic analysis.
In NER treatment process, the method that can be combined with dictionary and entity recognition model, entity recognition model is used
Crowdsourcing platform mark plus model training, last recombination region dictionary provide result.The present embodiment can be according to Entity recognition
As a result the entity word that those are cut open is restored.Such as: " Space Science and Technology " may be cut into " space flight " and " science and technology ", but
It is that incision can be reassembled into " Space Science and Technology " according to the result of NER below.
It should be pointed out that pair crawled described in the present embodiment and the data information Jing Guo data cleansing carries out information completion,
It can also include: the interdependent information progress syntax dependency parsing of the sentence context of information based on the data;If syntax is interdependent
Analysis lacks subject by target sentences or replaces subject to refer to word, but described target sentences itself include the pass of preset strength
It is feature, then completion and filling is carried out to the subject of target sentences.
Furthermore, the herein described subject to target sentences carries out completion and filling, may include handling as follows
Process:
Judge whether target sentences include subject;
If target sentences include subject, judges whether subject refers to pronoun, if subject refers to pronoun, judge target sentence
Whether upper one of son includes that subject if upper one of target sentences includes subject judges whether subject is entity word, if main
Language is entity word, then the subject completion of target sentences is carried out according to subject;
If not including subject, judge whether upper one of target sentences includes subject, if a upper packet for target sentences
Containing subject, then judge whether subject is entity word, if subject is entity word, is mended according to the subject that subject carries out target sentences
Entirely.
In addition, herein described processor 72 when carrying out feature extraction to the target information, can be used for based on mind
Word through netspeak model is embedded in feature, the feature based on the vocabulary level of co-occurrence sequence between word, and is based on syntax knot
The grammar property of structure.
As one of embodiment, word insertion, which refers to, is expressed as dense low-dimensional for the semantic information distribution of word
Spend real-valued vectors.Word insertion is characterized in utilizing distributed term vector spatial translation based on word2vec term vector trained in advance
Invariance finds out the COS distance value of the insertion vector of two entity words.
As one of embodiment, grammar property refers to the sentence structure feature based on dependency analysis and part of speech, such as
The part of speech POSD1 of the interdependent word D2 and interdependent word D1 of the interdependent word D1 of entity word c1, entity word c2, the part of speech of interdependent word D2
POSD2 etc..
For example, it is special to extract sentence context obtaining inside sentence sequence and sentence after the interdependent information of each word
Sign, such as: two entity middle verbs, the previous word of first entity, second entity the latter word etc..Next, root
Model training is carried out according to feature extraction result, constructs relationship classifier.
In this application, relationship classifier is preferably Bayes classifier.
Specifically, the building process of the relationship classifier of the present embodiment has following two method:
Method one, first collection fraction entity relationship example crawl its related text using crawler orientation, artificial to mark
A small amount of sample, one Relation extraction model of pre-training;Then model training, building relationship classification are carried out according to feature extraction result
Device.
Or method two, model training is directly carried out according to feature extraction result, constructs relationship classifier.
It should be strongly noted that classifier of the invention only determines a kind of positive and negative class of relationship, multirelation is sentenced
Surely multiple classifiers can be established in parallel.
It is noted that processor 72 described in the present embodiment is using the relationship disaggregated model from the data information
Serial relationship entity is obtained to data, and by relationship entity to the rudimentary knowledge carrier for being stored as knowledge mapping, under may include
It states mode: map being associated with heading message building according to the summary info using the relationship disaggregated model;According to institute
Association map is stated to infer potential related information and expand the association map;By the data information of different data sources, the phase
Knowledge data base is configured to data with the relationship entity for being associated with map with the linking relationship between entity, using as knowledge
The rudimentary knowledge carrier of map.
It is not difficult to find out that the application can use crawler system, NLP (natural language processing), relationship classifier carry out it is organic
Complete data screening, feature extraction, the function of classifying and be finally built into knowledge mapping in ground.
As one of application examples, after building knowledge mapping relational network, it can provide and carry out interface polls work
Make, for example be applied to universal, the financial industry analysis of Sci-tech Knowledge etc..
For example, NLP combination knowledge mapping is applied to financial industry, the realization process includes: by NLP technology to client
The problem of proposition, is handled;According to treated, information selection represents entity/relationship phrase, keyword in case later retrieval
It uses;It is retrieved in financial vertical map according to the information that the semantic analysis is obtained with understanding;It is raw according to search result
It exports at answer to answer client questions, and carries out the guidance and excavation of problem;The financial vertical map is made by
Relationship between financial field knowledge and the financial field knowledge is combed and is saved by data base manipulation machine learning techniques
And it obtains.
It is noted that with the reform and propulsion to finance in national 2019, it is economical gradually transition to arrive finance production
In industry, then, the application can provide technical support for this field, such as: prepare financial field relational learning data;By
Machine learning techniques learn the financial field relational learning data in the way of semi-supervised;The finance neck that will learn
Relationship between domain knowledge entity and financial field knowledge entity is combed and saved obtains chart database in the database;It is right
Relationship between the financial field knowledge entity carries out semi-supervised maintenance, and the synonym for indicating the same relation is related in the relationship
And entity within the scope of polymerize;When increasing picture description to common entity, and being exported as search terms for subsequent result
Selective extraction uses, and improves the diversification of interaction answer form;Wherein, when each entitative concept is stored in, other can be associated with
Existing entity, ultimately forms between each entity that there are the financial vertical knowledge mappings of correlation.
In conclusion the application carries out information classification processing by using the mode that knowledge mapping is combined with NLP, and keep away
The acquisition of information defect of the prior art is exempted from;Furthermore, the application is in semantic expression, from traditional keyword
Bag of words mode rises to more three-dimensional semantic net mode, so that any target of client's concern, can contain in model, and
Machine learning algorithm can be based on Manual definition, and further self study extension finds more knowledge points, such as finds more competitions
Opponent enterprise and upstream and downstream firms;In addition, the application is on knowledge-based reasoning, due to a large amount of knowledge point and its relationship by
It clearly indicates and defines, can find the relationship hidden between knowledge point, such as the text from magnanimity automatically by computer
It is middle to excavate the influence of finance " science and technology to ", or made inferences based on existing relational network, predict certain knowledge points it
Between with the presence or absence of the manually relationship do not listed.Knowledge hierarchy framework of this reasoning based on Manual definition is a kind of interpretable
Aptitude manner, therefore effect is more preferable in practice.
Incorporated by reference to said one or multiple embodiments, the application also provides a kind of computer readable storage medium, the meter
Calculation machine readable storage medium storing program for executing is stored with computer program, and the computer program realizes the processor when being executed by processor
The step of knowledge mapping processing method as described in Fig. 1-Fig. 5 and embodiment.
It should be understood that the present embodiment processor can be central processing unit (Central Processing
Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal
Processor, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing
At programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete
Door or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or the processor
It is also possible to any conventional processor etc..
Wherein, the computer readable storage medium can be the storage inside of computer equipment described in previous embodiment
Unit, such as the hard disk or memory of the computer equipment.The computer readable storage medium is also possible to the computer
The plug-in type hard disk being equipped on the External memory equipment of equipment, such as the computer equipment, intelligent memory card (Smart
Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any
Those familiar with the art within the technical scope of the present application, can readily occur in various equivalent modifications or replace
It changes, these modifications or substitutions should all cover within the scope of protection of this application.Therefore, the protection scope of the application should be with right
It is required that protection scope subject to.
Claims (10)
1. a kind of knowledge mapping processing method based on crawler characterized by comprising
The data information in webpage information is crawled using crawler system;
Data cleansing is carried out to the data information crawled;
Data information to crawling and Jing Guo data cleansing carries out information completion, to obtain target information;
Feature extraction is carried out to the target information, relationship disaggregated model is constructed according to the characteristic information extracted;
Serial relationship entity is obtained from the data information using the relationship disaggregated model to data, and by relationship entity pair
It is stored as the rudimentary knowledge carrier of knowledge mapping, to complete the building of processing knowledge mapping.
2. knowledge mapping processing method according to claim 1, which is characterized in that the described pair of data information crawled
Carry out data cleansing, comprising:
The data information is handled to obtain the data information of ontological format;
The data information of the ontological format is subjected to Data Integration by gradual disambiguation algorithm and obtains the phase of different data sources
With the linking relationship between entity;
Key message is obtained by automatic excavating, wherein the key message includes summary info and heading message.
3. knowledge mapping processing method according to claim 2, which is characterized in that described to utilize the relationship disaggregated model
Serial relationship entity is obtained from the data information to data, and by relationship entity to the rudimentary knowledge for being stored as knowledge mapping
Carrier, comprising:
Map is associated with heading message building according to the summary info using the relationship disaggregated model;
Potential related information is inferred according to the association map and expands the association map;
By linking relationship and the relationship entity for being associated with map between the data information of different data sources, the identical entity
Knowledge data base is configured to data, using the rudimentary knowledge carrier as knowledge mapping.
4. knowledge mapping processing method according to claim 1, which is characterized in that the described pair of data information crawled
Carry out data cleansing, comprising:
The data information is handled to obtain the data information of ontological format;
The data information of the ontological format is carried out Data Integration by natural language processing NLP Chinese word segmentation disambiguation algorithm to obtain
Linking relationship between the identical entity of different data sources;
Key message is obtained by automatic excavating, wherein the key message includes summary info and heading message.
5. knowledge mapping processing method according to claim 4, which is characterized in that the data by the ontological format
Information by natural language processing NLP Chinese word segmentation disambiguation algorithm carry out Data Integration obtain different data sources identical entity it
Between linking relationship, comprising:
Chinese sentence is obtained, overlapping ambiguity present in Chinese sentence is detected by maximum matching algorithm, and be put into chiasma type discrimination
Justice set indicates, without any processing, directly to return without chiasma type ambiguity in read statement if collection is combined into sky, otherwise, time
Go through all ambiguities in set;
Using the recursion method based on depth-first search, the full cutting in path is carried out to ambiguity, obtains the set in all paths, time
Go through set of paths;
Mathematical model is calculated according to given selection possibility to each path, ambiguity partition path is modeled, is calculated simultaneously
Record maximum two selections possibility numerical value in the set of paths of the selection possibility numerical value of respective paths and calculating ambiguity
Difference;
If difference in a given threshold value, confirms that the ambiguity is true ambiguity, stops clearing up and press preset true ambiguity resolution
Mode is handled, otherwise, judge the ambiguity for pseudo- ambiguity and will select the maximum path of possibility numerical value as the discrimination
The resolution result of justice.
6. knowledge mapping processing method according to claim 1,2 or 4, which is characterized in that described pair crawls and by number
Information completion is carried out according to the data information of cleaning, comprising:
To crawling and data information Jing Guo data cleansing carries out natural language processing NLP labeling operation, according to paragraph symbol or
Person's punctuation mark carries out sentence cutting to webpage information, is successively segmented to each sentence, part-of-speech tagging, name entity are known
Other and interdependent syntactic analysis.
7. knowledge mapping processing method according to claim 1,2 or 4, which is characterized in that described pair crawls and by number
Information completion is carried out according to the data information of cleaning, comprising:
The interdependent information of the sentence context of information carries out syntax dependency parsing based on the data;
If syntax dependency parsing lacks subject by target sentences or replaces subject to refer to word, but described target sentences itself are wrapped
Relationship characteristic containing preset strength then carries out completion and filling to the subject of target sentences.
8. a kind of knowledge mapping processing unit based on crawler characterized by comprising
Information crawler module, for crawling the data information in webpage information using crawler system;
Data cleansing module, for carrying out data cleansing to the data information crawled;
Information completion module carries out information completion for the data information to crawling and Jing Guo data cleansing, to obtain target letter
Breath;
Model construction module constructs relationship according to the characteristic information extracted for carrying out feature extraction to the target information
Disaggregated model;
Map construction module, for obtaining serial relationship entity logarithm from the data information using the relationship disaggregated model
According to, and by relationship entity to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor;
The memory is for storing computer program;
The processor, for executing the computer program and realization such as claim 1 when executing the computer program
To knowledge mapping processing method described in any one of 7.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence, the computer program make the processor realize knowing as described in any one of claims 1 to 7 when being executed by processor
Know map processing method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910471975.5A CN110347894A (en) | 2019-05-31 | 2019-05-31 | Knowledge mapping processing method, device, computer equipment and storage medium based on crawler |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910471975.5A CN110347894A (en) | 2019-05-31 | 2019-05-31 | Knowledge mapping processing method, device, computer equipment and storage medium based on crawler |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110347894A true CN110347894A (en) | 2019-10-18 |
Family
ID=68174536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910471975.5A Pending CN110347894A (en) | 2019-05-31 | 2019-05-31 | Knowledge mapping processing method, device, computer equipment and storage medium based on crawler |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110347894A (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110852104A (en) * | 2019-11-04 | 2020-02-28 | 合肥工业大学 | Family tree identification method and device, storage medium and processor |
CN110971754A (en) * | 2019-10-28 | 2020-04-07 | 深圳绿米联创科技有限公司 | Information processing method, information processing device, electronic equipment and storage medium |
CN111198941A (en) * | 2020-01-03 | 2020-05-26 | 联想(北京)有限公司 | Problem discovery method and device, electronic equipment and storage medium |
CN111309827A (en) * | 2020-03-23 | 2020-06-19 | 平安医疗健康管理股份有限公司 | Knowledge graph construction method and device, computer system and readable storage medium |
CN111428052A (en) * | 2020-03-30 | 2020-07-17 | 中国科学技术大学 | Method for constructing educational concept graph with multiple relations from multi-source data |
CN111428047A (en) * | 2020-03-19 | 2020-07-17 | 东南大学 | Knowledge graph construction method and device based on UC L semantic indexing |
CN111563170A (en) * | 2020-04-30 | 2020-08-21 | 北京明略软件系统有限公司 | Knowledge graph generation method and device, computer storage medium and terminal |
CN111585809A (en) * | 2020-04-29 | 2020-08-25 | 北京润通丰华科技有限公司 | Method for auditing network equipment configuration by utilizing big data statistical analysis |
CN111708882A (en) * | 2020-05-29 | 2020-09-25 | 西安理工大学 | Transformer-based Chinese text information missing completion method |
CN111797296A (en) * | 2020-07-08 | 2020-10-20 | 中国人民解放军军事科学院军事医学研究院 | Method and system for mining poison-target literature knowledge based on network crawling |
CN111966836A (en) * | 2020-08-29 | 2020-11-20 | 深圳呗佬智能有限公司 | Knowledge graph vector representation method and device, computer equipment and storage medium |
CN112182235A (en) * | 2020-08-29 | 2021-01-05 | 深圳呗佬智能有限公司 | Method and device for constructing knowledge graph, computer equipment and storage medium |
CN112270196A (en) * | 2020-12-14 | 2021-01-26 | 完美世界(北京)软件科技发展有限公司 | Entity relationship identification method and device and electronic equipment |
CN112307292A (en) * | 2020-10-30 | 2021-02-02 | 中国信息安全测评中心 | Information processing method and system based on advanced persistent threat attack |
CN112328806A (en) * | 2020-10-30 | 2021-02-05 | 广州市西美信息科技有限公司 | Data processing method, system, computer equipment and storage medium |
CN112463985A (en) * | 2020-12-04 | 2021-03-09 | 北京明略软件系统有限公司 | Government affair map model construction method, device, equipment and computer readable medium |
CN112800305A (en) * | 2021-01-12 | 2021-05-14 | 厦门渊亭信息科技有限公司 | Knowledge graph data extraction method and device based on web crawler |
CN112836919A (en) * | 2020-11-30 | 2021-05-25 | 广东电网有限责任公司 | Supplier association analysis method and device based on knowledge graph |
CN113673956A (en) * | 2021-08-23 | 2021-11-19 | 湖北三新文化传媒有限公司 | Book information completion method, equipment and storage medium |
CN114528413A (en) * | 2022-02-18 | 2022-05-24 | 北京融信数联科技有限公司 | Knowledge graph updating method, system and readable storage medium supported by crowdsourced marking |
CN116432965A (en) * | 2023-04-17 | 2023-07-14 | 北京正曦科技有限公司 | Post capability analysis method and tree diagram generation method based on knowledge graph |
CN116910386A (en) * | 2023-09-14 | 2023-10-20 | 深圳市智慧城市科技发展集团有限公司 | Address completion method, terminal device and computer-readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105224630A (en) * | 2015-09-24 | 2016-01-06 | 中国科学院自动化研究所 | Based on the integrated approach of Ontology on Semantic Web data |
US20170068903A1 (en) * | 2015-09-04 | 2017-03-09 | Microsoft Technology Licensing, Llc | Semantic entity relation detection classifier training |
CN108664618A (en) * | 2018-05-14 | 2018-10-16 | 江苏号百信息服务有限公司 | A kind of NLP Chinese word segmentation ambiguity recognition methods based on brand analysis system |
CN108874878A (en) * | 2018-05-03 | 2018-11-23 | 众安信息技术服务有限公司 | A kind of building system and method for knowledge mapping |
CN109684483A (en) * | 2018-12-11 | 2019-04-26 | 平安科技(深圳)有限公司 | Construction method, device, computer equipment and the storage medium of knowledge mapping |
-
2019
- 2019-05-31 CN CN201910471975.5A patent/CN110347894A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170068903A1 (en) * | 2015-09-04 | 2017-03-09 | Microsoft Technology Licensing, Llc | Semantic entity relation detection classifier training |
CN105224630A (en) * | 2015-09-24 | 2016-01-06 | 中国科学院自动化研究所 | Based on the integrated approach of Ontology on Semantic Web data |
CN108874878A (en) * | 2018-05-03 | 2018-11-23 | 众安信息技术服务有限公司 | A kind of building system and method for knowledge mapping |
CN108664618A (en) * | 2018-05-14 | 2018-10-16 | 江苏号百信息服务有限公司 | A kind of NLP Chinese word segmentation ambiguity recognition methods based on brand analysis system |
CN109684483A (en) * | 2018-12-11 | 2019-04-26 | 平安科技(深圳)有限公司 | Construction method, device, computer equipment and the storage medium of knowledge mapping |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110971754A (en) * | 2019-10-28 | 2020-04-07 | 深圳绿米联创科技有限公司 | Information processing method, information processing device, electronic equipment and storage medium |
CN110852104A (en) * | 2019-11-04 | 2020-02-28 | 合肥工业大学 | Family tree identification method and device, storage medium and processor |
CN111198941A (en) * | 2020-01-03 | 2020-05-26 | 联想(北京)有限公司 | Problem discovery method and device, electronic equipment and storage medium |
CN111428047A (en) * | 2020-03-19 | 2020-07-17 | 东南大学 | Knowledge graph construction method and device based on UC L semantic indexing |
CN111428047B (en) * | 2020-03-19 | 2023-04-21 | 东南大学 | Knowledge graph construction method and device based on UCL semantic indexing |
CN111309827A (en) * | 2020-03-23 | 2020-06-19 | 平安医疗健康管理股份有限公司 | Knowledge graph construction method and device, computer system and readable storage medium |
CN111428052A (en) * | 2020-03-30 | 2020-07-17 | 中国科学技术大学 | Method for constructing educational concept graph with multiple relations from multi-source data |
CN111428052B (en) * | 2020-03-30 | 2023-06-16 | 中国科学技术大学 | Method for constructing education conceptual diagram with multiple relations from multi-source data |
CN111585809A (en) * | 2020-04-29 | 2020-08-25 | 北京润通丰华科技有限公司 | Method for auditing network equipment configuration by utilizing big data statistical analysis |
CN111563170A (en) * | 2020-04-30 | 2020-08-21 | 北京明略软件系统有限公司 | Knowledge graph generation method and device, computer storage medium and terminal |
CN111708882B (en) * | 2020-05-29 | 2022-09-30 | 西安理工大学 | Transformer-based Chinese text information missing completion method |
CN111708882A (en) * | 2020-05-29 | 2020-09-25 | 西安理工大学 | Transformer-based Chinese text information missing completion method |
CN111797296B (en) * | 2020-07-08 | 2024-04-09 | 中国人民解放军军事科学院军事医学研究院 | Method and system for mining poison-target literature knowledge based on network crawling |
CN111797296A (en) * | 2020-07-08 | 2020-10-20 | 中国人民解放军军事科学院军事医学研究院 | Method and system for mining poison-target literature knowledge based on network crawling |
CN111966836A (en) * | 2020-08-29 | 2020-11-20 | 深圳呗佬智能有限公司 | Knowledge graph vector representation method and device, computer equipment and storage medium |
CN112182235A (en) * | 2020-08-29 | 2021-01-05 | 深圳呗佬智能有限公司 | Method and device for constructing knowledge graph, computer equipment and storage medium |
CN112307292A (en) * | 2020-10-30 | 2021-02-02 | 中国信息安全测评中心 | Information processing method and system based on advanced persistent threat attack |
CN112328806A (en) * | 2020-10-30 | 2021-02-05 | 广州市西美信息科技有限公司 | Data processing method, system, computer equipment and storage medium |
CN112836919A (en) * | 2020-11-30 | 2021-05-25 | 广东电网有限责任公司 | Supplier association analysis method and device based on knowledge graph |
CN112463985A (en) * | 2020-12-04 | 2021-03-09 | 北京明略软件系统有限公司 | Government affair map model construction method, device, equipment and computer readable medium |
CN112270196A (en) * | 2020-12-14 | 2021-01-26 | 完美世界(北京)软件科技发展有限公司 | Entity relationship identification method and device and electronic equipment |
CN112800305A (en) * | 2021-01-12 | 2021-05-14 | 厦门渊亭信息科技有限公司 | Knowledge graph data extraction method and device based on web crawler |
CN113673956A (en) * | 2021-08-23 | 2021-11-19 | 湖北三新文化传媒有限公司 | Book information completion method, equipment and storage medium |
CN114528413B (en) * | 2022-02-18 | 2022-08-12 | 北京融信数联科技有限公司 | Knowledge graph updating method, system and readable storage medium supported by crowdsourced marking |
CN114528413A (en) * | 2022-02-18 | 2022-05-24 | 北京融信数联科技有限公司 | Knowledge graph updating method, system and readable storage medium supported by crowdsourced marking |
CN116432965A (en) * | 2023-04-17 | 2023-07-14 | 北京正曦科技有限公司 | Post capability analysis method and tree diagram generation method based on knowledge graph |
CN116432965B (en) * | 2023-04-17 | 2024-03-22 | 北京正曦科技有限公司 | Post capability analysis method and tree diagram generation method based on knowledge graph |
CN116910386A (en) * | 2023-09-14 | 2023-10-20 | 深圳市智慧城市科技发展集团有限公司 | Address completion method, terminal device and computer-readable storage medium |
CN116910386B (en) * | 2023-09-14 | 2024-02-02 | 深圳市智慧城市科技发展集团有限公司 | Address completion method, terminal device and computer-readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110347894A (en) | Knowledge mapping processing method, device, computer equipment and storage medium based on crawler | |
CN112199511B (en) | Cross-language multi-source vertical domain knowledge graph construction method | |
CN110633409B (en) | Automobile news event extraction method integrating rules and deep learning | |
CN111026842B (en) | Natural language processing method, natural language processing device and intelligent question-answering system | |
US9779085B2 (en) | Multilingual embeddings for natural language processing | |
CN112232058B (en) | False news identification method and system based on deep learning three-layer semantic extraction framework | |
CN110633366B (en) | Short text classification method, device and storage medium | |
CN110457479A (en) | A kind of judgement document's analysis method based on criminal offence chain | |
CN106886580A (en) | A kind of picture feeling polarities analysis method based on deep learning | |
CN111597803B (en) | Element extraction method and device, electronic equipment and storage medium | |
CN107679110A (en) | The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction | |
CN109933671A (en) | Construct method, apparatus, computer equipment and the storage medium of personal knowledge map | |
CN113312480A (en) | Scientific and technological thesis level multi-label classification method and device based on graph convolution network | |
CN113196277A (en) | System for retrieving natural language documents | |
CN113742733A (en) | Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device | |
CN112883182A (en) | Question-answer matching method and device based on machine reading | |
CN115390806A (en) | Software design mode recommendation method based on bimodal joint modeling | |
CN114840685A (en) | Emergency plan knowledge graph construction method | |
CN117574898A (en) | Domain knowledge graph updating method and system based on power grid equipment | |
CN117216617A (en) | Text classification model training method, device, computer equipment and storage medium | |
CN116244277A (en) | NLP (non-linear point) identification and knowledge base construction method and system | |
CN116956869A (en) | Text normalization method, device, electronic equipment and storage medium | |
CN115129885A (en) | Entity chain pointing method, device, equipment and storage medium | |
Xu et al. | Estimating similarity of rich internet pages using visual information | |
Moreira et al. | Deepex: A robust weak supervision system for knowledge base augmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |