CN110347894A - Knowledge mapping processing method, device, computer equipment and storage medium based on crawler - Google Patents

Knowledge mapping processing method, device, computer equipment and storage medium based on crawler Download PDF

Info

Publication number
CN110347894A
CN110347894A CN201910471975.5A CN201910471975A CN110347894A CN 110347894 A CN110347894 A CN 110347894A CN 201910471975 A CN201910471975 A CN 201910471975A CN 110347894 A CN110347894 A CN 110347894A
Authority
CN
China
Prior art keywords
information
data
relationship
entity
knowledge mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910471975.5A
Other languages
Chinese (zh)
Inventor
王涛
朱葛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910471975.5A priority Critical patent/CN110347894A/en
Publication of CN110347894A publication Critical patent/CN110347894A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application involves data collecting fields, the mode that specifically used data crawl constructs knowledge mapping, and disclose a kind of knowledge mapping processing method based on crawler, device, computer equipment and storage medium, the data information in webpage information is crawled using crawler system, data cleansing is carried out to the data information crawled, data information to crawling and Jing Guo data cleansing carries out information completion, to obtain target information, feature extraction is carried out to the target information, relationship disaggregated model is constructed according to the characteristic information extracted, serial relationship entity is obtained from the data information using the relationship disaggregated model to data, and by relationship entity to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.By the above-mentioned means, the application can greatly improve working efficiency, and help to be best understood from information content by the knowledge that knowledge mapping provides and carry out information the analysis of public opinion by the quick fast information that obtains information.

Description

Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
Technical field
This application involves the network informations to obtain field more particularly to a kind of knowledge mapping processing method based on crawler, dress It sets, computer equipment and storage medium.
Background technique
In the technical field that the network information obtains, web crawlers (be otherwise known as webpage spider, network robot, It is more frequent to be known as webpage follower among the community FOAF), be it is a kind of according to certain rules, automatically grab WWW letter The program or script of breath.There are also ant, automatic indexing, simulation program or worms for the rarely needed name of other.
With the rapid development of network, WWW becomes the carrier of bulk information, how to efficiently extract and use these Information becomes a huge challenge.Search engine, such as traditional universal search engine Baidu and Google are assisted as one The tool that people retrieve information becomes entrance and guide that user accesses WWW.But these versatility search engines are also deposited In certain limitation, such as:
(1) different field, different background user often there is different retrieval purpose and demand, universal search engine institute The result of return includes the unconcerned webpage of a large number of users.
(2) target of universal search engine is the network coverage as big as possible, limited search engine server resource Contradiction between unlimited network data resource will further deepen.
(3) abundant and network technology the continuous development of world wide web data form, the more matchmakers of picture, database, audio, video The different data such as body largely occur, often intensive to these information contents and data with certain structure of universal search engine without It can be power, cannot find and obtain well.
(4) universal search engine provides the retrieval based on keyword mostly, it is difficult to which support is looked into according to what semantic information proposed It askes.
It is well known that, to information, public sentiment etc., need to understand in time in financial technology field, while also needing point The quickly analysis processing of door other class ground, but existing network information acquiring technology haves the defects that above-mentioned etc., leads to not meet and use The demand at family is difficult to meet the needs of technology development.
Summary of the invention
This application provides a kind of knowledge mapping processing method, device, equipment and storage medium based on crawler, can be right The field of the sensitivity such as some pairs of information, public sentiments provides the method and approach of quickly and effectively network information acquisition, the processing that classifies, Meet the needs of users the needs with technology development.
In a first aspect, this application provides a kind of knowledge mapping processing method based on crawler, comprising:
The data information in webpage information is crawled using crawler system;
Data cleansing is carried out to the data information crawled;
Data information to crawling and Jing Guo data cleansing carries out information completion, to obtain target information;
Feature extraction is carried out to the target information, relationship disaggregated model is constructed according to the characteristic information extracted;
Serial relationship entity is obtained from the data information using the relationship disaggregated model to data, and relationship is real Body is to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.
Second aspect, the knowledge mapping processing unit based on crawler that this application provides a kind of, comprising:
Information crawler module, for crawling the data information in webpage information using crawler system;
Data cleansing module, for carrying out data cleansing to the data information crawled;
Information completion module carries out information completion for the data information to crawling and Jing Guo data cleansing, to obtain mesh Mark information;
Model construction module is constructed for carrying out feature extraction to the target information according to the characteristic information extracted Relationship disaggregated model;
Map construction module, for obtaining serial relationship entity from the data information using the relationship disaggregated model To data, and by relationship entity to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.
The third aspect, present invention also provides a kind of computer equipment, the computer equipment includes memory and processing Device;
The memory is for storing computer program;
The processor, for executing the computer program and realization such as first party when executing the computer program Knowledge mapping processing method described in face.
Fourth aspect, present invention also provides a kind of computer readable storage medium, the computer readable storage medium It is stored with computer program, the computer program realizes the processor as described in relation to the first aspect Knowledge mapping processing method.
This application discloses knowledge mapping processing method, device, computer equipment and storage mediums based on crawler, utilize Crawler system crawls the data information in webpage information, carries out data cleansing to the data information that crawls, to crawling and pass through The data information for crossing data cleansing carries out information completion, to obtain target information, carries out feature extraction, root to the target information Relationship disaggregated model is constructed according to the characteristic information extracted, obtains system from the data information using the relationship disaggregated model Column relationship entity is to data, and by relationship entity to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete processing knowledge graph The building of spectrum.By the above-mentioned means, the application can greatly improve work effect by the quick fast information that obtains information Rate, and help to be best understood from information content by the knowledge that knowledge mapping provides, more accurately carry out information the analysis of public opinion.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the step schematic flow diagram of the knowledge mapping processing method provided by the embodiments of the present application based on crawler;
Fig. 2 is the step schematic flow diagram that a wherein embodiment for data cleansing is carried out to the data information crawled;
Fig. 3 is the step schematic flow diagram that another embodiment of data cleansing is carried out to the data information crawled;
Fig. 4 is signal the step of handled data information to obtain a wherein embodiment for the data information of ontological format Flow chart;
Fig. 5 is obtains series relationship entity from data information using relationship disaggregated model to data, and by relationship entity To the step schematic flow diagram of a wherein embodiment for the rudimentary knowledge carrier for being stored as knowledge mapping;
Fig. 6 is a kind of schematic block diagram for knowledge mapping processing unit based on crawler that embodiments herein provides;
Fig. 7 is a kind of structural representation block diagram for computer equipment that one embodiment of the application provides.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall in the protection scope of this application.
Flow chart shown in the drawings only illustrates, it is not necessary to including all content and operation/step, also not It is that must be executed by described sequence.For example, some operation/steps can also decompose, combine or partially merge, therefore practical The sequence of execution is possible to change according to the actual situation.
Embodiments herein provide a kind of knowledge mapping processing method based on crawler, device, computer equipment and Storage medium can provide the field of the sensitivity such as some pairs of information, public sentiments the quickly and effectively network information and obtain, classify The method and approach of processing meets the needs of users the needs with technology development.
With reference to the accompanying drawing, it elaborates to some embodiments of the application.In the absence of conflict, following Feature in embodiment and embodiment can be combined with each other.
Referring to Fig. 1, Fig. 1 shows the step of being the knowledge mapping processing method based on crawler of embodiments herein offer Meaning flow chart.It needs to illustrate first, the present embodiment knowledge mapping (Knowledge Graph) is also known as scientific knowledge figure Spectrum is known as knowledge domain visualization or ken mapping map in books and information group, is that explicit knowledge's development process and structure are closed A series of a variety of different figures of system describe knowledge resource and its carrier with visualization technique, excavate, analysis, building, draw And explicit knowledge and connecting each other between them.
As shown in Figure 1, the herein described knowledge mapping processing method based on crawler, can include but is not limited to step S101~step S105.
S101 crawls the data information in webpage information using crawler system.
In this application, S101 needs to carry out webpage reading, that is, reads the complete content of webpage, specifically may include asynchronous The content of load is such as completely presented to the content of browser window.
For example, network speed, flow, device rate, the screen for different user end (such as mobile phone or computer equipment) are big The reasons such as small, and lead to common python requests, the web page contents that python urlib, wget, curl etc. are got It is imperfect, such as only the skeleton of webpage and without content, need to wait the asynchronous loading content of JS.At this point, band can be used in the application The browser of JS enforcement engine drives to execute the asynchronous load JS in webpage, asynchronous loading problem is solved, further, it is also possible to match The mode without interface browser of conjunction uses.
In this application, current embodiment require that analyzing webpage, such as the web page contents crawled is analyzed, are mentioned The content of needs is taken, that is, extracts the address URL for needing further to crawl in the webpage, or utilize information architecture URL in webpage Address.
It should be noted that the content that the web page analysis of the present embodiment is targeted, can be structured content (such as HTML and JSON), semi-structured content and unstructured content (such as pure txt).
It is noted that the present embodiment also needs to carry out task duplicate removal and scheduling during crawling, to prevent net The repetition crawl of page, for example the address of B is contained in A, the address for returning to A is contained again in B, avoids crawler between A and B The problem of endless loop.Meanwhile the scheduling of the present embodiment is the angle from system performance, the main time-consuming of web page crawl be Network interaction waits network address to carry out dns resolution, request, returned data, asynchronous load completion etc., need several seconds even more The long time.
It should be added that the present embodiment can the customized crawler relevant information that needs to crawl first, it can Setting crawler crawls customized information, such as financial Information.
S102 carries out data cleansing to the data information crawled.
It should be strongly noted that pair data information crawled described in the present embodiment carries out data cleansing, can wrap Include step S1021~S1023 as shown in Figure 2.
S1021 handles the data information to obtain the data information of ontological format.
The data information of the ontological format is carried out Data Integration by gradual disambiguation algorithm and obtains difference by S1022 Linking relationship between the identical entity of data source.
It is noted that the data information of the ontological format is carried out data by gradual disambiguation algorithm by the application Integration, which obtains the linking relationship between the identical entity of different data sources, can specifically include following treatment processes: input target is real Body name and the first context parameters are searched from the knowledge data base according to the target entity name, obtain with it is described The identical number of target entity name judges whether the target entity name is original reality if the number is the first quantity Body noun exports first instance name identical with the target entity name, if described if the number is the second quantity Number is third quantity, then multiple second instance names is carried out disambiguation processing.
Furthermore, judge whether the target entity name is that primary entities noun specifically may be used described in the embodiment of the present application To include following treatment processes: if the entitled primary entities noun of the target entity, by the primary entities noun It is split to obtain multiple substantive nouns, be searched from the knowledge data base respectively according to the multiple substantive noun, Obtain physical name identical with the target entity name.
In addition, it may include following treatment process that multiple second instance names, which are carried out disambiguation processing, described in the present embodiment: will The context parameters of the target entity name and the context parameters of the multiple second instance name carry out at natural language respectively Reason obtains bag of words and bag of words collection, and the bag of words and the bag of words collection are carried out similarity calculation respectively, it is maximum to obtain similarity Word frequency exports the maximum word frequency of the similarity.
S1023 obtains key message by automatic excavating, wherein the key message includes summary info and title Information.
It should be strongly noted that the present embodiment can be using based on natural language processing NLP (Natural Language Processing mode) carries out crawler operation and correspondence is classified, specifically, referring to Fig. 3, the described pair of institute crawled It states data information and carries out data cleansing, may include step S1021 '~S1023 ':
S1021 ' handles the data information to obtain the data information of ontological format.
S1022 ', by the data information of the ontological format by natural language processing NLP Chinese word segmentation disambiguation algorithm into Row Data Integration obtains the linking relationship between the identical entity of different data sources.
It should be noted that the data information of the ontological format is passed through natural language processing NLP described in the present embodiment Chinese word segmentation disambiguation algorithm carries out Data Integration and obtains the linking relationship between the identical entity of different data sources, as shown in figure 4, It may include following processing step S401~S404.
S401 obtains Chinese sentence, detects overlapping ambiguity present in Chinese sentence by maximum matching algorithm, and be put into Chiasma type ambiguity set indicates directly to return without chiasma type ambiguity without any processing in read statement if collection is combined into sky, Otherwise, all ambiguities in traversal set.
S402 is carried out the full cutting in path to ambiguity, is obtained all paths using the recursion method based on depth-first search Set, traverse path set.
S403 calculates mathematical model according to given selection possibility to each path, builds to ambiguity partition path Mould calculates and records the selection possibility numerical value of respective paths and calculates maximum two selections possibility in the set of paths of ambiguity The difference of property numerical value.
S404 stops clearing up and by preset true if difference in a given threshold value, confirms that the ambiguity is true ambiguity Ambiguity resolution mode is handled, and otherwise, judges that the ambiguity is made for pseudo- ambiguity and by the maximum path of selection possibility numerical value For the resolution result of the ambiguity.
S1023 ' obtains key message by automatic excavating, wherein the key message includes summary info and title Information.
S103, the data information to crawling and Jing Guo data cleansing carries out information completion, to obtain target information.
It should be noted that the information completion of the present embodiment may include the basis mark for carrying out subject completion operation, than It can such as carry out including participle (word-seg), part-of-speech tagging (POS), name Entity recognition (NER), syntax dependency parsing (dep-parser), the basic mark movement of subject completion operation.
In the present embodiment, it crawls and data information Jing Guo data cleansing carries out information completion for described pair, may include: Data information to crawling and Jing Guo data cleansing carries out natural language processing NLP labeling operation, according to paragraph symbol or mark Point symbol to webpage information carry out sentence cutting, each sentence is successively segmented, part-of-speech tagging, name Entity recognition and Interdependent syntactic analysis.
In NER treatment process, the method that can be combined with dictionary and entity recognition model, entity recognition model is used Crowdsourcing platform mark plus model training, last recombination region dictionary provide result.The present embodiment can be according to Entity recognition As a result the entity word that those are cut open is restored.Such as: " Space Science and Technology " may be cut into " space flight " and " science and technology ", but It is that incision can be reassembled into " Space Science and Technology " according to the result of NER below.
It should be pointed out that pair crawled described in the present embodiment and the data information Jing Guo data cleansing carries out information completion, It can also include: the interdependent information progress syntax dependency parsing of the sentence context of information based on the data;If syntax is interdependent Analysis lacks subject by target sentences or replaces subject to refer to word, but described target sentences itself include the pass of preset strength It is feature, then completion and filling is carried out to the subject of target sentences.
Furthermore, the herein described subject to target sentences carries out completion and filling, may include handling as follows Process:
Judge whether target sentences include subject;
If target sentences include subject, judges whether subject refers to pronoun, if subject refers to pronoun, judge target sentence Whether upper one of son includes that subject if upper one of target sentences includes subject judges whether subject is entity word, if main Language is entity word, then the subject completion of target sentences is carried out according to subject;
If not including subject, judge whether upper one of target sentences includes subject, if a upper packet for target sentences Containing subject, then judge whether subject is entity word, if subject is entity word, is mended according to the subject that subject carries out target sentences Entirely.
S104, carries out feature extraction to the target information, constructs relationship disaggregated model according to the characteristic information extracted.
Specifically, the present embodiment S104 carries out feature extraction to the target information, can be used for based on neural network language Say the word insertion feature of model, based on the feature of the vocabulary level of co-occurrence sequence between word and grammer based on syntactic structure Feature.
As one of embodiment, word insertion, which refers to, is expressed as dense low-dimensional for the semantic information distribution of word Spend real-valued vectors.Word insertion is characterized in utilizing distributed term vector spatial translation based on word2vec term vector trained in advance Invariance finds out the COS distance value of the insertion vector of two entity words.
As one of embodiment, grammar property refers to the sentence structure feature based on dependency analysis and part of speech, such as The part of speech POSD1 of the interdependent word D2 and interdependent word D1 of the interdependent word D1 of entity word c1, entity word c2, the part of speech of interdependent word D2 POSD2 etc..
For example, it is special to extract sentence context obtaining inside sentence sequence and sentence after the interdependent information of each word Sign, such as: two entity middle verbs, the previous word of first entity, second entity the latter word etc..Next, root Model training is carried out according to feature extraction result, constructs relationship classifier.
In this application, relationship classifier is preferably Bayes classifier.
Specifically, the building process of the relationship classifier of the present embodiment has following two method:
Method one, first collection fraction entity relationship example crawl its related text using crawler orientation, artificial to mark A small amount of sample, one Relation extraction model of pre-training;Then model training, building relationship classification are carried out according to feature extraction result Device.
Or method two, model training is directly carried out according to feature extraction result, constructs relationship classifier.
It should be strongly noted that classifier of the invention only determines a kind of positive and negative class of relationship, multirelation is sentenced Surely multiple classifiers can be established in parallel.
S105 obtains serial relationship entity using the relationship disaggregated model to data from the data information, and will Relationship entity is to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.
It is corresponding with Fig. 2 to be, referring to Fig. 5, herein described utilize the relationship disaggregated model from the data information The serial relationship entity of middle acquisition may include to the rudimentary knowledge carrier for being stored as knowledge mapping to data, and by relationship entity Step S1051~S1053.
S1051 is associated with map with heading message building according to the summary info using the relationship disaggregated model.
S1052 infers potential related information according to the association map and expands the association map.
S1053, by the linking relationship and the association map between the data information of different data sources, the identical entity Relationship entity knowledge data base is configured to data, using the rudimentary knowledge carrier as knowledge mapping.
It is not difficult to find out that the application can use crawler system, NLP (natural language processing), relationship classifier carry out it is organic Complete data screening, feature extraction, the function of classifying and be finally built into knowledge mapping in ground.
As one of application examples, after building knowledge mapping relational network, it can provide and carry out interface polls work Make, for example be applied to universal, the financial industry analysis of Sci-tech Knowledge etc..
For example, NLP combination knowledge mapping is applied to financial industry, the realization process includes: by NLP technology to client The problem of proposition, is handled;According to treated, information selection represents entity/relationship phrase, keyword in case later retrieval It uses;It is retrieved in financial vertical map according to the information that the semantic analysis is obtained with understanding;It is raw according to search result It exports at answer to answer client questions, and carries out the guidance and excavation of problem;The financial vertical map is made by Relationship between financial field knowledge and the financial field knowledge is combed and is saved by data base manipulation machine learning techniques And it obtains.
It is noted that with the reform and propulsion to finance in national 2019, it is economical gradually transition to arrive finance production In industry, then, the application can provide technical support for this field, such as: prepare financial field relational learning data;By Machine learning techniques learn the financial field relational learning data in the way of semi-supervised;The finance neck that will learn Relationship between domain knowledge entity and financial field knowledge entity is combed and saved obtains chart database in the database;It is right Relationship between the financial field knowledge entity carries out semi-supervised maintenance, and the synonym for indicating the same relation is related in the relationship And entity within the scope of polymerize;When increasing picture description to common entity, and being exported as search terms for subsequent result Selective extraction uses, and improves the diversification of interaction answer form;Wherein, when each entitative concept is stored in, other can be associated with Existing entity, ultimately forms between each entity that there are the financial vertical knowledge mappings of correlation.
In conclusion the application carries out information classification processing by using the mode that knowledge mapping is combined with NLP, and keep away The acquisition of information defect of the prior art is exempted from;Furthermore, the application is in semantic expression, from traditional keyword Bag of words mode rises to more three-dimensional semantic net mode, so that any target of client's concern, can contain in model, and Machine learning algorithm can be based on Manual definition, and further self study extension finds more knowledge points, such as finds more competitions Opponent enterprise and upstream and downstream firms;In addition, the application is on knowledge-based reasoning, due to a large amount of knowledge point and its relationship by It clearly indicates and defines, can find the relationship hidden between knowledge point, such as the text from magnanimity automatically by computer It is middle to excavate the influence of finance " science and technology to ", or made inferences based on existing relational network, predict certain knowledge points it Between with the presence or absence of the manually relationship do not listed.Knowledge hierarchy framework of this reasoning based on Manual definition is a kind of interpretable Aptitude manner, therefore effect is more preferable in practice.
Referring to Fig. 6, Fig. 6 is a kind of showing for knowledge mapping processing unit based on crawler that embodiments herein provides Meaning property block diagram should be used to execute the knowledge mapping processing side above-mentioned based on crawler based on the knowledge mapping processing unit of crawler Method.Wherein, being somebody's turn to do the knowledge mapping processing unit based on crawler can be configured in server or terminal.
As shown in fig. 6, being somebody's turn to do the knowledge mapping processing unit 200 based on crawler, comprising: information crawler module 201, data are clear Mold cleaning block 202, information completion module 203, model construction module 204 and map construction module 205.
Information crawler module 201, for crawling the data information in webpage information using crawler system.
Data cleansing module 202, for carrying out data cleansing to the data information crawled.
Information completion module 203 carries out information completion for the data information to crawling and Jing Guo data cleansing, to obtain Target information.
Model construction module 204, for carrying out feature extraction to the target information, according to the characteristic information structure extracted Build relationship disaggregated model.
Map construction module 205, for obtaining serial relationship from the data information using the relationship disaggregated model Entity handles the structure of knowledge mapping with completion to the rudimentary knowledge carrier for being stored as knowledge mapping to data, and by relationship entity It builds.
It should be noted that it is apparent to those skilled in the art that, for convenience of description and succinctly, The specific work process of the device of foregoing description and each module, can refer to corresponding processes in the foregoing method embodiment, herein It repeats no more.
It is the structural representation frame of one embodiment of the application computer equipment refering to Fig. 7, Fig. 7 incorporated by reference to above-described embodiment Figure.It needs to illustrate first, the present embodiment knowledge mapping (Knowledge Graph) is also known as mapping knowledge domains, is scheming Book intelligence community is known as knowledge domain visualization or ken mapping map, is a system of explicit knowledge's development process and structural relation A variety of different figures are arranged, describe knowledge resource and its carrier with visualization technique, excavates, analysis, construct, drawing and display is known Knowledge and connecting each other between them.
Wherein, the computer equipment of the present embodiment can configured with operating system, computer program and its be possibly stored to It is total in addition it can configure built-in storage, network interface and system for connecting each module in non-volatile memory medium Line etc..
It illustrates in brief, as shown in fig. 7, herein described computer equipment may include memory 71 and processor 72, the memory 71 is for storing computer program, and the processor 72 is for executing the computer program and executing When the computer program, for realizing method include:
The data information in webpage information is crawled using crawler system;
Data cleansing is carried out to the data information crawled;
Data information to crawling and Jing Guo data cleansing carries out information completion, to obtain target information;
Feature extraction is carried out to the target information, relationship disaggregated model is constructed according to the characteristic information extracted;
Serial relationship entity is obtained from the data information using the relationship disaggregated model to data, and relationship is real Body is to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.
In this application, the processor 72 needs to carry out webpage reading, that is, reads the complete content of webpage, specifically can be with Content comprising asynchronous load is such as completely presented to the content of browser window.
For example, network speed, flow, device rate, the screen for different user end (such as mobile phone or computer equipment) are big The reasons such as small, and lead to common python requests, the web page contents that python urlib, wget, curl etc. are got It is imperfect, such as only the skeleton of webpage and without content, need to wait the asynchronous loading content of JS.At this point, band can be used in the application The browser of JS enforcement engine drives to execute the asynchronous load JS in webpage, asynchronous loading problem is solved, further, it is also possible to match The mode without interface browser of conjunction uses.
In this application, current embodiment require that analyzing webpage, such as the web page contents crawled is analyzed, are mentioned The content of needs is taken, that is, extracts the address URL for needing further to crawl in the webpage, or utilize information architecture URL in webpage Address.
It should be noted that the content that the web page analysis of the present embodiment is targeted, can be structured content (such as HTML and JSON), semi-structured content and unstructured content (such as pure txt).
It is noted that the present embodiment also needs to carry out task duplicate removal and scheduling during crawling, to prevent net The repetition crawl of page, for example the address of B is contained in A, the address for returning to A is contained again in B, avoids crawler between A and B The problem of endless loop.Meanwhile the scheduling of the present embodiment is the angle from system performance, the main time-consuming of web page crawl be Network interaction waits network address to carry out dns resolution, request, returned data, asynchronous load completion etc., need several seconds even more The long time.
It should be added that the present embodiment can the customized crawler relevant information that needs to crawl first, it can Setting crawler crawls customized information, such as financial Information.
In this application, the processor 72 is when carrying out data cleansing to the data information that crawls, for realizing: The data information is handled to obtain the data information of ontological format, the data information of the ontological format is passed through progressive Formula disambiguation algorithm carries out Data Integration and obtains the linking relationship between the identical entity of different data sources, and the processor 72 passes through Automatic excavating obtains key message, wherein the key message includes summary info and heading message.
It is noted that the data information of the ontological format is passed through gradual disambiguation by herein described processor 72 Algorithm progress Data Integration, which obtains the linking relationship between the identical entity of different data sources, can specifically include following process Journey: input target entity name and the first context parameters are looked into from the knowledge data base according to the target entity name It looks for, obtains number identical with the target entity name, if the number is the first quantity, judge the target entity name Whether it is primary entities noun, if the number is the second quantity, it is real exports identical with the target entity name first Multiple second instance names are carried out disambiguation processing if the number is third quantity by body name.
Furthermore, judge whether the target entity name is that primary entities noun specifically may be used described in the embodiment of the present application To include following treatment processes: if the entitled primary entities noun of the target entity, the processor 72 will be described Primary entities noun is split to obtain multiple substantive nouns, and the processor 72 is according to the multiple substantive noun respectively from institute It states in knowledge data base and is searched, the processor 72 obtains physical name identical with the target entity name.
In addition, it may include handling as follows that multiple second instance names are carried out disambiguation processing by processor 72 described in the present embodiment Process: the processor 72 is by the context parameters of the context parameters of the target entity name and the multiple second instance name Carry out natural language processing respectively and obtain bag of words and bag of words collection, the processor 72 by the bag of words and the bag of words collection respectively into Row similarity calculation, the processor 72 obtain the maximum word frequency of similarity, export the maximum word frequency of the similarity.
It should be strongly noted that processor 72 described in the present embodiment can use the side based on natural language processing NLP Formula carries out crawler operation and correspondence is classified, and the processor 72 is carrying out data cleansing to the data information crawled When, it may be implemented as follows: the data information being handled to obtain the data information of ontological format, by the ontological format Data information carries out Data Integration by natural language processing NLP Chinese word segmentation disambiguation algorithm and obtains the identical reality of different data sources Linking relationship between body obtains key message by automatic excavating, wherein the key message includes summary info and mark Inscribe information.
It should be noted that the data information of the ontological format is passed through natural language processing NLP described in the present embodiment Chinese word segmentation disambiguation algorithm carry out Data Integration obtain the linking relationship between the identical entity of different data sources, may include as Lower processing mode: obtaining Chinese sentence, detects overlapping ambiguity present in Chinese sentence by maximum matching algorithm, and be put into friendship Forked type ambiguity set indicates directly to return without chiasma type ambiguity without any processing in read statement if collection is combined into sky, no Then, all ambiguities in traversal set carry out the full cutting in path to ambiguity using the recursion method based on depth-first search, The set in all paths is obtained, traverse path set calculates mathematical model according to given selection possibility to each path, right Ambiguity partition path is modeled, in the selection possibility numerical value and the set of paths of calculating ambiguity for calculating and recording respective paths The difference of maximum two selections possibility numerical value is stopped if difference in a given threshold value, confirms that the ambiguity is true ambiguity It only clears up and is handled by preset true ambiguity resolution mode, otherwise, judge the ambiguity for pseudo- ambiguity and will select possible Resolution result of the property maximum path of numerical value as the ambiguity.
Wherein, data information of the processor 72 to crawling and Jing Guo data cleansing carries out information completion, to obtain mesh When marking information, the information completion of the present embodiment may include the basis mark for carrying out subject completion operation, for example can be wrapped Include participle (word-seg), part-of-speech tagging (POS), name Entity recognition (NER), syntax dependency parsing (dep-parser), master The basic mark movement of language completion operation.
In the present embodiment, it crawls and data information Jing Guo data cleansing carries out information completion for described pair, may include: Data information to crawling and Jing Guo data cleansing carries out natural language processing NLP labeling operation, according to paragraph symbol or mark Point symbol to webpage information carry out sentence cutting, each sentence is successively segmented, part-of-speech tagging, name Entity recognition and Interdependent syntactic analysis.
In NER treatment process, the method that can be combined with dictionary and entity recognition model, entity recognition model is used Crowdsourcing platform mark plus model training, last recombination region dictionary provide result.The present embodiment can be according to Entity recognition As a result the entity word that those are cut open is restored.Such as: " Space Science and Technology " may be cut into " space flight " and " science and technology ", but It is that incision can be reassembled into " Space Science and Technology " according to the result of NER below.
It should be pointed out that pair crawled described in the present embodiment and the data information Jing Guo data cleansing carries out information completion, It can also include: the interdependent information progress syntax dependency parsing of the sentence context of information based on the data;If syntax is interdependent Analysis lacks subject by target sentences or replaces subject to refer to word, but described target sentences itself include the pass of preset strength It is feature, then completion and filling is carried out to the subject of target sentences.
Furthermore, the herein described subject to target sentences carries out completion and filling, may include handling as follows Process:
Judge whether target sentences include subject;
If target sentences include subject, judges whether subject refers to pronoun, if subject refers to pronoun, judge target sentence Whether upper one of son includes that subject if upper one of target sentences includes subject judges whether subject is entity word, if main Language is entity word, then the subject completion of target sentences is carried out according to subject;
If not including subject, judge whether upper one of target sentences includes subject, if a upper packet for target sentences Containing subject, then judge whether subject is entity word, if subject is entity word, is mended according to the subject that subject carries out target sentences Entirely.
In addition, herein described processor 72 when carrying out feature extraction to the target information, can be used for based on mind Word through netspeak model is embedded in feature, the feature based on the vocabulary level of co-occurrence sequence between word, and is based on syntax knot The grammar property of structure.
As one of embodiment, word insertion, which refers to, is expressed as dense low-dimensional for the semantic information distribution of word Spend real-valued vectors.Word insertion is characterized in utilizing distributed term vector spatial translation based on word2vec term vector trained in advance Invariance finds out the COS distance value of the insertion vector of two entity words.
As one of embodiment, grammar property refers to the sentence structure feature based on dependency analysis and part of speech, such as The part of speech POSD1 of the interdependent word D2 and interdependent word D1 of the interdependent word D1 of entity word c1, entity word c2, the part of speech of interdependent word D2 POSD2 etc..
For example, it is special to extract sentence context obtaining inside sentence sequence and sentence after the interdependent information of each word Sign, such as: two entity middle verbs, the previous word of first entity, second entity the latter word etc..Next, root Model training is carried out according to feature extraction result, constructs relationship classifier.
In this application, relationship classifier is preferably Bayes classifier.
Specifically, the building process of the relationship classifier of the present embodiment has following two method:
Method one, first collection fraction entity relationship example crawl its related text using crawler orientation, artificial to mark A small amount of sample, one Relation extraction model of pre-training;Then model training, building relationship classification are carried out according to feature extraction result Device.
Or method two, model training is directly carried out according to feature extraction result, constructs relationship classifier.
It should be strongly noted that classifier of the invention only determines a kind of positive and negative class of relationship, multirelation is sentenced Surely multiple classifiers can be established in parallel.
It is noted that processor 72 described in the present embodiment is using the relationship disaggregated model from the data information Serial relationship entity is obtained to data, and by relationship entity to the rudimentary knowledge carrier for being stored as knowledge mapping, under may include It states mode: map being associated with heading message building according to the summary info using the relationship disaggregated model;According to institute Association map is stated to infer potential related information and expand the association map;By the data information of different data sources, the phase Knowledge data base is configured to data with the relationship entity for being associated with map with the linking relationship between entity, using as knowledge The rudimentary knowledge carrier of map.
It is not difficult to find out that the application can use crawler system, NLP (natural language processing), relationship classifier carry out it is organic Complete data screening, feature extraction, the function of classifying and be finally built into knowledge mapping in ground.
As one of application examples, after building knowledge mapping relational network, it can provide and carry out interface polls work Make, for example be applied to universal, the financial industry analysis of Sci-tech Knowledge etc..
For example, NLP combination knowledge mapping is applied to financial industry, the realization process includes: by NLP technology to client The problem of proposition, is handled;According to treated, information selection represents entity/relationship phrase, keyword in case later retrieval It uses;It is retrieved in financial vertical map according to the information that the semantic analysis is obtained with understanding;It is raw according to search result It exports at answer to answer client questions, and carries out the guidance and excavation of problem;The financial vertical map is made by Relationship between financial field knowledge and the financial field knowledge is combed and is saved by data base manipulation machine learning techniques And it obtains.
It is noted that with the reform and propulsion to finance in national 2019, it is economical gradually transition to arrive finance production In industry, then, the application can provide technical support for this field, such as: prepare financial field relational learning data;By Machine learning techniques learn the financial field relational learning data in the way of semi-supervised;The finance neck that will learn Relationship between domain knowledge entity and financial field knowledge entity is combed and saved obtains chart database in the database;It is right Relationship between the financial field knowledge entity carries out semi-supervised maintenance, and the synonym for indicating the same relation is related in the relationship And entity within the scope of polymerize;When increasing picture description to common entity, and being exported as search terms for subsequent result Selective extraction uses, and improves the diversification of interaction answer form;Wherein, when each entitative concept is stored in, other can be associated with Existing entity, ultimately forms between each entity that there are the financial vertical knowledge mappings of correlation.
In conclusion the application carries out information classification processing by using the mode that knowledge mapping is combined with NLP, and keep away The acquisition of information defect of the prior art is exempted from;Furthermore, the application is in semantic expression, from traditional keyword Bag of words mode rises to more three-dimensional semantic net mode, so that any target of client's concern, can contain in model, and Machine learning algorithm can be based on Manual definition, and further self study extension finds more knowledge points, such as finds more competitions Opponent enterprise and upstream and downstream firms;In addition, the application is on knowledge-based reasoning, due to a large amount of knowledge point and its relationship by It clearly indicates and defines, can find the relationship hidden between knowledge point, such as the text from magnanimity automatically by computer It is middle to excavate the influence of finance " science and technology to ", or made inferences based on existing relational network, predict certain knowledge points it Between with the presence or absence of the manually relationship do not listed.Knowledge hierarchy framework of this reasoning based on Manual definition is a kind of interpretable Aptitude manner, therefore effect is more preferable in practice.
Incorporated by reference to said one or multiple embodiments, the application also provides a kind of computer readable storage medium, the meter Calculation machine readable storage medium storing program for executing is stored with computer program, and the computer program realizes the processor when being executed by processor The step of knowledge mapping processing method as described in Fig. 1-Fig. 5 and embodiment.
It should be understood that the present embodiment processor can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing At programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or the processor It is also possible to any conventional processor etc..
Wherein, the computer readable storage medium can be the storage inside of computer equipment described in previous embodiment Unit, such as the hard disk or memory of the computer equipment.The computer readable storage medium is also possible to the computer The plug-in type hard disk being equipped on the External memory equipment of equipment, such as the computer equipment, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any Those familiar with the art within the technical scope of the present application, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should all cover within the scope of protection of this application.Therefore, the protection scope of the application should be with right It is required that protection scope subject to.

Claims (10)

1. a kind of knowledge mapping processing method based on crawler characterized by comprising
The data information in webpage information is crawled using crawler system;
Data cleansing is carried out to the data information crawled;
Data information to crawling and Jing Guo data cleansing carries out information completion, to obtain target information;
Feature extraction is carried out to the target information, relationship disaggregated model is constructed according to the characteristic information extracted;
Serial relationship entity is obtained from the data information using the relationship disaggregated model to data, and by relationship entity pair It is stored as the rudimentary knowledge carrier of knowledge mapping, to complete the building of processing knowledge mapping.
2. knowledge mapping processing method according to claim 1, which is characterized in that the described pair of data information crawled Carry out data cleansing, comprising:
The data information is handled to obtain the data information of ontological format;
The data information of the ontological format is subjected to Data Integration by gradual disambiguation algorithm and obtains the phase of different data sources With the linking relationship between entity;
Key message is obtained by automatic excavating, wherein the key message includes summary info and heading message.
3. knowledge mapping processing method according to claim 2, which is characterized in that described to utilize the relationship disaggregated model Serial relationship entity is obtained from the data information to data, and by relationship entity to the rudimentary knowledge for being stored as knowledge mapping Carrier, comprising:
Map is associated with heading message building according to the summary info using the relationship disaggregated model;
Potential related information is inferred according to the association map and expands the association map;
By linking relationship and the relationship entity for being associated with map between the data information of different data sources, the identical entity Knowledge data base is configured to data, using the rudimentary knowledge carrier as knowledge mapping.
4. knowledge mapping processing method according to claim 1, which is characterized in that the described pair of data information crawled Carry out data cleansing, comprising:
The data information is handled to obtain the data information of ontological format;
The data information of the ontological format is carried out Data Integration by natural language processing NLP Chinese word segmentation disambiguation algorithm to obtain Linking relationship between the identical entity of different data sources;
Key message is obtained by automatic excavating, wherein the key message includes summary info and heading message.
5. knowledge mapping processing method according to claim 4, which is characterized in that the data by the ontological format Information by natural language processing NLP Chinese word segmentation disambiguation algorithm carry out Data Integration obtain different data sources identical entity it Between linking relationship, comprising:
Chinese sentence is obtained, overlapping ambiguity present in Chinese sentence is detected by maximum matching algorithm, and be put into chiasma type discrimination Justice set indicates, without any processing, directly to return without chiasma type ambiguity in read statement if collection is combined into sky, otherwise, time Go through all ambiguities in set;
Using the recursion method based on depth-first search, the full cutting in path is carried out to ambiguity, obtains the set in all paths, time Go through set of paths;
Mathematical model is calculated according to given selection possibility to each path, ambiguity partition path is modeled, is calculated simultaneously Record maximum two selections possibility numerical value in the set of paths of the selection possibility numerical value of respective paths and calculating ambiguity Difference;
If difference in a given threshold value, confirms that the ambiguity is true ambiguity, stops clearing up and press preset true ambiguity resolution Mode is handled, otherwise, judge the ambiguity for pseudo- ambiguity and will select the maximum path of possibility numerical value as the discrimination The resolution result of justice.
6. knowledge mapping processing method according to claim 1,2 or 4, which is characterized in that described pair crawls and by number Information completion is carried out according to the data information of cleaning, comprising:
To crawling and data information Jing Guo data cleansing carries out natural language processing NLP labeling operation, according to paragraph symbol or Person's punctuation mark carries out sentence cutting to webpage information, is successively segmented to each sentence, part-of-speech tagging, name entity are known Other and interdependent syntactic analysis.
7. knowledge mapping processing method according to claim 1,2 or 4, which is characterized in that described pair crawls and by number Information completion is carried out according to the data information of cleaning, comprising:
The interdependent information of the sentence context of information carries out syntax dependency parsing based on the data;
If syntax dependency parsing lacks subject by target sentences or replaces subject to refer to word, but described target sentences itself are wrapped Relationship characteristic containing preset strength then carries out completion and filling to the subject of target sentences.
8. a kind of knowledge mapping processing unit based on crawler characterized by comprising
Information crawler module, for crawling the data information in webpage information using crawler system;
Data cleansing module, for carrying out data cleansing to the data information crawled;
Information completion module carries out information completion for the data information to crawling and Jing Guo data cleansing, to obtain target letter Breath;
Model construction module constructs relationship according to the characteristic information extracted for carrying out feature extraction to the target information Disaggregated model;
Map construction module, for obtaining serial relationship entity logarithm from the data information using the relationship disaggregated model According to, and by relationship entity to the rudimentary knowledge carrier for being stored as knowledge mapping, to complete the building of processing knowledge mapping.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor;
The memory is for storing computer program;
The processor, for executing the computer program and realization such as claim 1 when executing the computer program To knowledge mapping processing method described in any one of 7.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence, the computer program make the processor realize knowing as described in any one of claims 1 to 7 when being executed by processor Know map processing method.
CN201910471975.5A 2019-05-31 2019-05-31 Knowledge mapping processing method, device, computer equipment and storage medium based on crawler Pending CN110347894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910471975.5A CN110347894A (en) 2019-05-31 2019-05-31 Knowledge mapping processing method, device, computer equipment and storage medium based on crawler

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910471975.5A CN110347894A (en) 2019-05-31 2019-05-31 Knowledge mapping processing method, device, computer equipment and storage medium based on crawler

Publications (1)

Publication Number Publication Date
CN110347894A true CN110347894A (en) 2019-10-18

Family

ID=68174536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910471975.5A Pending CN110347894A (en) 2019-05-31 2019-05-31 Knowledge mapping processing method, device, computer equipment and storage medium based on crawler

Country Status (1)

Country Link
CN (1) CN110347894A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852104A (en) * 2019-11-04 2020-02-28 合肥工业大学 Family tree identification method and device, storage medium and processor
CN110971754A (en) * 2019-10-28 2020-04-07 深圳绿米联创科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN111198941A (en) * 2020-01-03 2020-05-26 联想(北京)有限公司 Problem discovery method and device, electronic equipment and storage medium
CN111309827A (en) * 2020-03-23 2020-06-19 平安医疗健康管理股份有限公司 Knowledge graph construction method and device, computer system and readable storage medium
CN111428052A (en) * 2020-03-30 2020-07-17 中国科学技术大学 Method for constructing educational concept graph with multiple relations from multi-source data
CN111428047A (en) * 2020-03-19 2020-07-17 东南大学 Knowledge graph construction method and device based on UC L semantic indexing
CN111563170A (en) * 2020-04-30 2020-08-21 北京明略软件系统有限公司 Knowledge graph generation method and device, computer storage medium and terminal
CN111585809A (en) * 2020-04-29 2020-08-25 北京润通丰华科技有限公司 Method for auditing network equipment configuration by utilizing big data statistical analysis
CN111708882A (en) * 2020-05-29 2020-09-25 西安理工大学 Transformer-based Chinese text information missing completion method
CN111797296A (en) * 2020-07-08 2020-10-20 中国人民解放军军事科学院军事医学研究院 Method and system for mining poison-target literature knowledge based on network crawling
CN111966836A (en) * 2020-08-29 2020-11-20 深圳呗佬智能有限公司 Knowledge graph vector representation method and device, computer equipment and storage medium
CN112182235A (en) * 2020-08-29 2021-01-05 深圳呗佬智能有限公司 Method and device for constructing knowledge graph, computer equipment and storage medium
CN112270196A (en) * 2020-12-14 2021-01-26 完美世界(北京)软件科技发展有限公司 Entity relationship identification method and device and electronic equipment
CN112307292A (en) * 2020-10-30 2021-02-02 中国信息安全测评中心 Information processing method and system based on advanced persistent threat attack
CN112328806A (en) * 2020-10-30 2021-02-05 广州市西美信息科技有限公司 Data processing method, system, computer equipment and storage medium
CN112463985A (en) * 2020-12-04 2021-03-09 北京明略软件系统有限公司 Government affair map model construction method, device, equipment and computer readable medium
CN112800305A (en) * 2021-01-12 2021-05-14 厦门渊亭信息科技有限公司 Knowledge graph data extraction method and device based on web crawler
CN112836919A (en) * 2020-11-30 2021-05-25 广东电网有限责任公司 Supplier association analysis method and device based on knowledge graph
CN113673956A (en) * 2021-08-23 2021-11-19 湖北三新文化传媒有限公司 Book information completion method, equipment and storage medium
CN114528413A (en) * 2022-02-18 2022-05-24 北京融信数联科技有限公司 Knowledge graph updating method, system and readable storage medium supported by crowdsourced marking
CN116432965A (en) * 2023-04-17 2023-07-14 北京正曦科技有限公司 Post capability analysis method and tree diagram generation method based on knowledge graph
CN116910386A (en) * 2023-09-14 2023-10-20 深圳市智慧城市科技发展集团有限公司 Address completion method, terminal device and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224630A (en) * 2015-09-24 2016-01-06 中国科学院自动化研究所 Based on the integrated approach of Ontology on Semantic Web data
US20170068903A1 (en) * 2015-09-04 2017-03-09 Microsoft Technology Licensing, Llc Semantic entity relation detection classifier training
CN108664618A (en) * 2018-05-14 2018-10-16 江苏号百信息服务有限公司 A kind of NLP Chinese word segmentation ambiguity recognition methods based on brand analysis system
CN108874878A (en) * 2018-05-03 2018-11-23 众安信息技术服务有限公司 A kind of building system and method for knowledge mapping
CN109684483A (en) * 2018-12-11 2019-04-26 平安科技(深圳)有限公司 Construction method, device, computer equipment and the storage medium of knowledge mapping

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170068903A1 (en) * 2015-09-04 2017-03-09 Microsoft Technology Licensing, Llc Semantic entity relation detection classifier training
CN105224630A (en) * 2015-09-24 2016-01-06 中国科学院自动化研究所 Based on the integrated approach of Ontology on Semantic Web data
CN108874878A (en) * 2018-05-03 2018-11-23 众安信息技术服务有限公司 A kind of building system and method for knowledge mapping
CN108664618A (en) * 2018-05-14 2018-10-16 江苏号百信息服务有限公司 A kind of NLP Chinese word segmentation ambiguity recognition methods based on brand analysis system
CN109684483A (en) * 2018-12-11 2019-04-26 平安科技(深圳)有限公司 Construction method, device, computer equipment and the storage medium of knowledge mapping

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110971754A (en) * 2019-10-28 2020-04-07 深圳绿米联创科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN110852104A (en) * 2019-11-04 2020-02-28 合肥工业大学 Family tree identification method and device, storage medium and processor
CN111198941A (en) * 2020-01-03 2020-05-26 联想(北京)有限公司 Problem discovery method and device, electronic equipment and storage medium
CN111428047A (en) * 2020-03-19 2020-07-17 东南大学 Knowledge graph construction method and device based on UC L semantic indexing
CN111428047B (en) * 2020-03-19 2023-04-21 东南大学 Knowledge graph construction method and device based on UCL semantic indexing
CN111309827A (en) * 2020-03-23 2020-06-19 平安医疗健康管理股份有限公司 Knowledge graph construction method and device, computer system and readable storage medium
CN111428052A (en) * 2020-03-30 2020-07-17 中国科学技术大学 Method for constructing educational concept graph with multiple relations from multi-source data
CN111428052B (en) * 2020-03-30 2023-06-16 中国科学技术大学 Method for constructing education conceptual diagram with multiple relations from multi-source data
CN111585809A (en) * 2020-04-29 2020-08-25 北京润通丰华科技有限公司 Method for auditing network equipment configuration by utilizing big data statistical analysis
CN111563170A (en) * 2020-04-30 2020-08-21 北京明略软件系统有限公司 Knowledge graph generation method and device, computer storage medium and terminal
CN111708882B (en) * 2020-05-29 2022-09-30 西安理工大学 Transformer-based Chinese text information missing completion method
CN111708882A (en) * 2020-05-29 2020-09-25 西安理工大学 Transformer-based Chinese text information missing completion method
CN111797296B (en) * 2020-07-08 2024-04-09 中国人民解放军军事科学院军事医学研究院 Method and system for mining poison-target literature knowledge based on network crawling
CN111797296A (en) * 2020-07-08 2020-10-20 中国人民解放军军事科学院军事医学研究院 Method and system for mining poison-target literature knowledge based on network crawling
CN111966836A (en) * 2020-08-29 2020-11-20 深圳呗佬智能有限公司 Knowledge graph vector representation method and device, computer equipment and storage medium
CN112182235A (en) * 2020-08-29 2021-01-05 深圳呗佬智能有限公司 Method and device for constructing knowledge graph, computer equipment and storage medium
CN112307292A (en) * 2020-10-30 2021-02-02 中国信息安全测评中心 Information processing method and system based on advanced persistent threat attack
CN112328806A (en) * 2020-10-30 2021-02-05 广州市西美信息科技有限公司 Data processing method, system, computer equipment and storage medium
CN112836919A (en) * 2020-11-30 2021-05-25 广东电网有限责任公司 Supplier association analysis method and device based on knowledge graph
CN112463985A (en) * 2020-12-04 2021-03-09 北京明略软件系统有限公司 Government affair map model construction method, device, equipment and computer readable medium
CN112270196A (en) * 2020-12-14 2021-01-26 完美世界(北京)软件科技发展有限公司 Entity relationship identification method and device and electronic equipment
CN112800305A (en) * 2021-01-12 2021-05-14 厦门渊亭信息科技有限公司 Knowledge graph data extraction method and device based on web crawler
CN113673956A (en) * 2021-08-23 2021-11-19 湖北三新文化传媒有限公司 Book information completion method, equipment and storage medium
CN114528413B (en) * 2022-02-18 2022-08-12 北京融信数联科技有限公司 Knowledge graph updating method, system and readable storage medium supported by crowdsourced marking
CN114528413A (en) * 2022-02-18 2022-05-24 北京融信数联科技有限公司 Knowledge graph updating method, system and readable storage medium supported by crowdsourced marking
CN116432965A (en) * 2023-04-17 2023-07-14 北京正曦科技有限公司 Post capability analysis method and tree diagram generation method based on knowledge graph
CN116432965B (en) * 2023-04-17 2024-03-22 北京正曦科技有限公司 Post capability analysis method and tree diagram generation method based on knowledge graph
CN116910386A (en) * 2023-09-14 2023-10-20 深圳市智慧城市科技发展集团有限公司 Address completion method, terminal device and computer-readable storage medium
CN116910386B (en) * 2023-09-14 2024-02-02 深圳市智慧城市科技发展集团有限公司 Address completion method, terminal device and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN110347894A (en) Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
CN112199511B (en) Cross-language multi-source vertical domain knowledge graph construction method
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN111026842B (en) Natural language processing method, natural language processing device and intelligent question-answering system
US9779085B2 (en) Multilingual embeddings for natural language processing
CN112232058B (en) False news identification method and system based on deep learning three-layer semantic extraction framework
CN110633366B (en) Short text classification method, device and storage medium
CN110457479A (en) A kind of judgement document's analysis method based on criminal offence chain
CN106886580A (en) A kind of picture feeling polarities analysis method based on deep learning
CN111597803B (en) Element extraction method and device, electronic equipment and storage medium
CN107679110A (en) The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
CN109933671A (en) Construct method, apparatus, computer equipment and the storage medium of personal knowledge map
CN113312480A (en) Scientific and technological thesis level multi-label classification method and device based on graph convolution network
CN113196277A (en) System for retrieving natural language documents
CN113742733A (en) Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
CN112883182A (en) Question-answer matching method and device based on machine reading
CN115390806A (en) Software design mode recommendation method based on bimodal joint modeling
CN114840685A (en) Emergency plan knowledge graph construction method
CN117574898A (en) Domain knowledge graph updating method and system based on power grid equipment
CN117216617A (en) Text classification model training method, device, computer equipment and storage medium
CN116244277A (en) NLP (non-linear point) identification and knowledge base construction method and system
CN116956869A (en) Text normalization method, device, electronic equipment and storage medium
CN115129885A (en) Entity chain pointing method, device, equipment and storage medium
Xu et al. Estimating similarity of rich internet pages using visual information
Moreira et al. Deepex: A robust weak supervision system for knowledge base augmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination