CN108595708A - A kind of exception information file classification method of knowledge based collection of illustrative plates - Google Patents
A kind of exception information file classification method of knowledge based collection of illustrative plates Download PDFInfo
- Publication number
- CN108595708A CN108595708A CN201810443976.4A CN201810443976A CN108595708A CN 108595708 A CN108595708 A CN 108595708A CN 201810443976 A CN201810443976 A CN 201810443976A CN 108595708 A CN108595708 A CN 108595708A
- Authority
- CN
- China
- Prior art keywords
- vector
- entity
- text
- knowledge
- illustrative plates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention proposes a kind of exception information file classification method of knowledge based collection of illustrative plates, builds domain knowledge collection of illustrative plates first, constructs Entity recognition and entity link based on the domain knowledge collection of illustrative plates, then builds Text Representation vector vtextVector v is indicated with substance featureent, Text Representation vector and substance feature are finally indicated that vector splicing has been incorporated the new text representation vector v of knowledge featuremerge, classification based training is carried out to the new text representation vector, obtains final classification results.
Description
Technical field
The present invention relates to a kind of sorting technique more particularly to a kind of exception information text classification sides of knowledge based collection of illustrative plates
Method.
Background technology
With the development of internet and the continuous growth of the network information, the rapid development of network technology makes people to network
It increasingly relies on, along with ever-increasing information sharing and service propaganda on network, the safety problem of Web content has highlighted
Come.Therefore it is badly in need of the exception information recognition methods of a kind of high accuracy and strong autgmentability and provides network security for society and individual
Guarantee.
In the prior art, exception information detection there are two main classes method:One kind is that keyword is used to filter or with artificial
The mode that mode models exception information, it is artificial to formulate filtering keys word list matched text information;Another kind of is to be based on
The file classification method of statistics and machine learning, such as support vector machines, K neighbor algorithms and decision Tree algorithms.Above method takes
The effect obtained is all not satisfactory, and application scenarios limitation is often extremely difficult to balance between the accuracy and scalability of method.
Artificial formulation filtering keys word list, machinery are relied in such a way that exception information is identified in the method for keyword filtering
And autgmentability is poor, and the neologisms on network emerge one after another, and it can not be complete by exception information only according to artificial formulation lists of keywords
Covering, and can not also understand from the angle of semantic analysis and screen harmful information.Currently based on the Information Filtering Model of content
A large amount of rules manually formulated are relied on to complete to model, network harmful content is various informative, and the rule manually formulated can not sample
Sample exhaustion is to the greatest extent.In addition data mining technology and the neural network model of machine learning also obtain in terms of the identification of exception information
With application, but ignore the field priori involved in text, most methods are only started with from the surface characteristics of text, passed through
The word frequency of word or semantic vector carry out semantic modeling to text in text, can only simply utilize such as cooccurrence relation shallow-layer special
Sign, it is difficult to the semantic information for capturing the deeper contained in text, general character relationship, the inclusion relation of the things as mentioned by text
And the priori etc. of the unmentioned common sense property of text.
Currently, knowledge mapping has become the semantic interlink realized in big data analysis, the multi-source heterogeneous number in internet is realized
According to the important tool in the conversion of the specific things description to objective world.The foundation of knowledge mapping to the Unify legislations of data,
Effective integration, association are found and knowledge reasoning has established effective research method, knowledge mapping visualization technique Description of Knowledge
Resource and its carrier excavate, analysis, structure, draw knowledge and connect each other therebetween.As WordNet, DBPedia etc. are advised greatly
The appearance and development of mould knowledge base, a large amount of knowledge can be with Open Access Journals, and the knowledge feature obtained from knowledge base is also by increasingly
In more the applying to natural language processing of the tasks.As the natural language model based on neural network is embedded in (word by word
Embedding) text feature is carried out the success of vectorization expression by method, is achieved on the representation method of knowledge feature same
The remarkable result of sample, such as TransE to TranR a series of knowledge base entities and relationship embedding technique studies.But
A few class representation of knowledge learning methods of the prior art are mostly used for the knowledge bases field internal problems such as relation inference, link prediction,
And individually knowledge information is modeled mostly, it is not applied in exception information text identification.
Invention content
The present invention proposes a kind of exception information file classification method of knowledge based collection of illustrative plates, builds domain knowledge figure first
Spectrum, construct Entity recognition and entity link model based on the domain knowledge collection of illustrative plates, then build Text Representation to
Measure vtextVector v is indicated with substance featureent, Text Representation vector is finally indicated that vector splicing obtains with substance feature
The new text representation vector v of knowledge feature is incorporatedmerge, classification based training is carried out to the new text representation vector, is obtained
Final classification results.
The present invention is based on the Entity recognitions of knowledge mapping with linking and the short essay based on text Yu knowledge mapping union feature
This classification is detected to realize based on the short text exception information of text and knowledge mapping.Present invention introduces external knowledge library is auxiliary
Help the Deep Semantics excavation for carrying out text and character representation.Pass through entity relationship, classification, attribute etc. abundant inside knowledge base
The extraction for extending the Deep Semantics relationship in information support text passes through the entity disambiguation of knowledge based collection of illustrative plates and link method
The ambiguity problem for solving word handles the reference in text by the mapping relations of abbreviation complete in knowledge base and alias
Word finally adds to the knowledge base information for linking entity in the training process of model as supplemental characteristic, is realized to improve
The reliability of abnormal text classification.
Description of the drawings
Fig. 1 is the domain entities relationship system figure of one embodiment of the invention;
Fig. 2 is the attribute fusion of one embodiment of the invention and the qi flow chart that disappears;
Fig. 3 is the entity recognition model framework of the present invention;
Fig. 4 is the present invention based on text and knowledge mapping union feature classification process figure.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments,
The present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only used to explain this hair
It is bright, it is not intended to limit the present invention.In addition, technology involved in the various embodiments of the present invention described below is special
Sign can be combined with each other as long as they do not conflict with each other.
As shown in Figure 1 for the exception information text detection embodiment of political class and concerning taxes class field, structure field is needed
Knowledge mapping establishes domain entities library.Building process is by extracting in news portal website, microblogging, wechat public platform and forum
Political, economic related data, and combine main a few class Chinese encyclopaedia website (Baidupedia, interactions hundred in current internet
Section, Chinese wikipedia) in semi-structured data supplement.Network data has polyphyly, derives from news portal net
It stands, multiple channels such as microblogging, wechat public platform and forum, different statement modes and data content knot is had in different platform
Structure, so needing to be handled the data of multi-source and realizing fusion.First using the rule-based tool that crawls from Network page
The data of structuring are extracted in face, and design simple rule and (filtering spcial character, conversion between simplified and traditional Chinese are cleared up to initial data
Deng) with normalization (unified time, the expression forms such as date), then will be obtained from Chinese encyclopaedia website with crawl with
Politics, economy class news, the relevant entry of microblogging are used as entity using original entry label in encyclopaedia website data
Simple K-means clustering algorithms are the delimitation classification of each entity, composition and classification system.About the not Tongfang of entry in encyclopaedia
Face description is used for constituting every attribute of entity, and the hyperlink in being described about entry in encyclopaedia can be used for establishing each reality
Isolated entity is connected into collection of illustrative plates by the incidence relation between body.It is preliminary based on having been obtained from multi-resources Heterogeneous data at this time
Molding knowledge mapping provides basis for follow-up work.
Since the expression-form of separate sources data differs with the quality of data, need to carry out knowledge fusion.Knowledge fusion packet
Containing entity alignment and two groundworks of qi that disappear of merging of attribute, entity alignment is retouched using physical name, entity class and entity
It states three dimensional characteristics and the list of entities that should be aligned is found out by Arithmetic of Semantic Similarity judgement, same entity is needed to be melted
The entity item attribute information of conjunction is organized into set, using attribute fusion as shown in Figure 2 and the qi flow that disappears, finally obtains complete
Solid data be stored in database.The building process of knowledge base in this way is basically completed, and storage medium uses neo4j chart databases,
The inquiry of the knowledge base of structure completion is carried out by way of the API Access interface that calling neo4j is provided.
Extract textual association to knowledge mapping information need through entity recognition method to the reality that is arrived involved in text
Pronouns, general term for nouns, numerals and measure words or phrase are labeled, and be linked in knowledge base its it is corresponding it is specific physically.The main task of Entity recognition is
The name such as the name mentioned in natural language text, place name, institution term Entity recognition is come out, and is optionally carried out real
The simple classification of body name.Current almost all of processing mode is all to regard this problem as the similar sequence labelling segmented to ask
Topic uses " BIO " labelling method that each word in sentence is marked, and " B " represents the beginning of some physical name, and " I " represents certain
The centre of a physical name or end, " O " represent the word other than physical name, reuse machine learning model mark data
It is trained on collection, such as condition random field (CRF) or Recognition with Recurrent Neural Network model.The present invention uses as shown in Figure 3
The binding model of BILSTM+CRF is first encoded text using shot and long term Memory Neural Networks (LSTM), each in text
Input of the term vector of word as LSTM, it is certain that then output, which is each word,
The groundwork of entity link is to find the corresponding entity in knowledge base according to name entity word, can be related to therebetween
And the disambiguation to entity of the same name, as judged, " Zhang San " should be linked to leader Zhang San in " Zhang San is a great Leader "
Physically or personage's biography Zhang San physically.The present invention is established generally by way of statistical learning under the data set of standard
Rate model completes the qi that disappears, and identifies the highest entity of probability, returns to entity id.The case where lacking the data set completely marked
Under, it first passes through knowledge library searching and enumerates all candidate entities, then utilize entity popularity, entity class and original text
The indexs such as this degree of association or the similarity of entity information and urtext carry out rule-based sequence and screening.
Traditional text representation method carries out vectorization expression by one-hot vectors or TF-IDF value sequences, right first
The word obtained after all text participles of data set is counted, and vocabulary is obtained after filtering low word stop words, if vocabulary size
For n, then the expression vector v of textd∈RnThe appearance feelings of i-th of word in the text in digital representation in i-th dimension vocabulary
Condition, 1 is to occur 0 not occur, or replace to obtain more preferable effect using the TF-IDF values of the word.But this representation method
Dimension can be brought excessively high, Sparse, and to the weaker equal prominent questions of the code capacity of Semantic Similarity.
To solve the above problems, the present invention uses word embedding grammar, single word is subjected to vectorization expression, by word
Between similitude be converted to the measurement of COS distance between vector.The present invention uses nearest 1 year news data and Chinese
Wikipedia data carry out term vector training using word2vec.
Text is considered as the sequence of terms (w occurred successively1, w2, w3...), if word wiWord2vec vector tables
It is shown as vwi, vector length k is stitched together the term vector of all words of text to obtain Text Representation vector vtext∈
Rs×k。
The main target of knowledge mapping be various entities and concept existing for describe among real world and it
Between incidence relation.Knowledge mapping is by " entity-relationship-entity " triple, by entity with the real world and concept
It is mapped in a semantic network, can effectively solve the problem that the low density problem of open the Internet big data information value, it is especially suitable
Relevant information retrieval task related for entity, semantic.But entity relationship is very difficult to apply in Algorithm of documents categorization, this hair
The bright semantic information by knowledge base is expressed as dense low-dimensional real-valued vectors, towards in knowledge mapping entity and relationship carry out
Indicate study.
The present invention indicates entity and relation vector using TransE models, by each triple example (head,
Relation, tail) in relationship relation regard the addition of vectors from entity head to entity tail as, by constantly adjusting
The vector of whole head, relation and tail keep (h+r) equal with t, i.e. h+r=t.
By TransE algorithms in knowledge mapping entity and after relationship is indicated study, each entity and pass
System can indicate v with a k dimensional vectorei.Then it can show that the vector of knowledge feature indicates:The reality of knowledge based collection of illustrative plates
Method body identification and linked, text (w1, w2, w3...) can the entity that arrives of text link be (e1,e2,……,et),
The entity vector of all entities is stitched together to obtain substance feature vector expression vent∈Rt×k。
Text Representation vector and substance feature are indicated that vector splicing has been incorporated the text representation of knowledge feature
Form:
With new text representation vector vmergeInstead of the expression vector v originally based on plain text featuretext, participate in mesh
Model training in, to complete to expand the feature of target text, increase the support to Deep Semantics information, improve model
Quality and completeness.
The disaggregated model of the present invention is illustrated in figure 4 to vmergeCarry out classification based training.Using CNN deep learnings model into
Row classification based training, vmergeVector is spliced into the representing matrix of text, CNN layers is input to, result is finally input to full connection
Network classifier carries out model training, obtains final classification as a result, ensure that capture of the model to text Deep Semantics information,
Improve classification quality and reliability.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;To the greatest extent
Present invention has been described in detail with reference to the aforementioned embodiments for pipe, it will be understood by those of ordinary skill in the art that:It is still
It can modify to the technical solution recorded in previous embodiment or equivalent replacement of some of the technical features;
And these modifications or replacements, the spirit for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution
And range.
Claims (6)
1. a kind of exception information file classification method of knowledge based collection of illustrative plates, which is characterized in that domain knowledge collection of illustrative plates is built first,
Entity recognition and entity link based on the domain knowledge collection of illustrative plates are constructed, Text Representation vector v is then builttextWith
Substance feature indicates vector vent, Text Representation vector is finally indicated that vector splicing has been incorporated with substance feature and is known
Know the new text representation vector v of featuremerge, classification based training is carried out to the new text representation vector, obtains final point
Class result.
2. the method as described in claim 1, which is characterized in that the structure domain knowledge collection of illustrative plates is by extracting in different platform
Multi-source data handled and merged, establish between entity class and entity and be associated with, then carry out knowledge fusion;It is described to know
It includes entity alignment and two steps of qi that disappear that merge of attribute to know fusion, and physical name, entity are used in the entity alignment step
Three dimensional characteristics of classification and entity description find out the list of entities that should be aligned by Arithmetic of Semantic Similarity judgement, will be same
All entity item attribute informations to be fused of entity are organized into set.
3. the method as described in claim 1, which is characterized in that the structure is passed through based on the Entity recognition based on collection of illustrative plates
Using the binding model of BILSTM+CRF, first text is encoded using LSTM algorithms, the term vector of each word in text
As the input of LSTM, then output is the probability that each word is some label, and as the input of CRF, randomization transfer
Probability matrix finds out the highest flag sequence of probability according to deduction algorithm;The chain of entities based on collection of illustrative plates based on described is built to connect
The mode for crossing statistical learning establishes probabilistic model under the data set of standard, completes the qi that disappears, identifies the highest entity of probability, return
Return entity id.
4. the method as described in claim 1, which is characterized in that the structure Text Representation vector uses word insertion side
Single word is carried out vectorization expression, the similitude between word is converted to the measurement of COS distance between vector, led to by method
It crosses neural network to learn text feature, while reducing term vector dimension;If word wiWord2vec vectors be expressed as
vwi, the term vector of all words of text is stitched together to obtain Text Representation vector by vector length kThe s is the quantity of word, the vtext∈Rs×k。
5. method as claimed in claim 4, which is characterized in that it is to pass through to build the method that the substance feature indicates vector
TransE algorithms in knowledge mapping entity and relationship be indicated study, one k dimensional vector of each entity and relationship
Indicate vei, text (w1, w2, w3...) can the entity that arrives of text link be (e1,e2,……,et), by the reality of all entities
Body vector is stitched together to obtain the substance feature expression vectorThe t is real
The quantity of body, vent∈Rt×k。
6. method as claimed in claim 5, which is characterized in that Text Representation vector and substance feature are indicated that vector is spelled
Connect the new text representation vector v for having been incorporated knowledge featuremergeMode beThen make
Classification based training is carried out with CNN deep learnings model, by vmergeVector is spliced into the representing matrix of text, is input to CNN layers, finally
Result is input to fully-connected network grader progress model training and obtains final classification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810443976.4A CN108595708A (en) | 2018-05-10 | 2018-05-10 | A kind of exception information file classification method of knowledge based collection of illustrative plates |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810443976.4A CN108595708A (en) | 2018-05-10 | 2018-05-10 | A kind of exception information file classification method of knowledge based collection of illustrative plates |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108595708A true CN108595708A (en) | 2018-09-28 |
Family
ID=63637073
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810443976.4A Pending CN108595708A (en) | 2018-05-10 | 2018-05-10 | A kind of exception information file classification method of knowledge based collection of illustrative plates |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108595708A (en) |
Cited By (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543041A (en) * | 2018-11-30 | 2019-03-29 | 安徽听见科技有限公司 | A kind of generation method and device of language model scores |
CN109582802A (en) * | 2018-11-30 | 2019-04-05 | 国信优易数据有限公司 | A kind of entity embedding grammar, device, medium and equipment |
CN109614615A (en) * | 2018-12-04 | 2019-04-12 | 联想(北京)有限公司 | Methodology for Entities Matching, device and electronic equipment |
CN109657238A (en) * | 2018-12-10 | 2019-04-19 | 宁波深擎信息科技有限公司 | Context identification complementing method, system, terminal and the medium of knowledge based map |
CN109684394A (en) * | 2018-12-13 | 2019-04-26 | 北京百度网讯科技有限公司 | Document creation method, device, equipment and storage medium |
CN109726253A (en) * | 2018-12-21 | 2019-05-07 | 义橙网络科技(上海)有限公司 | Construction method, device, equipment and the medium of talent's map and talent's portrait |
CN109977419A (en) * | 2019-04-09 | 2019-07-05 | 福建奇点时空数字科技有限公司 | A kind of knowledge mapping building system |
CN110046260A (en) * | 2019-04-16 | 2019-07-23 | 广州大学 | A kind of darknet topic discovery method and system of knowledge based map |
CN110069779A (en) * | 2019-04-18 | 2019-07-30 | 腾讯科技(深圳)有限公司 | The symptom entity recognition method and relevant apparatus of medical text |
CN110188147A (en) * | 2019-05-22 | 2019-08-30 | 厦门无常师教育科技有限公司 | The document entity relationship of knowledge based map finds method and system |
CN110245228A (en) * | 2019-04-29 | 2019-09-17 | 阿里巴巴集团控股有限公司 | The method and apparatus for determining text categories |
CN110263324A (en) * | 2019-05-16 | 2019-09-20 | 华为技术有限公司 | Text handling method, model training method and device |
CN110263178A (en) * | 2019-06-03 | 2019-09-20 | 南京航空航天大学 | A kind of mapping method of WordNet to Neo4J, Semantic detection method and semantic computation expansion interface generation method |
CN110275928A (en) * | 2019-06-24 | 2019-09-24 | 复旦大学 | Iterative entity relation extraction method |
CN110297908A (en) * | 2019-07-01 | 2019-10-01 | 中国医学科学院医学信息研究所 | Diagnosis and treatment program prediction method and device |
CN110390324A (en) * | 2019-07-27 | 2019-10-29 | 苏州过来人科技有限公司 | A kind of resume printed page analysis algorithm merging vision and text feature |
CN110399261A (en) * | 2019-06-13 | 2019-11-01 | 中国科学院信息工程研究所 | A kind of system alarm clustering method based on co-occurrence figure |
CN110490251A (en) * | 2019-03-08 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Prediction disaggregated model acquisition methods and device, storage medium based on artificial intelligence |
CN110516073A (en) * | 2019-08-30 | 2019-11-29 | 北京百度网讯科技有限公司 | A kind of file classification method, device, equipment and medium |
CN110633366A (en) * | 2019-07-31 | 2019-12-31 | 国家计算机网络与信息安全管理中心 | Short text classification method, device and storage medium |
CN110750647A (en) * | 2019-10-17 | 2020-02-04 | 北京华宇信息技术有限公司 | Construction method of ELP model of multi-source heterogeneous information data |
CN110825882A (en) * | 2019-10-09 | 2020-02-21 | 西安交通大学 | Knowledge graph-based information system management method |
CN110910243A (en) * | 2019-09-26 | 2020-03-24 | 山东佳联电子商务有限公司 | Property right transaction method based on reconfigurable big data knowledge map technology |
CN110955764A (en) * | 2019-11-19 | 2020-04-03 | 百度在线网络技术(北京)有限公司 | Scene knowledge graph generation method, man-machine conversation method and related equipment |
CN110955780A (en) * | 2019-10-12 | 2020-04-03 | 中国人民解放军国防科技大学 | Entity alignment method for knowledge graph |
CN110990533A (en) * | 2019-11-29 | 2020-04-10 | 支付宝(杭州)信息技术有限公司 | Method and device for determining standard text corresponding to query text |
CN111028952A (en) * | 2019-11-27 | 2020-04-17 | 云知声智能科技股份有限公司 | Method and device for constructing Chinese medical implication knowledge graph |
CN111144574A (en) * | 2018-11-06 | 2020-05-12 | 北京嘀嘀无限科技发展有限公司 | Artificial intelligence system and method for training learner model using instructor model |
CN111191047A (en) * | 2019-12-31 | 2020-05-22 | 武汉理工大学 | Knowledge graph construction method for human-computer cooperation disassembly task |
CN111191031A (en) * | 2019-12-24 | 2020-05-22 | 上海大学 | Entity relation classification method of unstructured text based on WordNet and IDF |
CN111209738A (en) * | 2019-12-31 | 2020-05-29 | 浙江大学 | Multi-task named entity recognition method combining text classification |
CN111414393A (en) * | 2020-03-26 | 2020-07-14 | 湖南科创信息技术股份有限公司 | Semantic similar case retrieval method and equipment based on medical knowledge graph |
CN111563166A (en) * | 2020-05-28 | 2020-08-21 | 浙江学海教育科技有限公司 | Pre-training model method for mathematical problem classification |
CN111737489A (en) * | 2020-06-17 | 2020-10-02 | 广联达科技股份有限公司 | Building information retrieval method, device, equipment and readable storage medium |
CN111985242A (en) * | 2019-05-22 | 2020-11-24 | 中国信息安全测评中心 | Text labeling method and device |
CN112084331A (en) * | 2020-08-27 | 2020-12-15 | 清华大学 | Text processing method, text processing device, model training method, model training device, computer equipment and storage medium |
CN112182249A (en) * | 2020-10-23 | 2021-01-05 | 四川大学 | Automatic classification method and device for aviation safety report |
CN112417448A (en) * | 2020-11-15 | 2021-02-26 | 复旦大学 | Anti-aging enhancement method for malicious software detection model based on API (application programming interface) relational graph |
CN112559737A (en) * | 2020-11-20 | 2021-03-26 | 和美(深圳)信息技术股份有限公司 | Node classification method and system of knowledge graph |
CN112597298A (en) * | 2020-10-14 | 2021-04-02 | 上海勃池信息技术有限公司 | Deep learning text classification method fusing knowledge maps |
CN112632994A (en) * | 2020-12-03 | 2021-04-09 | 大箴(杭州)科技有限公司 | Method, device and equipment for determining basic attribute characteristics based on text information |
CN112801706A (en) * | 2021-02-04 | 2021-05-14 | 北京云上曲率科技有限公司 | Game user behavior data mining method and system |
CN112906361A (en) * | 2021-02-09 | 2021-06-04 | 上海明略人工智能(集团)有限公司 | Text data labeling method and device, electronic equipment and storage medium |
CN113094715A (en) * | 2021-04-20 | 2021-07-09 | 国家计算机网络与信息安全管理中心 | Network security dynamic early warning system based on knowledge graph |
CN113254615A (en) * | 2021-05-31 | 2021-08-13 | 中国移动通信集团陕西有限公司 | Text processing method, device, equipment and medium |
CN113449104A (en) * | 2021-06-22 | 2021-09-28 | 上海明略人工智能(集团)有限公司 | Label enhancement model construction method and system, electronic equipment and storage medium |
CN113590802A (en) * | 2021-09-27 | 2021-11-02 | 北京明略软件系统有限公司 | Session content abnormity detection method and device, electronic equipment and storage medium |
CN113641766A (en) * | 2021-07-15 | 2021-11-12 | 北京三快在线科技有限公司 | Relationship identification method and device, storage medium and electronic equipment |
CN113722509A (en) * | 2021-09-07 | 2021-11-30 | 中国人民解放军32801部队 | Knowledge graph data fusion method based on entity attribute similarity |
WO2021259002A1 (en) * | 2020-06-23 | 2021-12-30 | 平安科技(深圳)有限公司 | Decision tree-based method and apparatus for outputting abnormal data sources, and computer device |
CN113963357A (en) * | 2021-12-16 | 2022-01-21 | 北京大学 | Knowledge graph-based sensitive text detection method and system |
CN114064901A (en) * | 2021-11-26 | 2022-02-18 | 重庆邮电大学 | Book comment text classification method based on knowledge graph word meaning disambiguation |
CN114548103A (en) * | 2020-11-25 | 2022-05-27 | 马上消费金融股份有限公司 | Training method of named entity recognition model and recognition method of named entity |
CN117040926A (en) * | 2023-10-08 | 2023-11-10 | 北京网藤科技有限公司 | Industrial control network security feature analysis method and system applying knowledge graph |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107526798A (en) * | 2017-08-18 | 2017-12-29 | 武汉红茶数据技术有限公司 | A kind of Entity recognition based on neutral net and standardization integrated processes and model |
CN107644014A (en) * | 2017-09-25 | 2018-01-30 | 南京安链数据科技有限公司 | A kind of name entity recognition method based on two-way LSTM and CRF |
CN107992480A (en) * | 2017-12-25 | 2018-05-04 | 东软集团股份有限公司 | A kind of method, apparatus for realizing entity disambiguation and storage medium, program product |
-
2018
- 2018-05-10 CN CN201810443976.4A patent/CN108595708A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107526798A (en) * | 2017-08-18 | 2017-12-29 | 武汉红茶数据技术有限公司 | A kind of Entity recognition based on neutral net and standardization integrated processes and model |
CN107644014A (en) * | 2017-09-25 | 2018-01-30 | 南京安链数据科技有限公司 | A kind of name entity recognition method based on two-way LSTM and CRF |
CN107992480A (en) * | 2017-12-25 | 2018-05-04 | 东软集团股份有限公司 | A kind of method, apparatus for realizing entity disambiguation and storage medium, program product |
Non-Patent Citations (2)
Title |
---|
JIN WANG等: "《Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence》", 《COMBINING KNOWLEDGE WITH DEEP CONVOLUTIONAL NEURAL NETWORKS FOR SHORT TEXT CLASSIFICATION》 * |
徐增林等: "知识图谱技术综述", 《电子科技大学学报》 * |
Cited By (82)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10872300B2 (en) | 2018-11-06 | 2020-12-22 | Beijing Didi Infinity Technology And Development Co., Ltd. | Artificial intelligent systems and methods for using a structurally simpler learner model to mimic behaviors of a structurally more complicated reference model |
CN111144574A (en) * | 2018-11-06 | 2020-05-12 | 北京嘀嘀无限科技发展有限公司 | Artificial intelligence system and method for training learner model using instructor model |
WO2020093356A1 (en) * | 2018-11-06 | 2020-05-14 | Beijing Didi Infinity Technology And Development Co., Ltd. | Artificial intelligent systems and methods for using structurally simpler learner model to mimic behaviors of structurally more complicated reference model |
CN111144574B (en) * | 2018-11-06 | 2023-03-24 | 北京嘀嘀无限科技发展有限公司 | Artificial intelligence system and method for training learner model using instructor model |
CN109582802A (en) * | 2018-11-30 | 2019-04-05 | 国信优易数据有限公司 | A kind of entity embedding grammar, device, medium and equipment |
CN109543041A (en) * | 2018-11-30 | 2019-03-29 | 安徽听见科技有限公司 | A kind of generation method and device of language model scores |
CN109582802B (en) * | 2018-11-30 | 2020-11-03 | 国信优易数据股份有限公司 | Entity embedding method, device, medium and equipment |
CN109614615A (en) * | 2018-12-04 | 2019-04-12 | 联想(北京)有限公司 | Methodology for Entities Matching, device and electronic equipment |
CN109614615B (en) * | 2018-12-04 | 2022-04-22 | 联想(北京)有限公司 | Entity matching method and device and electronic equipment |
CN109657238A (en) * | 2018-12-10 | 2019-04-19 | 宁波深擎信息科技有限公司 | Context identification complementing method, system, terminal and the medium of knowledge based map |
CN109657238B (en) * | 2018-12-10 | 2023-10-13 | 宁波深擎信息科技有限公司 | Knowledge graph-based context identification completion method, system, terminal and medium |
CN109684394A (en) * | 2018-12-13 | 2019-04-26 | 北京百度网讯科技有限公司 | Document creation method, device, equipment and storage medium |
CN109726253A (en) * | 2018-12-21 | 2019-05-07 | 义橙网络科技(上海)有限公司 | Construction method, device, equipment and the medium of talent's map and talent's portrait |
CN110490251B (en) * | 2019-03-08 | 2022-07-01 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based prediction classification model obtaining method and device and storage medium |
CN110490251A (en) * | 2019-03-08 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Prediction disaggregated model acquisition methods and device, storage medium based on artificial intelligence |
CN109977419A (en) * | 2019-04-09 | 2019-07-05 | 福建奇点时空数字科技有限公司 | A kind of knowledge mapping building system |
CN110046260A (en) * | 2019-04-16 | 2019-07-23 | 广州大学 | A kind of darknet topic discovery method and system of knowledge based map |
CN110069779B (en) * | 2019-04-18 | 2023-01-10 | 腾讯科技(深圳)有限公司 | Symptom entity identification method of medical text and related device |
CN110069779A (en) * | 2019-04-18 | 2019-07-30 | 腾讯科技(深圳)有限公司 | The symptom entity recognition method and relevant apparatus of medical text |
CN110245228A (en) * | 2019-04-29 | 2019-09-17 | 阿里巴巴集团控股有限公司 | The method and apparatus for determining text categories |
US20220147715A1 (en) * | 2019-05-16 | 2022-05-12 | Huawei Technologies Co., Ltd. | Text processing method, model training method, and apparatus |
CN110263324B (en) * | 2019-05-16 | 2021-02-12 | 华为技术有限公司 | Text processing method, model training method and device |
CN110263324A (en) * | 2019-05-16 | 2019-09-20 | 华为技术有限公司 | Text handling method, model training method and device |
CN110188147A (en) * | 2019-05-22 | 2019-08-30 | 厦门无常师教育科技有限公司 | The document entity relationship of knowledge based map finds method and system |
CN111985242A (en) * | 2019-05-22 | 2020-11-24 | 中国信息安全测评中心 | Text labeling method and device |
CN110263178B (en) * | 2019-06-03 | 2023-05-12 | 南京航空航天大学 | WordNet-to-Neo 4J mapping method, semantic detection method and semantic calculation expansion interface generation method |
CN110263178A (en) * | 2019-06-03 | 2019-09-20 | 南京航空航天大学 | A kind of mapping method of WordNet to Neo4J, Semantic detection method and semantic computation expansion interface generation method |
CN110399261A (en) * | 2019-06-13 | 2019-11-01 | 中国科学院信息工程研究所 | A kind of system alarm clustering method based on co-occurrence figure |
CN110275928B (en) * | 2019-06-24 | 2022-11-22 | 复旦大学 | Iterative entity relation extraction method |
CN110275928A (en) * | 2019-06-24 | 2019-09-24 | 复旦大学 | Iterative entity relation extraction method |
CN110297908A (en) * | 2019-07-01 | 2019-10-01 | 中国医学科学院医学信息研究所 | Diagnosis and treatment program prediction method and device |
CN110390324A (en) * | 2019-07-27 | 2019-10-29 | 苏州过来人科技有限公司 | A kind of resume printed page analysis algorithm merging vision and text feature |
CN110633366B (en) * | 2019-07-31 | 2022-12-16 | 国家计算机网络与信息安全管理中心 | Short text classification method, device and storage medium |
CN110633366A (en) * | 2019-07-31 | 2019-12-31 | 国家计算机网络与信息安全管理中心 | Short text classification method, device and storage medium |
CN110516073A (en) * | 2019-08-30 | 2019-11-29 | 北京百度网讯科技有限公司 | A kind of file classification method, device, equipment and medium |
CN110910243A (en) * | 2019-09-26 | 2020-03-24 | 山东佳联电子商务有限公司 | Property right transaction method based on reconfigurable big data knowledge map technology |
CN110825882A (en) * | 2019-10-09 | 2020-02-21 | 西安交通大学 | Knowledge graph-based information system management method |
CN110825882B (en) * | 2019-10-09 | 2022-03-01 | 西安交通大学 | Knowledge graph-based information system management method |
CN110955780B (en) * | 2019-10-12 | 2022-10-14 | 中国人民解放军国防科技大学 | Entity alignment method for knowledge graph |
CN110955780A (en) * | 2019-10-12 | 2020-04-03 | 中国人民解放军国防科技大学 | Entity alignment method for knowledge graph |
CN110750647A (en) * | 2019-10-17 | 2020-02-04 | 北京华宇信息技术有限公司 | Construction method of ELP model of multi-source heterogeneous information data |
CN110750647B (en) * | 2019-10-17 | 2020-07-31 | 北京华宇信息技术有限公司 | Method for constructing E L P model of multi-source heterogeneous information data |
CN110955764B (en) * | 2019-11-19 | 2021-04-06 | 百度在线网络技术(北京)有限公司 | Scene knowledge graph generation method, man-machine conversation method and related equipment |
CN110955764A (en) * | 2019-11-19 | 2020-04-03 | 百度在线网络技术(北京)有限公司 | Scene knowledge graph generation method, man-machine conversation method and related equipment |
CN111028952A (en) * | 2019-11-27 | 2020-04-17 | 云知声智能科技股份有限公司 | Method and device for constructing Chinese medical implication knowledge graph |
CN111028952B (en) * | 2019-11-27 | 2023-08-04 | 云知声智能科技股份有限公司 | Method and device for constructing Chinese medical implication knowledge graph |
CN110990533A (en) * | 2019-11-29 | 2020-04-10 | 支付宝(杭州)信息技术有限公司 | Method and device for determining standard text corresponding to query text |
CN110990533B (en) * | 2019-11-29 | 2023-08-25 | 支付宝(杭州)信息技术有限公司 | Method and device for determining standard text corresponding to query text |
CN111191031A (en) * | 2019-12-24 | 2020-05-22 | 上海大学 | Entity relation classification method of unstructured text based on WordNet and IDF |
CN111191047A (en) * | 2019-12-31 | 2020-05-22 | 武汉理工大学 | Knowledge graph construction method for human-computer cooperation disassembly task |
CN111209738A (en) * | 2019-12-31 | 2020-05-29 | 浙江大学 | Multi-task named entity recognition method combining text classification |
CN111414393A (en) * | 2020-03-26 | 2020-07-14 | 湖南科创信息技术股份有限公司 | Semantic similar case retrieval method and equipment based on medical knowledge graph |
CN111563166B (en) * | 2020-05-28 | 2024-02-13 | 浙江学海教育科技有限公司 | Pre-training model method for classifying mathematical problems |
CN111563166A (en) * | 2020-05-28 | 2020-08-21 | 浙江学海教育科技有限公司 | Pre-training model method for mathematical problem classification |
CN111737489A (en) * | 2020-06-17 | 2020-10-02 | 广联达科技股份有限公司 | Building information retrieval method, device, equipment and readable storage medium |
WO2021259002A1 (en) * | 2020-06-23 | 2021-12-30 | 平安科技(深圳)有限公司 | Decision tree-based method and apparatus for outputting abnormal data sources, and computer device |
CN112084331A (en) * | 2020-08-27 | 2020-12-15 | 清华大学 | Text processing method, text processing device, model training method, model training device, computer equipment and storage medium |
CN112597298A (en) * | 2020-10-14 | 2021-04-02 | 上海勃池信息技术有限公司 | Deep learning text classification method fusing knowledge maps |
CN112182249A (en) * | 2020-10-23 | 2021-01-05 | 四川大学 | Automatic classification method and device for aviation safety report |
CN112417448B (en) * | 2020-11-15 | 2022-03-18 | 复旦大学 | Anti-aging enhancement method for malicious software detection model based on API (application programming interface) relational graph |
CN112417448A (en) * | 2020-11-15 | 2021-02-26 | 复旦大学 | Anti-aging enhancement method for malicious software detection model based on API (application programming interface) relational graph |
CN112559737A (en) * | 2020-11-20 | 2021-03-26 | 和美(深圳)信息技术股份有限公司 | Node classification method and system of knowledge graph |
CN114548103B (en) * | 2020-11-25 | 2024-03-29 | 马上消费金融股份有限公司 | Named entity recognition model training method and named entity recognition method |
CN114548103A (en) * | 2020-11-25 | 2022-05-27 | 马上消费金融股份有限公司 | Training method of named entity recognition model and recognition method of named entity |
CN112632994B (en) * | 2020-12-03 | 2023-09-01 | 大箴(杭州)科技有限公司 | Method, device and equipment for determining basic attribute characteristics based on text information |
CN112632994A (en) * | 2020-12-03 | 2021-04-09 | 大箴(杭州)科技有限公司 | Method, device and equipment for determining basic attribute characteristics based on text information |
CN112801706B (en) * | 2021-02-04 | 2024-02-02 | 北京云上曲率科技有限公司 | Game user behavior data mining method and system |
CN112801706A (en) * | 2021-02-04 | 2021-05-14 | 北京云上曲率科技有限公司 | Game user behavior data mining method and system |
CN112906361A (en) * | 2021-02-09 | 2021-06-04 | 上海明略人工智能(集团)有限公司 | Text data labeling method and device, electronic equipment and storage medium |
CN113094715A (en) * | 2021-04-20 | 2021-07-09 | 国家计算机网络与信息安全管理中心 | Network security dynamic early warning system based on knowledge graph |
CN113254615A (en) * | 2021-05-31 | 2021-08-13 | 中国移动通信集团陕西有限公司 | Text processing method, device, equipment and medium |
CN113449104A (en) * | 2021-06-22 | 2021-09-28 | 上海明略人工智能(集团)有限公司 | Label enhancement model construction method and system, electronic equipment and storage medium |
CN113641766A (en) * | 2021-07-15 | 2021-11-12 | 北京三快在线科技有限公司 | Relationship identification method and device, storage medium and electronic equipment |
CN113722509B (en) * | 2021-09-07 | 2022-03-01 | 中国人民解放军32801部队 | Knowledge graph data fusion method based on entity attribute similarity |
CN113722509A (en) * | 2021-09-07 | 2021-11-30 | 中国人民解放军32801部队 | Knowledge graph data fusion method based on entity attribute similarity |
CN113590802A (en) * | 2021-09-27 | 2021-11-02 | 北京明略软件系统有限公司 | Session content abnormity detection method and device, electronic equipment and storage medium |
CN114064901A (en) * | 2021-11-26 | 2022-02-18 | 重庆邮电大学 | Book comment text classification method based on knowledge graph word meaning disambiguation |
CN114064901B (en) * | 2021-11-26 | 2022-08-26 | 重庆邮电大学 | Book comment text classification method based on knowledge graph word meaning disambiguation |
CN113963357A (en) * | 2021-12-16 | 2022-01-21 | 北京大学 | Knowledge graph-based sensitive text detection method and system |
CN113963357B (en) * | 2021-12-16 | 2022-03-11 | 北京大学 | Knowledge graph-based sensitive text detection method and system |
CN117040926A (en) * | 2023-10-08 | 2023-11-10 | 北京网藤科技有限公司 | Industrial control network security feature analysis method and system applying knowledge graph |
CN117040926B (en) * | 2023-10-08 | 2024-01-26 | 北京网藤科技有限公司 | Industrial control network security feature analysis method and system applying knowledge graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108595708A (en) | A kind of exception information file classification method of knowledge based collection of illustrative plates | |
CN108628828B (en) | Combined extraction method based on self-attention viewpoint and holder thereof | |
CN112069811B (en) | Electronic text event extraction method with multi-task interaction enhancement | |
CN110532328B (en) | Text concept graph construction method | |
CN110909164A (en) | Text enhancement semantic classification method and system based on convolutional neural network | |
CN114444516B (en) | Cantonese rumor detection method based on deep semantic perception map convolutional network | |
CN113157859B (en) | Event detection method based on upper concept information | |
CN112541337B (en) | Document template automatic generation method and system based on recurrent neural network language model | |
CN110457585B (en) | Negative text pushing method, device and system and computer equipment | |
CN114881043B (en) | Deep learning model-based legal document semantic similarity evaluation method and system | |
CN113569050A (en) | Method and device for automatically constructing government affair field knowledge map based on deep learning | |
Kumar et al. | Hybrid fusion based approach for multimodal emotion recognition with insufficient labeled data | |
CN112733547A (en) | Chinese question semantic understanding method by utilizing semantic dependency analysis | |
CN112069312A (en) | Text classification method based on entity recognition and electronic device | |
CN113011126A (en) | Text processing method and device, electronic equipment and computer readable storage medium | |
CN115952794A (en) | Chinese-Tai cross-language sensitive information recognition method fusing bilingual sensitive dictionary and heterogeneous graph | |
Celikyilmaz et al. | A graph-based semi-supervised learning for question-answering | |
CN117765450B (en) | Video language understanding method, device, equipment and readable storage medium | |
Samih et al. | Enhanced sentiment analysis based on improved word embeddings and XGboost. | |
CN115730232A (en) | Topic-correlation-based heterogeneous graph neural network cross-language text classification method | |
Mahmud et al. | Deep learning based sentiment analysis from Bangla text using glove word embedding along with convolutional neural network | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
CN113761128A (en) | Event key information extraction method combining domain synonym dictionary and pattern matching | |
Cai et al. | Multi‐level deep correlative networks for multi‐modal sentiment analysis | |
CN117216617A (en) | Text classification model training method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180928 |