CN109582953A - A kind of speech of information is according to support methods of marking, equipment and storage medium - Google Patents
A kind of speech of information is according to support methods of marking, equipment and storage medium Download PDFInfo
- Publication number
- CN109582953A CN109582953A CN201811302326.4A CN201811302326A CN109582953A CN 109582953 A CN109582953 A CN 109582953A CN 201811302326 A CN201811302326 A CN 201811302326A CN 109582953 A CN109582953 A CN 109582953A
- Authority
- CN
- China
- Prior art keywords
- information
- semantic
- similarity matrix
- speech
- support
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000003860 storage Methods 0.000 title claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims abstract description 79
- 238000005295 random walk Methods 0.000 claims abstract description 13
- 230000003203 everyday effect Effects 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 13
- 238000000513 principal component analysis Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 230000036541 health Effects 0.000 description 11
- 238000013136 deep learning model Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 201000010099 disease Diseases 0.000 description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000003780 insertion Methods 0.000 description 5
- 230000037431 insertion Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 239000003814 drug Substances 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 239000004744 fabric Substances 0.000 description 3
- 208000024891 symptom Diseases 0.000 description 3
- 230000000996 additive effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 206010039083 rhinitis Diseases 0.000 description 2
- 206010002198 Anaphylactic reaction Diseases 0.000 description 1
- 206010039085 Rhinitis allergic Diseases 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 201000010105 allergic rhinitis Diseases 0.000 description 1
- 230000036783 anaphylactic response Effects 0.000 description 1
- 208000003455 anaphylaxis Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000005669 field effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of speeches of information according to support methods of marking, equipment and storage medium.This method comprises: carrying out deep semantic vector coding respectively to all information in information library;According to the deep semantic vector of each information, the similarity of all information between any two is calculated, semantic similarity matrix is obtained;According to the semantic similarity matrix, semantic network is constructed;According to preset random walk model, the corresponding information of node each in the semantic network is sayed and is scored according to support.Foothold of the invention is to evaluate the reliability of viewpoint in information, depth language vector coding is carried out to information, by calculating the similarity of information between any two, construct semantic network, and then the speech that can calculate each information scores according to support, accuracy of the present invention is high, and human cost can be effectively reduced.
Description
Technical field
The present invention relates to data mining and recommender system technical fields more particularly to a kind of speech of information according to support scoring side
Method, equipment and storage medium.
Background technique
Traditional information acquiring pattern is often active, for example user actively browses portal website and obtains newest news
Information, or oneself interested information is actively searched for by search engine.In recent years, with computer network and artificial intelligence
Biggish transformation has occurred in the development of technology, the mode that people obtain information, and various waterfall stream informations, the information of intelligently pushing are straight
Displaying is connect before user, user is being many times passively to receive these information.Turn in information acquiring pattern from active
It during changing to passively, other than the benign development of technology, is also spread unchecked along with information explosion and information, some false moneys
News even rumour fast propagation, so that negative effect of the positive information (such as healthy class information) by negative information.
In rumour identification project, the analysis to information content is paid close attention to, it is many by the inspection of professional or network
Packet study, identifies content exaggerate in information, unreasonable, to infer whether information is for rumour.But it either relies on
Inspection or network the crowdsourcing study of professional all has biggish limitation, requires to consume a large amount of human cost.By
In currently without efficient rumour recognition methods, so actually network crowdsourcing study has become the unique of each platform of refuting a rumour
Selection.Network crowdsourcing study relies on internet social activity participation, plays advantage with united wisdom and strength, in common marker recognition rumour
Hold, the reliability of information is judged by statistics label, but network crowdsourcing study is for the quality and internet of network personnel
Social participation is more demanding, is not suitable in a network environment identifying a large amount of information.
With the extensive use of depth learning technology, researcher starts to consider to identify rumour using deep learning model,
Its basic ideas is still to start with from the content of information itself, is largely marked by the sample to rumour and non-rumour,
The classifier for going out to distinguish the two by deep learning network struction, to directly sentence to the reliability of information content
It is disconnected.But deep learning model has the following problems: although one, deep learning model achieves well in image/video field
Effect, but in natural language field, especially on the information evaluation field that common people can not make discrimination, it is difficult to which it is suitable to find
Deep learning model meet actual requirement;Two, the interpretation of deep learning model needs further to be furtherd investigate, in reality
The result is that being calculated by large amount of complex, final result is often difficult to control the output of deep learning model in the application of border,
Also the quality of output result can not be directly verified by evidence.
Summary of the invention
The main purpose of the present invention is to provide a kind of speeches of information according to support methods of marking, equipment and storage medium, with
It solves the reliability recognition methods human cost height of existing information and accuracy is low.
In view of the above technical problems, the present invention solves by the following technical programs:
The present invention provides a kind of speeches of information according to support methods of marking, comprising: distinguishes all information in information library
Carry out deep semantic vector coding;According to the deep semantic vector of each information, the phase of all information between any two is calculated
Like degree, semantic similarity matrix is obtained;According to the semantic similarity matrix, semantic network is constructed;According to preset random trip
Model is walked, the corresponding information of node each in the semantic network is sayed and is scored according to support.
Wherein, all information in information library carry out deep semantic vector coding respectively, comprising: in default website
Middle crawl everyday words, and the everyday words is added in preset participle tool;Using the participle tool, to the information
All information in library carry out word segmentation processing respectively, obtain multiple participles;According to preset distributed term vector representation method, make
With the preset distributed term vector model of the multiple participle training, the corresponding distributed term vector of each participle is obtained;
According to the corresponding distributed term vector of each participle, deep semantic vector volume is carried out to each information in the information library
Code.
Wherein, described according to the semantic similarity matrix, construct semantic network, comprising: to the semantic similarity square
Battle array carries out principal component analysis, constructs sparse semantic similarity matrix;According to the semantic similarity matrix and the sparse semanteme
Similarity matrix constructs simply connected cum rights undirected simple graph as semantic network.
Wherein, according to the semantic similarity matrix and the sparse semantic similarity matrix, simply connected cum rights is constructed
Undirected simple graph, comprising: according to the sparse semantic similarity matrix, construct cum rights undirected simple graph;Determine the cum rights without
The multiple not connected subgraphs for including into simple graph;It is being inquired between each not connected subgraph in the semantic similarity matrix
The similarity of node pair;In the cum rights undirected simple graph, the maximum node pair of similarity is connected, the maximum phase is used
Weight like degree as the connection, constitutes simply connected cum rights undirected simple graph.
Wherein, the method also includes: commented according to the speech of the corresponding information of node each in the semantic network according to support
Point, obtain the reliability scoring in the information library.
The present invention also provides a kind of speeches of information according to support Marking apparatus, and the speech of the information is according to support Marking apparatus packet
It includes: memory, processor and being stored in the computer program that can be run on the memory and on the processor, the meter
Calculation machine program performed the steps of when being executed by the processor to all information in information library carry out respectively deep semantic to
Amount coding;According to the deep semantic vector of each information, the similarity of all information between any two is calculated, semantic phase is obtained
Like degree matrix;According to the semantic similarity matrix, semantic network is constructed;According to preset random walk model, to institute's predicate
The corresponding information of each node is sayed in adopted network scores according to support.
Wherein, the processor is also used to execute the computer program stored in memory, to perform the steps of pre-
If grabbing everyday words in website, and the everyday words is added in preset participle tool;Using the participle tool, to institute
All information stated in information library carry out word segmentation processing respectively, obtain multiple participles;It is indicated according to preset distributed term vector
Method obtains the corresponding distribution of each participle using the preset distributed term vector model of the multiple participle training
Term vector;According to the corresponding distributed term vector of each participle, depth language is carried out to each information in the information library
Adopted vector coding.
Wherein, the processor is also used to execute the computer program stored in memory, to perform the steps of to institute
Predicate justice similarity matrix carries out principal component analysis, constructs sparse semantic similarity matrix;According to the semantic similarity matrix
With the sparse semantic similarity matrix, simply connected cum rights undirected simple graph is constructed as semantic network.
Wherein, the processor is also used to execute the computer program stored in memory, to perform the steps of basis
The speech of the corresponding information of each node scores according to support in the semantic network, obtains the reliability scoring in the information library.
Invention further provides a kind of storage medium, the speech of information is stored on the storage medium according to support scoring journey
Sequence, the speech of the information realize the speech of above-mentioned information according to support methods of marking when being executed by processor according to support scoring procedures
Step.
The present invention has the beneficial effect that:
Foothold of the invention is the reliability of viewpoint in evaluation information, carries out depth language vector coding to information,
By calculating the similarity of information between any two, semantic network is constructed, and then the speech that can calculate each information is commented according to support
Point, accuracy of the present invention is high, and human cost can be effectively reduced.Further, present invention use has same with this information
Other information of sample viewpoint support the viewpoint of this information, if support other information of the viewpoint seldom in information library, even
Other information have the viewpoint incompatible with the viewpoint, then the reliability of this information will be very low, conversely, other a large amount of information
All there is the evidence with considered information verifying same insight, then the reliability of this information will be very high.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the speech of according to embodiments of the present invention one information according to the flow chart of support methods of marking;
Fig. 2 is the step flow chart of according to embodiments of the present invention two deep semantic vector coding;
Fig. 3 is the step flow chart of according to embodiments of the present invention three semantic network building;
Fig. 4 is the speech of according to embodiments of the present invention five information according to the structure chart of support Marking apparatus.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with drawings and the specific embodiments, to this
Invention is described in further detail.
Embodiment one
Embodiment according to the present invention one provides a kind of speech of information according to support methods of marking.As shown in Figure 1, being root
According to the embodiment of the present invention one information speech according to support methods of marking flow chart.
Step S110 carries out deep semantic vector coding to all information in information library respectively.
Deep semantic vector coding, which refers to, extracts information in the vector table in semantic context space by depth learning technology
Show.By depth learning technology, the Context-dependent of descriptor information where word can carry out better semanteme to word and build
Mould, and vector coding refers to information being converted into computable amount, is handled convenient for computer.
Step S120 calculates the similarity of all information between any two according to the deep semantic vector of each information,
Obtain semantic similarity matrix.
Similarity comprising any two information in information library in language similarity matrix.
Step S130 constructs semantic network according to the semantic similarity matrix.
Node in semantic network is the information in information library, and the connection in semantic network between any two node has
Weight, the value of weight are the similarity of two nodes.
Step S140, according to preset random walk model, to the corresponding information of node each in the semantic network into
Row speech scores according to support.
Random walk model is applied in network, and the random mistake of path probability relationship is formed for describing a series of chance moves
Journey model, random walk, then according to the structure of network (semantic network), it is general to rely on preset transfer from start node
Rate jumps in next step, and with the increase of iterative steps, transition probability is finally intended to stable distribution.Random walk model
The build-in attribute that network structure can preferably be described, finds the central node for having significant role to network.
It in the present embodiment, can also be according to semanteme after the speech of each information is according to support scoring in obtaining information library
The speech of the corresponding information of each node scores according to support in network, obtains the reliability scoring in information library.In other words, according to money
The speech for interrogating each information in library scores according to support, obtains the reliability scoring in information library.
The speech of information is higher according to support scoring, and the reliability of information is higher, and the speech of information gets over bottom according to support scoring, information
Reliability is lower.Similar, the speech in information library is higher according to support scoring, and the reliability in information library is higher, and the speech in information library is according to branch
Bottom is got in support scoring, and the reliability in information library is lower.
In the present embodiment, information can be ranked up according to the height of support scoring according to speech, speech is scored according to support
High information is supplied to user.Further, it is scored according to the speech in each information library according to support, it is highest according to support scoring in speech
In information library, speech is chosen according to the highest information of support scoring, user is supplied to and checks.
The foothold of the present embodiment is to evaluate the reliability of viewpoint in information, to information progress depth language vector volume
Code constructs semantic network, and then can calculate the speech of each information according to support by calculating the similarity of information between any two
Scoring.
The present embodiment needs to have other information of same viewpoint with this information in information library to support this in evaluation procedure
The viewpoint of information, if supporting other information of the viewpoint seldom in information library or even other information have and the viewpoint not phase
The viewpoint of appearance, then the speech of information will be very low according to support scoring (reliability), conversely, other a large amount of information all have and examined
The evidence of information verifying same insight is considered, then the speech of information will be very high according to support scoring (reliability).
Below by two~example IV of embodiment, the step in embodiment one is described further.Wherein, real
Applying two~example IV of example will be explained in detail based on health field.
Embodiment two
The present embodiment is further described through the step of deep semantic vector coding.
Fig. 2 is the step flow chart according to the deep semantic vector coding of the embodiment of the present invention two.
Step S210 grabs everyday words in default website, and the everyday words is added in preset participle tool.
Everyday words refers to: technical term, technical term, Common names or the frequency of occurrences occurred in default website compared with
High word.
Default website is, for example: " A+ medicine encyclopaedia ", " 39 healthy net ", " net of seeking medical advice and medicine ", " Baidu's medicine encyclopaedia ".
Participle tool is, for example: stammerer participle, NLPIR, LTP, THULAC, IK-Analyzer.
By grabbing the entry in default website, everyday words is obtained, expands the dictionary of participle tool, in order to provide more managing
The participle effect thought.Such as: " allergic rhinitis " is the common disease noun of a kind of rhinitis, and most of participle tool is all cut
It is divided into " anaphylaxis " and " rhinitis " two words, after segmenting the cutting of tool, complete and effective can not embodies proprietary disease
Meaning, biggish adverse effect will be generated to subsequent semantic analysis.In this way, it is possible to specify healthy class website, to healthy class
Website is grabbed about the entry of disease and symptom, obtains everyday words.
When selecting healthy class website, selection gist is as follows: (1) there are " disease encyclopaedia " and " symptom encyclopaedia " two plates in website
Block has the page of link to be described in detail disease and symptom;(2) website filters out in all multiple search engines and clearly marks
Note is that the result that advertisement link is found outside is more forward, and has relatively clear network structure.
It is loaded into everyday words as user-oriented dictionary in participle tool, thus can use the participle tool to health information
Each health information in library carries out symbol, goes to stop word, participle operation.
Step S220 carries out word segmentation processing to all information in the information library respectively, obtains using the participle tool
To multiple participles.
Word segmentation processing is carried out to each health information in health information library respectively, obtains multiple participles, forms health money
Interrogate data set.
Step S230 uses preset point of the multiple participle training according to preset distributed term vector representation method
Cloth term vector model obtains the corresponding distributed term vector of each participle.
In the present embodiment, distributed term vector representation method can be point of word-based insertion (Word Embedding)
Cloth vector representation method.The distributed vector representation method of word-based insertion to carry out the participle in health information data set
It encodes (vector expression).
Distributed term vector model can be word2vec model, be also possible to GloVe model.Wherein, word2vec mould
Type is typical three layers of feedforward network, is indicated by input layer, hidden layer (mapping layer) and output layer, which is being provided by word
Context in news library outputs and inputs to construct, to find the context semantic relation of word.Dimension can be pre-defined, than
Such as: the context relation of all words is indicated using 250 dimensions, often one-dimensional is all multiple semantic compound, referred to as distributed language
Justice indicates.The vector that outputs and inputs of the word2vec model is each base in the one-hot coding of dictionary position, such as:
" health " if serial number is 500 in dictionary, in addition to 500, this position is 1, other positions all 0.It should
Word2vec model has two class training methods, and these two kinds of methods are just opposite to the definition output and input when building
, the method that one kind is referred to as continuous word packet (CBOW) model is that word itself is predicted with upper and lower cliction, another kind of to be referred to as Skip-
The method of gram is then that equivalent predicts its context word, and the network structure and optimization method of two class training methods have a little poor
It is different, but be provided to preferably obtain the semantic expressiveness of a comparison " compact " (dense) of word.
In the multi-task of natural language processing, quantify due to can be good at handling the context semanteme by word
The distributed vector of the reason of word itself, word is expressed as the foundation stone of natural language quantum chemical method.To by health information number
Training dataset according to collection as word2vec model utilizes the training dataset training word2vec model, it may be assumed that will be healthy
Participle in information data set is formed by sequence inputting word2vec model, by the suitable parameter of setting, such as: point of word
The parameters such as cloth dimension, contextual window size, iteration cycle, training method, and then make each point of the output of word2vec model
The corresponding distributed term vector of word.
Step S240, according to the corresponding distributed term vector of each participle, to each information in the information library
Carry out deep semantic vector coding.
In the distributed term vector representation method of word-based insertion, the context semanteme of participle has additive property, in this way
By the weighted average of participle, the deep semantic vector of every information can be obtained.
Embodiment three
The present embodiment is further described through the building of semantic network.
Fig. 3 is the step flow chart of according to embodiments of the present invention three semantic network building.
Step S310 calculates the similarity of all information between any two, obtains according to the deep semantic vector of each information
Semantic similarity matrix.
Similarity including any two information in information library in semantic similarity matrix.
In the present embodiment, the purpose of similarity calculation is to find similar speech according to support.Such as: A information shows a
Viewpoint, B information show b viewpoint, if viewpoint a and viewpoint b have similar semanteme, A, B information just corresponding speech each other
According to support, speech can be defined as the semantic similarity S (a, b) of a, b according to the intensity of support, and similarity is higher, say according to the strong of support
Degree is higher, and similarity is lower, and speech is lower according to the intensity of support, during this, the deep semantic vector point of A information and B information
It Wei not va、vb。
Although the semanteme of the distributed term vector representation method of word-based insertion has additive property, in the present embodiment
In, not only by direction similarity Spos(a, b) will also add amplitude similarity Sstr(a, b), the two common metrics two moneys
The similarity of news.
Direction similarity SposCosine similarity can be used in (a, b), is defined as:
Wherein, ‖ va‖ indicates amount of orientation vaModular arithmetic, ‖ vb‖ indicates amount of orientation vbModular arithmetic.
Amplitude similarity Sstr(a, b) is defined as follows:
In this way, the similarity of information A, B can be defined as the weighted sum of two above similarity:
S (a, b)=λ Spos(a, b)+(1- λ) Sstr(a, b)
Wherein, parameter lambda (0.5 < λ < 1) is preset value, for adjusting the weight of direction similarity and amplitude similarity.?
In the present embodiment, direction similarity embodies expressed viewpoint in the orientation consistency of semantic space, and amplitude similarity then table
For bright viewpoint in the dynamics consistency of semantic space, direction is often more important than dynamics, so codomain S (a, b) ∈ of the present embodiment
(- λ, 1].
In calculating information library after the similarity of any two information, it can be constructed according to obtained multiple similarities
Semantic similarity matrix.
Step S320 carries out principal component analysis to semantic similarity matrix, constructs sparse semantic similarity matrix.
Due to the deep semantic vector often dimension with higher of information, it is semantic completely orthogonal to there are two information
I.e. similarity is minimum for 0 probability, this indicates that semantic similarity matrix is dense matrix, and it is thick why to will cause matrix
It is close, it is on the one hand to be indicated since the distributed term vector representation method of word-based insertion is distributed to each semanteme as a result, in addition
On the one hand it is also due in information library that there are some with the little high-frequency noise of information purport semantic association.
In order to eliminate the influence of semantic high-frequency noises, principal component analysis can be carried out to dense semantic similarity matrix
(principal components analysis, abbreviation PCA) mathematically carries out singular value decomposition (Singular to it
Value decomposition, abbreviation SVD), it reconstructs to obtain a more sparse expression later again.It is obtained after reconstruct
Semantic similarity matrix is sparse semantic similarity matrix, is an approximation of original semantic similarity matrix, in addition to
The influence of some high-frequency noises is eliminated, also can be reduced the calculation amount of subsequent operation, so that subsequent Random Walk Algorithm energy
It is enough more robust.
Step S330 is constructed simply connected according to the semantic similarity matrix and the sparse semantic similarity matrix
Cum rights undirected simple graph is as semantic network.
Step 1, according to the sparse semantic similarity matrix, cum rights undirected simple graph is constructed.
Cum rights undirected simple graph refers to that the side of one opposite vertexes of association has and only one, without vertex to itself side (i.e.
There is no ring), and the figure of weight is had on side.
Using sparse semantic similarity matrix as adjacency matrix, cum rights undirected simple graph is constructed.The cum rights is undirected simple
Figure is a semantic context network actually.Every information is to the node that should be used as in cum rights undirected simple graph.
Step 2, the multiple not connected subgraphs for including in the cum rights undirected simple graph are determined.
Connected subgraph is not the subgraph not connected with other subgraphs.
Since principal component analysis eliminates the connection of the semantic context between many nodes, it is undirected simple to may cause cum rights
Figure is not a simply connected network, for subsequent analysis needs, needs to find in cum rights undirected simple graph disjunct
Several sub-networks (not connected subgraph), construct bridge in disconnected several sub-networks, so as to can be by the semanteme of whole network
Context connects.
Step 3, in the semantic similarity matrix, the similarity of the node pair between each not connected subgraph is inquired.
Node is to including two nodes, and in two not connected subgraph, one of node is located at a not connected subgraph
In, another node is located at another not in connected subgraph.
In order to not interfere with original semantic context as far as possible, it should connection as few as possible not connected subgraph, and to the greatest extent
Including the semantic context between not connected subgraph possible more.
Step 4, in the cum rights undirected simple graph, the maximum node pair of similarity is connected, the maximum phase is used
Weight like degree as the connection, constitutes simply connected cum rights undirected simple graph.
It is needed to be implemented between every two not connected subgraph: determining first node and second not in first not connected subgraph
Second node is determined in connected subgraph, in semantic similarity matrix, inquires the similarity of first node and second node;?
One does not include multiple first nodes in connected subgraph, includes multiple second nodes in second not connected subgraph, inquires each the
The similarity of one node and each second node, and multiple similarities of acquisition are ranked up, determine maximum similarity, even
The corresponding first node of maximum similarity and second node are connect, and uses the power of the maximum similarity as the connection
Weight, so that the first not connected subgraph and the second not connected subgraph connection.
Example IV
The present embodiment is further described through how to be sayed information according to support scoring.
The present embodiment implements random walk model in semantic network, completes the speech of each node in semantic network according to branch
Support scoring.
Such as: there are N information in health information library, semantic similarity matrix is M, wherein the similarity of health information i and j
For sij.Information i is denoted as in the speech that the t of iteration is walked according to support scoringAnd the initial speech of information i is according to support scoringIt is obtained according to semantic similarity matrix:
Random walk model follows support score of the speech of node i according to support scoring by other nodes in back and obtains,
A part is obtained by adjacent node, and another part is contributed to obtain by other node stochastic averaginas, then node i obtains other
The speech that node is walked in t+1 scores iterative formula according to support are as follows:
Wherein, P is preset value, and P indicates two nodes in semantic network if being connected, and selects phase from a node
The probability of neighbors travelling, and 1-P indicates to randomly choose the probability of other adjacent or non-conterminous nodes, this implementation accordingly
Example preferably, 0.5≤P≤1.W indicates that semantic network, i and j are the adjacent node in semantic network, and k is other nodes (k of i
≠ i, and k ≠ j), wijThe weight of connection between information i and j, the i.e. similarity of information i and j;wkjFor information k and j it
Between connection weight, i.e. the similarity of information k and j;sikFor the similarity of i and k.
Pass through primary condition and iterative formula in this way, so that it may which the speech for obtaining each node scores according to support.
It is scored according to the speech of the corresponding information of node each in semantic network according to support, the reliability for obtaining information library is commented
Point.The reliability scoring in entire information library may be defined as speech being averaged according to support scoring of each information:
The context mechanism information in information library where the present embodiment combines the content analysis and information of information itself,
It allows information to have circumstantial evidence, information is allowed to reach speech in the context in information library according to being in harmony certainly, else if appearance is incompatible even
It is runing counter to as a result, so reliability of the information in information library will have a greatly reduced quality.
Embodiment five
The present embodiment provides a kind of speeches of information according to support Marking apparatus.As shown in figure 4, for according to the embodiment of the present invention five
Information speech according to support Marking apparatus structure chart.
In the present embodiment, the speech of the information is according to support Marking apparatus 400, and including but not limited to: processor 410 is deposited
Reservoir 420.
The processor 410 is used to execute the speech of the information stored in memory 420 according to support scoring procedures, to realize reality
The speech of information described in one~example IV of example is applied according to support methods of marking.
Specifically, the processor 410 is used to execute the speech of the information stored in memory 420 according to support scoring journey
Sequence carries out deep semantic vector coding to all information in information library to perform the steps of respectively;According to each money
The deep semantic vector of news calculates the similarity of all information between any two, obtains semantic similarity matrix;According to the semanteme
Similarity matrix constructs semantic network;It is corresponding to each node in the semantic network according to preset random walk model
Information is sayed to score according to support.
Wherein, all information in information library carry out deep semantic vector coding respectively, comprising: in default website
Middle crawl everyday words, and the everyday words is added in preset participle tool;Using the participle tool, to the information
All information in library carry out word segmentation processing respectively, obtain multiple participles;According to preset distributed term vector representation method, make
With the preset distributed term vector model of the multiple participle training, the corresponding distributed term vector of each participle is obtained;
According to the corresponding distributed term vector of each participle, deep semantic vector volume is carried out to each information in the information library
Code.
Wherein, described according to the semantic similarity matrix, construct semantic network, comprising: to the semantic similarity square
Battle array carries out principal component analysis, constructs sparse semantic similarity matrix;According to the semantic similarity matrix and the sparse semanteme
Similarity matrix constructs simply connected cum rights undirected simple graph as semantic network.
Wherein, according to the semantic similarity matrix and the sparse semantic similarity matrix, simply connected cum rights is constructed
Undirected simple graph, comprising: according to the sparse semantic similarity matrix, construct cum rights undirected simple graph;Determine the cum rights without
The multiple not connected subgraphs for including into simple graph;It is being inquired between each not connected subgraph in the semantic similarity matrix
The similarity of node pair;In the cum rights undirected simple graph, the maximum node pair of similarity is connected, the maximum phase is used
Weight like degree as the connection, constitutes simply connected cum rights undirected simple graph.
Wherein, it is scored according to the speech of the corresponding information of node each in the semantic network according to support, obtains the information
The reliability in library scores.
Embodiment six
The embodiment of the invention also provides a kind of storage mediums.Here storage medium is stored with one or more journey
Sequence.Wherein, storage medium may include volatile memory, such as random access memory;Memory also may include non-easy
The property lost memory, such as read-only memory, flash memory, hard disk or solid state hard disk;Memory can also include mentioned kind
Memory combination.
When one or more program can be executed by one or more processor in storage medium, to realize above-mentioned money
The speech of news is according to support methods of marking.
Specifically, the processor is used to execute the speech of the information stored in memory according to support scoring procedures, with reality
Existing following steps: deep semantic vector coding is carried out respectively to all information in information library;According to the depth of each information
Semantic vector is spent, the similarity of all information between any two is calculated, obtains semantic similarity matrix;According to the semantic similarity
Matrix constructs semantic network;According to preset random walk model, to the corresponding information of node each in the semantic network into
Row speech scores according to support.
Wherein, all information in information library carry out deep semantic vector coding respectively, comprising: in default website
Middle crawl everyday words, and the everyday words is added in preset participle tool;Using the participle tool, to the information
All information in library carry out word segmentation processing respectively, obtain multiple participles;According to preset distributed term vector representation method, make
With the preset distributed term vector model of the multiple participle training, the corresponding distributed term vector of each participle is obtained;
According to the corresponding distributed term vector of each participle, deep semantic vector volume is carried out to each information in the information library
Code.
Wherein, described according to the semantic similarity matrix, construct semantic network, comprising: to the semantic similarity square
Battle array carries out principal component analysis, constructs sparse semantic similarity matrix;According to the semantic similarity matrix and the sparse semanteme
Similarity matrix constructs simply connected cum rights undirected simple graph as semantic network.
Wherein, according to the semantic similarity matrix and the sparse semantic similarity matrix, simply connected cum rights is constructed
Undirected simple graph, comprising: according to the sparse semantic similarity matrix, construct cum rights undirected simple graph;Determine the cum rights without
The multiple not connected subgraphs for including into simple graph;It is being inquired between each not connected subgraph in the semantic similarity matrix
The similarity of node pair;In the cum rights undirected simple graph, the maximum node pair of similarity is connected, the maximum phase is used
Weight like degree as the connection, constitutes simply connected cum rights undirected simple graph.
Wherein, it is scored according to the speech of the corresponding information of node each in the semantic network according to support, obtains the information
The reliability in library scores.
The above description is only an embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art
For member, the invention may be variously modified and varied.All within the spirits and principles of the present invention, it is made it is any modification,
Equivalent replacement, improvement etc., should be included within scope of the presently claimed invention.
Claims (10)
1. a kind of speech of information is according to support methods of marking characterized by comprising
Deep semantic vector coding is carried out respectively to all information in information library;
According to the deep semantic vector of each information, the similarity of all information between any two is calculated, is obtained semantic similar
Spend matrix;
According to the semantic similarity matrix, semantic network is constructed;
According to preset random walk model, the corresponding information of node each in the semantic network is sayed and is commented according to support
Point.
2. the method according to claim 1, wherein all information in information library carry out depth respectively
Semantic vector coding, comprising:
Everyday words is grabbed in default website, and the everyday words is added in preset participle tool;
Using the participle tool, word segmentation processing is carried out to all information in the information library respectively, obtains multiple participles;
According to preset distributed term vector representation method, the preset distributed term vector mould of the multiple participle training is used
Type obtains the corresponding distributed term vector of each participle;
According to the corresponding distributed term vector of each participle, to each information in the information library carry out deep semantic to
Amount coding.
3. building is semantic the method according to claim 1, wherein described according to the semantic similarity matrix
Network, comprising:
Principal component analysis is carried out to the semantic similarity matrix, constructs sparse semantic similarity matrix;
According to the semantic similarity matrix and the sparse semantic similarity matrix, simply connected cum rights undirected simple graph is constructed
As semantic network.
4. according to the method described in claim 3, it is characterized in that, according to the semantic similarity matrix and the sparse semanteme
Similarity matrix constructs simply connected cum rights undirected simple graph, comprising:
According to the sparse semantic similarity matrix, cum rights undirected simple graph is constructed;
Determine the multiple not connected subgraphs for including in the cum rights undirected simple graph;
In the similarity for the node pair inquired in the semantic similarity matrix between each not connected subgraph;
In the cum rights undirected simple graph, the maximum node pair of similarity is connected, uses the maximum similarity as institute
The weight for stating connection constitutes simply connected cum rights undirected simple graph.
5. the method according to claim 1, wherein the method also includes:
It is scored according to the speech of the corresponding information of node each in the semantic network according to support, obtains the reliability in the information library
Scoring.
6. a kind of speech of information is according to support Marking apparatus, which is characterized in that the speech of the information includes: to deposit according to support Marking apparatus
Reservoir, processor and it is stored in the computer program that can be run on the memory and on the processor, the computer
Program performs the steps of when being executed by the processor
Deep semantic vector coding is carried out respectively to all information in information library;
According to the deep semantic vector of each information, the similarity of all information between any two is calculated, is obtained semantic similar
Spend matrix;
According to the semantic similarity matrix, semantic network is constructed;
According to preset random walk model, the corresponding information of node each in the semantic network is sayed and is commented according to support
Point.
7. equipment according to claim 6, which is characterized in that the processor is also used to execute the meter stored in memory
Calculation machine program, to perform the steps of
Everyday words is grabbed in default website, and the everyday words is added in preset participle tool;
Using the participle tool, word segmentation processing is carried out to all information in the information library respectively, obtains multiple participles;
According to preset distributed term vector representation method, the preset distributed term vector mould of the multiple participle training is used
Type obtains the corresponding distributed term vector of each participle;
According to the corresponding distributed term vector of each participle, to each information in the information library carry out deep semantic to
Amount coding.
8. equipment according to claim 6, which is characterized in that the processor is also used to execute the meter stored in memory
Calculation machine program, to perform the steps of
Principal component analysis is carried out to the semantic similarity matrix, constructs sparse semantic similarity matrix;
According to the semantic similarity matrix and the sparse semantic similarity matrix, simply connected cum rights undirected simple graph is constructed
As semantic network.
9. equipment according to claim 6, which is characterized in that the processor is also used to execute the meter stored in memory
Calculation machine program, to perform the steps of
It is scored according to the speech of the corresponding information of node each in the semantic network according to support, obtains the reliability in the information library
Scoring.
10. a kind of storage medium, which is characterized in that be stored with the speech of information on the storage medium according to support scoring procedures, institute
It states when the speech of information is executed by processor according to support scoring procedures and realizes such as information according to any one of claims 1 to 5
The step of speech is according to support methods of marking.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811302326.4A CN109582953B (en) | 2018-11-02 | 2018-11-02 | Data support scoring method and equipment for information and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811302326.4A CN109582953B (en) | 2018-11-02 | 2018-11-02 | Data support scoring method and equipment for information and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109582953A true CN109582953A (en) | 2019-04-05 |
CN109582953B CN109582953B (en) | 2023-04-07 |
Family
ID=65921410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811302326.4A Active CN109582953B (en) | 2018-11-02 | 2018-11-02 | Data support scoring method and equipment for information and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109582953B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027320A (en) * | 2019-11-15 | 2020-04-17 | 北京三快在线科技有限公司 | Text similarity calculation method and device, electronic equipment and readable storage medium |
CN112100221A (en) * | 2019-06-17 | 2020-12-18 | 腾讯科技(北京)有限公司 | Information recommendation method and device, recommendation server and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130110496A1 (en) * | 2011-10-28 | 2013-05-02 | Sap Ag | Calculating Term Similarity Using A Meta-Model Semantic Network |
CN103544255A (en) * | 2013-10-15 | 2014-01-29 | 常州大学 | Text semantic relativity based network public opinion information analysis method |
CN104408115A (en) * | 2014-11-25 | 2015-03-11 | 三星电子(中国)研发中心 | Semantic link based recommendation method and device for heterogeneous resource of TV platform |
CN105808648A (en) * | 2016-02-25 | 2016-07-27 | 焦点科技股份有限公司 | R language program based personalized recommendation method |
CN105824797A (en) * | 2015-01-04 | 2016-08-03 | 华为技术有限公司 | Method, device and system evaluating semantic similarity |
CN105893362A (en) * | 2014-09-26 | 2016-08-24 | 北大方正集团有限公司 | A method for acquiring knowledge point semantic vectors and a method and a system for determining correlative knowledge points |
CN107193805A (en) * | 2017-06-06 | 2017-09-22 | 北京百度网讯科技有限公司 | Article Valuation Method, device and storage medium based on artificial intelligence |
CN107526850A (en) * | 2017-10-12 | 2017-12-29 | 燕山大学 | Social networks friend recommendation method based on multiple personality feature mixed architecture |
CN108399163A (en) * | 2018-03-21 | 2018-08-14 | 北京理工大学 | Bluebeard compound polymerize the text similarity measure with word combination semantic feature |
-
2018
- 2018-11-02 CN CN201811302326.4A patent/CN109582953B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130110496A1 (en) * | 2011-10-28 | 2013-05-02 | Sap Ag | Calculating Term Similarity Using A Meta-Model Semantic Network |
CN103544255A (en) * | 2013-10-15 | 2014-01-29 | 常州大学 | Text semantic relativity based network public opinion information analysis method |
CN105893362A (en) * | 2014-09-26 | 2016-08-24 | 北大方正集团有限公司 | A method for acquiring knowledge point semantic vectors and a method and a system for determining correlative knowledge points |
CN104408115A (en) * | 2014-11-25 | 2015-03-11 | 三星电子(中国)研发中心 | Semantic link based recommendation method and device for heterogeneous resource of TV platform |
CN105824797A (en) * | 2015-01-04 | 2016-08-03 | 华为技术有限公司 | Method, device and system evaluating semantic similarity |
CN105808648A (en) * | 2016-02-25 | 2016-07-27 | 焦点科技股份有限公司 | R language program based personalized recommendation method |
CN107193805A (en) * | 2017-06-06 | 2017-09-22 | 北京百度网讯科技有限公司 | Article Valuation Method, device and storage medium based on artificial intelligence |
CN107526850A (en) * | 2017-10-12 | 2017-12-29 | 燕山大学 | Social networks friend recommendation method based on multiple personality feature mixed architecture |
CN108399163A (en) * | 2018-03-21 | 2018-08-14 | 北京理工大学 | Bluebeard compound polymerize the text similarity measure with word combination semantic feature |
Non-Patent Citations (5)
Title |
---|
张仰森等: "基于双重注意力模型的微博情感分析方法", 《清华大学学报(自然科学版)》 * |
李晓红等: "一种基于谱分割的短文本聚类算法", 《计算机工程》 * |
李璐?等: "虚假评论检测研究综述", 《计算机学报》 * |
王阳等: "融合语义相似度与矩阵分解的评分预测算法", 《计算机应用》 * |
郭鸿奇等: "一种基于词语多原型向量表示的句子相似度计算方法", 《智能计算机与应用》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112100221A (en) * | 2019-06-17 | 2020-12-18 | 腾讯科技(北京)有限公司 | Information recommendation method and device, recommendation server and storage medium |
CN112100221B (en) * | 2019-06-17 | 2024-02-13 | 深圳市雅阅科技有限公司 | Information recommendation method and device, recommendation server and storage medium |
CN111027320A (en) * | 2019-11-15 | 2020-04-17 | 北京三快在线科技有限公司 | Text similarity calculation method and device, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109582953B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Interdonato et al. | Multilayer network simplification: approaches, models and methods | |
JP5904559B2 (en) | Scenario generation device and computer program therefor | |
Dominguez-Sal et al. | A discussion on the design of graph database benchmarks | |
WO2015093540A1 (en) | Phrase pair gathering device and computer program therefor | |
Edwards et al. | Identifying wildlife observations on twitter | |
WO2015093539A1 (en) | Complex predicate template gathering device, and computer program therefor | |
López-Cruz et al. | Bayesian network modeling of the consensus between experts: An application to neuron classification | |
Cécillon et al. | Graph embeddings for abusive language detection | |
Pal et al. | Deep learning for network analysis: problems, approaches and challenges | |
Bao et al. | Discovering interesting co-location patterns interactively using ontologies | |
He et al. | Neurally-guided semantic navigation in knowledge graph | |
CN109582953A (en) | A kind of speech of information is according to support methods of marking, equipment and storage medium | |
Huang et al. | Knowledge sharing and reuse in digital forensics | |
Gupta et al. | Fake News Analysis and Graph Classification on a COVID-19 Twitter Dataset | |
CN114218445A (en) | Anomaly detection method based on dynamic heterogeneous information network representation of metagraph | |
Wei et al. | DF-Miner: Domain-specific facet mining by leveraging the hyperlink structure of Wikipedia | |
Bi et al. | Judicial knowledge-enhanced magnitude-aware reasoning for numerical legal judgment prediction | |
Invernici et al. | Exploring the evolution of research topics during the COVID-19 pandemic | |
CN109558586A (en) | A kind of speech of information is according to from card methods of marking, equipment and storage medium | |
CN111209745B (en) | Information reliability evaluation method, equipment and storage medium | |
Křenková et al. | Similarity search with the distance density model | |
CN115759110A (en) | Malicious information detection method, device and system based on multi-feature fusion | |
Chen | English translation template retrieval based on semantic distance ontology knowledge recognition algorithm | |
Janik et al. | Explaining Link Predictions in Knowledge Graph Embedding Models with Influential Examples | |
Fang et al. | Transpath: Representation learning for heterogeneous information networks via translation mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |