CN109582953A - A kind of speech of information is according to support methods of marking, equipment and storage medium - Google Patents

A kind of speech of information is according to support methods of marking, equipment and storage medium Download PDF

Info

Publication number
CN109582953A
CN109582953A CN201811302326.4A CN201811302326A CN109582953A CN 109582953 A CN109582953 A CN 109582953A CN 201811302326 A CN201811302326 A CN 201811302326A CN 109582953 A CN109582953 A CN 109582953A
Authority
CN
China
Prior art keywords
information
semantic
similarity matrix
speech
support
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811302326.4A
Other languages
Chinese (zh)
Other versions
CN109582953B (en
Inventor
罗冠
游强
胡卫明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201811302326.4A priority Critical patent/CN109582953B/en
Publication of CN109582953A publication Critical patent/CN109582953A/en
Application granted granted Critical
Publication of CN109582953B publication Critical patent/CN109582953B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of speeches of information according to support methods of marking, equipment and storage medium.This method comprises: carrying out deep semantic vector coding respectively to all information in information library;According to the deep semantic vector of each information, the similarity of all information between any two is calculated, semantic similarity matrix is obtained;According to the semantic similarity matrix, semantic network is constructed;According to preset random walk model, the corresponding information of node each in the semantic network is sayed and is scored according to support.Foothold of the invention is to evaluate the reliability of viewpoint in information, depth language vector coding is carried out to information, by calculating the similarity of information between any two, construct semantic network, and then the speech that can calculate each information scores according to support, accuracy of the present invention is high, and human cost can be effectively reduced.

Description

A kind of speech of information is according to support methods of marking, equipment and storage medium
Technical field
The present invention relates to data mining and recommender system technical fields more particularly to a kind of speech of information according to support scoring side Method, equipment and storage medium.
Background technique
Traditional information acquiring pattern is often active, for example user actively browses portal website and obtains newest news Information, or oneself interested information is actively searched for by search engine.In recent years, with computer network and artificial intelligence Biggish transformation has occurred in the development of technology, the mode that people obtain information, and various waterfall stream informations, the information of intelligently pushing are straight Displaying is connect before user, user is being many times passively to receive these information.Turn in information acquiring pattern from active It during changing to passively, other than the benign development of technology, is also spread unchecked along with information explosion and information, some false moneys News even rumour fast propagation, so that negative effect of the positive information (such as healthy class information) by negative information.
In rumour identification project, the analysis to information content is paid close attention to, it is many by the inspection of professional or network Packet study, identifies content exaggerate in information, unreasonable, to infer whether information is for rumour.But it either relies on Inspection or network the crowdsourcing study of professional all has biggish limitation, requires to consume a large amount of human cost.By In currently without efficient rumour recognition methods, so actually network crowdsourcing study has become the unique of each platform of refuting a rumour Selection.Network crowdsourcing study relies on internet social activity participation, plays advantage with united wisdom and strength, in common marker recognition rumour Hold, the reliability of information is judged by statistics label, but network crowdsourcing study is for the quality and internet of network personnel Social participation is more demanding, is not suitable in a network environment identifying a large amount of information.
With the extensive use of depth learning technology, researcher starts to consider to identify rumour using deep learning model, Its basic ideas is still to start with from the content of information itself, is largely marked by the sample to rumour and non-rumour, The classifier for going out to distinguish the two by deep learning network struction, to directly sentence to the reliability of information content It is disconnected.But deep learning model has the following problems: although one, deep learning model achieves well in image/video field Effect, but in natural language field, especially on the information evaluation field that common people can not make discrimination, it is difficult to which it is suitable to find Deep learning model meet actual requirement;Two, the interpretation of deep learning model needs further to be furtherd investigate, in reality The result is that being calculated by large amount of complex, final result is often difficult to control the output of deep learning model in the application of border, Also the quality of output result can not be directly verified by evidence.
Summary of the invention
The main purpose of the present invention is to provide a kind of speeches of information according to support methods of marking, equipment and storage medium, with It solves the reliability recognition methods human cost height of existing information and accuracy is low.
In view of the above technical problems, the present invention solves by the following technical programs:
The present invention provides a kind of speeches of information according to support methods of marking, comprising: distinguishes all information in information library Carry out deep semantic vector coding;According to the deep semantic vector of each information, the phase of all information between any two is calculated Like degree, semantic similarity matrix is obtained;According to the semantic similarity matrix, semantic network is constructed;According to preset random trip Model is walked, the corresponding information of node each in the semantic network is sayed and is scored according to support.
Wherein, all information in information library carry out deep semantic vector coding respectively, comprising: in default website Middle crawl everyday words, and the everyday words is added in preset participle tool;Using the participle tool, to the information All information in library carry out word segmentation processing respectively, obtain multiple participles;According to preset distributed term vector representation method, make With the preset distributed term vector model of the multiple participle training, the corresponding distributed term vector of each participle is obtained; According to the corresponding distributed term vector of each participle, deep semantic vector volume is carried out to each information in the information library Code.
Wherein, described according to the semantic similarity matrix, construct semantic network, comprising: to the semantic similarity square Battle array carries out principal component analysis, constructs sparse semantic similarity matrix;According to the semantic similarity matrix and the sparse semanteme Similarity matrix constructs simply connected cum rights undirected simple graph as semantic network.
Wherein, according to the semantic similarity matrix and the sparse semantic similarity matrix, simply connected cum rights is constructed Undirected simple graph, comprising: according to the sparse semantic similarity matrix, construct cum rights undirected simple graph;Determine the cum rights without The multiple not connected subgraphs for including into simple graph;It is being inquired between each not connected subgraph in the semantic similarity matrix The similarity of node pair;In the cum rights undirected simple graph, the maximum node pair of similarity is connected, the maximum phase is used Weight like degree as the connection, constitutes simply connected cum rights undirected simple graph.
Wherein, the method also includes: commented according to the speech of the corresponding information of node each in the semantic network according to support Point, obtain the reliability scoring in the information library.
The present invention also provides a kind of speeches of information according to support Marking apparatus, and the speech of the information is according to support Marking apparatus packet It includes: memory, processor and being stored in the computer program that can be run on the memory and on the processor, the meter Calculation machine program performed the steps of when being executed by the processor to all information in information library carry out respectively deep semantic to Amount coding;According to the deep semantic vector of each information, the similarity of all information between any two is calculated, semantic phase is obtained Like degree matrix;According to the semantic similarity matrix, semantic network is constructed;According to preset random walk model, to institute's predicate The corresponding information of each node is sayed in adopted network scores according to support.
Wherein, the processor is also used to execute the computer program stored in memory, to perform the steps of pre- If grabbing everyday words in website, and the everyday words is added in preset participle tool;Using the participle tool, to institute All information stated in information library carry out word segmentation processing respectively, obtain multiple participles;It is indicated according to preset distributed term vector Method obtains the corresponding distribution of each participle using the preset distributed term vector model of the multiple participle training Term vector;According to the corresponding distributed term vector of each participle, depth language is carried out to each information in the information library Adopted vector coding.
Wherein, the processor is also used to execute the computer program stored in memory, to perform the steps of to institute Predicate justice similarity matrix carries out principal component analysis, constructs sparse semantic similarity matrix;According to the semantic similarity matrix With the sparse semantic similarity matrix, simply connected cum rights undirected simple graph is constructed as semantic network.
Wherein, the processor is also used to execute the computer program stored in memory, to perform the steps of basis The speech of the corresponding information of each node scores according to support in the semantic network, obtains the reliability scoring in the information library.
Invention further provides a kind of storage medium, the speech of information is stored on the storage medium according to support scoring journey Sequence, the speech of the information realize the speech of above-mentioned information according to support methods of marking when being executed by processor according to support scoring procedures Step.
The present invention has the beneficial effect that:
Foothold of the invention is the reliability of viewpoint in evaluation information, carries out depth language vector coding to information, By calculating the similarity of information between any two, semantic network is constructed, and then the speech that can calculate each information is commented according to support Point, accuracy of the present invention is high, and human cost can be effectively reduced.Further, present invention use has same with this information Other information of sample viewpoint support the viewpoint of this information, if support other information of the viewpoint seldom in information library, even Other information have the viewpoint incompatible with the viewpoint, then the reliability of this information will be very low, conversely, other a large amount of information All there is the evidence with considered information verifying same insight, then the reliability of this information will be very high.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the speech of according to embodiments of the present invention one information according to the flow chart of support methods of marking;
Fig. 2 is the step flow chart of according to embodiments of the present invention two deep semantic vector coding;
Fig. 3 is the step flow chart of according to embodiments of the present invention three semantic network building;
Fig. 4 is the speech of according to embodiments of the present invention five information according to the structure chart of support Marking apparatus.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with drawings and the specific embodiments, to this Invention is described in further detail.
Embodiment one
Embodiment according to the present invention one provides a kind of speech of information according to support methods of marking.As shown in Figure 1, being root According to the embodiment of the present invention one information speech according to support methods of marking flow chart.
Step S110 carries out deep semantic vector coding to all information in information library respectively.
Deep semantic vector coding, which refers to, extracts information in the vector table in semantic context space by depth learning technology Show.By depth learning technology, the Context-dependent of descriptor information where word can carry out better semanteme to word and build Mould, and vector coding refers to information being converted into computable amount, is handled convenient for computer.
Step S120 calculates the similarity of all information between any two according to the deep semantic vector of each information, Obtain semantic similarity matrix.
Similarity comprising any two information in information library in language similarity matrix.
Step S130 constructs semantic network according to the semantic similarity matrix.
Node in semantic network is the information in information library, and the connection in semantic network between any two node has Weight, the value of weight are the similarity of two nodes.
Step S140, according to preset random walk model, to the corresponding information of node each in the semantic network into Row speech scores according to support.
Random walk model is applied in network, and the random mistake of path probability relationship is formed for describing a series of chance moves Journey model, random walk, then according to the structure of network (semantic network), it is general to rely on preset transfer from start node Rate jumps in next step, and with the increase of iterative steps, transition probability is finally intended to stable distribution.Random walk model The build-in attribute that network structure can preferably be described, finds the central node for having significant role to network.
It in the present embodiment, can also be according to semanteme after the speech of each information is according to support scoring in obtaining information library The speech of the corresponding information of each node scores according to support in network, obtains the reliability scoring in information library.In other words, according to money The speech for interrogating each information in library scores according to support, obtains the reliability scoring in information library.
The speech of information is higher according to support scoring, and the reliability of information is higher, and the speech of information gets over bottom according to support scoring, information Reliability is lower.Similar, the speech in information library is higher according to support scoring, and the reliability in information library is higher, and the speech in information library is according to branch Bottom is got in support scoring, and the reliability in information library is lower.
In the present embodiment, information can be ranked up according to the height of support scoring according to speech, speech is scored according to support High information is supplied to user.Further, it is scored according to the speech in each information library according to support, it is highest according to support scoring in speech In information library, speech is chosen according to the highest information of support scoring, user is supplied to and checks.
The foothold of the present embodiment is to evaluate the reliability of viewpoint in information, to information progress depth language vector volume Code constructs semantic network, and then can calculate the speech of each information according to support by calculating the similarity of information between any two Scoring.
The present embodiment needs to have other information of same viewpoint with this information in information library to support this in evaluation procedure The viewpoint of information, if supporting other information of the viewpoint seldom in information library or even other information have and the viewpoint not phase The viewpoint of appearance, then the speech of information will be very low according to support scoring (reliability), conversely, other a large amount of information all have and examined The evidence of information verifying same insight is considered, then the speech of information will be very high according to support scoring (reliability).
Below by two~example IV of embodiment, the step in embodiment one is described further.Wherein, real Applying two~example IV of example will be explained in detail based on health field.
Embodiment two
The present embodiment is further described through the step of deep semantic vector coding.
Fig. 2 is the step flow chart according to the deep semantic vector coding of the embodiment of the present invention two.
Step S210 grabs everyday words in default website, and the everyday words is added in preset participle tool.
Everyday words refers to: technical term, technical term, Common names or the frequency of occurrences occurred in default website compared with High word.
Default website is, for example: " A+ medicine encyclopaedia ", " 39 healthy net ", " net of seeking medical advice and medicine ", " Baidu's medicine encyclopaedia ".
Participle tool is, for example: stammerer participle, NLPIR, LTP, THULAC, IK-Analyzer.
By grabbing the entry in default website, everyday words is obtained, expands the dictionary of participle tool, in order to provide more managing The participle effect thought.Such as: " allergic rhinitis " is the common disease noun of a kind of rhinitis, and most of participle tool is all cut It is divided into " anaphylaxis " and " rhinitis " two words, after segmenting the cutting of tool, complete and effective can not embodies proprietary disease Meaning, biggish adverse effect will be generated to subsequent semantic analysis.In this way, it is possible to specify healthy class website, to healthy class Website is grabbed about the entry of disease and symptom, obtains everyday words.
When selecting healthy class website, selection gist is as follows: (1) there are " disease encyclopaedia " and " symptom encyclopaedia " two plates in website Block has the page of link to be described in detail disease and symptom;(2) website filters out in all multiple search engines and clearly marks Note is that the result that advertisement link is found outside is more forward, and has relatively clear network structure.
It is loaded into everyday words as user-oriented dictionary in participle tool, thus can use the participle tool to health information Each health information in library carries out symbol, goes to stop word, participle operation.
Step S220 carries out word segmentation processing to all information in the information library respectively, obtains using the participle tool To multiple participles.
Word segmentation processing is carried out to each health information in health information library respectively, obtains multiple participles, forms health money Interrogate data set.
Step S230 uses preset point of the multiple participle training according to preset distributed term vector representation method Cloth term vector model obtains the corresponding distributed term vector of each participle.
In the present embodiment, distributed term vector representation method can be point of word-based insertion (Word Embedding) Cloth vector representation method.The distributed vector representation method of word-based insertion to carry out the participle in health information data set It encodes (vector expression).
Distributed term vector model can be word2vec model, be also possible to GloVe model.Wherein, word2vec mould Type is typical three layers of feedforward network, is indicated by input layer, hidden layer (mapping layer) and output layer, which is being provided by word Context in news library outputs and inputs to construct, to find the context semantic relation of word.Dimension can be pre-defined, than Such as: the context relation of all words is indicated using 250 dimensions, often one-dimensional is all multiple semantic compound, referred to as distributed language Justice indicates.The vector that outputs and inputs of the word2vec model is each base in the one-hot coding of dictionary position, such as: " health " if serial number is 500 in dictionary, in addition to 500, this position is 1, other positions all 0.It should Word2vec model has two class training methods, and these two kinds of methods are just opposite to the definition output and input when building , the method that one kind is referred to as continuous word packet (CBOW) model is that word itself is predicted with upper and lower cliction, another kind of to be referred to as Skip- The method of gram is then that equivalent predicts its context word, and the network structure and optimization method of two class training methods have a little poor It is different, but be provided to preferably obtain the semantic expressiveness of a comparison " compact " (dense) of word.
In the multi-task of natural language processing, quantify due to can be good at handling the context semanteme by word The distributed vector of the reason of word itself, word is expressed as the foundation stone of natural language quantum chemical method.To by health information number Training dataset according to collection as word2vec model utilizes the training dataset training word2vec model, it may be assumed that will be healthy Participle in information data set is formed by sequence inputting word2vec model, by the suitable parameter of setting, such as: point of word The parameters such as cloth dimension, contextual window size, iteration cycle, training method, and then make each point of the output of word2vec model The corresponding distributed term vector of word.
Step S240, according to the corresponding distributed term vector of each participle, to each information in the information library Carry out deep semantic vector coding.
In the distributed term vector representation method of word-based insertion, the context semanteme of participle has additive property, in this way By the weighted average of participle, the deep semantic vector of every information can be obtained.
Embodiment three
The present embodiment is further described through the building of semantic network.
Fig. 3 is the step flow chart of according to embodiments of the present invention three semantic network building.
Step S310 calculates the similarity of all information between any two, obtains according to the deep semantic vector of each information Semantic similarity matrix.
Similarity including any two information in information library in semantic similarity matrix.
In the present embodiment, the purpose of similarity calculation is to find similar speech according to support.Such as: A information shows a Viewpoint, B information show b viewpoint, if viewpoint a and viewpoint b have similar semanteme, A, B information just corresponding speech each other According to support, speech can be defined as the semantic similarity S (a, b) of a, b according to the intensity of support, and similarity is higher, say according to the strong of support Degree is higher, and similarity is lower, and speech is lower according to the intensity of support, during this, the deep semantic vector point of A information and B information It Wei not va、vb
Although the semanteme of the distributed term vector representation method of word-based insertion has additive property, in the present embodiment In, not only by direction similarity Spos(a, b) will also add amplitude similarity Sstr(a, b), the two common metrics two moneys The similarity of news.
Direction similarity SposCosine similarity can be used in (a, b), is defined as:
Wherein, ‖ va‖ indicates amount of orientation vaModular arithmetic, ‖ vb‖ indicates amount of orientation vbModular arithmetic.
Amplitude similarity Sstr(a, b) is defined as follows:
In this way, the similarity of information A, B can be defined as the weighted sum of two above similarity:
S (a, b)=λ Spos(a, b)+(1- λ) Sstr(a, b)
Wherein, parameter lambda (0.5 < λ < 1) is preset value, for adjusting the weight of direction similarity and amplitude similarity.? In the present embodiment, direction similarity embodies expressed viewpoint in the orientation consistency of semantic space, and amplitude similarity then table For bright viewpoint in the dynamics consistency of semantic space, direction is often more important than dynamics, so codomain S (a, b) ∈ of the present embodiment (- λ, 1].
In calculating information library after the similarity of any two information, it can be constructed according to obtained multiple similarities Semantic similarity matrix.
Step S320 carries out principal component analysis to semantic similarity matrix, constructs sparse semantic similarity matrix.
Due to the deep semantic vector often dimension with higher of information, it is semantic completely orthogonal to there are two information I.e. similarity is minimum for 0 probability, this indicates that semantic similarity matrix is dense matrix, and it is thick why to will cause matrix It is close, it is on the one hand to be indicated since the distributed term vector representation method of word-based insertion is distributed to each semanteme as a result, in addition On the one hand it is also due in information library that there are some with the little high-frequency noise of information purport semantic association.
In order to eliminate the influence of semantic high-frequency noises, principal component analysis can be carried out to dense semantic similarity matrix (principal components analysis, abbreviation PCA) mathematically carries out singular value decomposition (Singular to it Value decomposition, abbreviation SVD), it reconstructs to obtain a more sparse expression later again.It is obtained after reconstruct Semantic similarity matrix is sparse semantic similarity matrix, is an approximation of original semantic similarity matrix, in addition to The influence of some high-frequency noises is eliminated, also can be reduced the calculation amount of subsequent operation, so that subsequent Random Walk Algorithm energy It is enough more robust.
Step S330 is constructed simply connected according to the semantic similarity matrix and the sparse semantic similarity matrix Cum rights undirected simple graph is as semantic network.
Step 1, according to the sparse semantic similarity matrix, cum rights undirected simple graph is constructed.
Cum rights undirected simple graph refers to that the side of one opposite vertexes of association has and only one, without vertex to itself side (i.e. There is no ring), and the figure of weight is had on side.
Using sparse semantic similarity matrix as adjacency matrix, cum rights undirected simple graph is constructed.The cum rights is undirected simple Figure is a semantic context network actually.Every information is to the node that should be used as in cum rights undirected simple graph.
Step 2, the multiple not connected subgraphs for including in the cum rights undirected simple graph are determined.
Connected subgraph is not the subgraph not connected with other subgraphs.
Since principal component analysis eliminates the connection of the semantic context between many nodes, it is undirected simple to may cause cum rights Figure is not a simply connected network, for subsequent analysis needs, needs to find in cum rights undirected simple graph disjunct Several sub-networks (not connected subgraph), construct bridge in disconnected several sub-networks, so as to can be by the semanteme of whole network Context connects.
Step 3, in the semantic similarity matrix, the similarity of the node pair between each not connected subgraph is inquired.
Node is to including two nodes, and in two not connected subgraph, one of node is located at a not connected subgraph In, another node is located at another not in connected subgraph.
In order to not interfere with original semantic context as far as possible, it should connection as few as possible not connected subgraph, and to the greatest extent Including the semantic context between not connected subgraph possible more.
Step 4, in the cum rights undirected simple graph, the maximum node pair of similarity is connected, the maximum phase is used Weight like degree as the connection, constitutes simply connected cum rights undirected simple graph.
It is needed to be implemented between every two not connected subgraph: determining first node and second not in first not connected subgraph Second node is determined in connected subgraph, in semantic similarity matrix, inquires the similarity of first node and second node;? One does not include multiple first nodes in connected subgraph, includes multiple second nodes in second not connected subgraph, inquires each the The similarity of one node and each second node, and multiple similarities of acquisition are ranked up, determine maximum similarity, even The corresponding first node of maximum similarity and second node are connect, and uses the power of the maximum similarity as the connection Weight, so that the first not connected subgraph and the second not connected subgraph connection.
Example IV
The present embodiment is further described through how to be sayed information according to support scoring.
The present embodiment implements random walk model in semantic network, completes the speech of each node in semantic network according to branch Support scoring.
Such as: there are N information in health information library, semantic similarity matrix is M, wherein the similarity of health information i and j For sij.Information i is denoted as in the speech that the t of iteration is walked according to support scoringAnd the initial speech of information i is according to support scoringIt is obtained according to semantic similarity matrix:
Random walk model follows support score of the speech of node i according to support scoring by other nodes in back and obtains, A part is obtained by adjacent node, and another part is contributed to obtain by other node stochastic averaginas, then node i obtains other The speech that node is walked in t+1 scores iterative formula according to support are as follows:
Wherein, P is preset value, and P indicates two nodes in semantic network if being connected, and selects phase from a node The probability of neighbors travelling, and 1-P indicates to randomly choose the probability of other adjacent or non-conterminous nodes, this implementation accordingly Example preferably, 0.5≤P≤1.W indicates that semantic network, i and j are the adjacent node in semantic network, and k is other nodes (k of i ≠ i, and k ≠ j), wijThe weight of connection between information i and j, the i.e. similarity of information i and j;wkjFor information k and j it Between connection weight, i.e. the similarity of information k and j;sikFor the similarity of i and k.
Pass through primary condition and iterative formula in this way, so that it may which the speech for obtaining each node scores according to support.
It is scored according to the speech of the corresponding information of node each in semantic network according to support, the reliability for obtaining information library is commented Point.The reliability scoring in entire information library may be defined as speech being averaged according to support scoring of each information:
The context mechanism information in information library where the present embodiment combines the content analysis and information of information itself, It allows information to have circumstantial evidence, information is allowed to reach speech in the context in information library according to being in harmony certainly, else if appearance is incompatible even It is runing counter to as a result, so reliability of the information in information library will have a greatly reduced quality.
Embodiment five
The present embodiment provides a kind of speeches of information according to support Marking apparatus.As shown in figure 4, for according to the embodiment of the present invention five Information speech according to support Marking apparatus structure chart.
In the present embodiment, the speech of the information is according to support Marking apparatus 400, and including but not limited to: processor 410 is deposited Reservoir 420.
The processor 410 is used to execute the speech of the information stored in memory 420 according to support scoring procedures, to realize reality The speech of information described in one~example IV of example is applied according to support methods of marking.
Specifically, the processor 410 is used to execute the speech of the information stored in memory 420 according to support scoring journey Sequence carries out deep semantic vector coding to all information in information library to perform the steps of respectively;According to each money The deep semantic vector of news calculates the similarity of all information between any two, obtains semantic similarity matrix;According to the semanteme Similarity matrix constructs semantic network;It is corresponding to each node in the semantic network according to preset random walk model Information is sayed to score according to support.
Wherein, all information in information library carry out deep semantic vector coding respectively, comprising: in default website Middle crawl everyday words, and the everyday words is added in preset participle tool;Using the participle tool, to the information All information in library carry out word segmentation processing respectively, obtain multiple participles;According to preset distributed term vector representation method, make With the preset distributed term vector model of the multiple participle training, the corresponding distributed term vector of each participle is obtained; According to the corresponding distributed term vector of each participle, deep semantic vector volume is carried out to each information in the information library Code.
Wherein, described according to the semantic similarity matrix, construct semantic network, comprising: to the semantic similarity square Battle array carries out principal component analysis, constructs sparse semantic similarity matrix;According to the semantic similarity matrix and the sparse semanteme Similarity matrix constructs simply connected cum rights undirected simple graph as semantic network.
Wherein, according to the semantic similarity matrix and the sparse semantic similarity matrix, simply connected cum rights is constructed Undirected simple graph, comprising: according to the sparse semantic similarity matrix, construct cum rights undirected simple graph;Determine the cum rights without The multiple not connected subgraphs for including into simple graph;It is being inquired between each not connected subgraph in the semantic similarity matrix The similarity of node pair;In the cum rights undirected simple graph, the maximum node pair of similarity is connected, the maximum phase is used Weight like degree as the connection, constitutes simply connected cum rights undirected simple graph.
Wherein, it is scored according to the speech of the corresponding information of node each in the semantic network according to support, obtains the information The reliability in library scores.
Embodiment six
The embodiment of the invention also provides a kind of storage mediums.Here storage medium is stored with one or more journey Sequence.Wherein, storage medium may include volatile memory, such as random access memory;Memory also may include non-easy The property lost memory, such as read-only memory, flash memory, hard disk or solid state hard disk;Memory can also include mentioned kind Memory combination.
When one or more program can be executed by one or more processor in storage medium, to realize above-mentioned money The speech of news is according to support methods of marking.
Specifically, the processor is used to execute the speech of the information stored in memory according to support scoring procedures, with reality Existing following steps: deep semantic vector coding is carried out respectively to all information in information library;According to the depth of each information Semantic vector is spent, the similarity of all information between any two is calculated, obtains semantic similarity matrix;According to the semantic similarity Matrix constructs semantic network;According to preset random walk model, to the corresponding information of node each in the semantic network into Row speech scores according to support.
Wherein, all information in information library carry out deep semantic vector coding respectively, comprising: in default website Middle crawl everyday words, and the everyday words is added in preset participle tool;Using the participle tool, to the information All information in library carry out word segmentation processing respectively, obtain multiple participles;According to preset distributed term vector representation method, make With the preset distributed term vector model of the multiple participle training, the corresponding distributed term vector of each participle is obtained; According to the corresponding distributed term vector of each participle, deep semantic vector volume is carried out to each information in the information library Code.
Wherein, described according to the semantic similarity matrix, construct semantic network, comprising: to the semantic similarity square Battle array carries out principal component analysis, constructs sparse semantic similarity matrix;According to the semantic similarity matrix and the sparse semanteme Similarity matrix constructs simply connected cum rights undirected simple graph as semantic network.
Wherein, according to the semantic similarity matrix and the sparse semantic similarity matrix, simply connected cum rights is constructed Undirected simple graph, comprising: according to the sparse semantic similarity matrix, construct cum rights undirected simple graph;Determine the cum rights without The multiple not connected subgraphs for including into simple graph;It is being inquired between each not connected subgraph in the semantic similarity matrix The similarity of node pair;In the cum rights undirected simple graph, the maximum node pair of similarity is connected, the maximum phase is used Weight like degree as the connection, constitutes simply connected cum rights undirected simple graph.
Wherein, it is scored according to the speech of the corresponding information of node each in the semantic network according to support, obtains the information The reliability in library scores.
The above description is only an embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art For member, the invention may be variously modified and varied.All within the spirits and principles of the present invention, it is made it is any modification, Equivalent replacement, improvement etc., should be included within scope of the presently claimed invention.

Claims (10)

1. a kind of speech of information is according to support methods of marking characterized by comprising
Deep semantic vector coding is carried out respectively to all information in information library;
According to the deep semantic vector of each information, the similarity of all information between any two is calculated, is obtained semantic similar Spend matrix;
According to the semantic similarity matrix, semantic network is constructed;
According to preset random walk model, the corresponding information of node each in the semantic network is sayed and is commented according to support Point.
2. the method according to claim 1, wherein all information in information library carry out depth respectively Semantic vector coding, comprising:
Everyday words is grabbed in default website, and the everyday words is added in preset participle tool;
Using the participle tool, word segmentation processing is carried out to all information in the information library respectively, obtains multiple participles;
According to preset distributed term vector representation method, the preset distributed term vector mould of the multiple participle training is used Type obtains the corresponding distributed term vector of each participle;
According to the corresponding distributed term vector of each participle, to each information in the information library carry out deep semantic to Amount coding.
3. building is semantic the method according to claim 1, wherein described according to the semantic similarity matrix Network, comprising:
Principal component analysis is carried out to the semantic similarity matrix, constructs sparse semantic similarity matrix;
According to the semantic similarity matrix and the sparse semantic similarity matrix, simply connected cum rights undirected simple graph is constructed As semantic network.
4. according to the method described in claim 3, it is characterized in that, according to the semantic similarity matrix and the sparse semanteme Similarity matrix constructs simply connected cum rights undirected simple graph, comprising:
According to the sparse semantic similarity matrix, cum rights undirected simple graph is constructed;
Determine the multiple not connected subgraphs for including in the cum rights undirected simple graph;
In the similarity for the node pair inquired in the semantic similarity matrix between each not connected subgraph;
In the cum rights undirected simple graph, the maximum node pair of similarity is connected, uses the maximum similarity as institute The weight for stating connection constitutes simply connected cum rights undirected simple graph.
5. the method according to claim 1, wherein the method also includes:
It is scored according to the speech of the corresponding information of node each in the semantic network according to support, obtains the reliability in the information library Scoring.
6. a kind of speech of information is according to support Marking apparatus, which is characterized in that the speech of the information includes: to deposit according to support Marking apparatus Reservoir, processor and it is stored in the computer program that can be run on the memory and on the processor, the computer Program performs the steps of when being executed by the processor
Deep semantic vector coding is carried out respectively to all information in information library;
According to the deep semantic vector of each information, the similarity of all information between any two is calculated, is obtained semantic similar Spend matrix;
According to the semantic similarity matrix, semantic network is constructed;
According to preset random walk model, the corresponding information of node each in the semantic network is sayed and is commented according to support Point.
7. equipment according to claim 6, which is characterized in that the processor is also used to execute the meter stored in memory Calculation machine program, to perform the steps of
Everyday words is grabbed in default website, and the everyday words is added in preset participle tool;
Using the participle tool, word segmentation processing is carried out to all information in the information library respectively, obtains multiple participles;
According to preset distributed term vector representation method, the preset distributed term vector mould of the multiple participle training is used Type obtains the corresponding distributed term vector of each participle;
According to the corresponding distributed term vector of each participle, to each information in the information library carry out deep semantic to Amount coding.
8. equipment according to claim 6, which is characterized in that the processor is also used to execute the meter stored in memory Calculation machine program, to perform the steps of
Principal component analysis is carried out to the semantic similarity matrix, constructs sparse semantic similarity matrix;
According to the semantic similarity matrix and the sparse semantic similarity matrix, simply connected cum rights undirected simple graph is constructed As semantic network.
9. equipment according to claim 6, which is characterized in that the processor is also used to execute the meter stored in memory Calculation machine program, to perform the steps of
It is scored according to the speech of the corresponding information of node each in the semantic network according to support, obtains the reliability in the information library Scoring.
10. a kind of storage medium, which is characterized in that be stored with the speech of information on the storage medium according to support scoring procedures, institute It states when the speech of information is executed by processor according to support scoring procedures and realizes such as information according to any one of claims 1 to 5 The step of speech is according to support methods of marking.
CN201811302326.4A 2018-11-02 2018-11-02 Data support scoring method and equipment for information and storage medium Active CN109582953B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811302326.4A CN109582953B (en) 2018-11-02 2018-11-02 Data support scoring method and equipment for information and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811302326.4A CN109582953B (en) 2018-11-02 2018-11-02 Data support scoring method and equipment for information and storage medium

Publications (2)

Publication Number Publication Date
CN109582953A true CN109582953A (en) 2019-04-05
CN109582953B CN109582953B (en) 2023-04-07

Family

ID=65921410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811302326.4A Active CN109582953B (en) 2018-11-02 2018-11-02 Data support scoring method and equipment for information and storage medium

Country Status (1)

Country Link
CN (1) CN109582953B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027320A (en) * 2019-11-15 2020-04-17 北京三快在线科技有限公司 Text similarity calculation method and device, electronic equipment and readable storage medium
CN112100221A (en) * 2019-06-17 2020-12-18 腾讯科技(北京)有限公司 Information recommendation method and device, recommendation server and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130110496A1 (en) * 2011-10-28 2013-05-02 Sap Ag Calculating Term Similarity Using A Meta-Model Semantic Network
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN104408115A (en) * 2014-11-25 2015-03-11 三星电子(中国)研发中心 Semantic link based recommendation method and device for heterogeneous resource of TV platform
CN105808648A (en) * 2016-02-25 2016-07-27 焦点科技股份有限公司 R language program based personalized recommendation method
CN105824797A (en) * 2015-01-04 2016-08-03 华为技术有限公司 Method, device and system evaluating semantic similarity
CN105893362A (en) * 2014-09-26 2016-08-24 北大方正集团有限公司 A method for acquiring knowledge point semantic vectors and a method and a system for determining correlative knowledge points
CN107193805A (en) * 2017-06-06 2017-09-22 北京百度网讯科技有限公司 Article Valuation Method, device and storage medium based on artificial intelligence
CN107526850A (en) * 2017-10-12 2017-12-29 燕山大学 Social networks friend recommendation method based on multiple personality feature mixed architecture
CN108399163A (en) * 2018-03-21 2018-08-14 北京理工大学 Bluebeard compound polymerize the text similarity measure with word combination semantic feature

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130110496A1 (en) * 2011-10-28 2013-05-02 Sap Ag Calculating Term Similarity Using A Meta-Model Semantic Network
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN105893362A (en) * 2014-09-26 2016-08-24 北大方正集团有限公司 A method for acquiring knowledge point semantic vectors and a method and a system for determining correlative knowledge points
CN104408115A (en) * 2014-11-25 2015-03-11 三星电子(中国)研发中心 Semantic link based recommendation method and device for heterogeneous resource of TV platform
CN105824797A (en) * 2015-01-04 2016-08-03 华为技术有限公司 Method, device and system evaluating semantic similarity
CN105808648A (en) * 2016-02-25 2016-07-27 焦点科技股份有限公司 R language program based personalized recommendation method
CN107193805A (en) * 2017-06-06 2017-09-22 北京百度网讯科技有限公司 Article Valuation Method, device and storage medium based on artificial intelligence
CN107526850A (en) * 2017-10-12 2017-12-29 燕山大学 Social networks friend recommendation method based on multiple personality feature mixed architecture
CN108399163A (en) * 2018-03-21 2018-08-14 北京理工大学 Bluebeard compound polymerize the text similarity measure with word combination semantic feature

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
张仰森等: "基于双重注意力模型的微博情感分析方法", 《清华大学学报(自然科学版)》 *
李晓红等: "一种基于谱分割的短文本聚类算法", 《计算机工程》 *
李璐?等: "虚假评论检测研究综述", 《计算机学报》 *
王阳等: "融合语义相似度与矩阵分解的评分预测算法", 《计算机应用》 *
郭鸿奇等: "一种基于词语多原型向量表示的句子相似度计算方法", 《智能计算机与应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100221A (en) * 2019-06-17 2020-12-18 腾讯科技(北京)有限公司 Information recommendation method and device, recommendation server and storage medium
CN112100221B (en) * 2019-06-17 2024-02-13 深圳市雅阅科技有限公司 Information recommendation method and device, recommendation server and storage medium
CN111027320A (en) * 2019-11-15 2020-04-17 北京三快在线科技有限公司 Text similarity calculation method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN109582953B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
Interdonato et al. Multilayer network simplification: approaches, models and methods
JP5904559B2 (en) Scenario generation device and computer program therefor
Dominguez-Sal et al. A discussion on the design of graph database benchmarks
WO2015093540A1 (en) Phrase pair gathering device and computer program therefor
Edwards et al. Identifying wildlife observations on twitter
WO2015093539A1 (en) Complex predicate template gathering device, and computer program therefor
López-Cruz et al. Bayesian network modeling of the consensus between experts: An application to neuron classification
Cécillon et al. Graph embeddings for abusive language detection
Pal et al. Deep learning for network analysis: problems, approaches and challenges
Bao et al. Discovering interesting co-location patterns interactively using ontologies
He et al. Neurally-guided semantic navigation in knowledge graph
CN109582953A (en) A kind of speech of information is according to support methods of marking, equipment and storage medium
Huang et al. Knowledge sharing and reuse in digital forensics
Gupta et al. Fake News Analysis and Graph Classification on a COVID-19 Twitter Dataset
CN114218445A (en) Anomaly detection method based on dynamic heterogeneous information network representation of metagraph
Wei et al. DF-Miner: Domain-specific facet mining by leveraging the hyperlink structure of Wikipedia
Bi et al. Judicial knowledge-enhanced magnitude-aware reasoning for numerical legal judgment prediction
Invernici et al. Exploring the evolution of research topics during the COVID-19 pandemic
CN109558586A (en) A kind of speech of information is according to from card methods of marking, equipment and storage medium
CN111209745B (en) Information reliability evaluation method, equipment and storage medium
Křenková et al. Similarity search with the distance density model
CN115759110A (en) Malicious information detection method, device and system based on multi-feature fusion
Chen English translation template retrieval based on semantic distance ontology knowledge recognition algorithm
Janik et al. Explaining Link Predictions in Knowledge Graph Embedding Models with Influential Examples
Fang et al. Transpath: Representation learning for heterogeneous information networks via translation mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant