CN106570162A - Canard identification method and device based on artificial intelligence - Google Patents

Canard identification method and device based on artificial intelligence Download PDF

Info

Publication number
CN106570162A
CN106570162A CN201610974822.9A CN201610974822A CN106570162A CN 106570162 A CN106570162 A CN 106570162A CN 201610974822 A CN201610974822 A CN 201610974822A CN 106570162 A CN106570162 A CN 106570162A
Authority
CN
China
Prior art keywords
text
identified
vector
rumour
term vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610974822.9A
Other languages
Chinese (zh)
Other versions
CN106570162B (en
Inventor
张军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610974822.9A priority Critical patent/CN106570162B/en
Publication of CN106570162A publication Critical patent/CN106570162A/en
Application granted granted Critical
Publication of CN106570162B publication Critical patent/CN106570162B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a canard identification method and device based on artificial intelligence, wherein the method comprises the following steps of: obtaining a text to be identified; on the basis of a BOW (Bag of Words) model, generating a word vector corresponding to the text to be identified; converting the word vector into a vector, the length of which is 2, by utilizing a projection matrix model; and taking the vector, the length of which is 2, as the input, and calculating the probability that the text to be identified is a canard through a regression function SOFTMAX. By means of the canard identification method and device provided by the invention, network canards in internet information can be identified rapidly; and the network canard identification rate and timeliness can be increased.

Description

Rumour recognition methodss and device based on artificial intelligence
Technical field
The present invention relates to Internet technical field, more particularly to a kind of rumour recognition methodss and dress based on artificial intelligence Put.
Background technology
Artificial intelligence (Artificial Intelligence), english abbreviation is AI.It is study, be developed for simulation, Extend and extend a new science of technology of intelligent theory, method, technology and the application system of people.Artificial intelligence is to calculate One branch of machine science, it attempts to understand essence of intelligence, and produce it is a kind of it is new can be in the similar mode of human intelligence The intelligent machine made a response, the research in the field includes robot, language identification, image recognition, natural language processing and specially Family's system etc..The development of artificial intelligence technology has also driven the progress of other correlation techniques, such as network rumour technology of identification.
Network rumour is referred to and passed by network medium (such as E-mail address, chat software, social network sites, network forum etc.) The language having no factual evidence broadcast, relates generally to accident, public sphere, famous person important official, overturns traditional, guilty of heterodoxy etc. Content.Normal social order is easily upset in the propagation of network rumour, and society is had undesirable effect.
With the continuous development of Internet technology, the spread speed of internet information is more and more faster, the propagation of network rumour Speed is also accelerated therewith.Therefore, the network rumour in internet information how is effectively recognized, becomes Internet technical field urgently The problem of solution.
In existing network rumour recognition methodss, judge that whether the network information is generally according to default lists of keywords Rumour.When there is the word matched with the word in lists of keywords in internet information, then it is assumed that the internet information is rumour. Because existing network rumour recognition methodss are to carry out network rumour identification by predetermined keyword, thus discrimination is low, and Due to the hysteresis quality of lists of keywords, existing network rumour recognition methodss are made to recognize the poor in timeliness of rumour.
The content of the invention
The purpose of the present invention is intended at least solve one of above-mentioned technical problem to a certain extent.
For this purpose, first purpose of the present invention is to propose a kind of rumour recognition methodss based on artificial intelligence, the method Can rapidly recognize the network rumour in internet information, improve the discrimination of network rumour and ageing.
Second object of the present invention is to propose a kind of rumour identifying device based on artificial intelligence.
Third object of the present invention is to propose a kind of terminal.
Fourth object of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
5th purpose of the present invention is to propose a kind of computer program.
To achieve these goals, first aspect present invention embodiment proposes a kind of rumour based on artificial intelligence and recognizes Method, including:Obtain text to be identified;Term vector corresponding with text to be identified is generated based on bag of words BOW model;Using projection Term vector is converted to matrix model the vector that length is 2;Using vector that length is 2 as input, by regression function SOFTMAX calculates probability of the text to be identified for rumour.
The rumour recognition methodss based on artificial intelligence that first aspect present invention embodiment is proposed, by obtaining text to be identified This, based on bag of words term vector corresponding with text to be identified is generated, and term vector is converted to into length using projection matrix model The vector for 2 is spent, and the probability that text to be identified is rumour is calculated by regression function as input.Thereby, it is possible to quick Network rumour in ground identification internet information, improves the discrimination of network rumour and ageing.
To achieve these goals, second aspect present invention embodiment proposes a kind of rumour based on artificial intelligence and recognizes Device, including:Acquisition module, for obtaining text to be identified;Generation module, for being generated based on bag of words BOW model and waiting to know The corresponding term vector of other text;Modular converter, for term vector to be converted to into the vector that length is 2 using projection matrix model; Computing module, for, used as input, it to be rumour to calculate text to be identified by regression function SOFTMAX using vector that length is 2 Probability.
The rumour identifying device based on artificial intelligence that second aspect present invention embodiment is proposed, by obtaining text to be identified This, based on bag of words term vector corresponding with text to be identified is generated, and term vector is converted to into length using projection matrix model The vector for 2 is spent, and the probability that text to be identified is rumour is calculated by regression function as input.Thereby, it is possible to quick Network rumour in ground identification internet information, improves the discrimination of network rumour and ageing.
To achieve these goals, third aspect present invention embodiment proposes a kind of terminal, including:Processor;For The memorizer of storage processor executable.Wherein, processor is configured to perform following steps:
Obtain text to be identified;
Term vector corresponding with text to be identified is generated based on bag of words BOW model;
Term vector is converted to into the vector that length is 2 using projection matrix model;
Using vector that length is 2 as input, text to be identified is calculated for the general of rumour by regression function SOFTMAX Rate.
The terminal that third aspect present invention embodiment is proposed, by obtaining text to be identified, based on bag of words generate with The corresponding term vector of text to be identified, the vector that length is 2 is converted to using projection matrix model by term vector, and as input The probability that text to be identified is rumour is calculated by regression function.Thereby, it is possible to rapidly recognize the net in internet information Network rumour, improves the discrimination of network rumour and ageing.
To achieve these goals, fourth aspect present invention embodiment proposes a kind of non-transitory computer-readable storage Medium, for storing one or more programs, when computing device of the instruction in storage medium by mobile terminal so that move Dynamic terminal is able to carry out a kind of rumour recognition methodss based on artificial intelligence, and method includes;
Obtain text to be identified;
Term vector corresponding with text to be identified is generated based on bag of words BOW model;
Term vector is converted to into the vector that length is 2 using projection matrix model;
Using vector that length is 2 as input, text to be identified is calculated for the general of rumour by regression function SOFTMAX Rate.
The non-transitorycomputer readable storage medium that fourth aspect present invention embodiment is proposed, by obtaining text to be identified This, based on bag of words term vector corresponding with text to be identified is generated, and term vector is converted to into length using projection matrix model The vector for 2 is spent, and the probability that text to be identified is rumour is calculated by regression function as input.Thereby, it is possible to quick Network rumour in ground identification internet information, improves the discrimination of network rumour and ageing.
To achieve these goals, fifth aspect present invention embodiment proposes a kind of computer program, works as calculating Instruction in machine program product is when executed by, and performs a kind of rumour recognition methodss based on artificial intelligence, and method includes:
Obtain text to be identified;
Term vector corresponding with text to be identified is generated based on bag of words BOW model;
Term vector is converted to into the vector that length is 2 using projection matrix model;
Using vector that length is 2 as input, text to be identified is calculated for the general of rumour by regression function SOFTMAX Rate.
The computer program that fifth aspect present invention embodiment is proposed, by obtaining text to be identified, based on bag of words Model generates term vector corresponding with text to be identified, and term vector is converted to into the vector that length is 2 using projection matrix model, And the probability that text to be identified is rumour is calculated by regression function as input.Thereby, it is possible to rapidly recognize the Internet Network rumour in information, improves the discrimination of network rumour and ageing.
Description of the drawings
The above-mentioned and/or additional aspect of the present invention and advantage will become from the following description of the accompanying drawings of embodiments It is substantially and easy to understand, wherein:
Fig. 1 is the schematic flow sheet of the rumour recognition methodss based on artificial intelligence that one embodiment of the invention is proposed;
Fig. 2 is the schematic flow sheet that term vector corresponding with text to be identified is generated based on BOW models;
Fig. 3 is the exemplary plot illustrated to the present embodiment so that text to be identified is for web documents content as an example;
Fig. 4 is the schematic flow sheet of the rumour recognition methodss based on artificial intelligence that another embodiment of the present invention is proposed;
Fig. 5 is the schematic flow sheet of the parameter for training projection matrix model;
Fig. 6 is the schematic flow sheet of the rumour recognition methodss based on artificial intelligence that further embodiment of this invention is proposed;
Fig. 7 is the structural representation of the rumour identifying device based on artificial intelligence that one embodiment of the invention is proposed;
Fig. 8 is the structural representation of the rumour identifying device based on artificial intelligence that another embodiment of the present invention is proposed;
Fig. 9 is the structural representation of the rumour identifying device based on artificial intelligence that further embodiment of this invention is proposed;
Figure 10 is the structural representation of the rumour identifying device based on artificial intelligence that yet another embodiment of the invention is proposed;
Figure 11 is the structural representation of the rumour identifying device based on artificial intelligence that a further embodiment of the present invention is proposed.
Specific embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from start to finish Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and be not considered as limiting the invention.Conversely, Embodiments of the invention include all changes fallen in the range of the spirit and intension of attached claims, modification and are equal to Thing.
Fig. 1 is the schematic flow sheet of the rumour recognition methodss based on artificial intelligence that one embodiment of the invention is proposed.
As shown in figure 1, the present embodiment is included based on the rumour recognition methodss of artificial intelligence:
S11:Obtain text to be identified.
In the present embodiment, in order to judge whether certain internet information is rumour, first have to obtain the interconnection by the Internet Net information is used as text to be identified.
Wherein, internet information can be title of the longer web documents of length, or news etc..
S12:Term vector corresponding with text to be identified is generated based on bag of words BOW model.
In the present embodiment, after text to be identified is got, you can according to bag of words (Bag Of Words, BOW) model Generate term vector corresponding with text to be identified.
Wherein, BOW models are the conventional document representing methods of information retrieval field.In information retrieval, BOW model assumptions For a document, the key elements such as the order of words, grammer and syntax of the document are not considered, it only only is considered as several vocabulary Set, the appearance of each word is independent in the document, does not rely on whether other words occur.
We assume that, in a huge collection of document, include several documents.Extract all in all documents Word, constitutes a dictionary comprising Q word.Using BOW models, each document presented as a Q tie up to Amount, Q is positive integer.Wherein, the number of times that i-th word occurs in the document in i-th element representation dictionary in vector, i For positive integer.
Specifically, as shown in Fig. 2 generating term vector corresponding with text to be identified based on BOW models, can include following Step:
S121:It is multiple participles by text dividing to be identified.
In the present embodiment, term vector corresponding with text to be identified is obtained in order to be based on BOW models, obtaining text to be identified After this, word segmentation processing is carried out to text to be identified first with correlation technique, be multiple participles by text dividing to be identified.
S122:Obtain the corresponding participle vector of multiple participles.
In the present embodiment, after text to be identified to be carried out word segmentation processing, and then the corresponding participle of each participle is obtained Vector.Wherein, the corresponding participle vector of each participle can be obtained by way of lookup dictionary.
Specifically, it is assumed that there are a dictionary comprising N number of word, the term vector size (embedding of each word Size it is) M, then the dictionary can be expressed as the term vector matrix of a N*M.Wherein, N, M are positive integer, and the size of M generally sets It is set between 50~1000.The word given for one, it is assumed that the word corresponds to the row k in term vector matrix, then by looking into Look for the term vector matrix, you can obtain the term vector of the word.Wherein, k is positive integer.
So that text to be identified is for web documents content content as an example.It is by content cuttings initially with correlation technique Multiple participles, are designated as respectively w1, w2..., wn.Subsequently, according to above-mentioned term vector matrix, be obtained by searching by way of dictionary The each self-corresponding term vector of the n participle is obtained, emb (w are designated as respectively1), emb (w2) ..., emb (wn)。
It should be noted that term vector is a kind of mode for being digitized the word in language, i.e., a word is represented Into a vector.Simplest term vector method for expressing is One-hot Representation, and this method is each vocabulary A very long vector is shown as, the dimension of vector represents the size of vocabulary, and the component of a vector value of only one of which dimension is " 1 ", its His component value is all " 0 ", and the position of " 1 " is to should position of the word in vocabulary.For example, " mike " is expressed as [0 00 0000 010 00000 0...], it is assumed that start note from 0, mike is designated as 8, represent this word of mike in word The 8th position in table.Also a kind of term vector method for expressing is Distributed Representation, its basic idea It is:Each word in certain language is mapped to by training for the vector of one regular length, all vector set are formed One term vector space, a point in each of which vector representation space.The general of " distance " is introduced in term vector space Read, you can according to the distance between word come grammatical term for the character and similarity of the word in grammer, semantically.The present invention can adopt either one Method represents term vector, and this is not restricted.
S123:Using BOW models, computing is carried out to participle vector, to generate the corresponding term vector of text to be identified.
In the present embodiment, after participle vector is obtained, you can computing is carried out to participle vector using BOW models, with life Into the corresponding term vector of text to be identified.
Specifically, the corresponding participle vector of each participle is carried out simple summation operation by BOW models, i.e., to each participle to Amount carries out being added by element, and the result of gained is the corresponding term vector of text to be identified.
Therefore, the corresponding term vector rep (content) of above-mentioned content can be expressed as:
Rep (content)=sum (emb (w1), emb (w2) ..., emb (wn))
S13:Term vector is converted to into the vector that length is 2 using projection matrix model.
In the present embodiment, after the corresponding term vector of text to be identified is got, you can will using projection matrix model Term vector is converted to the vector that length is 2.
Specifically, term vector is converted to into the vector that length is 2 using projection matrix model, including:Using projection matrix Model carries out project to term vector, generates the corresponding matrix of term vector;By nonlinear change function, matrix is transported Calculate, generate the vector that length is 2.Wherein, nonlinear change function includes sigmoid functions, tan and activation primitive In one kind.
It should be noted that project and the operation for carrying out computing by nonlinear change function pair matrix are not limited to one It is secondary, the vector that length is 2, the invention is not limited in this regard can be obtained by multiple computing.
It is aforementioned to obtain content using BOW models still so that text to be identified is for web documents content content as an example Corresponding term vector rep (content).Rep (content) is the vector that a length is N, using projection matrix model pair Rep (content) carries out project, i.e., another rep (content) is multiplied by the matrix of a N*M, then obtain a length for M's Vector.Then, by nonlinear change function such as sigmoid functions, the vector to obtaining carries out nonlinear operation, obtains Remain the vector that length is M.Continuing the vector to obtaining after nonlinear operation carries out project, and another its is multiplied by a M*2 Matrix, you can obtain the vector that length is 2.Wherein, M, N are positive integer.
S14:Using vector that length is 2 as input, text to be identified is calculated for rumour by regression function SOFTMAX Probability.
In the present embodiment, after the corresponding term vector of text to be identified is converted to into the vector that length is 2, by gained vector As input, the probability for obtaining text to be identified for rumour can be calculated by regression function SOFTMAX.
Wherein, SOFTMAX functions are a kind of functions that can solve the problem that many classification problems, and its essence is by appointing that a K is tieed up Meaning real number DUAL PROBLEMS OF VECTOR MAPPING into another K dimension real number vector, wherein each the element value in vector between (0,1) between, to In amount all elements and be positive integer for 1, K.Therefore, when be input into length be 2 it is vectorial when, through SOFTMAX functions output Be also vector that length is 2, and in vector two elements value between 0~1, two vectors and for 1, thus energy Enough probability and be not the probability of rumour for representing that text to be identified is rumour respectively.
For example, for SOFTMAX functions, its output can be expressed as y=[y1, y2..., yk], wherein, k is just whole Number, represents the length of output vector.In the present embodiment, k=2 is taken.The length for assuming the output of SOFTMAX functions be 2 it is vectorial First element representation text to be identified is not the probability of rumour, second element representation text to be identified for rumour probability. If for a certain text to be identified, the output vector of SOFTMAX functions is y=[0.8,0.2], i.e., the probability for not being rumour is 0.8, the probability for being rumour is 0.2, then show that the text to be identified is not rumour.If for a certain text to be identified, SOFTMAX The output vector of function is y=[0.26,0.74], and the probability for not being rumour is 0.26, and the probability for being rumour is 0.74, then show The text to be identified is rumour.
(such as, the rumour recognition methodss based on artificial intelligence of the present embodiment, can be deployed in contents production server Mhkc), used in the server such as content forwarding server (such as, bing must be answered), for judging that whether internet information be Rumour.
Below the present embodiment is specifically described so that text to be identified is for web documents content as an example.As shown in figure 3, this In embodiment, existing official's report sample and rumour sample are obtained from internet database first, and be respectively labeled as just Example and negative example are trained as training sample using the model based on gradient, obtain the parameter of projection matrix model.Obtaining After a certain web documents content in the Internet, the term vector for generating the web documents content based on BOW models is represented, Jin Erjie Close training obtained by projection matrix model parameter, the term vector of the web documents content is represented be converted to length be 2 to Amount, and as the input of SOFTMAX functions, finally according to the output of SOFTMAX functions judge the web documents content whether as Rumour.The rumour recognition methodss based on artificial intelligence that the embodiment of the present invention is proposed, by obtaining text to be identified, based on bag of words Model generates term vector corresponding with text to be identified, and term vector is converted to into the vector that length is 2 using projection matrix model, And the probability that text to be identified is rumour is calculated by regression function as input.Thereby, it is possible to rapidly recognize the Internet Network rumour in information, improves the discrimination of network rumour and ageing.
Fig. 4 is the schematic flow sheet of the rumour recognition methodss based on artificial intelligence that another embodiment of the present invention is proposed.
As shown in figure 4, be based on above-described embodiment, using projection matrix model by term vector be converted to length be 2 to Before amount, can also comprise the following steps:
S15:The parameter of training projection matrix model.
In the present embodiment, in order to conversion is carried out to the term vector of text to be identified using projection matrix model length is generated The vector for 2 is spent, the parameter for training projection matrix model first is needed.
It should be noted that the parameter of training projection matrix model is not necessarily carried out after step s 12, can be with Any instant before execution step S13 is carried out, the invention is not limited in this regard.
Specifically, as shown in figure 5, the parameter of training projection matrix model, may comprise steps of:
S151:Sample data is obtained, sample data includes that official reports sample and rumour sample.
In the present embodiment, sample data can be obtained from internet database as the parameter of training projection matrix model Required training data, wherein, sample data includes that official reports sample and rumour sample.
S152:Official is reported sample as positive example, rumour sample trains generation parameter, and utilization to be based on as negative example The model optimization parameter of gradient.
In the present embodiment, after sample data is obtained, official's report sample of acquisition is labeled as into positive example, the ballad that will be obtained Speech sample is labeled as negative example, and after the completion of mark using the sample data for marking as training data, training generates projection square The parameter of battle array model, and then the parameter for generating is optimized using the model based on gradient.
Specifically, it is possible to use the pair-wise training methodes or utilization based on sample pair is based on single sample Point-wise training methodes are trained to generate the parameter of projection matrix model to sample data.Using the mould based on gradient The method of type parameters optimization can also have various, such as stochastic gradient descent (Stochastic Gradient Descent, SGD) algorithm, square amount (Momentum) algorithm, reversely self-adaption gradient (Adaptive Gradient, AdaGrad) algorithm, biography Broadcast (Back Propagation, BP) algorithm etc..
By taking SGD optimized algorithms as an example.The thought of SGD algorithms be by calculate a certain group of sample data gradient (parameter Partial derivative), carry out the parameter of the more newly-generated projection matrix model of iteration.Iteration update process be:Obtained by front an iteration The gradient of parameter is multiplied by learning rate (learning rate) i.e. step-length, and the result obtained by this iteration is updated in parameter.Jing After crossing successive ignition, the difference between the value and actual value of the parameter for finally giving can be made to converge on negative logarithm loss (negative log loss) function.
It should be noted that can be instructed to sample data using pair-wise methods or point-wise methods Practice the parameter for generating projection matrix model, it is also possible to generate the parameter of projection matrix model using other training methodes.In addition, Unknown losses function can be used as optimization aim, such as 0-1 loss functions, quadratic loss function, absolute loss function etc.. The present invention is not restricted to the training method of parameter, optimization method and optimization object function.
The rumour recognition methodss based on artificial intelligence that the embodiment of the present invention is proposed, by obtaining official sample and ballad are reported Speech sample is used to train the parameter of projection matrix model as sample data respectively as positive example and negative example, and based on gradient Model optimization parameter, can make the operation result of projection matrix model more accurate, further improve the accurate of network rumour identification Rate.
Fig. 6 is the schematic flow sheet of the rumour recognition methodss based on artificial intelligence that further embodiment of this invention is proposed.
As shown in fig. 6, being based on above-described embodiment, after the probability that text to be identified is rumour is calculated, can also wrap Include following steps:
S16:Text to be identified is processed accordingly according to probability.
In the present embodiment, after the probability that text to be identified is rumour is calculated, you can judge that the text to be identified is It is no for rumour, and text to be identified is processed accordingly according to judged result.Such as, however, it is determined that text to be identified is ballad Speech, text to be identified can be carried out account close, it is eye-catching sign etc. process;If it is determined that text to be identified is not rumour, then directly Connect display text to be identified.
The rumour recognition methodss based on artificial intelligence that the embodiment of the present invention is proposed, by being calculating text to be identified After the probability of rumour, text to be identified is processed accordingly according to gained probability, user's identification network ballad can be helped Speech, it is to avoid user is endangered by rumour information.
In order to realize above-described embodiment, the invention allows for a kind of rumour identifying device based on artificial intelligence, Fig. 7 is The structural representation of the rumour identifying device based on artificial intelligence that one embodiment of the invention is proposed.
As shown in fig. 7, the present embodiment is included based on the rumour identifying device of artificial intelligence:Acquisition module 710, generation mould Block 720, modular converter 730, and computing module 740.Wherein,
Acquisition module 710, for obtaining text to be identified.
Generation module 720, for generating term vector corresponding with text to be identified based on bag of words BOW model.
Specifically, as shown in figure 8, generation module 720 includes:
Cutting unit 721, for by text dividing to be identified be multiple participles.
First acquisition unit 722, for obtaining the corresponding participle vector of multiple participles.
Arithmetic element 723, it is corresponding to generate text to be identified for using BOW models, to participle vector computing being carried out Term vector.
Modular converter 730, for term vector to be converted to into the vector that length is 2 using projection matrix model.
Specifically, modular converter 730 is used for:
Project is carried out to term vector using projection matrix model, the corresponding matrix of term vector is generated;
By nonlinear change function, computing is carried out to matrix, generate the vector that length is 2.
Wherein, nonlinear change function includes the one kind in sigmoid functions, tan and activation primitive.
Computing module 740, as input, calculates to be identified for using vector that length is 2 by regression function SOFTMAX Text is the probability of rumour.
It should be noted that to the explanation of the rumour recognition methodss embodiment based on artificial intelligence in previous embodiment The rumour identifying device based on artificial intelligence of the present embodiment is also applied for, it realizes that principle is similar to, and here is omitted.
The rumour identifying device based on artificial intelligence that the embodiment of the present invention is proposed, by obtaining text to be identified, is based on Bag of words generate term vector corresponding with text to be identified, and it is 2 that term vector is converted to into length using projection matrix model Vector, and the probability that text to be identified is rumour is calculated by regression function as input.Thereby, it is possible to rapidly recognize mutually Network rumour in networked information, improves the discrimination of network rumour and ageing.
Fig. 9 is the structural representation of the rumour identifying device based on artificial intelligence that further embodiment of this invention is proposed.Such as Shown in Fig. 9, on the basis of as shown in Figure 7, the rumour identifying device based on artificial intelligence that the present embodiment is proposed can also be wrapped Include:
Training module 750, for training the parameter of projection matrix model.
Specifically, as shown in Figure 10, training module 750 includes:
Second acquisition unit 751, for obtaining sample data, sample data includes that official reports sample and rumour sample.
Training unit 752, for official to be reported sample as positive example, rumour sample trains generation parameter as negative example, And using the model optimization parameter based on gradient.
It should be noted that to the explanation of the rumour recognition methodss embodiment based on artificial intelligence in previous embodiment The rumour identifying device based on artificial intelligence of the present embodiment is also applied for, it realizes that principle is similar to, and here is omitted.
The rumour identifying device based on artificial intelligence that the embodiment of the present invention is proposed, by obtaining official sample and ballad are reported Speech sample is used to train the parameter of projection matrix model as sample data respectively as positive example and negative example, and based on gradient Model optimization parameter, can make the operation result of projection matrix model more accurate, further improve the accurate of network rumour identification Rate.
Figure 11 is the structural representation of the rumour identifying device based on artificial intelligence that a further embodiment of the present invention is proposed.Such as Shown in Figure 11, on the basis of as shown in Figure 7, the rumour identifying device based on artificial intelligence that the present embodiment is proposed can be with Including:
Processing module 760, for after the probability that text to be identified is rumour is calculated, according to probability to text to be identified Originally processed accordingly.
In the present embodiment, after the probability that text to be identified is rumour is calculated, you can judge that the text to be identified is It is no for rumour, and text to be identified is processed accordingly according to judged result.Such as, however, it is determined that text to be identified is ballad Speech, text to be identified can be carried out account close, it is eye-catching sign etc. process;If it is determined that text to be identified is not rumour, then directly Connect display text to be identified.
It should be noted that to the explanation of the rumour recognition methodss embodiment based on artificial intelligence in previous embodiment The rumour identifying device based on artificial intelligence of the present embodiment is also applied for, it realizes that principle is similar to, and here is omitted.
The rumour identifying device based on artificial intelligence that the embodiment of the present invention is proposed, by being calculating text to be identified After the probability of rumour, text to be identified is processed accordingly according to gained probability, user's identification network ballad can be helped Speech, it is to avoid user is endangered by rumour information.
In order to realize above-described embodiment, the invention allows for a kind of terminal, including:Processor, and for storage at The memorizer of reason device executable instruction.Wherein, processor is configured to perform following steps:
S11’:Obtain text to be identified.
S12’:Term vector corresponding with text to be identified is generated based on bag of words BOW model.
S13’:Term vector is converted to into the vector that length is 2 using projection matrix model.
S14’:Using vector that length is 2 as input, it is rumour to calculate text to be identified by regression function SOFTMAX Probability.
It should be noted that to the explanation of the rumour recognition methodss embodiment based on artificial intelligence in previous embodiment The terminal of the present embodiment is also applied for, it realizes that principle is similar to, and here is omitted.
The terminal that the embodiment of the present invention is proposed, by obtaining text to be identified, is generated and text to be identified based on bag of words This corresponding term vector, the vector that length is 2 is converted to using projection matrix model by term vector, and as input by returning Function calculates the probability that text to be identified is rumour.Thereby, it is possible to rapidly recognize the network rumour in internet information, carry The discrimination of high network rumour and ageing.
In order to realize above-described embodiment, the invention allows for a kind of non-transitorycomputer readable storage medium, is used for One or more programs are stored, when computing device of the instruction in storage medium by mobile terminal so that mobile terminal energy Enough perform the rumour recognition methodss based on artificial intelligence that first aspect present invention embodiment is proposed.
The non-transitorycomputer readable storage medium that the embodiment of the present invention is proposed, by obtaining text to be identified, is based on Bag of words generate term vector corresponding with text to be identified, and it is 2 that term vector is converted to into length using projection matrix model Vector, and the probability that text to be identified is rumour is calculated by regression function as input.Thereby, it is possible to rapidly recognize mutually Network rumour in networked information, improves the discrimination of network rumour and ageing.
In order to realize above-described embodiment, the invention allows for a kind of computer program, works as computer program In instruction be when executed by, perform the rumour identification side based on artificial intelligence that first aspect present invention embodiment is proposed Method.
The computer program that the embodiment of the present invention is proposed, by obtaining text to be identified, is generated based on bag of words Term vector corresponding with text to be identified, the vector that length is 2 is converted to using projection matrix model by term vector, and as defeated Enter and the probability that text to be identified is rumour is calculated by regression function.Thereby, it is possible to rapidly recognize in internet information Network rumour, improves the discrimination of network rumour and ageing.
It should be noted that in describing the invention, term " first ", " second " etc. are not only used for describing purpose, and not It is understood that to indicate or implying relative importance.Additionally, in describing the invention, unless otherwise stated, the implication of " multiple " It is two or more.
In flow chart or here any process described otherwise above or method description are construed as, expression includes It is one or more for realizing specific logical function or process the step of the module of code of executable instruction, fragment or portion Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussion suitable Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention Embodiment person of ordinary skill in the field understood.
It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, the software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage Or firmware is realizing.For example, if realized with hardware, and in another embodiment, can be with well known in the art Any one of row technology or their combination are realizing:With for realizing the logic gates of logic function to data signal Discrete logic, the special IC with suitable combinational logic gate circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried Suddenly the hardware that can be by program to instruct correlation is completed, and described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
Additionally, each functional unit in each embodiment of the invention can be integrated in a processing module, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a module.Above-mentioned integrated mould Block both can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.The integrated module is such as Fruit is realized and as independent production marketing or when using using in the form of software function module, it is also possible to be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read only memory, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means to combine specific features, structure, material or spy that the embodiment or example are described Point is contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not Identical embodiment or example must be directed to.And, the specific features of description, structure, material or feature can be with office Combine in an appropriate manner in one or more embodiments or example.Additionally, in the case of not conflicting, the skill of this area Art personnel can be tied the feature of the different embodiments or example described in this specification and different embodiments or example Close and combine.
Although embodiments of the invention have been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to limitation of the present invention is interpreted as, one of ordinary skill in the art within the scope of the invention can be to above-mentioned Embodiment is changed, changes, replacing and modification.

Claims (14)

1. a kind of rumour recognition methodss based on artificial intelligence, it is characterised in that include:
Obtain text to be identified;
Term vector corresponding with the text to be identified is generated based on bag of words BOW model;
The term vector is converted to into the vector that length is 2 using projection matrix model;
Using vector that the length is 2 as input, the text to be identified is calculated for rumour by regression function SOFTMAX Probability.
2. the method for claim 1, it is characterised in that generated based on bag of words BOW and the text pair to be identified The term vector answered, including:
It is multiple participles by the text dividing to be identified;
Obtain the corresponding participle vector of the plurality of participle;
Using the BOW models, computing is carried out to the participle vector, to generate the corresponding term vector of the text to be identified.
3. the method for claim 1, it is characterised in that the term vector is converted to into length using projection matrix model For 2 vector, including:
Project is carried out to the term vector using projection matrix model, the corresponding matrix of the term vector is generated;
By nonlinear change function, computing is carried out to the matrix, generate the vector that the length is 2.
4. method as claimed in claim 3, it is characterised in that the nonlinear change function includes sigmoid functions, tangent One kind in function and activation primitive.
5. the method for claim 1, it is characterised in that also include:
Train the parameter of the projection matrix model.
6. method as claimed in claim 5, it is characterised in that the parameter of the training projection matrix model, including:
Sample data is obtained, the sample data includes that official reports sample and rumour sample;
The official is reported sample as positive example, used as negative example, training generates the parameter to the rumour sample, and utilizes base Parameter described in model optimization in gradient.
7. the method for claim 1, it is characterised in that also include:
After the probability that the text to be identified is rumour is calculated, phase is carried out to the text to be identified according to the probability The process answered.
8. a kind of rumour identifying device based on artificial intelligence, it is characterised in that include:
Acquisition module, for obtaining text to be identified;
Generation module, for generating term vector corresponding with the text to be identified based on bag of words BOW model;
Modular converter, for the term vector to be converted to into the vector that length is 2 using projection matrix model;
Computing module, for using vector that the length is 2 as input, by regression function SOFTMAX calculate described in wait to know Other text is the probability of rumour.
9. device as claimed in claim 8, it is characterised in that the generation module, including:
Cutting unit, for by the text dividing to be identified be multiple participles;
First acquisition unit, for obtaining the corresponding participle vector of the plurality of participle;
Arithmetic element, for using the BOW models, to the participle vector computing being carried out, to generate the text to be identified Corresponding term vector.
10. device as claimed in claim 8, it is characterised in that the modular converter, is used for:
Project is carried out to the term vector using projection matrix model, the corresponding matrix of the term vector is generated;
By nonlinear change function, computing is carried out to the matrix, generate the vector that the length is 2.
11. devices as claimed in claim 10, it is characterised in that the nonlinear change function includes sigmoid functions, just Cut function and the one kind in activation primitive.
12. devices as claimed in claim 8, it is characterised in that also include:
Training module, for training the parameter of the projection matrix model.
13. devices as claimed in claim 12, it is characterised in that the training module, including:
Second acquisition unit, for obtaining sample data, the sample data includes that official reports sample and rumour sample;
Training unit, for the official to be reported sample as positive example, the rumour sample generates described as negative example, training Parameter, and using parameter described in the model optimization based on gradient.
14. devices as claimed in claim 8, it is characterised in that also include:
Processing module, for after the probability that the text to be identified is rumour is calculated, being treated to described according to the probability Identification text is processed accordingly.
CN201610974822.9A 2016-11-04 2016-11-04 Artificial intelligence-based rumor recognition method and device Active CN106570162B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610974822.9A CN106570162B (en) 2016-11-04 2016-11-04 Artificial intelligence-based rumor recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610974822.9A CN106570162B (en) 2016-11-04 2016-11-04 Artificial intelligence-based rumor recognition method and device

Publications (2)

Publication Number Publication Date
CN106570162A true CN106570162A (en) 2017-04-19
CN106570162B CN106570162B (en) 2020-07-28

Family

ID=58539915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610974822.9A Active CN106570162B (en) 2016-11-04 2016-11-04 Artificial intelligence-based rumor recognition method and device

Country Status (1)

Country Link
CN (1) CN106570162B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491480A (en) * 2018-03-12 2018-09-04 义语智能科技(上海)有限公司 Rumour detection method and equipment
CN108614855A (en) * 2018-03-19 2018-10-02 众安信息技术服务有限公司 A kind of rumour recognition methods
CN109388696A (en) * 2018-09-30 2019-02-26 北京字节跳动网络技术有限公司 Delete method, apparatus, storage medium and the electronic equipment of rumour article
CN109753646A (en) * 2017-11-01 2019-05-14 深圳市腾讯计算机系统有限公司 A kind of article attribute recognition approach and electronic equipment
CN111610913A (en) * 2020-04-24 2020-09-01 维沃移动通信有限公司 Message identification method and device and electronic equipment
CN112487176A (en) * 2020-11-26 2021-03-12 北京智源人工智能研究院 Social robot detection method, system, storage medium and electronic device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902621A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 Method and device for identifying network rumor
CN104679739A (en) * 2013-11-27 2015-06-03 江苏华御信息技术有限公司 Method for controlling spreading of unreal information
US20150170046A1 (en) * 2013-12-17 2015-06-18 International Business Machines Corporation Analysis of evaluations from internet media
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
US20150309992A1 (en) * 2014-04-18 2015-10-29 Itoric, Llc Automated comprehension of natural language via constraint-based processing
CN105045857A (en) * 2015-07-09 2015-11-11 中国科学院计算技术研究所 Social network rumor recognition method and system
CN105354305A (en) * 2015-11-05 2016-02-24 北京邮电大学 Online-rumor identification method and apparatus
CN105787101A (en) * 2016-03-18 2016-07-20 联想(北京)有限公司 Information processing method and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902621A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 Method and device for identifying network rumor
CN104679739A (en) * 2013-11-27 2015-06-03 江苏华御信息技术有限公司 Method for controlling spreading of unreal information
US20150170046A1 (en) * 2013-12-17 2015-06-18 International Business Machines Corporation Analysis of evaluations from internet media
US20150309992A1 (en) * 2014-04-18 2015-10-29 Itoric, Llc Automated comprehension of natural language via constraint-based processing
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
CN105045857A (en) * 2015-07-09 2015-11-11 中国科学院计算技术研究所 Social network rumor recognition method and system
CN105354305A (en) * 2015-11-05 2016-02-24 北京邮电大学 Online-rumor identification method and apparatus
CN105787101A (en) * 2016-03-18 2016-07-20 联想(北京)有限公司 Information processing method and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
毛二松 等: ""基于深层特征和集成分类器的微博谣言检测研究"", 《计算机应用研究》 *
程志 等: ""网络地震谣言监测系统的设计和应用"", 《华南地震》 *
贺刚 等: ""微博谣言识别研究"", 《图书情报工作》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753646A (en) * 2017-11-01 2019-05-14 深圳市腾讯计算机系统有限公司 A kind of article attribute recognition approach and electronic equipment
CN109753646B (en) * 2017-11-01 2022-10-21 深圳市腾讯计算机系统有限公司 Article attribute identification method and electronic equipment
CN108491480A (en) * 2018-03-12 2018-09-04 义语智能科技(上海)有限公司 Rumour detection method and equipment
CN108491480B (en) * 2018-03-12 2021-05-11 义语智能科技(上海)有限公司 Rumor detection method and apparatus
CN108614855A (en) * 2018-03-19 2018-10-02 众安信息技术服务有限公司 A kind of rumour recognition methods
CN109388696A (en) * 2018-09-30 2019-02-26 北京字节跳动网络技术有限公司 Delete method, apparatus, storage medium and the electronic equipment of rumour article
CN111610913A (en) * 2020-04-24 2020-09-01 维沃移动通信有限公司 Message identification method and device and electronic equipment
CN111610913B (en) * 2020-04-24 2021-08-24 维沃移动通信有限公司 Message identification method and device and electronic equipment
CN112487176A (en) * 2020-11-26 2021-03-12 北京智源人工智能研究院 Social robot detection method, system, storage medium and electronic device

Also Published As

Publication number Publication date
CN106570162B (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN110263324B (en) Text processing method, model training method and device
CN106570162A (en) Canard identification method and device based on artificial intelligence
CN108733837B (en) Natural language structuring method and device for medical history text
CN110097085B (en) Lyric text generation method, training method, device, server and storage medium
CN110851596A (en) Text classification method and device and computer readable storage medium
CN106557563A (en) Query statement based on artificial intelligence recommends method and device
CN106156365A (en) A kind of generation method and device of knowledge mapping
CN113627447B (en) Label identification method, label identification device, computer equipment, storage medium and program product
CN110390018A (en) A kind of social networks comment generation method based on LSTM
CN104462066A (en) Method and device for labeling semantic role
CN110263325A (en) Chinese automatic word-cut
CN110134954A (en) A kind of name entity recognition method based on Attention mechanism
CN108763539A (en) A kind of file classification method and system based on parts of speech classification
KR20200087977A (en) Multimodal ducument summary system and method
CN115658955B (en) Cross-media retrieval and model training method, device, equipment and menu retrieval system
CN108470061A (en) A kind of emotional semantic classification system for visual angle grade text
CN114443899A (en) Video classification method, device, equipment and medium
CN113515632A (en) Text classification method based on graph path knowledge extraction
CN110472245A (en) A kind of multiple labeling emotional intensity prediction technique based on stratification convolutional neural networks
CN104391969A (en) User query statement syntactic structure determining method and device
CN115455171A (en) Method, device, equipment and medium for mutual retrieval and model training of text videos
CN112579794B (en) Method and system for predicting semantic tree for Chinese and English word pairs
CN111767720B (en) Title generation method, computer and readable storage medium
Bhatia et al. Analysing cyberbullying using natural language processing by understanding jargon in social media
CN113011126A (en) Text processing method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant