CN117743581A

CN117743581A - Intervention method for agricultural product quality safety network rumors

Info

Publication number: CN117743581A
Application number: CN202311807450.7A
Authority: CN
Inventors: 田儒雅; 梁晓贺; 杨璐; 吴蕾; 孙巍
Original assignee: Agricultural Information Institute of CAAS
Current assignee: Agricultural Information Institute of CAAS
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-03-22
Anticipated expiration: 2043-12-26
Also published as: CN117743581B

Abstract

The invention relates to an intervention method for a rumor of an agricultural product quality safety network, which can effectively detect and intervene the rumor of the agricultural product quality safety in the network. The method mainly comprises five steps. Firstly, acquiring information about agricultural product quality safety in social media, news media and other public information sources through a big data technology; then screening and classifying the acquired information by using a natural language processing technology; then, carrying out deep analysis on the resources classified as negative information by adopting a machine learning algorithm to determine whether the resources are rumors or not; then, an intervention mechanism is started aiming at the detected rumors, real and accurate agricultural product quality information is pushed to affected consumers, and rumor information is reported to related departments; by the method, the accuracy of the quality safety information of the agricultural products can be improved, and the fairness and normal operation of the agricultural product market are ensured.

Description

Intervention method for agricultural product quality safety network rumors

Technical Field

The invention belongs to the field of big data, and particularly relates to an intervention method of a network rumor for agricultural product quality safety.

Background

In recent years, with the rapid development of the internet and information communication technology, networks have become an important platform for people to acquire and spread information. The information on networks is very diverse and includes a lot of information about the quality safety of agricultural products. However, there is a lot of false information, i.e. rumors, interspersed with, which has a great negative impact on scientific knowledge of the quality of the agricultural product and on the correct decisions. Particularly, counterfeit and misleading rumors related to the quality safety of agricultural products often cause panic of consumers, prevent the normal circulation and transaction of the agricultural products, and further have profound effects on agricultural production and the agricultural product market.

Therefore, how to effectively identify and intervene in these rumors is an urgent issue. Traditional rumor intervention mainly relies on manual approaches, such as setting a special reporting platform, and collecting and processing various reporting information. However, this method is time-consuming and labor-consuming, and is difficult to cope with a network environment with a large amount of information and a high update speed.

At present, the network rumor intervention research in China is concentrated in a qualitative stage, takes case analysis as a main method, and lacks a scientific, systematic and quantitative intervention method.

Disclosure of Invention

The invention mainly solves the technical problems of effectively collecting the information of the agricultural product quality safety rumors, accurately distinguishing the content of the rumors, effectively turning over the rumors by using a specific intervention mechanism, reforming the brand image of the front agricultural product and resisting the adverse effects of the rumors. In addition, the invention aims at evaluating the intervention effect and providing feedback so as to purposefully adjust and optimize the intervention strategy and provide long-term and continuous guarantee for the quality safety of agricultural products. The series of solutions can resist the propagation of the agricultural product quality safety rumors, help to improve the confidence of the public on the agricultural product quality, stabilize the agricultural product market and promote the sustainable and healthy development of agriculture.

In order to achieve the above purpose, the present invention is realized by adopting the following technical scheme: the intervention method comprises the following steps:

step one, data acquisition, namely performing data crawling on social media, news media and other public real life information through a big data technology to acquire all information resources about agricultural product quality safety;

screening and classifying the acquired information by using a natural language processing technology, and determining whether the information is positive information, negative information or neutral information;

and thirdly, rumor detection, namely performing deep analysis on the resources classified as negative information by adopting a machine learning algorithm to determine whether false information exists, namely rumors.

Step four: an intervention mechanism that is activated upon detection of a rumor; real and accurate agricultural product quality information is quickly pushed to affected consumers, misunderstanding of the agricultural product quality is corrected, and rumor information is reported to related departments;

step five: effect feedback and continuous monitoring, capturing and analyzing information flow change of dry prognosis, evaluating rumor intervention effect and optimizing; continuous data acquisition and rumor detection are performed, so that intervention can be performed in time when a new rumor is found.

Further, the step one data acquisition comprises

S101, setting a crawler program: the specific targets are various large social media, network news platforms and public resources containing agricultural product information, such as forums, blogs and question-answering websites;

s102, keyword setting: setting keywords related to the quality and safety of agricultural products, such as 'quality of agricultural products', 'safety of agricultural products', 'pollution of agricultural products', so as to collect related information in a directed manner;

s103, information acquisition: and information acquisition is carried out on a set website through a crawler program, wherein the information acquisition comprises an original text, a text time and a text place so as to maximally retain the original information.

Further, the step two information screening includes:

s201, text preprocessing: performing text preprocessing on the crawled original information, wherein the text preprocessing comprises word segmentation, stop word removal, punctuation mark removal and irrelevant character removal;

s202 conversion format: converting the preprocessed data into a format suitable for modeling; converting the segmented sentences into vector forms by using a Word2Vec method;

s203, establishing a model: in the step, the emotion analysis model in NLP is used for identifying the semantics and emotion tendencies;

s204, classification information: and (3) classifying the information related to the agricultural products into three types of positive information, negative information and neutral information through the constructed model.

Further, the step of detecting the rumor includes:

s301, extracting features: extracting features of the resources classified as negative information; the characteristics comprise text length, number of negative words in the text, number of emotion color words in the text and reputation degree of an author; these features are denoted as feature vectors X;

s302, logistic regression model setting: the logistic regression model was set as:

P(Y＝1|X)＝1/(1+exp(-(ω·X+b)))

where P (y= 1|X) represents the probability that the text is a rumor, y=1 represents a rumor, y=0 represents a non-rumor, ω is a weight coefficient, b is a constant term, ω·x is the result of multiplying the feature vector and the weight;

s303, training a model: optimizing parameters w and b by using algorithms such as maximum likelihood estimation, gradient descent and the like through the marked data set, including the texts of rumors and non-rumors and the characteristics thereof;

s304 rumor detection: extracting new negative information, extracting characteristic vector X, putting into a trained model, calculating probability P (Y= 1|X) of rumor, and if the probability is greater than a set threshold (such as 0.5), predicting that the text is rumor; otherwise, the non-rumors are used.

Further, the fourth intervention mechanism is as follows: once the rumor detection step 3 determines that rumors are present, an intervention mechanism is initiated;

s401, matching rumor propagation modes by using a batch mode matching algorithm;

s402 using the candidate identification to find users in the matched node paths that may be affected by rumors;

after determining a rumor message, S403 pushes the rumor-forming message to the corresponding candidate user, so as to reduce the rumor propagation.

Further, the threshold is 0.5.

The invention has the beneficial effects that:

through big data technology and natural language processing technology, can collect and filter the relevant information of agricultural product quality safety effectively, in time discover and discern the rumor, improved detection efficiency and accuracy to the rumor. The rumor intervention mechanism can timely correct misunderstanding of consumers on the quality of agricultural products, counteract bad influences of rumors, improve the confidence of consumers on the quality of the agricultural products, and is beneficial to stabilizing the agricultural product market.

The mechanism of effective feedback and continuous monitoring ensures that intervention can be timely performed when new agricultural product quality safety rumors are found, can resist continuous transmission of rumors, and slows down the influence of rumors on market and consumer mind states.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of data acquisition according to the present invention;

FIG. 3 is a flow chart of the information screening of the present invention;

FIG. 4 is a flow chart of rumor detection according to the present invention;

fig. 5 is a flow chart of rumor intervention in accordance with the present invention.

Detailed Description

In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Exemplary embodiments of the present invention are illustrated in the accompanying drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

As shown in fig. 1, the intervention method of the agricultural product quality safety network rumors comprises the following steps:

step one, data acquisition, namely performing data crawling on social media, news media and other public real life information through a big data technology to acquire all information resources about agricultural product quality safety.

As shown in FIG. 2, the step one data acquisition includes

S101, setting a crawler program: specific crawler programs are written, with the specific goal of large social media, web news platforms, and other published resources that may contain agricultural product information, such as forums, blogs, question-answering websites, etc.

S102, keyword setting: keywords related to the quality and safety of agricultural products, such as 'quality of agricultural products', 'safety of agricultural products', 'pollution of agricultural products', etc., are set so as to collect related information in a directed manner.

S103, information acquisition: and collecting information at the set website through a crawler program. Including original text, time of posting, place of posting, etc., to maximize retention of the original information.

as shown in fig. 3, the step two information filtering includes:

s201, text preprocessing: text preprocessing is carried out on the crawled original information, and the text preprocessing comprises word segmentation, word deactivation (such as' and other words with insubstantial meaning), punctuation marks and removal of irrelevant characters.

S202 conversion format: converting the preprocessed data into a format suitable for modeling; the Word2Vec method is used to convert the segmented sentences into vector form.

For the preprocessed text data, the Word2Vec method specifically comprises the following steps:

first, a sliding window (e.g., a window of size 5) is used to slide over the text, each sliding resulting in a center word and context words within the window (words within the window other than the center word).

For the Skip-gram model, the input is a center word and the output is a context word.

The Skip-gram model is:

L(θ)＝∑ _x ∈corpus∑ _y ∈context(x)log p(y|x)(1)

where θ represents a parameter of the model, i.e. a vector in the matrix, and p represents a conditional probability.

Two matrices are used: an input matrix and an output matrix, each word having a corresponding vector representation in both matrices. During the training process, the vector parameters in both matrices are updated according to the current task (e.g., predictive center word or predictive context word).

After training, word2Vec vectors of each Word are corresponding vectors in the input matrix or the output matrix.

S203, establishing a model: this step uses the emotion analysis model in NLP to emphasize recognition of semantics and emotion tendencies.

With a naive bayes classifier, under this assumption, the probability of emotion classification of the model predictive text can be calculated as:

P(C|F1,...,Fn)＝P(C)*P(F1|C)...P(Fn|C)/P(F1,...,Fn)(2)

where C is a particular classification, either positive, negative or neutral, F1 to Fn are features of the input text (commonly referred to as words), and P (c|f1, term, fn) represents the probability that the text is classified as C given these features; p (C) represents the probability that any text is classified as C; p (fi|c) represents the probability of containing the feature Fi in the case where the text is classified as C.

After the model is built, the text after preprocessing and format conversion is input into the model, the probability of positive, negative and neutral classification of the text is calculated, and finally the text is classified into the class with the highest probability.

For example, for the text "A", the model computes

P (Positive i a) =0.4, P (Negative i a) =0.3, and P (Neutral i a) =0.3, since the probability of the front is the largest, the text "a" is determined as the front information.

As shown in fig. 4, the step of the third rumor detection includes:

P(Y＝1|X)＝1/(1+exp(-(ω·X+b))) (3)

Step four: an intervention mechanism that is activated upon detection of a rumor; real and accurate agricultural product quality information is quickly pushed to affected consumers, misunderstanding of the agricultural product quality is corrected, and rumor information is reported to related departments.

As shown in fig. 5, S401 matches rumor propagation patterns using a batch pattern matching algorithm.

The algorithm creates an index for the nodes in the regular tree, then traverses the regular tree and expands the partial decomposition step by step based on the index, and finally obtains all the matching node paths of the pattern.

Formalized definition of batch pattern matching

Given a graph G and a regular tree RT that may represent rumor propagation patterns PC, a pattern PC is matched on the graph G to a node path that matches at least one coincidence P _C Is provided. The algorithm firstly creates an index for each node in the regular tree, candidate matching of the current node label stored in the index in the large graph is carried out, then hierarchical traversal is adopted on the regular tree to gradually expand partial decomposition, finally all matching node paths of the whole mode are obtained, and the set of the matching node paths is P _N And (3) representing. The matching algorithm does not need to traverse the whole large graph, but reduces the traversal on the regular tree, and the size of the matching algorithm is about one thousandth of the large graph, so that the algorithm efficiency is improved. The formal definition is as follows:

question input: an information dissemination network G and a regular tree RT,

problem output: node path set P _N 。

The algorithm will then traverse the regular tree through the hierarchy with the help of the index to get each node n _i Is a partial solution of S (n) _i )，S(n _i ) Is formed by all matching canonical paths L (n ₀ )…L(n _i ) Is defined by the node paths. When the calculation of the leaf node layer is completed, the paths stored in the partial solutions in the leaf nodes are the matching pattern P _C Is a node path of (a). Matching pattern P _C Is P _N 。

Partial decomposition S (n) _i ) The definition is as follows: the matching algorithm will traverse the regular tree RT, from top to bottom, to progressively expand the partial decomposition,

the method comprises the following specific steps:

first for each node n in RT _i Initialization of partial decomposition

Then traverse RT from top to bottom, access each node ni and call matlab subroutine SubMatch (n) _i ,n _j ) The method comprises the steps of carrying out a first treatment on the surface of the The program uses I (n) _j ) To expand the partial decomposition S (n) _i ) And gets S (n) _j ) Wherein n is _j Is n _i Until the regular tree RT access is complete.

node path set P in obtaining matching mode _N The nodes in the node path then need to be identified to determine if the user they represent is likely to be a potential propagator more susceptible to rumors.

First, for P _N In (a) each node path np=v ₀ …v _i …v _n (0.ltoreq.i.ltoreq.n). The identification rule of each node is as follows. (1) Taking into account v ₀ The node selected for the rumor propagation mode (i.e., the first node in the path) should have some similarity to the positive node, which indicates that the user has forwarded or reviewed the rumor microblog, and therefore is represented by v ₀ The user of the presentation is considered likely to forward and comment on such rumor microblogs in the past or in the future. (2) Considering that rumor path np is a node path starting with a positive node and not covered by any negative node, it should be unique to rumor propagation.

Thus, other nodes on the path should also have some correlation with rumor propagation. To enhance such correlation, it is assumed that np is represented by sub-pattern p=a ₁ …a _j …a _m (1.ltoreq.j.ltoreq.m) and the algorithm considers for some node vi other than the first node. If each a in pattern p _j (1. Ltoreq.j.ltoreq.m), all with edge e= (v, v) _i ) Wherein V is V, let a _j E L (e) then considers vi to be likely to be affected by the rumor and should be a potential transmitter of the rumor.

Finally, the algorithm obtains a set C of information propagation pairs according to the record of forwarding and commenting (namely the interest of the user _P ) Candidate users are identified, i.e. users who may forward or comment on rumor microblogs in the past or later, and the candidate node set C is used _u Representing these nodes.

Set C according to information propagation pairs _P The information category to which the candidate node is attached pushes the ballad information of the corresponding category to the user represented by the candidate node set node. For example, set C of certain information propagation pairs _P The method is obtained by processing the agricultural product quality information, and then once the rumors with the agricultural product quality are officially identified, the algorithm pushes corresponding rumor-forming information to candidate users, and timely blocks the rumor propagation, so that negative effects are reduced.

Real-time monitoring includes, but is not limited to, changes of public information streams such as social media, a network news platform and the like, and records emotional response, comments, consumption behavior changes and the like of consumers after the quality of agricultural products is influenced by rumors.

And observing and measuring the change condition of the information flow, and carrying out data analysis. Such as calculating the rate of change of negative comments after intervention, the change in consumption behavior, etc. From these data, the effect of rumor intervention was assessed quantitatively.

Based on the evaluation results, the advantages and disadvantages of the intervention are analyzed and based on these feedback, the intervention strategy is adjusted and optimized. Such as enhancing the pushing of information to certain specific groups, adjusting the information content to improve its convincing, etc.

Continuous data acquisition and rumor detection. The technology of big data and natural language processing is continuously used, public information such as social media, news media and the like is concerned at any time, and once a new rumors related to quality and safety of agricultural products are found, an intervention mechanism is started in time to intervene, so that the rumors are prevented from being continuously spread and influence is enlarged.

Example 1

Firstly, a crawler program is set, and a target website mainly comprises main social media and news media such as microblogs, weChat public numbers, headline news and the like, and question-answering websites such as agricultural forums and knowledge containing agricultural product information. The keywords are set as keywords related to pork quality safety, such as pork pollution, swine fever and the like.

After the program was set, we began to crawl the website for information, gathering about 2000 pieces of raw information about pork contamination. The original information includes original text, time of posting, place of posting, etc.

After the original information is obtained, text preprocessing is carried out by using a natural language processing technology, wherein the text preprocessing comprises word segmentation, stop words, punctuation marks and irrelevant characters in the text are removed. We then convert the preprocessed data into a vector format suitable for modeling, and then use emotion analysis models to classify the information into three categories, positive, negative and neutral. The processing results in about 1500 pieces of negative information.

For these negative information we resort to machine learning methods for deep analysis. Through extracting characteristics such as text length, number of negative words, number of emotion color words, reputation of a publisher and the like, training is carried out by using a logistic regression model, and finally about 200 pieces of information are found to be rumors with pork quality safety.

After discovering the rumors, we start an intervention mechanism, and set up corresponding customs strategies, including publishing real and scientific pork quality information, conducting science popularization on incorrect ideas, and reporting the rumors.

After intervention, we evaluate the effect of rumor intervention by continuously monitoring and analyzing network information, and make adjustments and optimizations.

For example, we found that a social platform has higher negative information receiving degree of pork quality, so we increase information pushing force on the platform and further optimize pushing content to make it more in line with receiving habit of consumers.

Through the steps, the transmission of the rumors with the safe pork quality is successfully controlled, misunderstanding of the pork quality by consumers is corrected, and the confidence of the consumers is enhanced.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ReadOnlyMemory, ROM) or a random access memory (RandomABBessMemory, RAM).

It should be understood that the detailed description of the technical solution of the present invention, given by way of preferred embodiments, is illustrative and not restrictive. Modifications of the technical solutions described in the embodiments or equivalent substitutions of some technical features thereof may be performed by those skilled in the art on the basis of the present description; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An intervention method of a network rumor for agricultural product quality safety is characterized by comprising the following steps: the intervention method comprises the following steps:

data acquisition, namely performing data crawling on social media, news media and other public real life information through a big data technology to acquire all information resources about agricultural product quality safety;

information screening, namely screening and classifying the acquired information by using a natural language processing technology, and determining whether the information is positive information, negative information or neutral information;

rumors are detected, and the resources classified as negative information are subjected to deep analysis by adopting a machine learning algorithm to determine whether false information, namely rumors, exists.

An intervention mechanism, detecting rumors, and starting the intervention mechanism; real and accurate agricultural product quality information is quickly pushed to affected consumers, misunderstanding of the agricultural product quality is corrected, and rumor information is reported to related departments;

effect feedback and continuous monitoring, capturing and analyzing information flow change of dry prognosis, evaluating rumor intervention effect and optimizing; continuous data acquisition and rumor detection are carried out, and intervention is carried out timely when new rumors are found.

2. The method for intervention in a quality of agricultural product safety network rumor as claimed in claim 1, wherein: the data acquisition comprises

3. The method for intervention in a quality of agricultural product safety network rumor as claimed in claim 1, wherein: the information screening comprises the following steps:

4. The method for intervention in a quality of agricultural product safety network rumor as claimed in claim 1, wherein: the rumor detection includes:

P(Y＝1|X)＝1/(1+exp(-(ω·X+b)))

s304 rumor detection: extracting the new negative information, extracting the characteristic vector X, putting the negative information into a trained model, and predicting that the text is rumor by calculating the probability P (Y= 1|X) that the text is rumor if the probability is larger than a set threshold value; otherwise, the non-rumors are used.

5. The method for intervention in a quality of agricultural product safety network rumor as claimed in claim 1, wherein: the intervention mechanism is as follows: once the rumor detection step 3 determines that rumors are present, an intervention mechanism is initiated;

s402 using the candidate identification to find users in the matched node paths that may be affected by rumors; after determining a rumor message, S403 pushes the rumor-forming message to the corresponding candidate user, so as to reduce the rumor propagation.

6. The method for intervention in a quality of agricultural product safety network rumors as claimed in claim 4, wherein: the threshold is 0.5.