CN113592338B - Food quality management safety risk pre-screening model - Google Patents

Food quality management safety risk pre-screening model Download PDF

Info

Publication number
CN113592338B
CN113592338B CN202110909891.2A CN202110909891A CN113592338B CN 113592338 B CN113592338 B CN 113592338B CN 202110909891 A CN202110909891 A CN 202110909891A CN 113592338 B CN113592338 B CN 113592338B
Authority
CN
China
Prior art keywords
word
food
representing
text
quality management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110909891.2A
Other languages
Chinese (zh)
Other versions
CN113592338A (en
Inventor
左恩光
陈晨
吕小毅
陈程
严紫薇
吴伟
李敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang Aiqi Side Testing Technology Co ltd
Xinjiang University
Original Assignee
Xinjiang Aiqi Side Testing Technology Co ltd
Xinjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang Aiqi Side Testing Technology Co ltd, Xinjiang University filed Critical Xinjiang Aiqi Side Testing Technology Co ltd
Priority to CN202110909891.2A priority Critical patent/CN113592338B/en
Publication of CN113592338A publication Critical patent/CN113592338A/en
Application granted granted Critical
Publication of CN113592338B publication Critical patent/CN113592338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a food quality management security risk pre-screening model. A food quality management security risk pre-screening model comprising: the method comprises the following steps of (1) obtaining and preprocessing text data; (2) vectorization of the pre-processed text data encoding; (3) And judging the degree of food safety hazard through a concentration scoring mechanism in supervised deep learning. The food quality management security risk pre-screening model is a novel food text mining technology based on an associated attention mechanism, calculates the associated score of each word and food security hazard by using the mutual information of each word and an un-security tag in consumer comments, and further screens potential interaction between consumers and dangerous foods by combining the attention score in supervised deep learning, so that potential food security problems can be rapidly screened.

Description

Food quality management safety risk pre-screening model
Technical Field
The invention particularly relates to a food quality management security risk pre-screening model.
Background
With the vigorous development of Internet economy, the Chinese food management mode is changed over the sky. By 3 months in 2020, the sizes of takeaway users on China line reach 39780 thousands, the utilization rate of takeaway users reaches 44.0%, and the market permeability of takeaway industry reaches 13.0%. Consumers can accurately deliver the foods to the appointed area on time by clicking the foods liked by themselves on the mobile phone APP. The rise of the takeaway market brings convenience to consumers and also brings huge potential food safety hazards.
In view of the above, the invention provides a food quality management security risk pre-screening model, which can effectively screen samples with food security risks, save a large amount of time for food risk analysis and remarkably improve screening efficiency and range.
Disclosure of Invention
The invention aims to provide a food quality management security risk pre-screening model, and the screening system can effectively screen samples with food security risks.
In order to achieve the above purpose, the technical scheme adopted is as follows:
a food quality management security risk pre-screening model, said food quality management security risk pre-screening model comprising:
(1) Text data acquisition and preprocessing;
(2) Vectorization of the text data codes after pretreatment;
(3) And judging the degree of food safety hazard through a concentration scoring mechanism in supervised deep learning.
Further, the data acquisition in the step (1) is as follows: text data are acquired and classified according to potential safety hazards and no potential safety hazards.
Further, the pretreatment in the step (1) is as follows: and normalizing the text data with the type with potential safety hazard.
Still further, the normalization is: removing symbols, stopping words and separating words.
Further, the text encoding in the step (2) is vectorized as follows: and vectorizing the standardized data through a text encoder.
Further, the step (3) is as follows: and after carrying out self-adaptive learning on the vectorized data to extract potential features, paying attention to key words which are helpful to classification in sentences, combining confusion matrixes, and finally fusing the relative importance of each word in the sentences and the importance of the word on food safety threat to obtain the food safety hazard score.
Still further, in the step (3), the similarity between each word and other words in the sentence is calculated, so that the important words which are helpful for classification in the sentence are focused; the calculation formula is as follows:
still further, the calculation formula of the confusion matrix is as follows:
in the formula,representing the score associated with the security hazard for the ith term in the document.
Still further, in the step (3), the calculation formula of the relative importance of each word in the sentence and the importance of the word for the food security threat is as follows:
compared with the prior art, the invention has the beneficial effects that:
1. the invention integrates the deep learning method based on text mining into the current food safety supervision program, which is not only beneficial to stakeholders including consumers and companies, but also enables the supervision authorities to quickly find out main risk factors.
2. The FSPS monitoring system provided by the technical scheme of the invention can rapidly screen massive network texts in real time, excavate food with potential safety hazards, and reduce the detection cost of the current supervision mechanism. Meanwhile, the model of the technical scheme of the invention has generalization, and the experimental results show that the model provided by the technical scheme of the invention has universality (hotel comments and SMP2019 data set classification accuracy reach 85.85% and 78.03% respectively) by testing on data of different industries.
3. The model of the technical scheme of the invention is verified by the evaluation of high risk words/vocabularies generated by the model of the technical scheme of the invention by a group of experienced food security professionals. The results show that expert scores are highly consistent with the predicted results for the model. The technical scheme of the invention shows the effectiveness of text mining for food safety pre-screening (the accuracy rate of classifying the food safety reaches 96.95%, and the recall rate reaches 93.06%), and simultaneously proves that the proposed pre-screening model can remarkably improve expert risk analysis efficiency and provides strict experimental verification.
Drawings
FIG. 1 is a schematic diagram of a food quality management security risk pre-screening system;
fig. 2 is a confusion matrix for sample risk analysis.
Detailed Description
In order to further illustrate a food quality management security risk pre-screening model of the present invention, to achieve the intended purpose of the present invention, the following description will be given with reference to the preferred embodiments, with reference to the specific implementation, structure, features and effects of the food quality management security risk pre-screening model according to the present invention. In the following description, different "an embodiment" or "an embodiment" do not necessarily refer to the same embodiment. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
The food quality management security risk pre-screening model of the invention will be described in further detail with reference to specific examples:
food contamination and food poisoning present a significant safety risk to consumers worldwide. In this era of information explosion, the potential to develop social media data is increasingly being focused by government-related regulatory authorities, food enterprises, and consumers. In the invention, text data from network media is taken as a research object, and a novel food text mining technology based on an associated attention mechanism is innovatively provided for rapidly screening potential food safety problems. Firstly, calculating the association score of each word and food safety hazard by using mutual information of each word and an unsafety label in consumer comments; and the potential interaction between consumers and dangerous foods is further excavated by combining attention scores in supervised deep learning. Compared with the text excavation of the current main stream, the invention is obviously superior to a reference model on the food safety related data set, and the accuracy reaches 96.60%. The invited food safety expert group carries out further food risk analysis on the model prediction result of the invention, and the experimental result shows that the tool of the invention can obviously reduce the risk food screening time. The invention provides a rapid and cost-effective method for screening food safety and helps to reduce potential dietary hazards of consumers.
The technical scheme of the invention is as follows:
a food quality management security risk pre-screening model, said food quality management security risk pre-screening model comprising:
(1) Text data acquisition and preprocessing;
(2) Vectorization of the text data codes after pretreatment;
(3) And judging the degree of food safety hazard through a concentration scoring mechanism in supervised deep learning.
Preferably, the data acquisition in the step (1) is: text data are acquired and classified according to potential safety hazards and no potential safety hazards.
Preferably, the pretreatment in the step (1) is as follows: and normalizing the text data with the type with potential safety hazard.
Further preferably, the normalization is: removing symbols, stopping words and separating words.
Preferably, the text encoding in the step (2) is vectorized as: and vectorizing the standardized data through a text encoder.
Preferably, the step (3) is: and after carrying out self-adaptive learning on the vectorized data to extract potential features, paying attention to key words which are helpful to classification in sentences, combining confusion matrixes, and finally fusing the relative importance of each word in the sentences and the importance of the word on food safety threat to obtain the food safety hazard score.
Further preferably, in the step (3), the similarity between each word and other words in the sentence is calculated, so that the important words in the sentence which are helpful for classification are focused; the calculation formula is as follows:
further preferably, the calculation formula of the confusion matrix is:
in the formula,representing the score associated with the security hazard for the ith term in the document.
Further preferably, in the step (3), the calculation formula of fusing the relative importance of each word in the sentence and the importance of the word for the food security threat is as follows:
example 1.
1. Related work
(1) Text data mining
Text mining refers to the process of parsing input text to obtain valuable information. Text classification is a common form of text mining that distinguishes text into the correct categories according to corresponding attributes. For example, in the present embodiment, it is desirable to distinguish between food safety hazards and text without food safety hazards. Supervised deep learning is a popular method of text classification that requires each text in the dataset to be pre-labeled with the required categories. The text is then divided into a training set to build a high performance predictive model, a verification set to verify whether the model fits the data perfectly, and a test set that is not initially seen by the model and is then used to verify the model's performance in predicting unknown data.
Emotion analysis is a hotspot problem of text classification, used to calculate the polarity and degree of emotion in text. For example, the phrase "meal is very good-! "emotionally active, phrase" very bad with meal-! "emotionally negative". Mummalaneni et al state that online reviews describing security hazards are typically facts of statements, without explicit emotion words (e.g., good, bad). Therefore, the conventional emotion classification method may be troublesome to detect potential safety hazards. For example, abrahams et al found that the most significant keywords associated with automotive defects, such as "airbags", were non-emotional keywords, and were only relevant in a specific industry context, while emotional polarity was not always positively associated with product defects. David et al discuss the role of emotion analysis in on-line comment text mining of food safety hazards, and experimental results show that the classification accuracy of the emotion analysis method is not high. The Attention mechanism is a popular text focusing method, and the basic idea is to assign Attention weights to the text, focus Attention on the related text content, and increase the contribution of this part. The invention utilizes mutual information and an attribute mechanism to adaptively obtain the keywords of food safety.
(2) On-line monitoring of food safety
Food safety risks are commonly understood from two perspectives: likelihood and severity. Many of the work in the past has focused on food safety from a severity standpoint, such as detecting biological agents that are particularly harsh in impact. In recent years, the food industry has combined knowledge discovery KDD and text mining techniques to monitor and evaluate the likelihood of contamination threats occurring. Thakur et al employ a text classification algorithm for detecting the trend of disease outbreaks and determining the relationship between the food category and the location of the outbreak. The resulting knowledge can help the decision maker provide information for successful food processing, preparation and consumption practices. In view of this, there is a need to develop new predictive tools for rapid categorization of reports of unsafe food products.
2. Food safety pre-screening system
As shown in fig. 1, the present invention proposes to build a food safety risk pre-screening system that uses online reviews to quickly verify whether food is potentially safe. As shown in fig. 1, the system can be divided into two steps, a food quality management security risk pre-screening model and response measures. The first step is end-to-end text mining, which means that merely entering text can result in a prediction of whether food safety risks are involved by the designed algorithm. To further verify the output of the model, a second step introduces a highly experienced panel of food safety specialists to conduct some final manual review of the model generated high risk words/vocabularies and predictions and report to government authorities to take responsive action. It is worth pointing out that although the pre-screening system of the invention can greatly improve the current food safety monitoring speed and efficiency, the pre-screening system cannot be used as the only method for monitoring the food safety, and the screened risk shops need to be matched with food supervision departments for further detection.
The system is a first part of the food quality safety risk pre-screening system, and is responsible for adaptively learning and extracting potential characteristics of input text data, and screening out texts with potential safety threats for further processing. The model mainly comprises three parts of data acquisition and preprocessing, text coding vectorization and confusion attention mechanism. Specific:
(1) Data processing
Each process performed by the data processing stage will be described in detail next. First, the original Data of this embodiment is from 2019CCF Big Data and computing intelligent Big race (CCF Big Data & Computing Intelligence Contest, CCF BDCI for short).
Considering that the content described by the food online comments is not only descriptive of the food itself, but also part of the content of the comments is descriptive of food delivery services, merchant attitudes, food mouthfeel and the like which are irrelevant to food safety. The raw dataset classifies the data into two categories, food safety-related and food safety-unrelated. But even text that is originally tagged for food safety cannot be considered to be potentially food safety-risky (possibly some positive expression) in order to look at potential reviews that can cause food safety. The data set originally labeled as food safety-related data in this example was therefore manually modified according to the following criteria. The standard is:
a has potential safety hazard: i.e. the food product is considered to be a possible unhealthy consequence for the consumer. An audit standard is established, and consumers need to seek medical attention in certain situations after eating the product, such as stomach ache, vomit, dizziness or the occurrence of worms, mildewing of food and the like.
b, no safety hazard: data which does not meet the standard a is classified into the category without potential safety hazard. In addition, the original data is marked as food safety irrelevant data to be rechecked and classified into the category without potential safety hazard.
The manually corrected data set has 10000 document samples, wherein 1511 is 15.11% of the samples with potential safety hazard, and detailed statistical information is shown in table 1. The training set is used for training the model, the verification set is used for verifying the fitting degree of the data and the model, and the test set is used for testing the prediction capability of the model on unknown data.
TABLE 1
The data normalization is performed next, and mainly comprises the following three points: (1) All symbols, including numbers, punctuation, letters, etc. are removed, leaving text only in Chinese characters. (2) remove stop words. (3) The word segmentation, chinese word segmentation, is a basic step of Chinese text processing, namely, long texts such as sentences, paragraphs and articles are decomposed into data structures taking words as units, so that subsequent processing and analysis work is facilitated. Unlike English, chinese sentences have no word limit, so that when Chinese natural language processing is performed, word segmentation is usually required, word segmentation effects directly affect the effects of parts of speech, syntax trees and other modules, and compared with the embodiment, the method adopts a jieba word segmentation tool to segment words.
(2) Text encoding
Text cannot be trained by directly inputting the model, and the text needs to be vectorized in the next step. One method that is very common in the NLP field is token- > Word ID- > Word edition, and token- > Word ID is implemented by a dictionary. The dictionary is a set which filters out repeated words after the word segmentation of all the corpus in the last step, and has the structure of { word segmentation: index }, each word can be converted to an independent integer. The Word ID- > Word Embedding step requires loading of the pre-training Word vector, in this embodiment 80 ten thousand pre-training Word vectors of the Ten-message AILab. Each sentence is encoded into an enabling matrix after text encoding, the dimension is dxn, where d=300 represents the word vector dimension, 300-dimensional vector represents a word, and n represents the sentence length.
(3) Confusion attention mechanism:
and inputting the word embedding matrix of each sentence into an artificial neural network for feature extraction. Only in a specific industry context, key emotion words tend to be different in different emotion recognition areas. Based on the above, the invention provides a novel attention mechanism based on correlation, which can adaptively learn keywords in the food safety field.
Initial state of RAMIs a series of hidden layer states of an artificial neural network. The important words in the sentence that help to classify are focused on by computing the similarity between each word and other words in the sentence. In short, the process consists of the scaled of multiple independent parameters
dot-product attention function. Calculation is as formula (1-2):
from the formula (1), by calculationAnd each +.>Similarity or phase of (2)Relativity, get each->Correspond to->Weight coefficient +.>After normalization by softmax, the final Attention weight is obtained>In the formula (2), ∈>Parameter matrix representing the z-th head, denominator +.>To the regulating effect, prevent denominator +.>The inner product is too large.
At this time, relatively important words in each document can be captured, but the information amount which can be given to each text is limited, and the important words are not necessarily related to food safety hazards, so as to enlarge the word segmentation contributing to the identification of the food safety hazards. Examples: "sell looks exquisite, but spoil the meat, eat the bellyband pain-! The word "exquisite" in this document will get a high attention score, but "spoiled" and "bellow pain" should get more attention. Therefore, the embodiment combines the whole corpus statistical characteristics to calculate the relevance scores of the word segmentation and the labels so as to make up for the defects.
The confusion matrix is a performance evaluation index for machine learning classification problems, and can measure the association degree of a true value and a predicted value. The present embodiment designs a word-tag association confusion matrix to calculate the relevance score of each word with "food safety hazard," the greater the likelihood that the word will be considered to be a food safety hazard if the word has a higher relevance score. The specific calculation is shown by the formula (3-4):
in the formula (3),representing the score associated with the security hazard for the ith term in the document. TH (TH) i Frequency representing unsafe document labels with segmentation words i in data set, FS i Frequency number, TS, representing safety of document labels without word i in data set i Frequency number, FH, representing data and document label containing word i and safety i A frequency number that does not contain a word i in the dataset and the document tag is unsafe is represented. In the formula (4), the influence of abnormal value and extreme value is avoided by the z-score normalization, wherein +.>Representing the sample mean and variance, respectively.
Next, by fusing the relative importance of each word in the sentenceAnd the importance of the word for food safety threat +.>Obtaining a final RAM score->The manner of fusion and normalization is shown in formula (5):
finally, as shown in formula (6), splicing the Z-head attention results, and obtaining the hidden layer vector h updated by the segmentation i through nonlinear transformation i. wherein ,representing the splicing operation, sigma being the activation function, +.>Parameter matrix representing the z-th head, +.>Representing the initial vector of the segmentation i.
In this way, the attention score generated takes into account both the relative features and the tag label relationship features.
3. Responding to the measures.
As shown in fig. 1, this is the second part of the food quality safety risk pre-screening system, mining results from the model for comment text to further evaluate food safety risk.
(1) Expert analysis and planning
First, for the case of formula (3)The scores of (2) are sorted to yield a list of food safety related terms, as shown in table 2. And carrying out attribution analysis by the expert group in combination with the prediction result of the model, and determining a next response plan. Once the expert performs their analysis, they can use it to give candidate plans and inform government market authorities to provide basis for government to formulate and adjust food safety response measures.
(2) Response measures
And (5) formulating corresponding measures according to the analysis results of the expert group.
4. Analysis
(1) Evaluation index
Three levels of indicators are introduced to determine the performance of the model of the present invention. The first level index has four TP, TN, FP, FN indexes, which respectively represent true yang number, true yin number, false yang number and false yin number. The secondary index employs precision, recall to evaluate metrics for two different dimensions. The specific calculation mode is shown in the formula (7-8):
the precision is also called precision, and represents the proportion of all the samples predicted to be food safety hazards, wherein the labels in the samples are safety hazards. The recovery is also known as recall and represents the proportion of samples that were successfully screened for all tags that are potential safety hazards.
The formula (9-10) represents the overall evaluation index F 1 And accuracy, which integrates the results of precision and recovery yields.
(2) Test model
The validity of the above model was verified by conducting experiments. As shown in Table 2, the first 20 generated character, chinese word segmentation based food safety hazard terms, and their corresponding related scores, i.e., weights, are determined using a word segmentation-tag association confusion matrix scoring algorithm. The higher the score means the greater the likelihood that the word will be a potential safety hazard. Many food safety terms are associated with symptoms of food poisoning, such as "diarrhea", "nausea", and the like. While these symptoms may pose a safety hazard, they may be excluded from many emotion analysis dictionaries because they are not emotion words. And some of these terms appear to have triggered a discussion of the food itself to the consumer, which trend is particularly evident in character dictionaries, such as "stink", "rancid", or "worm", etc. This reflects that post-consumer symptoms and food intake status are key predictors of food safety concerns.
Next, a popular deep learning model TextRNN was used as the benchmark model for experimental comparison. The basic parameters are set to 64 training batches, 0.001 initial learning rate, 0.5 random inactivation rate, 128 hidden layer size, 30 training rounds, and the effect of continuously training 1000 training batches is stopped without improvement.
The experimental results are shown in table 3, and compared with the baseline model TextRNN, the classification accuracy of the character-based dictionary and the chinese word-based dictionary is 96.10% and 96.95%, respectively, which are improved by 2.09% and 1.65%.
Second, it can be seen by comparison that the model performance based on the chinese word dictionary is better than that based on the character dictionary, probably because the chinese word is the smallest unit for expressing the complete meaning, which means that the dictionary of the character is more prone to ambiguity, such as "no" + "wrong" ++ "not" + "line", where the non-high weight will result in similar scores of the two words. As can be seen from table 3, the F1 values of the comprehensive evaluation recall, precision were raised by 0.51 and 0.91, respectively.
TABLE 2 food safety hazard terminology
Table 3main results.
(3) Expert analysis
In order to verify the technical solution of the present invention at the food safety level, the present embodiment calculates a risk score for each food in the food comment data. Since word-segmentation based term dictionary performs optimally, word-segmentation terms are used for food risk analysis.
Table 4 expert scoring statistics.
Three food safety specialists of the Urufion product quality supervision and inspection institute were invited to analyze the prediction results of the method of this example. Each expert is required to analyze 100 reviews predicted to have potential safety hazards in the test set and 100 benchmark reviews predicted to have no potential safety hazards. The scoring of the pool of panelists is shown in table 4, and it can be seen that the average score for the hazard marking product was judged to be 0.907. In contrast, the baseline samples had an estimated mean of 0.02, indicating that the expert analysis results were almost completely consistent with the model predictions. P-value <0.01 is detected through significance, which indicates that the expert considers the dangerous product predicted by the model to be significantly different from the reference product.
To further verify the rapid screening effect of the model of this example, an additional 400 test samples were collected, with 200 samples each with or without food safety hazards. The panelist group performed risk analysis on 175 samples with unsafe model predictions with risk criteria of 0 (very low), 1 (low), 2 (medium), 3 (high), or 4 (very high). To perform this analysis, each expert receives information about the food name, the merchant, and the food related reviews to comprehensively consider the food. And then, three experts are scored, summarized and voted to obtain the final risk grade of the food. A baseline group was added as a comparison, i.e. the panel of panelists directly performed risk assessment on all 400 samples, and the rapid screening effect of the model of the invention was verified by comparison screening time. Experimental results showed that the time for Expert C was reduced from 32min15s to 18min23s, with an average of 48.17% reduction per person, compared to the baseline group Expert a with a time reduced from 20min09s to 10min05s,Expert B with a time reduced from 43min20s to 21 min. The confusion matrix of the final prediction results of each person is shown in fig. 2.
As can be seen in connection with a-f in fig. 2, the model of the present invention has a positive effect on the evaluation of the expert panel, and is valuable for further investigation of product quality and for risk assessment of future consumers. In addition, in view of the need to handle massive amounts of monitoring data, the ability to filter relevant reviews more quickly using the algorithms presented in the present invention is at a premium.
(4) Other data verification
In order to further verify whether the technical scheme of the invention can be expanded to other fields, hotel comment data and microblog texts are further investigated. The hotel comments and the microblog text data respectively comprise 9998 text documents and 19028 text documents, and the specific data are shown in table 5.
Table 5 data statistics
The hotel comment data provide basis for food-borne disease risk prevention of tourists. SMP2019 is a microblog implicit emotion dataset that is more challenging due to the lack of explicit emotion words.
Table 6 other data validation
As shown in Table 6, 1) compared with the traditional attention mechanism, the comprehensive evaluation index F1 score and accuracy rate of the invention are improved remarkably by +1.91%, +2.20%, +2.26%, +1.36% (on different data sets), which verifies the fusion effectiveness of the word segmentation-label related score and the attention mechanism. 2) The effect of the character dictionary for the SMP2019 dataset without explicit emotion words was 78.03%, which is significantly better than the word dictionary 74.67%, probably because there were no significantly differentiated emotion words, so that the word-tag correlation scores were averaged. Experimental results show that the technical scheme of the invention can be effectively expanded to other fields, and plays a positive role in risk prevention, emotion analysis and the like of tourist hotels.
In order to improve the current detection efficiency, the invention adopts a leading edge text mining technology, and utilizes an online take-away comment platform as a potential information source for supervision after the food industry is marketed. The invention provides a novel food safety pre-screening system (FSPS), which is used for pre-screening problem food/enterprises through feedback of consumers in order to make up for massive information enrichment and difficult to screen food detection one by one. Therefore, the embodiment of the invention collects the online and offline shop comment data, and the content is the experience of consumers on food, and the content is manually classified into two categories of whether food safety hidden trouble exists or not. The invention utilizes the mutual information of word frequency/word frequency and label in consumer comment to identify unique food safety related words and phrases, and combines the unique food safety related words and phrases with a supervised attention mechanism to mine potential interaction between consumers and dangerous foods. Experimental results show that the hierarchical correlation attention mechanism provided by the invention can effectively screen samples with potential food safety hazards, save a large amount of time for food risk analysis of expert groups, and remarkably improve screening efficiency and range. The invention can screen food with potential safety hazard for market supervision departments, and the supervision institutions can evaluate whether the products need to be further checked for pollution or not according to the detection result of the invention. Besides the supervision mechanism, the enterprise can use the model of the invention to remedy products or adjust production flows, suppliers or raw materials to reduce risks, can also provide basis for food recall of shops, and consumers can use the model of the invention to enhance the safety awareness when food is taken in, thereby having high application value.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the embodiment of the present invention in any way, but any simple modification, equivalent variation and modification of the above embodiment according to the technical substance of the embodiment of the present invention still fall within the scope of the technical solution of the embodiment of the present invention.

Claims (5)

1. The food quality management security risk pre-screening model is characterized by comprising the following steps of:
(1) Text data acquisition and preprocessing; the text data is an online comment text;
(2) Vectorization of the text data codes after pretreatment;
(3) Judging the degree of food safety hazard through a concentration scoring mechanism in supervised deep learning; the operation steps are as follows:
(1) firstly, paying attention to important words which are helpful to classification in sentences by calculating the similarity between each word and other words in the sentences, wherein the calculation formula is as follows:
in the formula (i),representing attention weight,/->Representing each +.>Correspond to->Weight coefficient of>Representing the word segmentation iInitial vector,/->An initial vector representing a word j +.>An initial vector representing a particular one of the N tokens; />A parameter matrix representing the z-th head, d representing the word vector dimension;
(2) the related score of each word and the food safety hazard is calculated through the word-tag association confusion matrix, and the calculation is shown by the following formula:
in the formula,a correlation score, TH, representing the ith segmentation word and safety hazard in the document i Frequency representing unsafe document labels with segmentation words i in data set, FS i Frequency number, TS, representing safety of document labels without word i in data set i Frequency, FH, representing that the data set contains word i and the document label is safe i The frequency that the data set does not contain the word segmentation i and the document label is unsafe is represented, and N represents the number of the word segmentation; />Respectively representing the mean and variance of the samples;
(3) by fusing each word in a sentenceRelative importance of (3)Importance of the word to food safety threatObtaining a final RAM score->The manner of fusion and normalization is shown in the following formula:
in the formula,representing the relative importance of each word in the sentence, < +.>Representing the importance of the word for food safety threat, < ->Representing the relative importance of a word in N segmented words in sentences;
(4) splicing the Z-head attribute results, and obtaining an updated hidden layer vector h of the segmentation i through nonlinear transformation i The generated attention score considers both the relative feature and the tag label relationship feature, and its count formula is as follows:
wherein ,representing the splicing operation, sigma being the activation function, +.>Parameter matrix representing the z-th head, +.>Representing the initial vector of the segmentation i.
2. The food quality management security risk pre-screening model of claim 1, wherein,
the data acquisition in the step (1) is as follows: text data are acquired and classified according to potential safety hazards and no potential safety hazards.
3. The food quality management security risk pre-screening model of claim 1, wherein,
the pretreatment in the step (1) is as follows: text data of the type with potential safety hazards is standardized.
4. The food quality management security risk pre-screening model of claim 3, wherein,
the normalization is as follows: removing symbols, stopping words and separating words.
5. The food quality management security risk pre-screening model of claim 1, wherein,
the text coding vectorization in the step (2) is as follows: and vectorizing the standardized data through a text encoder.
CN202110909891.2A 2021-08-09 2021-08-09 Food quality management safety risk pre-screening model Active CN113592338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110909891.2A CN113592338B (en) 2021-08-09 2021-08-09 Food quality management safety risk pre-screening model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110909891.2A CN113592338B (en) 2021-08-09 2021-08-09 Food quality management safety risk pre-screening model

Publications (2)

Publication Number Publication Date
CN113592338A CN113592338A (en) 2021-11-02
CN113592338B true CN113592338B (en) 2023-09-12

Family

ID=78256458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110909891.2A Active CN113592338B (en) 2021-08-09 2021-08-09 Food quality management safety risk pre-screening model

Country Status (1)

Country Link
CN (1) CN113592338B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014092230A1 (en) * 2012-12-13 2014-06-19 대한민국 (식품의약품안전청장) System and method for inspecting imported food-based harm prediction
CN108876100A (en) * 2018-04-28 2018-11-23 北京化工大学 Neural network food safety risk prediction model based on ISM and AHP
CN110457562A (en) * 2019-08-15 2019-11-15 中国农业大学 A kind of food safety affair classification method and device based on neural network model
CN110516138A (en) * 2019-08-31 2019-11-29 武汉理工大学 A kind of food safety affair early warning system threatening information bank based on multi-source self refresh
CN110688557A (en) * 2019-09-23 2020-01-14 中国农业大学 Food safety event-oriented early warning method
CN112766359A (en) * 2021-01-14 2021-05-07 北京工商大学 Word double-dimensional microblog rumor recognition method for food safety public sentiment
CN113191926A (en) * 2021-04-12 2021-07-30 北京工商大学 Grain and oil crop supply chain hazard identification method and system based on deep ensemble learning network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130089838A1 (en) * 2011-10-06 2013-04-11 Lisa Jeanne Adkins Food safety and risk analyzer

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014092230A1 (en) * 2012-12-13 2014-06-19 대한민국 (식품의약품안전청장) System and method for inspecting imported food-based harm prediction
CN108876100A (en) * 2018-04-28 2018-11-23 北京化工大学 Neural network food safety risk prediction model based on ISM and AHP
CN110457562A (en) * 2019-08-15 2019-11-15 中国农业大学 A kind of food safety affair classification method and device based on neural network model
CN110516138A (en) * 2019-08-31 2019-11-29 武汉理工大学 A kind of food safety affair early warning system threatening information bank based on multi-source self refresh
CN110688557A (en) * 2019-09-23 2020-01-14 中国农业大学 Food safety event-oriented early warning method
CN112766359A (en) * 2021-01-14 2021-05-07 北京工商大学 Word double-dimensional microblog rumor recognition method for food safety public sentiment
CN113191926A (en) * 2021-04-12 2021-07-30 北京工商大学 Grain and oil crop supply chain hazard identification method and system based on deep ensemble learning network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
上下文感知的树递归神经网络下隐式情感分析;陈秋嫦;《计算机工程与应用》;167-175 *

Also Published As

Publication number Publication date
CN113592338A (en) 2021-11-02

Similar Documents

Publication Publication Date Title
CN110008311B (en) Product information safety risk monitoring method based on semantic analysis
KR102020756B1 (en) Method for Analyzing Reviews Using Machine Leaning
CN111882446A (en) Abnormal account detection method based on graph convolution network
CN109657011B (en) Data mining system for screening terrorist attack event crime groups
CN111966944B (en) Model construction method for multi-level user comment security audit
CN111143840B (en) Method and system for identifying abnormity of host operation instruction
Akhter et al. Cyber bullying detection and classification using multinomial Naïve Bayes and fuzzy logic
CN112905739A (en) False comment detection model training method, detection method and electronic equipment
CN116361815B (en) Code sensitive information and hard coding detection method and device based on machine learning
Ressan et al. Naive-Bayes family for sentiment analysis during COVID-19 pandemic and classification tweets
Chaparro et al. Sentiment analysis of social network content to characterize the perception of security
CN110610007A (en) Maintenance vehicle condition intelligent identification method and device based on NLP
CN113592338B (en) Food quality management safety risk pre-screening model
Izzah et al. Modified TF-Assoc term weighting method for text classification on news dataset from twitter
Juanita et al. Sentiment analysis on E-Marketplace User Opinions Using Lexicon-Based and Naïve Bayes Model
CN108717637B (en) Automatic mining method and system for E-commerce safety related entities
Nurkasanah et al. feature extraction using Lexicon on the emotion recognition dataset of Indonesian text
CN113886529B (en) Information extraction method and system for network security field
KR102546536B1 (en) System and method for detecting traffic emerging risk based in-structed data
CN115544272A (en) Attention mechanism-based chemical accident cause knowledge graph construction method
CN116401343A (en) Data compliance analysis method
Abdullah et al. Text mining based sentiment analysis using a novel deep learning approach
Jayashree et al. Sentimental analysis on voice based reviews using fuzzy logic
CN115374174A (en) Accident reason analysis method and system based on hidden danger knowledge graph
CN114328819A (en) Power safety production hidden danger pre-control method based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant