CN107886240A - A kind of rule-based cross-border electric business commercial quality Risk Identification Method - Google Patents

A kind of rule-based cross-border electric business commercial quality Risk Identification Method Download PDF

Info

Publication number
CN107886240A
CN107886240A CN201711099313.7A CN201711099313A CN107886240A CN 107886240 A CN107886240 A CN 107886240A CN 201711099313 A CN201711099313 A CN 201711099313A CN 107886240 A CN107886240 A CN 107886240A
Authority
CN
China
Prior art keywords
mrow
commodity
keyword
word
msub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711099313.7A
Other languages
Chinese (zh)
Other versions
CN107886240B (en
Inventor
何军良
宋博
马奕葳
王煜
杨振生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN201711099313.7A priority Critical patent/CN107886240B/en
Publication of CN107886240A publication Critical patent/CN107886240A/en
Application granted granted Critical
Publication of CN107886240B publication Critical patent/CN107886240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to cross-border e-commerce field, disclose a kind of method for being used to carry out cross-border electric business commercial quality risk automatic identification, including merchandise risk knowledge acquisition, commodity are classified automatically, merchandise risk identifies and merchandise risk information visualization module, the cross-border electric business commercial quality risk automatic identifying method, the cross-border merchandise news of magnanimity can be rapidly processed in time, it was found that the commodity of China's quality requirement are not met wherein, and the statistical information of risk commodity is presented in visual form, the present invention can help consumer to select the safer cross-border commodity of quality, simultaneously relevant government department can be assisted to supervise cross-border electric business platform.

Description

A kind of rule-based cross-border electric business commercial quality Risk Identification Method
Technical field
The present invention relates to cross-border electric business field, and particularly quality risk possessed by cross-border electric business commodity is known automatically Other field.
Background technology
Cross-border ecommerce refers to double by the trade in country variant or area as a kind of new pattern of international trade Side, payment and settlement is traded by e-commerce platform, and commodity transaction is carried out with the clearance of the logistics form such as mail or express delivery.With Cross-border electric business to develop rapidly, substantial amounts of cross-border express mail or mailbag are transmitted directly in consumer's hand, easily to China's economy are pacified Entirely, ecological environment and consumer's physical and mental health bring harm.Due to cross-border electric business have transaction count frequently, single transaction commodity The features such as quantity is few, service log-on threshold is low, government be difficult to check cross-border commodity comprehensively, at present can only according to thousand/ Two ratio is spot-check, and is not only difficult to ensure that the quality safety of commodity, and the also supervision to relevant department causes huge Pressure.
The method of automatic identification directly is carried out to cross-border electric business commercial quality risk at present and case is also difficult to see, it is existing Commercial risks analysis case be directed to credit risk field more, and being evaluated based on the commercial quality that public praise and public feelings information are excavated is A kind of after-action review, the risk found by consumer can only be found, businessman's restocking violation commodity can not be applied to but do not sold also Risk detecting when going out.Rule-based system is a kind of artificial intelligence approach, and human knowledge is expressed as computer by it to manage The rule of solution, such as IF-THEN production, known facts then are read in using computer, the rule composition in rule base Reasoning from logic chain, finally give the solution of problem.Because the policies and regulations of country are inherently the rule that one group of needs is observed, Wherein clause of statute is applicable the precondition that situation is IF parts, and the conclusion of clause is the result of THEN parts, therefore is adapted to Handled by rule-based intelligence system.
The content of the invention
The present invention is directed to the risk supervision and identification problem of the cross-border commodity of magnanimity, using RBR (Rule-based Reasoning) the method pair state quality standard related to cross-border commodity and regulation carry out knowledge Modeling, acquisition and reasoning, make Computer can simulate human expert and the classification and parameter of cross-border commodity are analyzed, and to risk presence or absence and risk Species is judged.Meanwhile lack cannonical format for cross-border merchandise news on internet and a large amount of noises be present, Using natural language processing techniques such as Chinese word segmentation, keyword extraction, semantic matches, to improve cross-border commercial quality detection risk Precision and efficiency.By implementing this method on the computer systems, the cross-border business of the magnanimity sold on internet can be checked in real time Product, it is identified can in the risk commodity time very short after restocking, so that consumer and government regulator Cross-border merchandise risk information can be grasped in time and is made successfully manages.
Cross-border electric business commercial quality Risk Identification Method proposed by the present invention comprises the following steps:
Step S1:Knowledge acquisition, the laws and regulations related to cross-border electric business, national standard are converted into regular pattern composite knowledge;
The step S1 comprises the following steps:
S11:Define four kinds of risk rules and its corresponding syntactic structure, respectively classifying rules, parent rule, with square gauge Then and block rule;Using the classifying rules grammatical form that BNF form defines as:
CLASSIFICATION_RULE::=" IF merchandise newss include keyword " argument { ", " argument } [" And not
Containing keyword, " keyword { ", " keyword }] " THEN commodity belong to classification " keyword
Parent rule syntax form is:
FATHER_CLASS::=" the IF type of merchandises are " keyword " THEN commodity fall within type " keyword
Being formulated rule syntax form is:
INGREDIENT_RULE_LIMIT::=" IF merchandise classifications are " keyword " and commodity " keyword (" is big Be less than ") number in " | " " THEN commodity it is risky "
INGREDIENT_RULE_RANGE::Between=" IF merchandise classifications are " keyword " and commodity " keyword (" It is outer in ") number in " | " "-" number " THEN commodity it is risky "
The grammatical form of block rule is:
FORBIDDEN_RULE::=" IF merchandise newss include keyword " argument { ", " argument } [" and be free of Keyword " keyword { ", " keyword }] " THEN commodity are prohibited immigration "
Wherein:
argument::=keyword { " | " keyword }
Keyword and number is respectively character string and numeral, is filled in by user according to the clause of regulation, standard
S12:The regular text of user's input is parsed, is translated into the computer code for meeting Drools standards;
Step S2:Commodity title is parsed;
The step S2 comprises the following steps:
S21:Commodity title is segmented;
The step S21 is specific as follows:
Step S211:The word in semantic dictionary HowNet is traveled through, if it is appeared in commodity title, the word is added Enter into temporary table;
Step S212:The word in temporary table is traveled through, if it is included by another word in list, deletes the quilt Including word;
S22:Weight is assigned to commodity title word;
The step S22 is specific as follows:
Step S221:Keyword figure G=(V, E) is built, wherein V is set of node, is made up of the S21 word segmentation results generated, The side E being then based between cooccurrence relation construction any two points of the word in commodity title, only when them between two nodes Side be present during co-occurrence in same commodity title in corresponding word;
Step S222:Using TextRank algorithm according to equation below calculate node ViWeight WS (Vi):
Wherein, d is damped coefficient, value 0.85, represents a certain specified point from keyword figure and points to other arbitrfary points Probability, wijFor wantonly 2 points V in keyword figurei, VjBetween side weight, make all side rights refetch 1, for a given point Vi, In(Vi) it is to point to point ViSet, Out (Vi) it is point ViThe set of the point of sensing;
Step S223:Arbitrary initial weight value is specified to the point in keyword figure, and is iterated to calculate until weight convergence, Think iteration convergence when difference of the weighted value of every bit in keyword figure between iteration twice is both less than 0.0001, and it is defeated Go out the weighted value of now each word;
Step S3:Commodity are classified according to the type of merchandise defined in national standard and regulation;
The step S3 comprises the following steps:
S31:Entitative concept Entity is established, assigns its title, numerical value, related entities list, list of types, activation rule Six attributes of list and degree of risk;
S32:For commodity title to be sorted, corresponding Entity is created, extracts WS (V in commodity titlei) most Three big nominal words are added in the list of types of the Entity as the possibility type of commodity, meanwhile, according to semanteme Dictionary HowNet obtains the parent concept of these three nouns, and they are also added in list of types;
S33:Entity corresponding to commodity is added into Drools inference machines, if the classifying rules described in triggering S11, business Category type is distinguished, if commodity do not trigger any classifying rules, determines its type in the following way:
Calculate the mutual information MI between word w and w':
Wherein p (w, w ') is the ratio shared in all sentences of the sentence containing word w and word w' in corpus, and p (w) is Sentence containing word w ratio shared in all sentences;
Define the word degree of correlation
Wherein l is the word length weighed with number of words, and S is the set of all sentences in corpus, when two words are complete When identical, its degree of correlation is calculated according to situation I, and when two words include different individual characters, its degree of correlation is counted according to situation II Calculate
One commodity title T and class declaration C degree of correlation R (T, C) is calculated as follows:
WhereinF (w, d) be in document d word w occur number, D It is the set that all documents are formed
By the degree of correlation for calculating a certain commodity title and all types definition document, it may be determined that degree of correlation highest document Corresponding type is the affiliated type of the commodity;
Step S4:Commercial quality risk is identified;
The step S4 comprises the following steps:
S41:The cross-border merchandise news that n bars not yet carry out risk identification is read in from database, n is taken as 50 per thread;
S42:By n bars merchandise news according to step S21 processing, when performing step S22, from prior operating procedure Title term weighing is directly inquired in the term weighing list that S22 is obtained to accelerate system processing speed, so-called operation in advance Step S22 refers to prefetch substantial amounts of, covering types of merchandize as more as possible merchandise newss in database, and S21 and S22 is performed with it Obtain term weighing, and in these " word-weight " information deposit tables and internal memory will be read in, system is every the set time with newest Some merchandise news operating procedure S22 and obtain new " word-weight " list;
S43:Merchandise news of the n bars by step S2 processing is sent into step S3 to perform, if commodity can trigger classifying rules, Then inference engine Drools can go out Risk Results according to formula rule induction automatically, if commodity can not trigger classifying rules, After step S33 is performed, judged according to threshold θ=0.5, if the maximum similarity of commodity and all categories is more than θ, selected The reasoning that the classification with maximum similarity carries out follow-up formula rule as merchandise classification is selected, if maximum similarity is less than θ, Commodity are not classified, only judged according to block rule whether the commodity are prohibited to enter the territory;
S44:If the related entities list of a commodity is sky, WS (V in the commodity keyword sequence are takeni) maximum 5 Individual word forms set A, pair and the commodity belong to each same category of history commodity, equally take WS (Vi) 5 maximum words form Set B, calculate the Jaccard similarities between A and B:
When the maximum Jaccard similarities for inputting commodity and history commodity are more than 0.5, by the non-of similarity maximum commodity The related entities list of empty formula information injection input commodity, as the formula information of input commodity, and will impart formula letter The input commodity of breath are re-fed into risk inference engine and made inferences;
Step S5:Visualization is carried out to merchandise risk recognition result to show;
The step S5 comprises the following steps:
S51:According to the time of acquisition first of commodity, brand, the place of production, sales platform, classification and shop title to risky Commodity carry out quantity statistics;
S52:Using the time as transverse axis, risk commodity amount is the longitudinal axis, to different brands, the place of production, sales platform, classification and shop The commodity of paving are drawn, including line chart and histogram;
S53:Selected certain time period, exists according to different brands, the place of production, sales platform, classification and the risk in shop commodity Shared proportion draws pie chart in overall risk commodity amount;
S54:Using the place of production of cross-border commodity as foundation, the marked product importer on world map, using importer capital as The center of circle, the risk commodity amount that the state is found is that radius draws circle, and is illustrated in dynamic effect in WEB page.
The present invention has the effect that and advantage:
The present invention carries out cross-border electric business commercial quality risk identification using the method for Process Based, magnanimity information, The key message of cross-border commodity can be relatively accurately extracted in quick renewal and the internet environment full of noise, and then is analyzed Infer its quality risk.The commercial quality standard and inlet and outlet regulation that it can be formulated according to country judge the specific of merchandise risk Type, and risk information can be counted according to a variety of data dimensions and diagrammatic representation, its application can strengthen cross-border business The promptness and automaticity of product risk supervision, supervision department is helped to improve operating efficiency.
Brief description of the drawings
Fig. 1 is rule-based risk recognition system block diagram
Fig. 2 is merchandise risk reasoning process schematic diagram
Embodiment
The preferable case study on implementation of the present invention is told about below in conjunction with accompanying drawing.As shown in figure 1, the core of the present invention is rule-based Risk identification, the input that it is used has three sources, is comprising the risk definition including national standard, policy and regulation respectively File, cross-border electric business platform and semantic dictionary.Its risk defines file and defines which kind of wind when commodity have Danger, cross-border electric business platform provide the detailed description information of cross-border commodity, and semantic dictionary then defines the word related to field And relation between word.Risk information visualization output module is responsible for being counted and visualized exhibition to the merchandise risk identified Show.The present invention is further elaborated with reference to Fig. 2:
Step S1:Knowledge acquisition, the laws and regulations related to cross-border electric business, national standard are converted into regular pattern composite knowledge, Which constitute the regular source in rule base in Fig. 2;
The step S1 comprises the following steps:
S11:Define four kinds of risk rules and its corresponding syntactic structure, respectively classifying rules, parent rule, with square gauge Then and block rule;Using the classifying rules grammatical form that BNF form defines as:
CLASSIFICATION_RULE::=" IF merchandise newss include keyword " argument { ", " argument } [" And " keyword { ", " keyword }] " THEN commodity belong to classification " keyword without keyword
Parent rule syntax form is:
FATHER_CLASS::=" the IF type of merchandises are " keyword " THEN commodity fall within type " keyword
Being formulated rule syntax form is:
INGREDIENT_RULE_LIMIT::=" IF merchandise classifications are " keyword " and commodity " keyword (" is big Be less than ") number in " | " " THEN commodity it is risky "
INGREDIENT_RULE_RANGE::Between=" IF merchandise classifications are " keyword " and commodity " keyword (" It is outer in ") number in " | " "-" number " THEN commodity it is risky "
The grammatical form of block rule is:
FORBIDDEN_RULE::=" IF merchandise newss include keyword " argument { ", " argument } [" and be free of Keyword " keyword { ", " keyword }] " THEN commodity are prohibited immigration "
Wherein:
argument::=keyword { " | " keyword }
Keyword and number is respectively character string and numeral, is filled in by user according to the clause of regulation, standard
S12:The regular text of user's input is parsed, is translated into the computer code for meeting Drools standards;
Example:
The classifying rules of " newborn base infant formula " this type of merchandise can be write in national standard GB 10765-2010 For:
IF merchandise newss include keyword pre sections | and 1 section | one section, milk powder and belong to classification without keyword beans THEN commodity Newborn base infant formula
The fat content of " newborn base infant formula " class commodity need to be in 1.05-1.4g/ in national standard GB 10765-2010 100kJ rule can be written as:
IF merchandise classifications are risky in 1.05-1.4THEN commodity outside the fat of newborn base infant formula and commodity
User rules for writing and can be stored in the path that system is specified directly in notepad, can also be provided in system Rule is filled in graphical operation interface;
Step S2:Commodity title is parsed;
The step S2 comprises the following steps:
S21:Commodity title is segmented;
The step S21 is specific as follows:
Step S211:The word in semantic dictionary HowNet is traveled through, if it is appeared in commodity title, the word is added Enter into temporary table;
Step S212:The word in temporary table is traveled through, if it is included by another word in list, deletes the quilt Including word;
S22:Weight is assigned to commodity title word;
The step S22 is specific as follows:
Step S221:Keyword figure G=(V, E) is built, wherein V is set of node, is made up of the S21 word segmentation results generated, The side E being then based between cooccurrence relation construction any two points of the word in commodity title, only when them between two nodes Side be present during co-occurrence in same commodity title in corresponding word;
Step S222:Using TextRank algorithm according to equation below calculate node ViWeight WS (Vi):
Wherein, d is damped coefficient, value 0.85, represents a certain specified point from keyword figure and points to other arbitrfary points Probability, wijFor wantonly 2 points V in keyword figurei, VjBetween side weight, make all side rights refetch 1, for a given point Vi, In(Vi) it is to point to point ViSet, Out (Vi) it is point ViThe set of the point of sensing;
Step S223:Arbitrary initial weight value is specified to the point in keyword figure, and is iterated to calculate until weight convergence, Think iteration convergence when difference of the weighted value of every bit in keyword figure between iteration twice is both less than 0.0001, and it is defeated Go out the weighted value of now each word;
Step S3:Commodity are classified according to the type of merchandise defined in national standard and regulation;
The step S3 comprises the following steps:
S31:Entitative concept Entity is established, assigns its title, numerical value, related entities list, list of types, activation rule Six attributes of list and degree of risk;
S32:For commodity title to be sorted, corresponding Entity is created, extracts WS (V in commodity titlei) most Three big nominal words are added in the list of types of the Entity as the possibility type of commodity, meanwhile, according to semanteme Dictionary HowNet obtains the parent concept of these three nouns, and they are also added in list of types, " the business in this corresponding diagram 2 Product Extended-type " step;
S33:Entity corresponding to commodity is added into Drools inference machines, if the classifying rules described in triggering S11, business Category type is distinguished, if commodity do not trigger any classifying rules, determines its type in the following way:
Calculate the mutual information MI between word w and w':
Wherein p (w, w ') is the ratio shared in all sentences of the sentence containing word w and word w' in corpus, and p (w) is Sentence containing word w ratio shared in all sentences;
Define the word degree of correlation
Wherein l is the word length weighed with number of words, and S is the set of all sentences in corpus, when two words are complete When identical, its degree of correlation is calculated according to situation I, and when two words include different individual characters, its degree of correlation is counted according to situation II Calculate
One commodity title T and class declaration C degree of correlation R (T, C) is calculated as follows:
WhereinF (w, d) is the number that word w occurs in document d, D is the set that all documents are formed, and it is all national standards, policy, specification and nearest 200000 commodity text informations to take D here The degree of correlation for calculating a certain commodity title and all types definition document is passed through in the set of composition, it may be determined that degree of correlation highest text Type corresponding to shelves is the affiliated type of the commodity;
Step S4:Commercial quality risk is identified;
The step S4 comprises the following steps:
S41:The cross-border merchandise news that n bars not yet carry out risk identification is read in from database, n is taken as 50 per thread;
S42:By n bars merchandise news according to step S21 processing, when performing step S22, from prior operating procedure Title term weighing is directly inquired in the term weighing list that S22 is obtained to accelerate system processing speed, so-called operation in advance Step S22 refers to prefetch substantial amounts of, covering types of merchandize as more as possible merchandise newss in database, and S21 and S22 is performed with it Obtain term weighing, and in these " word-weight " information deposit tables and internal memory will be read in, system is daily using newest 200000 merchandise news operating procedure S22 simultaneously obtain new " word-weight " list;
S43:Merchandise news of the n bars by step S2 processing is sent into step S3 to perform, if commodity can trigger classifying rules, Then inference engine Drools can go out Risk Results according to formula rule induction automatically, if commodity can not trigger classifying rules, After step S33 is performed, judged according to threshold θ=0.5, if the maximum similarity of commodity and all categories is more than θ, selected The reasoning that the classification with maximum similarity carries out follow-up formula rule as merchandise classification is selected, if maximum similarity is less than θ, Commodity are not classified, only judged according to block rule whether the commodity are prohibited to enter the territory;
S44:If the related entities list of a commodity is sky, WS (V in the commodity keyword sequence are takeni) maximum 5 Individual word forms set A, pair and the commodity belong to each same category of history commodity, equally take WS (Vi) 5 maximum words form Set B, calculate the Jaccard similarities between A and B:
When the maximum Jaccard similarities for inputting commodity and history commodity are more than 0.5, by the non-of similarity maximum commodity The related entities list of empty formula information injection input commodity, as the formula information of input commodity, and will impart formula letter The input commodity of breath are re-fed into risk inference engine and made inferences;
Step S5:Visualization is carried out to merchandise risk recognition result to show;
The step S5 comprises the following steps:
S51:According to the time of acquisition first of commodity, brand, the place of production, sales platform, classification and shop title to risky Commodity carry out quantity statistics;
S52:Using the time as transverse axis, risk commodity amount is the longitudinal axis, to different brands, the place of production, sales platform, classification and shop The commodity of paving are drawn, including line chart and histogram;
S53:Selected certain time period, exists according to different brands, the place of production, sales platform, classification and the risk in shop commodity Shared proportion draws pie chart in overall risk commodity amount;
S54:Using the place of production of cross-border commodity as foundation, the marked product importer on world map, using importer capital as The center of circle, the risk commodity amount that the state is found is that radius draws circle, and is illustrated in dynamic effect in WEB page.

Claims (1)

1. a kind of cross-border electric business commercial quality risk automatic identifying method, it is characterised in that comprise the following steps:
Step S1:Knowledge acquisition, the laws and regulations related to cross-border electric business, national standard are converted into regular pattern composite knowledge;
The step S1 comprises the following steps:
S11:Define four kinds of risk rules and its corresponding syntactic structure, respectively classifying rules, parent rule, formula rule and Block rule;Using the classifying rules grammatical form that BNF form defines as:
CLASSIFICATION_RULE::=" IF merchandise newss include keyword " argument { ", " argument } [" and not Containing keyword, " keyword { ", " keyword }] " THEN commodity belong to classification " keyword
Parent rule syntax form is:
FATHER_CLASS::=" the IF type of merchandises are " keyword " THEN commodity fall within type " keyword
Being formulated rule syntax form is:
INGREDIENT_RULE_LIMIT::=" IF merchandise classifications are " keyword " and commodity " keyword (" being more than " | " Less than ") number " THEN commodity it is risky "
INGREDIENT_RULE_RANGE::=" IF merchandise classifications for " keyword " and commodity " keyword (" between in " | " Outside in ") number "-" number " THEN commodity it is risky "
The grammatical form of block rule is:
FORBIDDEN_RULE::=" IF merchandise newss include keyword " argument { ", " argument } [" and without key Word " keyword { ", " keyword }] " THEN commodity are prohibited immigration "
Wherein:
argument::=keyword { " | " keyword }
Keyword and number is respectively character string and numeral, is filled in by user according to the clause of regulation, standard
S12:The regular text of user's input is parsed, is translated into the computer code for meeting Drools standards;
Step S2:Commodity title is parsed;
The step S2 comprises the following steps:
S21:Commodity title is segmented;
The step S21 is specific as follows:
Step S211:The word in semantic dictionary HowNet is traveled through, if it is appeared in commodity title, the word is added to In temporary table;
Step S212:The word in temporary table is traveled through, if it is included by another word in list, this is deleted and is included Word;
S22:Weight is assigned to commodity title word;
The step S22 is specific as follows:
Step S221:Keyword figure G=(V, E) is built, wherein V is set of node, is made up of the S21 word segmentation results generated, then The side E between cooccurrence relation construction any two points based on word in commodity title, only when they are corresponding between two nodes Word side be present during co-occurrence in same commodity title;
Step S222:Using TextRank algorithm according to equation below calculate node ViWeight WS (Vi):
<mrow> <mi>W</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>d</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>d</mi> <mo>.</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <msub> <mi>V</mi> <mi>j</mi> </msub> <mo>&amp;Element;</mo> <mi>I</mi> <mi>n</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </munder> <mfrac> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>&amp;Element;</mo> <mi>O</mi> <mi>u</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </msub> <msub> <mi>w</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> </mrow> </mfrac> <mi>W</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow>
Wherein, d is damped coefficient, value 0.85, represents the probability that a certain specified point from keyword figure points to other arbitrfary points, wijFor wantonly 2 points V in keyword figurei, VjBetween side weight, make all side rights refetch 1, for a given point Vi, In (Vi) it is to point to point ViSet, Out (Vi) it is point ViThe set of the point of sensing;
Step S223:Arbitrary initial weight value is specified to the point in keyword figure, and is iterated to calculate until weight convergence, works as pass Think iteration convergence when difference of the weighted value of every bit between iteration twice is both less than 0.0001 in keyword figure, and export this When each word weighted value;
Step S3:Commodity are classified according to the type of merchandise defined in national standard and regulation;
The step S3 comprises the following steps:
S31:Entitative concept Entity is established, assigns its title, numerical value, related entities list, list of types, activation list of rules With six attributes of degree of risk;
S32:For commodity title to be sorted, corresponding Entity is created, extracts WS (V in commodity titlei) maximum three Individual nominal word is added in the list of types of the Entity as the possibility type of commodity, meanwhile, according to semantic dictionary HowNet obtains the parent concept of these three nouns, and they are also added in list of types;
S33:Entity corresponding to commodity is added into Drools inference machines, if the classifying rules described in triggering S11, commodity class Type is distinguished, if commodity do not trigger any classifying rules, determines its type in the following way:
Calculate the mutual information MI between word w and w':
<mrow> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <msup> <mi>w</mi> <mo>&amp;prime;</mo> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mi>log</mi> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <msup> <mi>w</mi> <mo>&amp;prime;</mo> </msup> <mo>)</mo> </mrow> </mrow> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mo>&amp;prime;</mo> </msup> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Wherein p (w, w ') is the ratio shared in all sentences of the sentence containing word w and word w' in corpus, p (w) be containing Word w sentence ratio shared in all sentences;
Define the word degree of correlation
Wherein l is the word length weighed with number of words, and S is the set of all sentences in corpus, when two words are identical When, its degree of correlation calculates according to situation I, and when two words include different individual characters, its degree of correlation calculates one according to situation II Individual commodity title T and class declaration C degree of correlation R (T, C) is calculated as follows:
<mrow> <mi>R</mi> <mrow> <mo>(</mo> <mi>T</mi> <mo>,</mo> <mi>C</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <mi>w</mi> <mo>&amp;Element;</mo> <mi>T</mi> </mrow> </msub> <munder> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> <mrow> <msup> <mi>w</mi> <mo>&amp;prime;</mo> </msup> <mo>&amp;Element;</mo> <mi>C</mi> </mrow> </munder> <mi>R</mi> <mi>W</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <msup> <mi>w</mi> <mo>&amp;prime;</mo> </msup> <mo>)</mo> </mrow> <mi>W</mi> <mi>S</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <mi>w</mi> <mo>&amp;Element;</mo> <mi>T</mi> </mrow> </msub> <mi>W</mi> <mi>S</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>+</mo> <mfrac> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <msup> <mi>w</mi> <mo>&amp;prime;</mo> </msup> <mo>&amp;Element;</mo> <mi>C</mi> </mrow> </msub> <munder> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> <mrow> <mi>w</mi> <mo>&amp;Element;</mo> <mi>T</mi> </mrow> </munder> <mi>R</mi> <mi>W</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <msup> <mi>w</mi> <mo>&amp;prime;</mo> </msup> <mo>)</mo> </mrow> <mi>T</mi> <mi>F</mi> <mi>I</mi> <mi>D</mi> <mi>F</mi> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mo>&amp;prime;</mo> </msup> <mo>,</mo> <mi>C</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <msup> <mi>w</mi> <mo>&amp;prime;</mo> </msup> <mo>&amp;Element;</mo> <mi>C</mi> </mrow> </msub> <mi>T</mi> <mi>F</mi> <mi>I</mi> <mi>D</mi> <mi>F</mi> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mo>&amp;prime;</mo> </msup> <mo>,</mo> <mi>C</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>
WhereinF (w, d) is the number that word w occurs in document d, and D is institute There is the set that document is formed
By the degree of correlation for calculating a certain commodity title and all types definition document, it may be determined that degree of correlation highest document institute is right The type answered is the affiliated type of the commodity;
Step S4:Commercial quality risk is identified;
The step S4 comprises the following steps:
S41:The cross-border merchandise news that n bars not yet carry out risk identification is read in from database, n is taken as 50 per thread;
S42:By n bars merchandise news according to step S21 processing, when performing step S22, obtained from prior operating procedure S22 To term weighing list in directly inquire title term weighing to accelerate system processing speed, so-called prior operating procedure S22 refers to prefetch substantial amounts of, covering types of merchandize as more as possible merchandise newss in database, performs S21 and S22 with it and obtains Term weighing, and in these " word-weight " information deposit tables and internal memory will be read in, if system every the set time with newest Dry bar merchandise news operating procedure S22 simultaneously obtains new " word-weight " list;
S43:Merchandise news of the n bars by step S2 processing is sent into step S3 to perform, if commodity can trigger classifying rules, pushed away Reason engine Drools can go out Risk Results according to formula rule induction automatically, if commodity can not trigger classifying rules, hold After row step S33, judged according to threshold θ=0.5, if the maximum similarity of commodity and all categories is more than θ, selection tool The classification for having maximum similarity carries out the reasoning of follow-up formula rule as merchandise classification, not right if maximum similarity is less than θ Commodity are classified, and are only judged according to block rule whether the commodity are prohibited to enter the territory;
S44:If the related entities list of a commodity is sky, WS (V in the commodity keyword sequence are takeni) maximum 5 word structures Into set A, pair belong to each same category of history commodity with the commodity, equally take WS (Vi) 5 maximum words form set B, Calculate the Jaccard similarities between A and B:
<mrow> <mi>J</mi> <mrow> <mo>(</mo> <mi>A</mi> <mo>,</mo> <mi>B</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>|</mo> <mi>A</mi> <mo>&amp;cap;</mo> <mi>B</mi> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <mi>A</mi> <mo>&amp;cup;</mo> <mi>B</mi> <mo>|</mo> </mrow> </mfrac> </mrow>
When the maximum Jaccard similarities for inputting commodity and history commodity are more than 0.5, the non-NULL of similarity maximum commodity is matched somebody with somebody The related entities list of square information injection input commodity, as the formula information of input commodity, and will impart formula information Input commodity are re-fed into risk inference engine and made inferences;
Step S5:Visualization is carried out to merchandise risk recognition result to show;
The step S5 comprises the following steps:
S51:Time, brand, the place of production, sales platform, classification and shop title are obtained first to risky business according to commodity Product carry out quantity statistics;
S52:Using the time as transverse axis, risk commodity amount is the longitudinal axis, to different brands, the place of production, sales platform, classification and shop Commodity are drawn, including line chart and histogram;
S53:Selected certain time period, according to different brands, the place of production, sales platform, classification and the risk in shop commodity in total wind Shared proportion draws pie chart in dangerous commodity amount;
S54:Using the place of production of cross-border commodity as foundation, the marked product importer on world map, using importer capital as the center of circle, The risk commodity amount that the state is found is that radius draws circle, and is illustrated in dynamic effect in WEB page.
CN201711099313.7A 2017-11-09 2017-11-09 Rule-based cross-border e-commerce commodity quality risk identification method Active CN107886240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711099313.7A CN107886240B (en) 2017-11-09 2017-11-09 Rule-based cross-border e-commerce commodity quality risk identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711099313.7A CN107886240B (en) 2017-11-09 2017-11-09 Rule-based cross-border e-commerce commodity quality risk identification method

Publications (2)

Publication Number Publication Date
CN107886240A true CN107886240A (en) 2018-04-06
CN107886240B CN107886240B (en) 2021-09-28

Family

ID=61779879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711099313.7A Active CN107886240B (en) 2017-11-09 2017-11-09 Rule-based cross-border e-commerce commodity quality risk identification method

Country Status (1)

Country Link
CN (1) CN107886240B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189015A (en) * 2019-05-24 2019-08-30 复旦大学 Risk evaluating system towards entry and exit commodity
CN111241288A (en) * 2020-01-17 2020-06-05 烟台海颐软件股份有限公司 Emergency sensing system of large centralized power customer service center and construction method
CN112101774A (en) * 2020-09-11 2020-12-18 复旦大学 Cross-border commodity-oriented associated risk identification system
CN112365165A (en) * 2020-11-13 2021-02-12 广东卓志跨境电商供应链服务有限公司 Cross-border e-commerce wind control management method and system
CN112365166A (en) * 2020-11-13 2021-02-12 广东卓志跨境电商供应链服务有限公司 Cross-border e-commerce commodity filing risk control method and related device
CN114185869A (en) * 2021-12-03 2022-03-15 四川新网银行股份有限公司 Data model auditing method based on data standard

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782998A (en) * 2009-01-20 2010-07-21 复旦大学 Intelligent judging method for illegal on-line product information and system
CN102663025A (en) * 2012-03-22 2012-09-12 浙江盘石信息技术有限公司 Illegal online commodity detection method
CN104063523A (en) * 2014-07-21 2014-09-24 焦点科技股份有限公司 E-commerce search scoring and ranking method and system
CN104321794A (en) * 2013-05-02 2015-01-28 邓白氏公司 A system and method using multi-dimensional rating to determine an entity's future commercial viability
CN104794625A (en) * 2015-04-28 2015-07-22 酷悠悠科技(深圳)有限公司 Operation method and system of cross-border e-commerce website
CN105427050A (en) * 2015-12-02 2016-03-23 常州大学 Trust model based food quality evaluation method
CN105677622A (en) * 2016-03-11 2016-06-15 郑州师范学院 Automatic big data analysis report generating system
CN105812394A (en) * 2016-05-24 2016-07-27 王四春 Novel application of cloud computing to cross-border electronic commerce
CN105844478A (en) * 2016-03-17 2016-08-10 深圳市检验检疫科学研究院 Product quality sampling method used for cross-border electronic commerce
CN105893350A (en) * 2016-03-31 2016-08-24 重庆大学 Evaluating method and system for text comment quality in electronic commerce
CN106886934A (en) * 2016-12-30 2017-06-23 北京三快在线科技有限公司 Method, system and apparatus for determining merchant categories

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782998A (en) * 2009-01-20 2010-07-21 复旦大学 Intelligent judging method for illegal on-line product information and system
CN102663025A (en) * 2012-03-22 2012-09-12 浙江盘石信息技术有限公司 Illegal online commodity detection method
CN104321794A (en) * 2013-05-02 2015-01-28 邓白氏公司 A system and method using multi-dimensional rating to determine an entity's future commercial viability
CN104063523A (en) * 2014-07-21 2014-09-24 焦点科技股份有限公司 E-commerce search scoring and ranking method and system
CN104794625A (en) * 2015-04-28 2015-07-22 酷悠悠科技(深圳)有限公司 Operation method and system of cross-border e-commerce website
CN105427050A (en) * 2015-12-02 2016-03-23 常州大学 Trust model based food quality evaluation method
CN105677622A (en) * 2016-03-11 2016-06-15 郑州师范学院 Automatic big data analysis report generating system
CN105844478A (en) * 2016-03-17 2016-08-10 深圳市检验检疫科学研究院 Product quality sampling method used for cross-border electronic commerce
CN105893350A (en) * 2016-03-31 2016-08-24 重庆大学 Evaluating method and system for text comment quality in electronic commerce
CN105812394A (en) * 2016-05-24 2016-07-27 王四春 Novel application of cloud computing to cross-border electronic commerce
CN106886934A (en) * 2016-12-30 2017-06-23 北京三快在线科技有限公司 Method, system and apparatus for determining merchant categories

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王卓伦等: "网络零售物流企业信用评价指标体系与风险预警 ", 《福建电脑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189015A (en) * 2019-05-24 2019-08-30 复旦大学 Risk evaluating system towards entry and exit commodity
CN111241288A (en) * 2020-01-17 2020-06-05 烟台海颐软件股份有限公司 Emergency sensing system of large centralized power customer service center and construction method
CN112101774A (en) * 2020-09-11 2020-12-18 复旦大学 Cross-border commodity-oriented associated risk identification system
CN112365165A (en) * 2020-11-13 2021-02-12 广东卓志跨境电商供应链服务有限公司 Cross-border e-commerce wind control management method and system
CN112365166A (en) * 2020-11-13 2021-02-12 广东卓志跨境电商供应链服务有限公司 Cross-border e-commerce commodity filing risk control method and related device
CN114185869A (en) * 2021-12-03 2022-03-15 四川新网银行股份有限公司 Data model auditing method based on data standard

Also Published As

Publication number Publication date
CN107886240B (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN107886240A (en) A kind of rule-based cross-border electric business commercial quality Risk Identification Method
Sharif et al. Sentiment analysis of Bengali texts on online restaurant reviews using multinomial Naïve Bayes
Lopez Barbosa et al. Evaluating hotels rating prediction based on sentiment analysis services
CN103838789A (en) Text similarity computing method
CN111767725A (en) Data processing method and device based on emotion polarity analysis model
CN109345272A (en) One kind is based on the markovian shop credit risk forecast method of improvement
Vamshi et al. Topic model based opinion mining and sentiment analysis
Dewang et al. Identification of fake reviews using new set of lexical and syntactic features
CN112905739A (en) False comment detection model training method, detection method and electronic equipment
CN106407195A (en) Method and system for eliminating duplication of webpage
CN111680131A (en) Document clustering method and system based on semantics and computer equipment
Devasia et al. Feature extracted sentiment analysis of customer product reviews
Velmurugan et al. Mining implicit and explicit rules for customer data using natural language processing and apriori algorithm
Sheikhattar et al. A thematic analysis–based model for identifying the impacts of natural crises on a supply chain for service integrity: A text analysis approach
Anggara et al. Analysis of Netizen Comments Sentiment on Public Official Statements on Instagram Social Media Accounts
Rubtsova et al. Aspect extraction from reviews using conditional random fields
Muangon et al. A lexiconizing framework of feature-based opinion mining in tourism industry
Raj et al. Automated Cyberstalking Classification using Social Media
Juliane Implementation of Naive Bayes Algorithm on Sentiment Analysis Application
Pérez-Santiago et al. We Will Know Them by Their Style: Fake News Detection Based on Masked N-Grams
Regino et al. QART: A Framework to Transform Natural Language Questions and Answers into RDF Triples.
Kurematsu et al. DODDLE II: A domain ontology development environment using a MRD and text corpus
Pain Harmonized System Code Classification Using Transfer Learning with Pre-Trained Weights
Watjanapron et al. USING DEE PLEARNING MODEL WITH MULTIPLE INPUTS FOR THAI DEFAMATORY TEXT CLASSIFICATION ON PUBLIC FACEBOOK COMMENTS
KR102663632B1 (en) Device and method for artwork trend data prediction using artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant