CN107886240A - A kind of rule-based cross-border electric business commercial quality Risk Identification Method - Google Patents
A kind of rule-based cross-border electric business commercial quality Risk Identification Method Download PDFInfo
- Publication number
- CN107886240A CN107886240A CN201711099313.7A CN201711099313A CN107886240A CN 107886240 A CN107886240 A CN 107886240A CN 201711099313 A CN201711099313 A CN 201711099313A CN 107886240 A CN107886240 A CN 107886240A
- Authority
- CN
- China
- Prior art keywords
- mrow
- commodity
- keyword
- word
- msub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to cross-border e-commerce field, disclose a kind of method for being used to carry out cross-border electric business commercial quality risk automatic identification, including merchandise risk knowledge acquisition, commodity are classified automatically, merchandise risk identifies and merchandise risk information visualization module, the cross-border electric business commercial quality risk automatic identifying method, the cross-border merchandise news of magnanimity can be rapidly processed in time, it was found that the commodity of China's quality requirement are not met wherein, and the statistical information of risk commodity is presented in visual form, the present invention can help consumer to select the safer cross-border commodity of quality, simultaneously relevant government department can be assisted to supervise cross-border electric business platform.
Description
Technical field
The present invention relates to cross-border electric business field, and particularly quality risk possessed by cross-border electric business commodity is known automatically
Other field.
Background technology
Cross-border ecommerce refers to double by the trade in country variant or area as a kind of new pattern of international trade
Side, payment and settlement is traded by e-commerce platform, and commodity transaction is carried out with the clearance of the logistics form such as mail or express delivery.With
Cross-border electric business to develop rapidly, substantial amounts of cross-border express mail or mailbag are transmitted directly in consumer's hand, easily to China's economy are pacified
Entirely, ecological environment and consumer's physical and mental health bring harm.Due to cross-border electric business have transaction count frequently, single transaction commodity
The features such as quantity is few, service log-on threshold is low, government be difficult to check cross-border commodity comprehensively, at present can only according to thousand/
Two ratio is spot-check, and is not only difficult to ensure that the quality safety of commodity, and the also supervision to relevant department causes huge
Pressure.
The method of automatic identification directly is carried out to cross-border electric business commercial quality risk at present and case is also difficult to see, it is existing
Commercial risks analysis case be directed to credit risk field more, and being evaluated based on the commercial quality that public praise and public feelings information are excavated is
A kind of after-action review, the risk found by consumer can only be found, businessman's restocking violation commodity can not be applied to but do not sold also
Risk detecting when going out.Rule-based system is a kind of artificial intelligence approach, and human knowledge is expressed as computer by it to manage
The rule of solution, such as IF-THEN production, known facts then are read in using computer, the rule composition in rule base
Reasoning from logic chain, finally give the solution of problem.Because the policies and regulations of country are inherently the rule that one group of needs is observed,
Wherein clause of statute is applicable the precondition that situation is IF parts, and the conclusion of clause is the result of THEN parts, therefore is adapted to
Handled by rule-based intelligence system.
The content of the invention
The present invention is directed to the risk supervision and identification problem of the cross-border commodity of magnanimity, using RBR (Rule-based
Reasoning) the method pair state quality standard related to cross-border commodity and regulation carry out knowledge Modeling, acquisition and reasoning, make
Computer can simulate human expert and the classification and parameter of cross-border commodity are analyzed, and to risk presence or absence and risk
Species is judged.Meanwhile lack cannonical format for cross-border merchandise news on internet and a large amount of noises be present,
Using natural language processing techniques such as Chinese word segmentation, keyword extraction, semantic matches, to improve cross-border commercial quality detection risk
Precision and efficiency.By implementing this method on the computer systems, the cross-border business of the magnanimity sold on internet can be checked in real time
Product, it is identified can in the risk commodity time very short after restocking, so that consumer and government regulator
Cross-border merchandise risk information can be grasped in time and is made successfully manages.
Cross-border electric business commercial quality Risk Identification Method proposed by the present invention comprises the following steps:
Step S1:Knowledge acquisition, the laws and regulations related to cross-border electric business, national standard are converted into regular pattern composite knowledge;
The step S1 comprises the following steps:
S11:Define four kinds of risk rules and its corresponding syntactic structure, respectively classifying rules, parent rule, with square gauge
Then and block rule;Using the classifying rules grammatical form that BNF form defines as:
CLASSIFICATION_RULE::=" IF merchandise newss include keyword " argument { ", " argument } ["
And not
Containing keyword, " keyword { ", " keyword }] " THEN commodity belong to classification " keyword
Parent rule syntax form is:
FATHER_CLASS::=" the IF type of merchandises are " keyword " THEN commodity fall within type " keyword
Being formulated rule syntax form is:
INGREDIENT_RULE_LIMIT::=" IF merchandise classifications are " keyword " and commodity " keyword (" is big
Be less than ") number in " | " " THEN commodity it is risky "
INGREDIENT_RULE_RANGE::Between=" IF merchandise classifications are " keyword " and commodity " keyword ("
It is outer in ") number in " | " "-" number " THEN commodity it is risky "
The grammatical form of block rule is:
FORBIDDEN_RULE::=" IF merchandise newss include keyword " argument { ", " argument } [" and be free of
Keyword " keyword { ", " keyword }] " THEN commodity are prohibited immigration "
Wherein:
argument::=keyword { " | " keyword }
Keyword and number is respectively character string and numeral, is filled in by user according to the clause of regulation, standard
S12:The regular text of user's input is parsed, is translated into the computer code for meeting Drools standards;
Step S2:Commodity title is parsed;
The step S2 comprises the following steps:
S21:Commodity title is segmented;
The step S21 is specific as follows:
Step S211:The word in semantic dictionary HowNet is traveled through, if it is appeared in commodity title, the word is added
Enter into temporary table;
Step S212:The word in temporary table is traveled through, if it is included by another word in list, deletes the quilt
Including word;
S22:Weight is assigned to commodity title word;
The step S22 is specific as follows:
Step S221:Keyword figure G=(V, E) is built, wherein V is set of node, is made up of the S21 word segmentation results generated,
The side E being then based between cooccurrence relation construction any two points of the word in commodity title, only when them between two nodes
Side be present during co-occurrence in same commodity title in corresponding word;
Step S222:Using TextRank algorithm according to equation below calculate node ViWeight WS (Vi):
Wherein, d is damped coefficient, value 0.85, represents a certain specified point from keyword figure and points to other arbitrfary points
Probability, wijFor wantonly 2 points V in keyword figurei, VjBetween side weight, make all side rights refetch 1, for a given point Vi,
In(Vi) it is to point to point ViSet, Out (Vi) it is point ViThe set of the point of sensing;
Step S223:Arbitrary initial weight value is specified to the point in keyword figure, and is iterated to calculate until weight convergence,
Think iteration convergence when difference of the weighted value of every bit in keyword figure between iteration twice is both less than 0.0001, and it is defeated
Go out the weighted value of now each word;
Step S3:Commodity are classified according to the type of merchandise defined in national standard and regulation;
The step S3 comprises the following steps:
S31:Entitative concept Entity is established, assigns its title, numerical value, related entities list, list of types, activation rule
Six attributes of list and degree of risk;
S32:For commodity title to be sorted, corresponding Entity is created, extracts WS (V in commodity titlei) most
Three big nominal words are added in the list of types of the Entity as the possibility type of commodity, meanwhile, according to semanteme
Dictionary HowNet obtains the parent concept of these three nouns, and they are also added in list of types;
S33:Entity corresponding to commodity is added into Drools inference machines, if the classifying rules described in triggering S11, business
Category type is distinguished, if commodity do not trigger any classifying rules, determines its type in the following way:
Calculate the mutual information MI between word w and w':
Wherein p (w, w ') is the ratio shared in all sentences of the sentence containing word w and word w' in corpus, and p (w) is
Sentence containing word w ratio shared in all sentences;
Define the word degree of correlation
Wherein l is the word length weighed with number of words, and S is the set of all sentences in corpus, when two words are complete
When identical, its degree of correlation is calculated according to situation I, and when two words include different individual characters, its degree of correlation is counted according to situation II
Calculate
One commodity title T and class declaration C degree of correlation R (T, C) is calculated as follows:
WhereinF (w, d) be in document d word w occur number, D
It is the set that all documents are formed
By the degree of correlation for calculating a certain commodity title and all types definition document, it may be determined that degree of correlation highest document
Corresponding type is the affiliated type of the commodity;
Step S4:Commercial quality risk is identified;
The step S4 comprises the following steps:
S41:The cross-border merchandise news that n bars not yet carry out risk identification is read in from database, n is taken as 50 per thread;
S42:By n bars merchandise news according to step S21 processing, when performing step S22, from prior operating procedure
Title term weighing is directly inquired in the term weighing list that S22 is obtained to accelerate system processing speed, so-called operation in advance
Step S22 refers to prefetch substantial amounts of, covering types of merchandize as more as possible merchandise newss in database, and S21 and S22 is performed with it
Obtain term weighing, and in these " word-weight " information deposit tables and internal memory will be read in, system is every the set time with newest
Some merchandise news operating procedure S22 and obtain new " word-weight " list;
S43:Merchandise news of the n bars by step S2 processing is sent into step S3 to perform, if commodity can trigger classifying rules,
Then inference engine Drools can go out Risk Results according to formula rule induction automatically, if commodity can not trigger classifying rules,
After step S33 is performed, judged according to threshold θ=0.5, if the maximum similarity of commodity and all categories is more than θ, selected
The reasoning that the classification with maximum similarity carries out follow-up formula rule as merchandise classification is selected, if maximum similarity is less than θ,
Commodity are not classified, only judged according to block rule whether the commodity are prohibited to enter the territory;
S44:If the related entities list of a commodity is sky, WS (V in the commodity keyword sequence are takeni) maximum 5
Individual word forms set A, pair and the commodity belong to each same category of history commodity, equally take WS (Vi) 5 maximum words form
Set B, calculate the Jaccard similarities between A and B:
When the maximum Jaccard similarities for inputting commodity and history commodity are more than 0.5, by the non-of similarity maximum commodity
The related entities list of empty formula information injection input commodity, as the formula information of input commodity, and will impart formula letter
The input commodity of breath are re-fed into risk inference engine and made inferences;
Step S5:Visualization is carried out to merchandise risk recognition result to show;
The step S5 comprises the following steps:
S51:According to the time of acquisition first of commodity, brand, the place of production, sales platform, classification and shop title to risky
Commodity carry out quantity statistics;
S52:Using the time as transverse axis, risk commodity amount is the longitudinal axis, to different brands, the place of production, sales platform, classification and shop
The commodity of paving are drawn, including line chart and histogram;
S53:Selected certain time period, exists according to different brands, the place of production, sales platform, classification and the risk in shop commodity
Shared proportion draws pie chart in overall risk commodity amount;
S54:Using the place of production of cross-border commodity as foundation, the marked product importer on world map, using importer capital as
The center of circle, the risk commodity amount that the state is found is that radius draws circle, and is illustrated in dynamic effect in WEB page.
The present invention has the effect that and advantage:
The present invention carries out cross-border electric business commercial quality risk identification using the method for Process Based, magnanimity information,
The key message of cross-border commodity can be relatively accurately extracted in quick renewal and the internet environment full of noise, and then is analyzed
Infer its quality risk.The commercial quality standard and inlet and outlet regulation that it can be formulated according to country judge the specific of merchandise risk
Type, and risk information can be counted according to a variety of data dimensions and diagrammatic representation, its application can strengthen cross-border business
The promptness and automaticity of product risk supervision, supervision department is helped to improve operating efficiency.
Brief description of the drawings
Fig. 1 is rule-based risk recognition system block diagram
Fig. 2 is merchandise risk reasoning process schematic diagram
Embodiment
The preferable case study on implementation of the present invention is told about below in conjunction with accompanying drawing.As shown in figure 1, the core of the present invention is rule-based
Risk identification, the input that it is used has three sources, is comprising the risk definition including national standard, policy and regulation respectively
File, cross-border electric business platform and semantic dictionary.Its risk defines file and defines which kind of wind when commodity have
Danger, cross-border electric business platform provide the detailed description information of cross-border commodity, and semantic dictionary then defines the word related to field
And relation between word.Risk information visualization output module is responsible for being counted and visualized exhibition to the merchandise risk identified
Show.The present invention is further elaborated with reference to Fig. 2:
Step S1:Knowledge acquisition, the laws and regulations related to cross-border electric business, national standard are converted into regular pattern composite knowledge,
Which constitute the regular source in rule base in Fig. 2;
The step S1 comprises the following steps:
S11:Define four kinds of risk rules and its corresponding syntactic structure, respectively classifying rules, parent rule, with square gauge
Then and block rule;Using the classifying rules grammatical form that BNF form defines as:
CLASSIFICATION_RULE::=" IF merchandise newss include keyword " argument { ", " argument } ["
And " keyword { ", " keyword }] " THEN commodity belong to classification " keyword without keyword
Parent rule syntax form is:
FATHER_CLASS::=" the IF type of merchandises are " keyword " THEN commodity fall within type " keyword
Being formulated rule syntax form is:
INGREDIENT_RULE_LIMIT::=" IF merchandise classifications are " keyword " and commodity " keyword (" is big
Be less than ") number in " | " " THEN commodity it is risky "
INGREDIENT_RULE_RANGE::Between=" IF merchandise classifications are " keyword " and commodity " keyword ("
It is outer in ") number in " | " "-" number " THEN commodity it is risky "
The grammatical form of block rule is:
FORBIDDEN_RULE::=" IF merchandise newss include keyword " argument { ", " argument } [" and be free of
Keyword " keyword { ", " keyword }] " THEN commodity are prohibited immigration "
Wherein:
argument::=keyword { " | " keyword }
Keyword and number is respectively character string and numeral, is filled in by user according to the clause of regulation, standard
S12:The regular text of user's input is parsed, is translated into the computer code for meeting Drools standards;
Example:
The classifying rules of " newborn base infant formula " this type of merchandise can be write in national standard GB 10765-2010
For:
IF merchandise newss include keyword pre sections | and 1 section | one section, milk powder and belong to classification without keyword beans THEN commodity
Newborn base infant formula
The fat content of " newborn base infant formula " class commodity need to be in 1.05-1.4g/ in national standard GB 10765-2010
100kJ rule can be written as:
IF merchandise classifications are risky in 1.05-1.4THEN commodity outside the fat of newborn base infant formula and commodity
User rules for writing and can be stored in the path that system is specified directly in notepad, can also be provided in system
Rule is filled in graphical operation interface;
Step S2:Commodity title is parsed;
The step S2 comprises the following steps:
S21:Commodity title is segmented;
The step S21 is specific as follows:
Step S211:The word in semantic dictionary HowNet is traveled through, if it is appeared in commodity title, the word is added
Enter into temporary table;
Step S212:The word in temporary table is traveled through, if it is included by another word in list, deletes the quilt
Including word;
S22:Weight is assigned to commodity title word;
The step S22 is specific as follows:
Step S221:Keyword figure G=(V, E) is built, wherein V is set of node, is made up of the S21 word segmentation results generated,
The side E being then based between cooccurrence relation construction any two points of the word in commodity title, only when them between two nodes
Side be present during co-occurrence in same commodity title in corresponding word;
Step S222:Using TextRank algorithm according to equation below calculate node ViWeight WS (Vi):
Wherein, d is damped coefficient, value 0.85, represents a certain specified point from keyword figure and points to other arbitrfary points
Probability, wijFor wantonly 2 points V in keyword figurei, VjBetween side weight, make all side rights refetch 1, for a given point Vi,
In(Vi) it is to point to point ViSet, Out (Vi) it is point ViThe set of the point of sensing;
Step S223:Arbitrary initial weight value is specified to the point in keyword figure, and is iterated to calculate until weight convergence,
Think iteration convergence when difference of the weighted value of every bit in keyword figure between iteration twice is both less than 0.0001, and it is defeated
Go out the weighted value of now each word;
Step S3:Commodity are classified according to the type of merchandise defined in national standard and regulation;
The step S3 comprises the following steps:
S31:Entitative concept Entity is established, assigns its title, numerical value, related entities list, list of types, activation rule
Six attributes of list and degree of risk;
S32:For commodity title to be sorted, corresponding Entity is created, extracts WS (V in commodity titlei) most
Three big nominal words are added in the list of types of the Entity as the possibility type of commodity, meanwhile, according to semanteme
Dictionary HowNet obtains the parent concept of these three nouns, and they are also added in list of types, " the business in this corresponding diagram 2
Product Extended-type " step;
S33:Entity corresponding to commodity is added into Drools inference machines, if the classifying rules described in triggering S11, business
Category type is distinguished, if commodity do not trigger any classifying rules, determines its type in the following way:
Calculate the mutual information MI between word w and w':
Wherein p (w, w ') is the ratio shared in all sentences of the sentence containing word w and word w' in corpus, and p (w) is
Sentence containing word w ratio shared in all sentences;
Define the word degree of correlation
Wherein l is the word length weighed with number of words, and S is the set of all sentences in corpus, when two words are complete
When identical, its degree of correlation is calculated according to situation I, and when two words include different individual characters, its degree of correlation is counted according to situation II
Calculate
One commodity title T and class declaration C degree of correlation R (T, C) is calculated as follows:
WhereinF (w, d) is the number that word w occurs in document d,
D is the set that all documents are formed, and it is all national standards, policy, specification and nearest 200000 commodity text informations to take D here
The degree of correlation for calculating a certain commodity title and all types definition document is passed through in the set of composition, it may be determined that degree of correlation highest text
Type corresponding to shelves is the affiliated type of the commodity;
Step S4:Commercial quality risk is identified;
The step S4 comprises the following steps:
S41:The cross-border merchandise news that n bars not yet carry out risk identification is read in from database, n is taken as 50 per thread;
S42:By n bars merchandise news according to step S21 processing, when performing step S22, from prior operating procedure
Title term weighing is directly inquired in the term weighing list that S22 is obtained to accelerate system processing speed, so-called operation in advance
Step S22 refers to prefetch substantial amounts of, covering types of merchandize as more as possible merchandise newss in database, and S21 and S22 is performed with it
Obtain term weighing, and in these " word-weight " information deposit tables and internal memory will be read in, system is daily using newest
200000 merchandise news operating procedure S22 simultaneously obtain new " word-weight " list;
S43:Merchandise news of the n bars by step S2 processing is sent into step S3 to perform, if commodity can trigger classifying rules,
Then inference engine Drools can go out Risk Results according to formula rule induction automatically, if commodity can not trigger classifying rules,
After step S33 is performed, judged according to threshold θ=0.5, if the maximum similarity of commodity and all categories is more than θ, selected
The reasoning that the classification with maximum similarity carries out follow-up formula rule as merchandise classification is selected, if maximum similarity is less than θ,
Commodity are not classified, only judged according to block rule whether the commodity are prohibited to enter the territory;
S44:If the related entities list of a commodity is sky, WS (V in the commodity keyword sequence are takeni) maximum 5
Individual word forms set A, pair and the commodity belong to each same category of history commodity, equally take WS (Vi) 5 maximum words form
Set B, calculate the Jaccard similarities between A and B:
When the maximum Jaccard similarities for inputting commodity and history commodity are more than 0.5, by the non-of similarity maximum commodity
The related entities list of empty formula information injection input commodity, as the formula information of input commodity, and will impart formula letter
The input commodity of breath are re-fed into risk inference engine and made inferences;
Step S5:Visualization is carried out to merchandise risk recognition result to show;
The step S5 comprises the following steps:
S51:According to the time of acquisition first of commodity, brand, the place of production, sales platform, classification and shop title to risky
Commodity carry out quantity statistics;
S52:Using the time as transverse axis, risk commodity amount is the longitudinal axis, to different brands, the place of production, sales platform, classification and shop
The commodity of paving are drawn, including line chart and histogram;
S53:Selected certain time period, exists according to different brands, the place of production, sales platform, classification and the risk in shop commodity
Shared proportion draws pie chart in overall risk commodity amount;
S54:Using the place of production of cross-border commodity as foundation, the marked product importer on world map, using importer capital as
The center of circle, the risk commodity amount that the state is found is that radius draws circle, and is illustrated in dynamic effect in WEB page.
Claims (1)
1. a kind of cross-border electric business commercial quality risk automatic identifying method, it is characterised in that comprise the following steps:
Step S1:Knowledge acquisition, the laws and regulations related to cross-border electric business, national standard are converted into regular pattern composite knowledge;
The step S1 comprises the following steps:
S11:Define four kinds of risk rules and its corresponding syntactic structure, respectively classifying rules, parent rule, formula rule and
Block rule;Using the classifying rules grammatical form that BNF form defines as:
CLASSIFICATION_RULE::=" IF merchandise newss include keyword " argument { ", " argument } [" and not
Containing keyword, " keyword { ", " keyword }] " THEN commodity belong to classification " keyword
Parent rule syntax form is:
FATHER_CLASS::=" the IF type of merchandises are " keyword " THEN commodity fall within type " keyword
Being formulated rule syntax form is:
INGREDIENT_RULE_LIMIT::=" IF merchandise classifications are " keyword " and commodity " keyword (" being more than " | "
Less than ") number " THEN commodity it is risky "
INGREDIENT_RULE_RANGE::=" IF merchandise classifications for " keyword " and commodity " keyword (" between in " | "
Outside in ") number "-" number " THEN commodity it is risky "
The grammatical form of block rule is:
FORBIDDEN_RULE::=" IF merchandise newss include keyword " argument { ", " argument } [" and without key
Word " keyword { ", " keyword }] " THEN commodity are prohibited immigration "
Wherein:
argument::=keyword { " | " keyword }
Keyword and number is respectively character string and numeral, is filled in by user according to the clause of regulation, standard
S12:The regular text of user's input is parsed, is translated into the computer code for meeting Drools standards;
Step S2:Commodity title is parsed;
The step S2 comprises the following steps:
S21:Commodity title is segmented;
The step S21 is specific as follows:
Step S211:The word in semantic dictionary HowNet is traveled through, if it is appeared in commodity title, the word is added to
In temporary table;
Step S212:The word in temporary table is traveled through, if it is included by another word in list, this is deleted and is included
Word;
S22:Weight is assigned to commodity title word;
The step S22 is specific as follows:
Step S221:Keyword figure G=(V, E) is built, wherein V is set of node, is made up of the S21 word segmentation results generated, then
The side E between cooccurrence relation construction any two points based on word in commodity title, only when they are corresponding between two nodes
Word side be present during co-occurrence in same commodity title;
Step S222:Using TextRank algorithm according to equation below calculate node ViWeight WS (Vi):
<mrow>
<mi>W</mi>
<mi>S</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>V</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>-</mo>
<mi>d</mi>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mi>d</mi>
<mo>.</mo>
<munder>
<mo>&Sigma;</mo>
<mrow>
<msub>
<mi>V</mi>
<mi>j</mi>
</msub>
<mo>&Element;</mo>
<mi>I</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>V</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</munder>
<mfrac>
<msub>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mrow>
<msub>
<mi>&Sigma;</mi>
<mrow>
<msub>
<mi>V</mi>
<mi>k</mi>
</msub>
<mo>&Element;</mo>
<mi>O</mi>
<mi>u</mi>
<mi>t</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>V</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</msub>
<msub>
<mi>w</mi>
<mrow>
<mi>j</mi>
<mi>k</mi>
</mrow>
</msub>
</mrow>
</mfrac>
<mi>W</mi>
<mi>S</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>V</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
Wherein, d is damped coefficient, value 0.85, represents the probability that a certain specified point from keyword figure points to other arbitrfary points,
wijFor wantonly 2 points V in keyword figurei, VjBetween side weight, make all side rights refetch 1, for a given point Vi, In
(Vi) it is to point to point ViSet, Out (Vi) it is point ViThe set of the point of sensing;
Step S223:Arbitrary initial weight value is specified to the point in keyword figure, and is iterated to calculate until weight convergence, works as pass
Think iteration convergence when difference of the weighted value of every bit between iteration twice is both less than 0.0001 in keyword figure, and export this
When each word weighted value;
Step S3:Commodity are classified according to the type of merchandise defined in national standard and regulation;
The step S3 comprises the following steps:
S31:Entitative concept Entity is established, assigns its title, numerical value, related entities list, list of types, activation list of rules
With six attributes of degree of risk;
S32:For commodity title to be sorted, corresponding Entity is created, extracts WS (V in commodity titlei) maximum three
Individual nominal word is added in the list of types of the Entity as the possibility type of commodity, meanwhile, according to semantic dictionary
HowNet obtains the parent concept of these three nouns, and they are also added in list of types;
S33:Entity corresponding to commodity is added into Drools inference machines, if the classifying rules described in triggering S11, commodity class
Type is distinguished, if commodity do not trigger any classifying rules, determines its type in the following way:
Calculate the mutual information MI between word w and w':
<mrow>
<mi>M</mi>
<mi>I</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>,</mo>
<msup>
<mi>w</mi>
<mo>&prime;</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>log</mi>
<mfrac>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>,</mo>
<msup>
<mi>w</mi>
<mo>&prime;</mo>
</msup>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mo>&prime;</mo>
</msup>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
Wherein p (w, w ') is the ratio shared in all sentences of the sentence containing word w and word w' in corpus, p (w) be containing
Word w sentence ratio shared in all sentences;
Define the word degree of correlation
Wherein l is the word length weighed with number of words, and S is the set of all sentences in corpus, when two words are identical
When, its degree of correlation calculates according to situation I, and when two words include different individual characters, its degree of correlation calculates one according to situation II
Individual commodity title T and class declaration C degree of correlation R (T, C) is calculated as follows:
<mrow>
<mi>R</mi>
<mrow>
<mo>(</mo>
<mi>T</mi>
<mo>,</mo>
<mi>C</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<mrow>
<mo>(</mo>
<mfrac>
<mrow>
<msub>
<mi>&Sigma;</mi>
<mrow>
<mi>w</mi>
<mo>&Element;</mo>
<mi>T</mi>
</mrow>
</msub>
<munder>
<mrow>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
</mrow>
<mrow>
<msup>
<mi>w</mi>
<mo>&prime;</mo>
</msup>
<mo>&Element;</mo>
<mi>C</mi>
</mrow>
</munder>
<mi>R</mi>
<mi>W</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>,</mo>
<msup>
<mi>w</mi>
<mo>&prime;</mo>
</msup>
<mo>)</mo>
</mrow>
<mi>W</mi>
<mi>S</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msub>
<mi>&Sigma;</mi>
<mrow>
<mi>w</mi>
<mo>&Element;</mo>
<mi>T</mi>
</mrow>
</msub>
<mi>W</mi>
<mi>S</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>+</mo>
<mfrac>
<mrow>
<msub>
<mi>&Sigma;</mi>
<mrow>
<msup>
<mi>w</mi>
<mo>&prime;</mo>
</msup>
<mo>&Element;</mo>
<mi>C</mi>
</mrow>
</msub>
<munder>
<mrow>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
</mrow>
<mrow>
<mi>w</mi>
<mo>&Element;</mo>
<mi>T</mi>
</mrow>
</munder>
<mi>R</mi>
<mi>W</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>,</mo>
<msup>
<mi>w</mi>
<mo>&prime;</mo>
</msup>
<mo>)</mo>
</mrow>
<mi>T</mi>
<mi>F</mi>
<mi>I</mi>
<mi>D</mi>
<mi>F</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mo>&prime;</mo>
</msup>
<mo>,</mo>
<mi>C</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msub>
<mi>&Sigma;</mi>
<mrow>
<msup>
<mi>w</mi>
<mo>&prime;</mo>
</msup>
<mo>&Element;</mo>
<mi>C</mi>
</mrow>
</msub>
<mi>T</mi>
<mi>F</mi>
<mi>I</mi>
<mi>D</mi>
<mi>F</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mo>&prime;</mo>
</msup>
<mo>,</mo>
<mi>C</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
</mrow>
WhereinF (w, d) is the number that word w occurs in document d, and D is institute
There is the set that document is formed
By the degree of correlation for calculating a certain commodity title and all types definition document, it may be determined that degree of correlation highest document institute is right
The type answered is the affiliated type of the commodity;
Step S4:Commercial quality risk is identified;
The step S4 comprises the following steps:
S41:The cross-border merchandise news that n bars not yet carry out risk identification is read in from database, n is taken as 50 per thread;
S42:By n bars merchandise news according to step S21 processing, when performing step S22, obtained from prior operating procedure S22
To term weighing list in directly inquire title term weighing to accelerate system processing speed, so-called prior operating procedure
S22 refers to prefetch substantial amounts of, covering types of merchandize as more as possible merchandise newss in database, performs S21 and S22 with it and obtains
Term weighing, and in these " word-weight " information deposit tables and internal memory will be read in, if system every the set time with newest
Dry bar merchandise news operating procedure S22 simultaneously obtains new " word-weight " list;
S43:Merchandise news of the n bars by step S2 processing is sent into step S3 to perform, if commodity can trigger classifying rules, pushed away
Reason engine Drools can go out Risk Results according to formula rule induction automatically, if commodity can not trigger classifying rules, hold
After row step S33, judged according to threshold θ=0.5, if the maximum similarity of commodity and all categories is more than θ, selection tool
The classification for having maximum similarity carries out the reasoning of follow-up formula rule as merchandise classification, not right if maximum similarity is less than θ
Commodity are classified, and are only judged according to block rule whether the commodity are prohibited to enter the territory;
S44:If the related entities list of a commodity is sky, WS (V in the commodity keyword sequence are takeni) maximum 5 word structures
Into set A, pair belong to each same category of history commodity with the commodity, equally take WS (Vi) 5 maximum words form set B,
Calculate the Jaccard similarities between A and B:
<mrow>
<mi>J</mi>
<mrow>
<mo>(</mo>
<mi>A</mi>
<mo>,</mo>
<mi>B</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mo>|</mo>
<mi>A</mi>
<mo>&cap;</mo>
<mi>B</mi>
<mo>|</mo>
</mrow>
<mrow>
<mo>|</mo>
<mi>A</mi>
<mo>&cup;</mo>
<mi>B</mi>
<mo>|</mo>
</mrow>
</mfrac>
</mrow>
When the maximum Jaccard similarities for inputting commodity and history commodity are more than 0.5, the non-NULL of similarity maximum commodity is matched somebody with somebody
The related entities list of square information injection input commodity, as the formula information of input commodity, and will impart formula information
Input commodity are re-fed into risk inference engine and made inferences;
Step S5:Visualization is carried out to merchandise risk recognition result to show;
The step S5 comprises the following steps:
S51:Time, brand, the place of production, sales platform, classification and shop title are obtained first to risky business according to commodity
Product carry out quantity statistics;
S52:Using the time as transverse axis, risk commodity amount is the longitudinal axis, to different brands, the place of production, sales platform, classification and shop
Commodity are drawn, including line chart and histogram;
S53:Selected certain time period, according to different brands, the place of production, sales platform, classification and the risk in shop commodity in total wind
Shared proportion draws pie chart in dangerous commodity amount;
S54:Using the place of production of cross-border commodity as foundation, the marked product importer on world map, using importer capital as the center of circle,
The risk commodity amount that the state is found is that radius draws circle, and is illustrated in dynamic effect in WEB page.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711099313.7A CN107886240B (en) | 2017-11-09 | 2017-11-09 | Rule-based cross-border e-commerce commodity quality risk identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711099313.7A CN107886240B (en) | 2017-11-09 | 2017-11-09 | Rule-based cross-border e-commerce commodity quality risk identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107886240A true CN107886240A (en) | 2018-04-06 |
CN107886240B CN107886240B (en) | 2021-09-28 |
Family
ID=61779879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711099313.7A Active CN107886240B (en) | 2017-11-09 | 2017-11-09 | Rule-based cross-border e-commerce commodity quality risk identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107886240B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110189015A (en) * | 2019-05-24 | 2019-08-30 | 复旦大学 | Risk evaluating system towards entry and exit commodity |
CN111241288A (en) * | 2020-01-17 | 2020-06-05 | 烟台海颐软件股份有限公司 | Emergency sensing system of large centralized power customer service center and construction method |
CN112101774A (en) * | 2020-09-11 | 2020-12-18 | 复旦大学 | Cross-border commodity-oriented associated risk identification system |
CN112365165A (en) * | 2020-11-13 | 2021-02-12 | 广东卓志跨境电商供应链服务有限公司 | Cross-border e-commerce wind control management method and system |
CN112365166A (en) * | 2020-11-13 | 2021-02-12 | 广东卓志跨境电商供应链服务有限公司 | Cross-border e-commerce commodity filing risk control method and related device |
CN114185869A (en) * | 2021-12-03 | 2022-03-15 | 四川新网银行股份有限公司 | Data model auditing method based on data standard |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101782998A (en) * | 2009-01-20 | 2010-07-21 | 复旦大学 | Intelligent judging method for illegal on-line product information and system |
CN102663025A (en) * | 2012-03-22 | 2012-09-12 | 浙江盘石信息技术有限公司 | Illegal online commodity detection method |
CN104063523A (en) * | 2014-07-21 | 2014-09-24 | 焦点科技股份有限公司 | E-commerce search scoring and ranking method and system |
CN104321794A (en) * | 2013-05-02 | 2015-01-28 | 邓白氏公司 | A system and method using multi-dimensional rating to determine an entity's future commercial viability |
CN104794625A (en) * | 2015-04-28 | 2015-07-22 | 酷悠悠科技(深圳)有限公司 | Operation method and system of cross-border e-commerce website |
CN105427050A (en) * | 2015-12-02 | 2016-03-23 | 常州大学 | Trust model based food quality evaluation method |
CN105677622A (en) * | 2016-03-11 | 2016-06-15 | 郑州师范学院 | Automatic big data analysis report generating system |
CN105812394A (en) * | 2016-05-24 | 2016-07-27 | 王四春 | Novel application of cloud computing to cross-border electronic commerce |
CN105844478A (en) * | 2016-03-17 | 2016-08-10 | 深圳市检验检疫科学研究院 | Product quality sampling method used for cross-border electronic commerce |
CN105893350A (en) * | 2016-03-31 | 2016-08-24 | 重庆大学 | Evaluating method and system for text comment quality in electronic commerce |
CN106886934A (en) * | 2016-12-30 | 2017-06-23 | 北京三快在线科技有限公司 | Method, system and apparatus for determining merchant categories |
-
2017
- 2017-11-09 CN CN201711099313.7A patent/CN107886240B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101782998A (en) * | 2009-01-20 | 2010-07-21 | 复旦大学 | Intelligent judging method for illegal on-line product information and system |
CN102663025A (en) * | 2012-03-22 | 2012-09-12 | 浙江盘石信息技术有限公司 | Illegal online commodity detection method |
CN104321794A (en) * | 2013-05-02 | 2015-01-28 | 邓白氏公司 | A system and method using multi-dimensional rating to determine an entity's future commercial viability |
CN104063523A (en) * | 2014-07-21 | 2014-09-24 | 焦点科技股份有限公司 | E-commerce search scoring and ranking method and system |
CN104794625A (en) * | 2015-04-28 | 2015-07-22 | 酷悠悠科技(深圳)有限公司 | Operation method and system of cross-border e-commerce website |
CN105427050A (en) * | 2015-12-02 | 2016-03-23 | 常州大学 | Trust model based food quality evaluation method |
CN105677622A (en) * | 2016-03-11 | 2016-06-15 | 郑州师范学院 | Automatic big data analysis report generating system |
CN105844478A (en) * | 2016-03-17 | 2016-08-10 | 深圳市检验检疫科学研究院 | Product quality sampling method used for cross-border electronic commerce |
CN105893350A (en) * | 2016-03-31 | 2016-08-24 | 重庆大学 | Evaluating method and system for text comment quality in electronic commerce |
CN105812394A (en) * | 2016-05-24 | 2016-07-27 | 王四春 | Novel application of cloud computing to cross-border electronic commerce |
CN106886934A (en) * | 2016-12-30 | 2017-06-23 | 北京三快在线科技有限公司 | Method, system and apparatus for determining merchant categories |
Non-Patent Citations (1)
Title |
---|
王卓伦等: "网络零售物流企业信用评价指标体系与风险预警 ", 《福建电脑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110189015A (en) * | 2019-05-24 | 2019-08-30 | 复旦大学 | Risk evaluating system towards entry and exit commodity |
CN111241288A (en) * | 2020-01-17 | 2020-06-05 | 烟台海颐软件股份有限公司 | Emergency sensing system of large centralized power customer service center and construction method |
CN112101774A (en) * | 2020-09-11 | 2020-12-18 | 复旦大学 | Cross-border commodity-oriented associated risk identification system |
CN112365165A (en) * | 2020-11-13 | 2021-02-12 | 广东卓志跨境电商供应链服务有限公司 | Cross-border e-commerce wind control management method and system |
CN112365166A (en) * | 2020-11-13 | 2021-02-12 | 广东卓志跨境电商供应链服务有限公司 | Cross-border e-commerce commodity filing risk control method and related device |
CN114185869A (en) * | 2021-12-03 | 2022-03-15 | 四川新网银行股份有限公司 | Data model auditing method based on data standard |
Also Published As
Publication number | Publication date |
---|---|
CN107886240B (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107886240A (en) | A kind of rule-based cross-border electric business commercial quality Risk Identification Method | |
Sharif et al. | Sentiment analysis of Bengali texts on online restaurant reviews using multinomial Naïve Bayes | |
Lopez Barbosa et al. | Evaluating hotels rating prediction based on sentiment analysis services | |
CN103838789A (en) | Text similarity computing method | |
CN111767725A (en) | Data processing method and device based on emotion polarity analysis model | |
CN109345272A (en) | One kind is based on the markovian shop credit risk forecast method of improvement | |
Vamshi et al. | Topic model based opinion mining and sentiment analysis | |
Dewang et al. | Identification of fake reviews using new set of lexical and syntactic features | |
CN112905739A (en) | False comment detection model training method, detection method and electronic equipment | |
CN106407195A (en) | Method and system for eliminating duplication of webpage | |
CN111680131A (en) | Document clustering method and system based on semantics and computer equipment | |
Devasia et al. | Feature extracted sentiment analysis of customer product reviews | |
Velmurugan et al. | Mining implicit and explicit rules for customer data using natural language processing and apriori algorithm | |
Sheikhattar et al. | A thematic analysis–based model for identifying the impacts of natural crises on a supply chain for service integrity: A text analysis approach | |
Anggara et al. | Analysis of Netizen Comments Sentiment on Public Official Statements on Instagram Social Media Accounts | |
Rubtsova et al. | Aspect extraction from reviews using conditional random fields | |
Muangon et al. | A lexiconizing framework of feature-based opinion mining in tourism industry | |
Raj et al. | Automated Cyberstalking Classification using Social Media | |
Juliane | Implementation of Naive Bayes Algorithm on Sentiment Analysis Application | |
Pérez-Santiago et al. | We Will Know Them by Their Style: Fake News Detection Based on Masked N-Grams | |
Regino et al. | QART: A Framework to Transform Natural Language Questions and Answers into RDF Triples. | |
Kurematsu et al. | DODDLE II: A domain ontology development environment using a MRD and text corpus | |
Pain | Harmonized System Code Classification Using Transfer Learning with Pre-Trained Weights | |
Watjanapron et al. | USING DEE PLEARNING MODEL WITH MULTIPLE INPUTS FOR THAI DEFAMATORY TEXT CLASSIFICATION ON PUBLIC FACEBOOK COMMENTS | |
KR102663632B1 (en) | Device and method for artwork trend data prediction using artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |