The content of the invention
This application provides one kind classification expression formula generation method and device, to improve to the efficiency of language material screening and accurate
Degree.
In a first aspect, this application provides one kind classification expression formula generation method, methods described includes:Obtain at least two
Classification, each classification include multiple language materials;According to each language material included of classifying to each classification progress algorithm excavation
Generation corresponds to the frequent mode set of the classification, and each frequent mode set includes at least one frequent mode, often
The individual frequent mode includes at least one of concept or key element, and the concept or key element can be by each language materials
Parsing obtains;Compare in the frequent mode set under all classification, the concept and/or key element of each frequent mode, i.e., composition is each
The species of the member of individual frequent mode, identical frequent mode is excluded, retain at least one concept or key element and other frequent moulds
The different frequent mode of formula, and candidate's frequent mode set is generated, candidate's frequent mode set includes at least one time
Select frequent mode;By each described point of the concept at least one candidate's frequent mode and/or factor combination computing generation
The classification expression formula of class.
The method that present aspect provides, excavate to form frequent mode set by carrying out algorithm to each classification, and to all
The frequent mode set of generation be compared and screen formation classification expression formula, this method can discharge classification in repeat it is frequent
Pattern, and correspond to where it classification expression formula classified for the generation of any language material, it is a kind of automatic implementation process, avoids people
Work is screened and mining process, improves language material screening efficiency and the degree of accuracy.
With reference in a first aspect, in a kind of concrete implementation of first aspect, each corresponding item collection of the language material is described
Algorithm includes Apriori algorithm, described to carry out algorithm excavation generation pair to each classification according to the language material that each classification includes
Should in the frequent mode set of the classification, including:Obtain multiple item collections corresponding to multiple language materials under each classification;Will
The multiple item collection generates multiple frequent mode set by the Apriori algorithm computing, and each classification is corresponding one
Frequent mode set.
With reference in a first aspect, in first aspect another kind concrete implementation, the multiple frequent mode set of generation, wrap
Include:Screen more than binary or binary and only include a kind of frequent mode in concept and key element;Exclude described and include concept
Frequent mode more than binary and binary only comprising key element composition, retain unitary frequent mode and comprising concept and key element
Frequent mode more than binary and binary, and generate the multiple frequent mode set using the frequent mode retained.
This implementation eliminates the frequent mode for more than binary and binary only including concept and key element so that point of generation
Class expression formula includes two kinds of concept and key element, so as to which the language material that will more accurately screen was divided under corresponding classify, side
Just business personnel is counted and arranged to a large amount of language materials in database.
With reference in a first aspect, in first aspect another concrete implementation, by least one candidate's frequent mode
In concept and/or each classification of factor combination computing generation classification expression formula, including:If each frequent mode
In a concept or a key element be unitary, count the number of the member included in candidate's frequent mode;Judge institute
State whether candidate's frequent mode is made up of concept more than binary or binary and/or key element;If it is, by the candidate frequently
The computing that all concepts and/or key element in numerous pattern carry out logical AND generates the classification expression formula;If candidate's frequency
It is made up of, then is patrolled candidate's frequent mode of unitary and built classification expression formula a metanotion or key element in numerous pattern
Collect non-computing and generate the classification expression formula.
In this implementation, different classification charts is generated using the concept and/or key element of screening, and logic connective
Up to formula, such as the classification expression formula with logical AND and logic NOT, so as to improve the degree of accuracy to language material division.
With reference in a first aspect, in first aspect another concrete implementation, wrapped in each frequent mode of the comparison
Include the classification expression formula previously generated under each classification;Compare the concept and/or key element of each frequent mode, and generate time
The step of frequency-selecting numerous set of modes, including:Whether identical compare each frequent mode excavated by algorithm, and described in comparison
Whether each frequent mode excavated is identical with the combination of concept and/or key element in the classification expression formula previously generated;Such as
Fruit differs, then using the frequent mode as candidate's frequent mode, and generates candidate's frequent mode set.
In this implementation, by the way that the individual frequent mode of generation compared with the frequent mode previously generated, is rejected
Identical, retains different, has saved memory space, and conveniently counts all classification expression formulas under a kind of classification, compared with
Artificial screening picks and improves classification effectiveness again, saves the classification time.
With reference in a first aspect, in first aspect another concrete implementation, if the frequent mode comprising binary, and according to
Logic and operation, then it is described classification expression formula be expressed as:C_X+e_Y or e_X+c_Y;If the frequent mode comprising binary, and press
According to logical not operation, then the classification expression formula is expressed as:C_X-e_Y or e_X-c_Y;If the frequent mode comprising ternary, and
Logically with computing, then it is described classification expression formula be expressed as:c_X+e_Y+c_Z;If the frequent mode comprising ternary, and according to
Logical AND and logical not operation, then it is described classification expression formula be expressed as:c_X+e_Y-c_Z;Wherein, above-mentioned each classification expression formula
In, c represents concept, and e represents key element, and X, Y and Z represent concept name or key element name, is to pass through a variety of descriptions under identical concept
Normalized generation, "+" represents logic and operation, and "-" represents logical not operation.In addition, the shunting expression formula may be used also
With including other oeprators, such as " | " represents that the condition of left and right only meets one, or can be changed with " () " expression
Become priority of matching etc., the application is not limited this.
Second aspect, present invention also provides one kind classification expression formula generating means, for realizing described in first aspect
Classification expression formula generation method, specifically, the device includes:Acquiring unit and processing unit, furthermore it is also possible to single including sending
Member and memory cell etc.,
Acquiring unit, for obtaining at least two classification, each classification includes multiple language materials;
The processing unit, generation pair is excavated for carrying out algorithm to each classification according to the language material that each classification includes
Should be in the frequent mode set of the classification, each frequent mode set includes at least one frequent mode, each described
Frequent mode includes at least one of concept or key element, and the concept or key element to each language material parsing by obtaining
;
The processing unit, it is additionally operable in the frequent mode set under all classification of comparison, the concept of each frequent mode
And/or key element, identical frequent mode is excluded, retains frequent different from other frequent modes of at least one concept or key element
Pattern, and candidate's frequent mode set is generated, candidate's frequent mode set includes at least one candidate's frequent mode;With
And the concept at least one candidate's frequent mode and/or factor combination computing are generated to the classification of each classification
Expression formula.
With reference to second aspect, in a kind of concrete implementation of second aspect, each corresponding item collection of the language material is described
Algorithm includes Apriori algorithm, the acquiring unit, specifically for obtaining corresponding to multiple language materials under each classification
Multiple item collections;The processing unit, it is described specifically for being passed through the multiple item collection according to each language material included of classifying
Apriori algorithm computing generates multiple frequent mode set, each corresponding frequent mode set of classification.
With reference to second aspect, in second aspect another kind concrete implementation, the processing unit, specifically for screening two
More than member or binary and only include a kind of frequent mode in concept and key element;If it is, excluding described includes concept
Frequent mode more than binary and binary only comprising key element composition, retain unitary frequent mode and comprising concept and key element
Frequent mode more than binary and binary, and generate the multiple frequent mode set using the frequent mode retained.
With reference to second aspect, in second aspect another concrete implementation, the processing unit is each specifically for setting
A concept or a key element in the frequent mode are unitary, count member included in candidate's frequent mode
Number;Judge whether candidate's frequent mode is made up of concept more than binary or binary and/or key element;If it is,
The computing that all concepts and/or key element in candidate's frequent mode are carried out to logical AND generates the classification expression formula;Such as
It is made up of in candidate's frequent mode described in fruit a metanotion or key element, then by candidate's frequent mode of unitary and built classification
The computing that expression formula carries out logic NOT generates the classification expression formula.
With reference to second aspect, in second aspect another concrete implementation, wrapped in each frequent mode of the comparison
Include the classification expression formula previously generated under each classification;The processing unit, specifically for comparing by each of algorithm excavation
Whether frequent mode is identical, and in each frequent mode of the excavation and the classification expression formula previously generated
Whether the combination of concept and/or key element is identical;If differed, using the frequent mode as candidate's frequent mode, and
Generate candidate's frequent mode set.
The third aspect, present invention also provides a kind of grader, the grader includes transceiver, processor and storage
The parts such as device, for realizing the classification expression formula generation method described in above-mentioned first aspect, further, the place in the grader
The program or instruction that reason device can be stored by performing in memory are realized.
Fourth aspect, present invention also provides a kind of computer-readable storage medium, the computer-readable storage medium can be stored with journey
Sequence, the program can realize the part or all of step in a kind of classification expression formula generation method that the application provides when performing.
The method and apparatus that the application provides, the language material generation under each classification is corresponded to the frequency of classification using mining algorithm
Numerous set of modes, the frequent mode in comparing and screening each set, candidate's frequent mode set is generated, finally these are waited
Concept and/or key element in the numerous set of modes of frequency-selecting are combined computing generation classification expression formula, realize the automatic digging to language material
Pick and classification, solve and taken time and effort during artificial screening language material and generation classification expression formula, the problem of efficiency is low.
Embodiment
In order that those skilled in the art more fully understand the technical scheme in the embodiment of the present invention, and make of the invention real
Apply the above-mentioned purpose of example, feature and advantage can be more obvious understandable, below in conjunction with the accompanying drawings to the technology in the embodiment of the present invention
Scheme is described in further detail.
Before illustrating to the technical scheme of the embodiment of the present invention, application scenarios and phase first to the embodiment of the present invention
Concept is closed to illustrate and introduce.
The application can be applied to field of artificial intelligence, and a large amount of language materials stored in database are screened and drawn
Point, the classification expression formula of corresponding classification is generated, to facilitate technical staff to sort out and count language material, saves artificial screening language
The material time used, and the accuracy and efficiency of language material division is improved, and reach auxiliary structure, extension and Optimum Classification model
Purpose.
Technical scheme described herein can realize that the platform can be a kind of based on the non-of concept by platform
Structured text big data analysis mining equipment, for according to the mining rule in semantic model, to the language material of input (with text
Form) carry out analysis mining.The concept and key element of language material to be excavated are extracted, and is reached according to these concepts and key element generation classification chart
Formula, so that the language material can be divided into the form of expression formula of classifying among a specific classification, auxiliary is established and optimization
Disaggregated model.
Wherein, the language material include be platform or intelligent robot record caller client language and characters content.Specifically
It can express and show by way of text.For example, a language material is " I wants to handle the credit card of an XX bank.", or
Person " I in the credit card for the your bank handled before, preferential classification is wrong, so I wants to change a credit card.", in addition,
Also include the language material of user's evaluation, such as " my other day has handled the credit card of a your bank, examination & verification pass through after there is presently no
The card mailed is received, speed of applying for card is too slow, and I, which worries, uses " etc..
Because different language material contents is different, in order to which thousands of language material is counted and concluded, it is necessary to according to certain
Kind feature is classified to different language materials.Specifically, language material first can be split into several concepts and key element.
Further, concept, " c " can be used to represent, refers to the word sense information of the vocabulary in each text, and the language between vocabulary
Adopted relevance.One " concept " can represent one group of vocabulary, can also represent in short.Concept is the description to object, reaction
The abstract expression of the essential attribute of its described object.Such as time, place, mood, evaluation etc..Further, linguistics is general
Read, such as " negating ", " query ";Concept of time, such as " my god ", " 2017 ";Place concept " Beijing ", " locality ";Action is retouched
The property stated concept, such as " unfinished " etc..Wherein, concept is typically unrelated with specific business, is conventional language concept, different
It can be multiplexed in field.Accordingly, conceptional tree (English:Conception Tree):The unrelated conventional complexity of tree-shaped organization business
Concept.The node of tree is concept name, and concept value is a word or type mode, is language performance corresponding to concept.Each tree is all
It is construed as a semantic model.
Wherein, key element value supports two types:Text-type and mode-type.The text-type is in the form of vocabulary, phrase etc.
The text string of composition;Mode-type is the text string represented in the form of regularity, can be one section of text, or multiple vocabulary groups
Close;The simple operation of the vocabulary such as distance, position is supported, referring to shown in table 1 below.
Element name |
Key element value |
Element type |
Permanent amount |
Forever, { 0,3 } amount |
Pattern |
Permanent amount |
Volume forever |
Text |
Permanent amount |
Permanent amount |
Text |
Table 1
Key element, " e " can be used to represent, typically, attribute etc. relevant with specific business.Such as the entity related to business, including
" industrial and commercial bank ", " peony-card ";It is related to service attribute, such as " amount ", " minimum charge ";It is related to business action, for example " open
Card ", " clearing " etc..Key element also have it is related to field, the characteristics of can not being multiplexed in different fields.Accordingly, key element tree
(English:Element Tree):The related concept of tree-shaped organization business, is through commonly used object, instrument, attribute etc. in business
Concept;The node of tree is key element name, and key element name can correspond to multiple key element values.Key element value is language performance corresponding to key element, is
Word or type mode, referring to shown in table 1 below.
Concept name |
Concept value |
Concept type |
Environment difference is commented |
Haze |
Text |
Environment difference is commented |
Air, { 0,7 } are seriously polluted |
Pattern |
Environment difference is commented |
Environment, { 0,7 } is not good enough |
Pattern |
Table 2
Text mining is that valuable information is obtained from language material text.
Model optimization, refers to a large amount of language materials of batch quantity analysis, technology accuracy rate and recall rate, and renewal language material is iterated excellent
Change, the operation such as ensured sustained development model, modeling strategy and program.The process of the model optimization is establish classification expression formula one
Point, business personnel needs a large amount of language materials of artificial screening before structure, to language material split the combination of product concept and key element,
Pass through this combination of optimization method automatic mining again.
The method provided below the application is introduced, as shown in figure 1, a kind of classification provided for the embodiment of the present application
The flow chart of expression formula generation method, this method comprise the following steps:
Step 101:At least two classification are obtained, each classification includes multiple language materials, wherein, each language material pair
An item collection is answered, each item collection is made up of the concept after being parsed to the language material and/or key element.
The item collection means the set of item, including unitary item collection, binary item collection, ternary item collection etc..Each item collection bag
The species of member and the number of member are included, wherein, the species of the member includes concept and key element, and a concept or a key element are referred to as
Unitary, each item collection are made up of at least one member.If the relative support of an item collection meets predefined most ramuscule
Degree of holding threshold value, then the item collection can be described as frequent item set.The species of the frequent item set includes concept and/or key element, described frequent
The number of item centralized concept and/or key element is first number.
Wherein, the language material is converted into the process of item collection to be included:Each language material is split or normalized
Processing, extracts concept and/or key element that the language material includes, then by these concepts and/or key element composition corresponding to the language material
One item collection.Optionally, each item collection can be represented with " () ", for example, (setting, trading password).
Step 102:Algorithm excavation generation is carried out to each classification according to the language material that each classification includes and corresponds to this point
The frequent mode set of class, each frequent mode set include at least one frequent mode, each frequent mode
Include at least one of concept or key element, the concept or key element can be by obtaining to each language material parsing.
Frequent mode (English:Frequent pattern) represent to concentrate the pattern that frequently occurs in data, it is described
Pattern include some, subsequence, minor structure etc..
A kind of mode for generating frequent mode is that, using Apriori algorithm, the Apriori algorithm is that a kind of excavate is closed
Join the frequent item set algorithm of rule, its core concept be by the downward closing of candidate generation and plot detect two stages come
Mining Frequent Itemsets Based.
Further, the process of frequent mode set is generated using Apriori algorithm to be included:Obtain under any one classification
Multiple language materials corresponding to multiple item collections;The multiple item collection is generated by Apriori algorithm computing more under the classification
Individual frequent mode set.Other classification are generated into frequent mode set also through Apriori algorithm after the same method.Its
In, each corresponding frequent mode set of classification.
Optionally, during the frequent mode set under each classification of generation, in addition to:Discharge be entirely by concept or
The frequent mode more than binary or binary of key element composition.Specifically process includes:More than screening binary or binary and only include
A kind of frequent mode in concept and key element, exclude the only binary comprising concept and only comprising key element composition and binary with
On frequent mode, retain frequent mode more than unitary frequent mode and binary and binary comprising concept and key element, and profit
The frequent mode set is generated with the frequent mode of reservation.
This implementation eliminates the frequent mode for more than binary and binary only including concept and key element so that point of generation
Class expression formula includes two kinds of concept and key element, so as to which the language material that will more accurately screen was divided under corresponding classify, side
Just business personnel is counted and arranged to a large amount of language materials in database.
Step 103:Compare in the frequent mode set under all classification, the concept and/or key element of each frequent mode, arrange
Except identical frequent mode, retain at least one concept or the key element frequent mode different from other frequent modes, and generate
Candidate's frequent mode set, candidate's frequent mode set include at least one candidate's frequent mode.
Described " comparison " compares screening process including two layers, and one layer is, more each frequent mode is under same classification
No identical, exclusion has same concept and/or key element, and first number identical frequent mode, retains the frequent mode differed;Separately
One layer of screening is to intersect the frequent mode under more each classification, that is, compares each frequent mode under different classifications, reject phase
The different frequent mode with reservation.
Specifically, the concept and/or key element of each frequent mode, and generate the step of candidate's frequent mode set
Suddenly, including:Whether identical compare each frequent mode excavated by algorithm, and each frequent mode of the excavation
It is whether identical with the combination of concept and/or key element in the classification expression formula previously generated;, should if differed
Frequent mode generates candidate's frequent mode set as candidate's frequent mode.
Step 104:Concept at least one candidate's frequent mode and/or factor combination computing are generated into each institute
State the classification expression formula of classification.
Wherein, classification expression formula, is referred to as body expression formula:It is business rule corresponding to each classification or body class
Then, for represent it is each classification or body class corresponding to business rule standardization description, typically by resource (such as concept, will
Element) and operator combination form.The operator includes primary operator, and the primary operator has logical combination operations, such as logical AND
"+", logic or " | ", logic NOT "-", bracket priority " () " etc..Further, logical AND "+" represents that the condition of left and right must
It must simultaneously meet, i.e., all occur in language material;Logic or " | " have an item to meet as long as representing the condition of left and right;Logic
Non- "-" represents discharge, i.e. the condition on the right must discharge outside;Bracket " () " represents that the priority of matching can be changed;“#”
Represent that sentence limits, i.e., the condition of described restriction must occur in same sentence.
As shown in Fig. 2 specifically, step 104 includes:
Step 201:If the concept or a key element in each frequent mode are unitary, the candidate is counted
The number of member included in frequent mode.
Step 202:Judge whether candidate's frequent mode is by concept more than binary or binary and/or key element group
Into.
Step 203:If be made up of in candidate's frequent mode concept more than binary or binary and/or key element,
The computing that all concepts and/or key element in candidate's frequent mode are carried out to logical AND generates the classification expression formula.
For example, if candidate's frequent mode is { e_ opens card, e_ trading passwords }, then its classification expression formula under classifying is established
For:E_ opens card+e_ trading passwords, wherein, "+" represents logic and operation.
Step 204:It is if be made up of in candidate's frequent mode a metanotion or key element, the candidate of unitary is frequent
The computing that pattern carries out logic NOT with built classification expression formula generates the classification expression formula.
If for example, there are { e_ opens card, e_ trading passwords } and { c_ forgettings } in candidate's frequent mode, then by the frequency of the unitary
Numerous pattern { c_ forgettings } is excluded with logic NOT "-", then the classification expression formula generated is:E_ opens card+e_ trading passwords-c_ forgettings.
Optionally, the classification expression formula can generate following according to the number and operator of the member included in frequent mode
Different patterns:(1) if the frequent mode comprising binary, and logically with computing, then it is described classification expression formula be expressed as:c_
X+e_Y or e_X+c_Y;(2) if the frequent mode comprising binary, and logically inverse, then the classification expression formula represent
For:C_X-e_Y or e_X-c_Y;(3) if the frequent mode comprising ternary, and logically with computing, then the classification expression formula
It is expressed as:c_X+e_Y+c_Z;(4) if the frequent mode comprising ternary, and logically with and logical not operation, then described point
Class expression formula is expressed as:c_X+e_Y-c_Z.Wherein, in above-mentioned each classification expression formula, c represents concept, and e represents key element, X, Y and Z
Concept name or key element name are represented, is that "+" represents logic and operation, "-" by the way that a variety of descriptions normalization of identical concept is formed
Represent logical not operation.
The method that the present embodiment provides, excavate to form frequent mode set by carrying out algorithm to each classification, and to institute
The frequent mode set for having generation is compared and screened formation classification expression formula, and this method can discharge the frequency repeated in classification
Numerous pattern, and correspond to where it classification expression formula classified for the generation of any language material, it is a kind of automatic implementation process, avoids
Artificial screening and mining process, improve language material screening efficiency and the degree of accuracy, auxiliary structure, extension and Optimum Classification model.
In a specific embodiment, by taking the business that bank applies for card as an example, illustrate that the application generates classification chart and reached below
The method of formula.
For example, first is categorized as " setting password ", the process for the first classification generation classification expression formula includes:
Step 101:Obtain " setting password " this classification under a language material, such as language material for " ... I thinks setting one
Lower trading password." the concept value and key element value of this language material are identified, obtain concept " setting " (with " c_ setting " expression)
Under a concept value " set(once | individual)" match " setting " in language material;Key element " trading password " (uses " e_
Trading password " represent) under key element value " (transaction | consumption | credit | enchashment | pay) { 0,3 } password " matched in language material
" trading password ", wherein, multiple concept values can be included under each concept, multiple key element values can be included under each key element,
And then the normalized purpose of different expression ways for making identical concept/key element can be reached.As above-mentioned key element value " (transaction | consumption
| credit | enchashment | pay) { 0,3 } password " ' consumption password ', ' password of enchashment ', ' credit password ' etc. can also be matched
A variety of expression, these expression can be that " trading password " represents with a key element.So as to identify that what this language material included " sets
Put " and " trading password " two items, the two constitute item collections corresponding to this language material as (c_ is set, and e_ transaction is close
Code).
Other language materials can similarly draw corresponding item collection.
Step 102:After obtaining all item collections under the first classification, the frequency of this classification is excavated using apriori algorithms
Numerous pattern, that is, determine which (one/bis-/tri-) often while occur.Assuming that " setting password " this classification is dug by algorithm
Frequent mode after pick includes:{ e_ opens card, and c_ is set, e_ trading passwords }, { c_ is set, e_ trading passwords }, { c_ is activated, e_
Trading password }, { e_ opens card, e_ trading passwords }, { c_ activation }, { e_ opens card }, these frequent modes the first frequent mode of composition
Set.
Similarly, the frequent mode set of the second classification is obtained according to above-mentioned steps.Assuming that second is categorized as " resetting close
Code ", then the language material under " the replacement password " is carried out algorithm excavate to obtain corresponding to frequent mode include:{ c_ forgets, e_ transaction
Password }, { c_ is reset, e_ trading passwords }, { c_ is set, e_ trading passwords }, { c_ forgets, c_ modifications, e_ trading passwords }, { c_
Forget, these frequent modes form the second frequent mode set.
Step 103:Compare the frequent mode set under the first classification and the second classification, exclude the wherein frequent mould of identical
Formula, retain at least one concept or the key element frequent mode different from other frequent modes, and generate candidate's fuzzy frequent itemsets
Close.
Specifically include:First, for the frequent mode being made up of binary or ternary, by the frequent mode of each classification and
The key element of existing expression formula and/or association of ideas are compared under current class, exclude identical frequent mode.For example,
Under " setting password " this classification, the classification expression formula that has prestored:E_ opens card+c_ setting+e_ trading passwords, then by this
{ e_ opens card, and c_ is set, e_ trading passwords } in frequent mode set under one classification deletes, then retains remaining frequent mode
Have:
{ c_ is set, e_ trading passwords };
{ c_ is activated, e_ trading passwords };
{ e_ opens card, e_ trading passwords };
{ c_ activation };
{ e_ opens card }.
Then, the frequent mode set under the first classification is compared with the frequent mode set under the second classification, will
Identical frequent mode therein excludes from result set.{ c_ is set, e_ trading passwords } can be obtained as in two classification by contrast
Shared identical frequent mode, then this frequent mode is deleted from the result set under respective classification respectively.Wherein, the application
" identical " described in embodiment refers to that item member number is identical, and concept value or key element value also identical frequent mode.
Compare rear remaining frequent mode for candidate's frequent mode under each classification, it is as shown in table 3 below.
Candidate's frequent mode under first classification |
Candidate's frequent mode under second classification |
{ c_ is activated, e_ trading passwords } |
{ c_ forgets, e_ trading passwords } |
{ e_ opens card, e_ trading passwords } |
{ c_ is reset, e_ trading passwords } |
{ c_ activation } |
{ c_ forgets, c_ modifications, e_ trading passwords } |
{ e_ opens card } |
{ c_ forgettings } |
Table 3
Step 104:Concept at least one candidate's frequent mode and/or factor combination computing are generated into each institute
State the classification expression formula of classification.
Specific Principles include:If there is concept value and key element value more than binary or binary in candidate's frequent mode, use
Logic and operation generation classification expression formula.Such as candidate's frequent mode { e_ opens card, e_ trading passwords }, it is built into " setting password "
The classification expression formula of classification is:E_ opens card+e_ trading passwords.
If there is unitary item in candidate's frequent mode, this is excluded using logical not operation.Such as at " setting password "
Under classification, classification expression formula is completed:E_ opens card+e_ trading passwords, can be with order to preferably be distinguished with " replacement password " class
Collect the operation for carrying out logic NOT, the classification expression formula of generation in expression formula:E_ opens card+e_ trading passwords-c_ forgettings.
Identical item is excluded, retains exclusive item, according to the first classification of mentioned above principle generation and the body of the second classification
Expression formula is as shown in table 4.
Classification expression formula under " setting password " classification |
Classification expression formula under " replacement password " classification |
C_ activation+e_ trading passwords |
C_ forgetting+e_ trading passwords |
E_ opens card+e_ trading passwords |
C_ replacement+e_ trading passwords |
C_ activation+e_ trading passwords-c_ forgets |
C_ forgetting+c_ modification+e_ trading passwords |
E_ opens card+e_ trading passwords-c_ forgettings |
C_ forgetting+e_ trading passwords-c_ is activated |
|
C_ replacement+e_ trading passwords-c_ is activated |
|
C_ forgetting+c_ modification+e_ trading passwords-c_ is activated |
|
C_ forgetting+e_ trading passwords-e_ opens card |
|
C_ replacement+e_ trading passwords-e_ opens card |
|
C_ forgetting+c_ modification+e_ trading passwords-e_ opens card |
|
C_ forgetting+e_ trading password-c_ activation-e_ opens card |
|
C_ replacement+e_ trading password-c_ activation-e_ opens card |
|
C_ forgetting+c_ modification+e_ trading password-c_ activation-e_ opens card |
Table 4
In the present embodiment, with reference to correlation rule and mining algorithm, unitary, two are excavated out of training corpus under same classification
The frequent item set of member and three kinds of different dimensions of ternary.Due to will be it is recommended that the item of classification expression formula, in order to embody business and general
The contact of thought, during Mining Frequent Itemsets Based, there is provided condition, using the item of the frequent item set of binary as key element and association of ideas, three
At least one key element of item and a concept in the frequent item set of member.Other ineligible frequent item sets are deleted.
The frequent item set excavated forms the set of two kinds of frequent item sets.A kind of is the frequent episode excavated under each classification
Collection, the classifying rules expression formula of the corresponding classification of auxiliary extension.Another kind is excludes the set of the frequent item set of property, in other words,
It is not necessarily the classification under the second classifying rules if a frequent item set is the item of classification expression formula under the first classifying rules
The item of expression formula.In this case, by the non-operation of the execution logic in the generating process of the second classification expression formula, by this point
Class expression formula excludes.
This method combination conceptual rule disaggregated model and apriori algorithms, higher general of automatic mining and business association degree
Read or conceptual combinations, auxiliary build, extended, concept of optimization rule classification model.Both the degree of accuracy of grader had been ensure that, had been reduced again
It is artificial to expend, improve efficiency.
It is above-mentioned for realizing referring to Fig. 3, a kind of classification expression formula generating means provided for the embodiment of the present application, the device
Classification expression formula generation method described in embodiment Fig. 1 or Fig. 2.Further, the device can be arranged on platform, grader
In, or in intelligent robot, the application is not limited this.
As shown in figure 3, the device includes:Acquiring unit 310, processing unit 320 and transmitting element 330.Furthermore it is also possible to
Other functional units or the module such as including memory cell.
Further, acquiring unit 310 is used to obtain at least two classification, and each classification includes multiple language materials.
Processing unit 320 is used to correspond to each classification progress algorithm excavation generation according to the language material that each classification includes
In the frequent mode set of the classification, each frequent mode set includes at least one frequent mode, each frequency
Numerous pattern includes at least one of concept or key element, and the concept or key element to each language material parsing by obtaining.
Processing unit 320 is additionally operable in the frequent mode set under all classification of comparison, the concept of each frequent mode and/
Or key element, identical frequent mode is excluded, retains at least one concept or the key element frequent mould different from other frequent modes
Formula, and candidate's frequent mode set is generated, candidate's frequent mode set includes at least one candidate's frequent mode;With
And the concept at least one candidate's frequent mode and/or factor combination computing are generated to the classification of each classification
Expression formula.
Optionally, in a kind of specific implementation of the present embodiment, each corresponding item collection of the language material, the calculation
Method includes Apriori algorithm.The acquiring unit 320 is specifically used for obtaining corresponding to multiple language materials under each classification
Multiple item collections;And the multiple item collection is generated into multiple frequent mode set, Mei Gesuo by the Apriori algorithm computing
State the corresponding frequent mode set of classification.
Optionally, in a kind of specific implementation of the present embodiment, the processing unit 320 is specifically used for screening binary
Or more than binary and only comprising a kind of frequent mode in concept and key element;Exclude described and include key element comprising concept and only
The frequent mode more than binary and binary of composition, retain unitary frequent mode and binary comprising concept and key element and binary with
On frequent mode, and generate the multiple frequent mode set using the frequent mode retained.
Optionally, in a kind of specific implementation of the present embodiment, processing unit 320 is specifically used for setting each frequency
A concept or a key element in numerous pattern are unitary, count the number of the member included in candidate's frequent mode;
Judge whether candidate's frequent mode is made up of concept more than binary or binary and/or key element;If it is, by described in
The computing that all concepts and/or key element in candidate's frequent mode carry out logical AND generates the classification expression formula;It is if described
It is made up of in candidate's frequent mode a metanotion or key element, then by candidate's frequent mode of unitary and built classification expression formula
The computing for carrying out logic NOT generates the classification expression formula.
Optionally, in a kind of specific implementation of the present embodiment, processing unit 320, which specifically is additionally operable to compare, passes through calculation
Whether each frequent mode that method is excavated is identical, and each frequent mode of the excavation and point previously generated
Whether the combination of concept and/or key element in class expression formula is identical;If differed, using the frequent mode as the time
Frequent mode is selected, and generates candidate's frequent mode set.
Optionally, in a kind of specific implementation of the present embodiment, if the frequent mode comprising binary, and logically
With computing, then it is described classification expression formula be expressed as:C_X+e_Y or e_X+c_Y;If the frequent mode comprising binary, and according to patrolling
Inverse is collected, then the classification expression formula is expressed as:C_X-e_Y or e_X-c_Y;If the frequent mode comprising ternary, and according to
Logic and operation, then it is described classification expression formula be expressed as:c_X+e_Y+c_Z;If the frequent mode comprising ternary, and logically
With and logical not operation, then it is described classification expression formula be expressed as:c_X+e_Y-c_Z;Wherein, in above-mentioned each classification expression formula, c tables
Show concept, e represents key element, and X, Y and Z represent concept name or key element name, and "+" represents logic and operation, and "-" represents logic NOT fortune
Calculate.
Transmitting element 330 is used to externally export the classification expression formula of generation to user or show on platform.
The device that the present embodiment provides, the language material generation under each classification is corresponded to the frequent mould of classification using mining algorithm
Formula set, the frequent mode in comparing and screening each set, candidate's frequent mode set is generated, finally to these candidates frequency
Concept and/or key element in numerous set of modes are combined computing generation classification expression formula, realize to the automatic mining of language material and
Classification, solve and taken time and effort during artificial screening language material and generation classification expression formula, the problem of efficiency is low.
Referring to Fig. 4, the embodiment of the present application additionally provides a kind of grader, for carrying the device shown in Fig. 3.Wherein, should
Grader includes the parts such as transceiver 410, processor 420 and memory 430, for realizing above-mentioned classification expression formula generation side
Method, further, the program or instruction that the processor 420 in the grader can be stored by performing in memory 430 are real
It is existing.
Wherein, the transceiver 410 can be used for receiving or send data, the information such as language material, text, the transceiver
410 can receive or send information under the control of processor 420.
Processor 420 is the control centre of grader, utilizes each portion of various interfaces and the whole grader of connection
Point, by running or performing the software program and/or module that are stored in memory, and call and be stored in memory 430
Data, with realize generate classification expression formula function.Further, the processor 420 can be central processing unit
(central processing unit, CPU), network processing unit (network processor, NP) or CPU and NP group
Close.Processor can further include hardware chip.The hardware chip is provided with integrated circuit or PLD
Deng the application is not particularly limited to this.
Memory 430 can include volatile memory (volatile memory), such as random access memory
(random access memory, RAM);Nonvolatile memory (non-volatile memory) can also be included, such as
Flash memory (flash memory), hard disk (hard disk drive, HDD) or solid state hard disc (solid-state
Drive, SSD);The memory can also include the combination of the memory of mentioned kind.
In embodiments of the present invention, with reference to shown in Fig. 3 of above-described embodiment, the acquiring unit 310 and transmitting element 330
The function to be realized can be realized by the transceiver 410 of the grader, or the transceiver 410 controlled by processor 420
Realize;The function to be realized of processing unit 320 can then be realized that memory 430 is used to store by the processor 420
Language material, text, algorithm, frequent item set and classification expression formula etc..
In addition, the present invention also provides a kind of computer-readable storage medium, wherein, the computer-readable storage medium can be stored with journey
Sequence, the program may include the part or all of step in classification expression formula generation method provided by the invention when performing.Described
Storage medium can be magnetic disc, CD, read-only memory (English:Read-only memory, ROM) or random storage memory
Body (English:Random access memory, RAM) etc..
It is required that those skilled in the art can be understood that the technology in the embodiment of the present invention can add by software
The mode of general hardware platform realize.Based on such understanding, the technical scheme in the embodiment of the present invention substantially or
Say that the part to be contributed to prior art can be embodied in the form of software product, the computer software product can be deposited
Storage is in storage medium, such as ROM/RAM, magnetic disc, CD, including some instructions are causing a computer equipment (can be with
Be personal computer, server, either network equipment etc.) perform some part institutes of each embodiment of the present invention or embodiment
The method stated.
In this specification between each embodiment identical similar part mutually referring to.Especially for above-mentioned each reality
For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
In explanation.
Invention described above embodiment is not intended to limit the scope of the present invention..