CN107679209A - Expression formula generation method of classifying and device - Google Patents

Expression formula generation method of classifying and device Download PDF

Info

Publication number
CN107679209A
CN107679209A CN201710961839.5A CN201710961839A CN107679209A CN 107679209 A CN107679209 A CN 107679209A CN 201710961839 A CN201710961839 A CN 201710961839A CN 107679209 A CN107679209 A CN 107679209A
Authority
CN
China
Prior art keywords
classification
frequent mode
frequent
concept
key element
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710961839.5A
Other languages
Chinese (zh)
Other versions
CN107679209B (en
Inventor
李德彦
晋耀红
郝思洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Science and Technology (Beijing) Co., Ltd.
Original Assignee
Beijing Shenzhou Taiyue Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenzhou Taiyue Software Co Ltd filed Critical Beijing Shenzhou Taiyue Software Co Ltd
Priority to CN201710961839.5A priority Critical patent/CN107679209B/en
Publication of CN107679209A publication Critical patent/CN107679209A/en
Application granted granted Critical
Publication of CN107679209B publication Critical patent/CN107679209B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application discloses one kind classification expression formula generation method and device, methods described to include:At least two classification are obtained, each classification includes multiple language materials;Algorithm is carried out to each classification according to each language material included of classifying and excavates frequent mode set of the generation corresponding to the classification, each frequent mode set includes at least one frequent mode;In frequent mode set under relatively more each all classification, the concept and/or key element of each frequent mode, identical frequent mode is excluded, retains at least one concept or the key element frequent mode different from other frequent modes, and generate candidate's frequent mode set;By the concept at least one candidate's frequent mode and/or the classification expression formula of each classification of factor combination computing generation.This method can discharge the frequent mode repeated in classification, and automatically generate classification expression formula, avoid artificial screening and mining process, improve language material screening efficiency and the degree of accuracy.

Description

Expression formula generation method of classifying and device
Technical field
The application is related to Text Mining Technology field, belongs to different classifications for substantial amounts of language material to be excavated and generated Under classification expression formula, more particularly to a kind of classification expression formula generation method and device.
Background technology
There is 80% to belong to unstructured data in socialization big data, it is all that unstructured data processing is that big data is faced with More challenges, a kind of challenge therein are:Business is classified and planning changes the maintenance challenge brought soon more, and business classification is more, point Class change is fast, it is necessary to the language rule of all classification of correlation be combed again, the workload of maintenance during each Classification Change Greatly, efficiency is low.
Classification or language material for the stronger similar short texts of business, such as bank management system come to customer service work order Electric reason is classified, because content of text is very short, the seldom and different text of same feature occurrence number or language material institute Many cross features between the classification of category be present, be all difficult to weigh rational spy using statistics mining algorithm such as TFIDF, KNN etc. Weight is levied, causes the degree of accuracy to these texts or language material classification not high.
In real work, in order to ensure that the classification of the similar short texts stronger to business can reach the standard of practice , it is necessary to manually carry out the screening next life constituent class expression formula of feature from a large amount of language materials, this process takes time and effort exactness.
The content of the invention
This application provides one kind classification expression formula generation method and device, to improve to the efficiency of language material screening and accurate Degree.
In a first aspect, this application provides one kind classification expression formula generation method, methods described includes:Obtain at least two Classification, each classification include multiple language materials;According to each language material included of classifying to each classification progress algorithm excavation Generation corresponds to the frequent mode set of the classification, and each frequent mode set includes at least one frequent mode, often The individual frequent mode includes at least one of concept or key element, and the concept or key element can be by each language materials Parsing obtains;Compare in the frequent mode set under all classification, the concept and/or key element of each frequent mode, i.e., composition is each The species of the member of individual frequent mode, identical frequent mode is excluded, retain at least one concept or key element and other frequent moulds The different frequent mode of formula, and candidate's frequent mode set is generated, candidate's frequent mode set includes at least one time Select frequent mode;By each described point of the concept at least one candidate's frequent mode and/or factor combination computing generation The classification expression formula of class.
The method that present aspect provides, excavate to form frequent mode set by carrying out algorithm to each classification, and to all The frequent mode set of generation be compared and screen formation classification expression formula, this method can discharge classification in repeat it is frequent Pattern, and correspond to where it classification expression formula classified for the generation of any language material, it is a kind of automatic implementation process, avoids people Work is screened and mining process, improves language material screening efficiency and the degree of accuracy.
With reference in a first aspect, in a kind of concrete implementation of first aspect, each corresponding item collection of the language material is described Algorithm includes Apriori algorithm, described to carry out algorithm excavation generation pair to each classification according to the language material that each classification includes Should in the frequent mode set of the classification, including:Obtain multiple item collections corresponding to multiple language materials under each classification;Will The multiple item collection generates multiple frequent mode set by the Apriori algorithm computing, and each classification is corresponding one Frequent mode set.
With reference in a first aspect, in first aspect another kind concrete implementation, the multiple frequent mode set of generation, wrap Include:Screen more than binary or binary and only include a kind of frequent mode in concept and key element;Exclude described and include concept Frequent mode more than binary and binary only comprising key element composition, retain unitary frequent mode and comprising concept and key element Frequent mode more than binary and binary, and generate the multiple frequent mode set using the frequent mode retained.
This implementation eliminates the frequent mode for more than binary and binary only including concept and key element so that point of generation Class expression formula includes two kinds of concept and key element, so as to which the language material that will more accurately screen was divided under corresponding classify, side Just business personnel is counted and arranged to a large amount of language materials in database.
With reference in a first aspect, in first aspect another concrete implementation, by least one candidate's frequent mode In concept and/or each classification of factor combination computing generation classification expression formula, including:If each frequent mode In a concept or a key element be unitary, count the number of the member included in candidate's frequent mode;Judge institute State whether candidate's frequent mode is made up of concept more than binary or binary and/or key element;If it is, by the candidate frequently The computing that all concepts and/or key element in numerous pattern carry out logical AND generates the classification expression formula;If candidate's frequency It is made up of, then is patrolled candidate's frequent mode of unitary and built classification expression formula a metanotion or key element in numerous pattern Collect non-computing and generate the classification expression formula.
In this implementation, different classification charts is generated using the concept and/or key element of screening, and logic connective Up to formula, such as the classification expression formula with logical AND and logic NOT, so as to improve the degree of accuracy to language material division.
With reference in a first aspect, in first aspect another concrete implementation, wrapped in each frequent mode of the comparison Include the classification expression formula previously generated under each classification;Compare the concept and/or key element of each frequent mode, and generate time The step of frequency-selecting numerous set of modes, including:Whether identical compare each frequent mode excavated by algorithm, and described in comparison Whether each frequent mode excavated is identical with the combination of concept and/or key element in the classification expression formula previously generated;Such as Fruit differs, then using the frequent mode as candidate's frequent mode, and generates candidate's frequent mode set.
In this implementation, by the way that the individual frequent mode of generation compared with the frequent mode previously generated, is rejected Identical, retains different, has saved memory space, and conveniently counts all classification expression formulas under a kind of classification, compared with Artificial screening picks and improves classification effectiveness again, saves the classification time.
With reference in a first aspect, in first aspect another concrete implementation, if the frequent mode comprising binary, and according to Logic and operation, then it is described classification expression formula be expressed as:C_X+e_Y or e_X+c_Y;If the frequent mode comprising binary, and press According to logical not operation, then the classification expression formula is expressed as:C_X-e_Y or e_X-c_Y;If the frequent mode comprising ternary, and Logically with computing, then it is described classification expression formula be expressed as:c_X+e_Y+c_Z;If the frequent mode comprising ternary, and according to Logical AND and logical not operation, then it is described classification expression formula be expressed as:c_X+e_Y-c_Z;Wherein, above-mentioned each classification expression formula In, c represents concept, and e represents key element, and X, Y and Z represent concept name or key element name, is to pass through a variety of descriptions under identical concept Normalized generation, "+" represents logic and operation, and "-" represents logical not operation.In addition, the shunting expression formula may be used also With including other oeprators, such as " | " represents that the condition of left and right only meets one, or can be changed with " () " expression Become priority of matching etc., the application is not limited this.
Second aspect, present invention also provides one kind classification expression formula generating means, for realizing described in first aspect Classification expression formula generation method, specifically, the device includes:Acquiring unit and processing unit, furthermore it is also possible to single including sending Member and memory cell etc.,
Acquiring unit, for obtaining at least two classification, each classification includes multiple language materials;
The processing unit, generation pair is excavated for carrying out algorithm to each classification according to the language material that each classification includes Should be in the frequent mode set of the classification, each frequent mode set includes at least one frequent mode, each described Frequent mode includes at least one of concept or key element, and the concept or key element to each language material parsing by obtaining ;
The processing unit, it is additionally operable in the frequent mode set under all classification of comparison, the concept of each frequent mode And/or key element, identical frequent mode is excluded, retains frequent different from other frequent modes of at least one concept or key element Pattern, and candidate's frequent mode set is generated, candidate's frequent mode set includes at least one candidate's frequent mode;With And the concept at least one candidate's frequent mode and/or factor combination computing are generated to the classification of each classification Expression formula.
With reference to second aspect, in a kind of concrete implementation of second aspect, each corresponding item collection of the language material is described Algorithm includes Apriori algorithm, the acquiring unit, specifically for obtaining corresponding to multiple language materials under each classification Multiple item collections;The processing unit, it is described specifically for being passed through the multiple item collection according to each language material included of classifying Apriori algorithm computing generates multiple frequent mode set, each corresponding frequent mode set of classification.
With reference to second aspect, in second aspect another kind concrete implementation, the processing unit, specifically for screening two More than member or binary and only include a kind of frequent mode in concept and key element;If it is, excluding described includes concept Frequent mode more than binary and binary only comprising key element composition, retain unitary frequent mode and comprising concept and key element Frequent mode more than binary and binary, and generate the multiple frequent mode set using the frequent mode retained.
With reference to second aspect, in second aspect another concrete implementation, the processing unit is each specifically for setting A concept or a key element in the frequent mode are unitary, count member included in candidate's frequent mode Number;Judge whether candidate's frequent mode is made up of concept more than binary or binary and/or key element;If it is, The computing that all concepts and/or key element in candidate's frequent mode are carried out to logical AND generates the classification expression formula;Such as It is made up of in candidate's frequent mode described in fruit a metanotion or key element, then by candidate's frequent mode of unitary and built classification The computing that expression formula carries out logic NOT generates the classification expression formula.
With reference to second aspect, in second aspect another concrete implementation, wrapped in each frequent mode of the comparison Include the classification expression formula previously generated under each classification;The processing unit, specifically for comparing by each of algorithm excavation Whether frequent mode is identical, and in each frequent mode of the excavation and the classification expression formula previously generated Whether the combination of concept and/or key element is identical;If differed, using the frequent mode as candidate's frequent mode, and Generate candidate's frequent mode set.
The third aspect, present invention also provides a kind of grader, the grader includes transceiver, processor and storage The parts such as device, for realizing the classification expression formula generation method described in above-mentioned first aspect, further, the place in the grader The program or instruction that reason device can be stored by performing in memory are realized.
Fourth aspect, present invention also provides a kind of computer-readable storage medium, the computer-readable storage medium can be stored with journey Sequence, the program can realize the part or all of step in a kind of classification expression formula generation method that the application provides when performing.
The method and apparatus that the application provides, the language material generation under each classification is corresponded to the frequency of classification using mining algorithm Numerous set of modes, the frequent mode in comparing and screening each set, candidate's frequent mode set is generated, finally these are waited Concept and/or key element in the numerous set of modes of frequency-selecting are combined computing generation classification expression formula, realize the automatic digging to language material Pick and classification, solve and taken time and effort during artificial screening language material and generation classification expression formula, the problem of efficiency is low.
Brief description of the drawings
In order to illustrate more clearly of the technical scheme of the application, letter will be made to the required accompanying drawing used in embodiment below Singly introduce, it should be apparent that, for those of ordinary skills, without having to pay creative labor, Other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is a kind of schematic flow sheet for classification expression formula generation method that the embodiment of the present application provides;
Fig. 2 is the schematic flow sheet for another classification expression formula generation method that the embodiment of the present application provides;
Fig. 3 is a kind of structural representation for classification expression formula generating means that the embodiment of the present application provides;
Fig. 4 is a kind of grader structural representation that the embodiment of the present application provides.
Embodiment
In order that those skilled in the art more fully understand the technical scheme in the embodiment of the present invention, and make of the invention real Apply the above-mentioned purpose of example, feature and advantage can be more obvious understandable, below in conjunction with the accompanying drawings to the technology in the embodiment of the present invention Scheme is described in further detail.
Before illustrating to the technical scheme of the embodiment of the present invention, application scenarios and phase first to the embodiment of the present invention Concept is closed to illustrate and introduce.
The application can be applied to field of artificial intelligence, and a large amount of language materials stored in database are screened and drawn Point, the classification expression formula of corresponding classification is generated, to facilitate technical staff to sort out and count language material, saves artificial screening language The material time used, and the accuracy and efficiency of language material division is improved, and reach auxiliary structure, extension and Optimum Classification model Purpose.
Technical scheme described herein can realize that the platform can be a kind of based on the non-of concept by platform Structured text big data analysis mining equipment, for according to the mining rule in semantic model, to the language material of input (with text Form) carry out analysis mining.The concept and key element of language material to be excavated are extracted, and is reached according to these concepts and key element generation classification chart Formula, so that the language material can be divided into the form of expression formula of classifying among a specific classification, auxiliary is established and optimization Disaggregated model.
Wherein, the language material include be platform or intelligent robot record caller client language and characters content.Specifically It can express and show by way of text.For example, a language material is " I wants to handle the credit card of an XX bank.", or Person " I in the credit card for the your bank handled before, preferential classification is wrong, so I wants to change a credit card.", in addition, Also include the language material of user's evaluation, such as " my other day has handled the credit card of a your bank, examination & verification pass through after there is presently no The card mailed is received, speed of applying for card is too slow, and I, which worries, uses " etc..
Because different language material contents is different, in order to which thousands of language material is counted and concluded, it is necessary to according to certain Kind feature is classified to different language materials.Specifically, language material first can be split into several concepts and key element.
Further, concept, " c " can be used to represent, refers to the word sense information of the vocabulary in each text, and the language between vocabulary Adopted relevance.One " concept " can represent one group of vocabulary, can also represent in short.Concept is the description to object, reaction The abstract expression of the essential attribute of its described object.Such as time, place, mood, evaluation etc..Further, linguistics is general Read, such as " negating ", " query ";Concept of time, such as " my god ", " 2017 ";Place concept " Beijing ", " locality ";Action is retouched The property stated concept, such as " unfinished " etc..Wherein, concept is typically unrelated with specific business, is conventional language concept, different It can be multiplexed in field.Accordingly, conceptional tree (English:Conception Tree):The unrelated conventional complexity of tree-shaped organization business Concept.The node of tree is concept name, and concept value is a word or type mode, is language performance corresponding to concept.Each tree is all It is construed as a semantic model.
Wherein, key element value supports two types:Text-type and mode-type.The text-type is in the form of vocabulary, phrase etc. The text string of composition;Mode-type is the text string represented in the form of regularity, can be one section of text, or multiple vocabulary groups Close;The simple operation of the vocabulary such as distance, position is supported, referring to shown in table 1 below.
Element name Key element value Element type
Permanent amount Forever, { 0,3 } amount Pattern
Permanent amount Volume forever Text
Permanent amount Permanent amount Text
Table 1
Key element, " e " can be used to represent, typically, attribute etc. relevant with specific business.Such as the entity related to business, including " industrial and commercial bank ", " peony-card ";It is related to service attribute, such as " amount ", " minimum charge ";It is related to business action, for example " open Card ", " clearing " etc..Key element also have it is related to field, the characteristics of can not being multiplexed in different fields.Accordingly, key element tree (English:Element Tree):The related concept of tree-shaped organization business, is through commonly used object, instrument, attribute etc. in business Concept;The node of tree is key element name, and key element name can correspond to multiple key element values.Key element value is language performance corresponding to key element, is Word or type mode, referring to shown in table 1 below.
Concept name Concept value Concept type
Environment difference is commented Haze Text
Environment difference is commented Air, { 0,7 } are seriously polluted Pattern
Environment difference is commented Environment, { 0,7 } is not good enough Pattern
Table 2
Text mining is that valuable information is obtained from language material text.
Model optimization, refers to a large amount of language materials of batch quantity analysis, technology accuracy rate and recall rate, and renewal language material is iterated excellent Change, the operation such as ensured sustained development model, modeling strategy and program.The process of the model optimization is establish classification expression formula one Point, business personnel needs a large amount of language materials of artificial screening before structure, to language material split the combination of product concept and key element, Pass through this combination of optimization method automatic mining again.
The method provided below the application is introduced, as shown in figure 1, a kind of classification provided for the embodiment of the present application The flow chart of expression formula generation method, this method comprise the following steps:
Step 101:At least two classification are obtained, each classification includes multiple language materials, wherein, each language material pair An item collection is answered, each item collection is made up of the concept after being parsed to the language material and/or key element.
The item collection means the set of item, including unitary item collection, binary item collection, ternary item collection etc..Each item collection bag The species of member and the number of member are included, wherein, the species of the member includes concept and key element, and a concept or a key element are referred to as Unitary, each item collection are made up of at least one member.If the relative support of an item collection meets predefined most ramuscule Degree of holding threshold value, then the item collection can be described as frequent item set.The species of the frequent item set includes concept and/or key element, described frequent The number of item centralized concept and/or key element is first number.
Wherein, the language material is converted into the process of item collection to be included:Each language material is split or normalized Processing, extracts concept and/or key element that the language material includes, then by these concepts and/or key element composition corresponding to the language material One item collection.Optionally, each item collection can be represented with " () ", for example, (setting, trading password).
Step 102:Algorithm excavation generation is carried out to each classification according to the language material that each classification includes and corresponds to this point The frequent mode set of class, each frequent mode set include at least one frequent mode, each frequent mode Include at least one of concept or key element, the concept or key element can be by obtaining to each language material parsing.
Frequent mode (English:Frequent pattern) represent to concentrate the pattern that frequently occurs in data, it is described Pattern include some, subsequence, minor structure etc..
A kind of mode for generating frequent mode is that, using Apriori algorithm, the Apriori algorithm is that a kind of excavate is closed Join the frequent item set algorithm of rule, its core concept be by the downward closing of candidate generation and plot detect two stages come Mining Frequent Itemsets Based.
Further, the process of frequent mode set is generated using Apriori algorithm to be included:Obtain under any one classification Multiple language materials corresponding to multiple item collections;The multiple item collection is generated by Apriori algorithm computing more under the classification Individual frequent mode set.Other classification are generated into frequent mode set also through Apriori algorithm after the same method.Its In, each corresponding frequent mode set of classification.
Optionally, during the frequent mode set under each classification of generation, in addition to:Discharge be entirely by concept or The frequent mode more than binary or binary of key element composition.Specifically process includes:More than screening binary or binary and only include A kind of frequent mode in concept and key element, exclude the only binary comprising concept and only comprising key element composition and binary with On frequent mode, retain frequent mode more than unitary frequent mode and binary and binary comprising concept and key element, and profit The frequent mode set is generated with the frequent mode of reservation.
This implementation eliminates the frequent mode for more than binary and binary only including concept and key element so that point of generation Class expression formula includes two kinds of concept and key element, so as to which the language material that will more accurately screen was divided under corresponding classify, side Just business personnel is counted and arranged to a large amount of language materials in database.
Step 103:Compare in the frequent mode set under all classification, the concept and/or key element of each frequent mode, arrange Except identical frequent mode, retain at least one concept or the key element frequent mode different from other frequent modes, and generate Candidate's frequent mode set, candidate's frequent mode set include at least one candidate's frequent mode.
Described " comparison " compares screening process including two layers, and one layer is, more each frequent mode is under same classification No identical, exclusion has same concept and/or key element, and first number identical frequent mode, retains the frequent mode differed;Separately One layer of screening is to intersect the frequent mode under more each classification, that is, compares each frequent mode under different classifications, reject phase The different frequent mode with reservation.
Specifically, the concept and/or key element of each frequent mode, and generate the step of candidate's frequent mode set Suddenly, including:Whether identical compare each frequent mode excavated by algorithm, and each frequent mode of the excavation It is whether identical with the combination of concept and/or key element in the classification expression formula previously generated;, should if differed Frequent mode generates candidate's frequent mode set as candidate's frequent mode.
Step 104:Concept at least one candidate's frequent mode and/or factor combination computing are generated into each institute State the classification expression formula of classification.
Wherein, classification expression formula, is referred to as body expression formula:It is business rule corresponding to each classification or body class Then, for represent it is each classification or body class corresponding to business rule standardization description, typically by resource (such as concept, will Element) and operator combination form.The operator includes primary operator, and the primary operator has logical combination operations, such as logical AND "+", logic or " | ", logic NOT "-", bracket priority " () " etc..Further, logical AND "+" represents that the condition of left and right must It must simultaneously meet, i.e., all occur in language material;Logic or " | " have an item to meet as long as representing the condition of left and right;Logic Non- "-" represents discharge, i.e. the condition on the right must discharge outside;Bracket " () " represents that the priority of matching can be changed;“#” Represent that sentence limits, i.e., the condition of described restriction must occur in same sentence.
As shown in Fig. 2 specifically, step 104 includes:
Step 201:If the concept or a key element in each frequent mode are unitary, the candidate is counted The number of member included in frequent mode.
Step 202:Judge whether candidate's frequent mode is by concept more than binary or binary and/or key element group Into.
Step 203:If be made up of in candidate's frequent mode concept more than binary or binary and/or key element, The computing that all concepts and/or key element in candidate's frequent mode are carried out to logical AND generates the classification expression formula.
For example, if candidate's frequent mode is { e_ opens card, e_ trading passwords }, then its classification expression formula under classifying is established For:E_ opens card+e_ trading passwords, wherein, "+" represents logic and operation.
Step 204:It is if be made up of in candidate's frequent mode a metanotion or key element, the candidate of unitary is frequent The computing that pattern carries out logic NOT with built classification expression formula generates the classification expression formula.
If for example, there are { e_ opens card, e_ trading passwords } and { c_ forgettings } in candidate's frequent mode, then by the frequency of the unitary Numerous pattern { c_ forgettings } is excluded with logic NOT "-", then the classification expression formula generated is:E_ opens card+e_ trading passwords-c_ forgettings.
Optionally, the classification expression formula can generate following according to the number and operator of the member included in frequent mode Different patterns:(1) if the frequent mode comprising binary, and logically with computing, then it is described classification expression formula be expressed as:c_ X+e_Y or e_X+c_Y;(2) if the frequent mode comprising binary, and logically inverse, then the classification expression formula represent For:C_X-e_Y or e_X-c_Y;(3) if the frequent mode comprising ternary, and logically with computing, then the classification expression formula It is expressed as:c_X+e_Y+c_Z;(4) if the frequent mode comprising ternary, and logically with and logical not operation, then described point Class expression formula is expressed as:c_X+e_Y-c_Z.Wherein, in above-mentioned each classification expression formula, c represents concept, and e represents key element, X, Y and Z Concept name or key element name are represented, is that "+" represents logic and operation, "-" by the way that a variety of descriptions normalization of identical concept is formed Represent logical not operation.
The method that the present embodiment provides, excavate to form frequent mode set by carrying out algorithm to each classification, and to institute The frequent mode set for having generation is compared and screened formation classification expression formula, and this method can discharge the frequency repeated in classification Numerous pattern, and correspond to where it classification expression formula classified for the generation of any language material, it is a kind of automatic implementation process, avoids Artificial screening and mining process, improve language material screening efficiency and the degree of accuracy, auxiliary structure, extension and Optimum Classification model.
In a specific embodiment, by taking the business that bank applies for card as an example, illustrate that the application generates classification chart and reached below The method of formula.
For example, first is categorized as " setting password ", the process for the first classification generation classification expression formula includes:
Step 101:Obtain " setting password " this classification under a language material, such as language material for " ... I thinks setting one Lower trading password." the concept value and key element value of this language material are identified, obtain concept " setting " (with " c_ setting " expression) Under a concept value " set(once | individual)" match " setting " in language material;Key element " trading password " (uses " e_ Trading password " represent) under key element value " (transaction | consumption | credit | enchashment | pay) { 0,3 } password " matched in language material " trading password ", wherein, multiple concept values can be included under each concept, multiple key element values can be included under each key element, And then the normalized purpose of different expression ways for making identical concept/key element can be reached.As above-mentioned key element value " (transaction | consumption | credit | enchashment | pay) { 0,3 } password " ' consumption password ', ' password of enchashment ', ' credit password ' etc. can also be matched A variety of expression, these expression can be that " trading password " represents with a key element.So as to identify that what this language material included " sets Put " and " trading password " two items, the two constitute item collections corresponding to this language material as (c_ is set, and e_ transaction is close Code).
Other language materials can similarly draw corresponding item collection.
Step 102:After obtaining all item collections under the first classification, the frequency of this classification is excavated using apriori algorithms Numerous pattern, that is, determine which (one/bis-/tri-) often while occur.Assuming that " setting password " this classification is dug by algorithm Frequent mode after pick includes:{ e_ opens card, and c_ is set, e_ trading passwords }, { c_ is set, e_ trading passwords }, { c_ is activated, e_ Trading password }, { e_ opens card, e_ trading passwords }, { c_ activation }, { e_ opens card }, these frequent modes the first frequent mode of composition Set.
Similarly, the frequent mode set of the second classification is obtained according to above-mentioned steps.Assuming that second is categorized as " resetting close Code ", then the language material under " the replacement password " is carried out algorithm excavate to obtain corresponding to frequent mode include:{ c_ forgets, e_ transaction Password }, { c_ is reset, e_ trading passwords }, { c_ is set, e_ trading passwords }, { c_ forgets, c_ modifications, e_ trading passwords }, { c_ Forget, these frequent modes form the second frequent mode set.
Step 103:Compare the frequent mode set under the first classification and the second classification, exclude the wherein frequent mould of identical Formula, retain at least one concept or the key element frequent mode different from other frequent modes, and generate candidate's fuzzy frequent itemsets Close.
Specifically include:First, for the frequent mode being made up of binary or ternary, by the frequent mode of each classification and The key element of existing expression formula and/or association of ideas are compared under current class, exclude identical frequent mode.For example, Under " setting password " this classification, the classification expression formula that has prestored:E_ opens card+c_ setting+e_ trading passwords, then by this { e_ opens card, and c_ is set, e_ trading passwords } in frequent mode set under one classification deletes, then retains remaining frequent mode Have:
{ c_ is set, e_ trading passwords };
{ c_ is activated, e_ trading passwords };
{ e_ opens card, e_ trading passwords };
{ c_ activation };
{ e_ opens card }.
Then, the frequent mode set under the first classification is compared with the frequent mode set under the second classification, will Identical frequent mode therein excludes from result set.{ c_ is set, e_ trading passwords } can be obtained as in two classification by contrast Shared identical frequent mode, then this frequent mode is deleted from the result set under respective classification respectively.Wherein, the application " identical " described in embodiment refers to that item member number is identical, and concept value or key element value also identical frequent mode.
Compare rear remaining frequent mode for candidate's frequent mode under each classification, it is as shown in table 3 below.
Candidate's frequent mode under first classification Candidate's frequent mode under second classification
{ c_ is activated, e_ trading passwords } { c_ forgets, e_ trading passwords }
{ e_ opens card, e_ trading passwords } { c_ is reset, e_ trading passwords }
{ c_ activation } { c_ forgets, c_ modifications, e_ trading passwords }
{ e_ opens card } { c_ forgettings }
Table 3
Step 104:Concept at least one candidate's frequent mode and/or factor combination computing are generated into each institute State the classification expression formula of classification.
Specific Principles include:If there is concept value and key element value more than binary or binary in candidate's frequent mode, use Logic and operation generation classification expression formula.Such as candidate's frequent mode { e_ opens card, e_ trading passwords }, it is built into " setting password " The classification expression formula of classification is:E_ opens card+e_ trading passwords.
If there is unitary item in candidate's frequent mode, this is excluded using logical not operation.Such as at " setting password " Under classification, classification expression formula is completed:E_ opens card+e_ trading passwords, can be with order to preferably be distinguished with " replacement password " class Collect the operation for carrying out logic NOT, the classification expression formula of generation in expression formula:E_ opens card+e_ trading passwords-c_ forgettings.
Identical item is excluded, retains exclusive item, according to the first classification of mentioned above principle generation and the body of the second classification Expression formula is as shown in table 4.
Classification expression formula under " setting password " classification Classification expression formula under " replacement password " classification
C_ activation+e_ trading passwords C_ forgetting+e_ trading passwords
E_ opens card+e_ trading passwords C_ replacement+e_ trading passwords
C_ activation+e_ trading passwords-c_ forgets C_ forgetting+c_ modification+e_ trading passwords
E_ opens card+e_ trading passwords-c_ forgettings C_ forgetting+e_ trading passwords-c_ is activated
C_ replacement+e_ trading passwords-c_ is activated
C_ forgetting+c_ modification+e_ trading passwords-c_ is activated
C_ forgetting+e_ trading passwords-e_ opens card
C_ replacement+e_ trading passwords-e_ opens card
C_ forgetting+c_ modification+e_ trading passwords-e_ opens card
C_ forgetting+e_ trading password-c_ activation-e_ opens card
C_ replacement+e_ trading password-c_ activation-e_ opens card
C_ forgetting+c_ modification+e_ trading password-c_ activation-e_ opens card
Table 4
In the present embodiment, with reference to correlation rule and mining algorithm, unitary, two are excavated out of training corpus under same classification The frequent item set of member and three kinds of different dimensions of ternary.Due to will be it is recommended that the item of classification expression formula, in order to embody business and general The contact of thought, during Mining Frequent Itemsets Based, there is provided condition, using the item of the frequent item set of binary as key element and association of ideas, three At least one key element of item and a concept in the frequent item set of member.Other ineligible frequent item sets are deleted.
The frequent item set excavated forms the set of two kinds of frequent item sets.A kind of is the frequent episode excavated under each classification Collection, the classifying rules expression formula of the corresponding classification of auxiliary extension.Another kind is excludes the set of the frequent item set of property, in other words, It is not necessarily the classification under the second classifying rules if a frequent item set is the item of classification expression formula under the first classifying rules The item of expression formula.In this case, by the non-operation of the execution logic in the generating process of the second classification expression formula, by this point Class expression formula excludes.
This method combination conceptual rule disaggregated model and apriori algorithms, higher general of automatic mining and business association degree Read or conceptual combinations, auxiliary build, extended, concept of optimization rule classification model.Both the degree of accuracy of grader had been ensure that, had been reduced again It is artificial to expend, improve efficiency.
It is above-mentioned for realizing referring to Fig. 3, a kind of classification expression formula generating means provided for the embodiment of the present application, the device Classification expression formula generation method described in embodiment Fig. 1 or Fig. 2.Further, the device can be arranged on platform, grader In, or in intelligent robot, the application is not limited this.
As shown in figure 3, the device includes:Acquiring unit 310, processing unit 320 and transmitting element 330.Furthermore it is also possible to Other functional units or the module such as including memory cell.
Further, acquiring unit 310 is used to obtain at least two classification, and each classification includes multiple language materials.
Processing unit 320 is used to correspond to each classification progress algorithm excavation generation according to the language material that each classification includes In the frequent mode set of the classification, each frequent mode set includes at least one frequent mode, each frequency Numerous pattern includes at least one of concept or key element, and the concept or key element to each language material parsing by obtaining.
Processing unit 320 is additionally operable in the frequent mode set under all classification of comparison, the concept of each frequent mode and/ Or key element, identical frequent mode is excluded, retains at least one concept or the key element frequent mould different from other frequent modes Formula, and candidate's frequent mode set is generated, candidate's frequent mode set includes at least one candidate's frequent mode;With And the concept at least one candidate's frequent mode and/or factor combination computing are generated to the classification of each classification Expression formula.
Optionally, in a kind of specific implementation of the present embodiment, each corresponding item collection of the language material, the calculation Method includes Apriori algorithm.The acquiring unit 320 is specifically used for obtaining corresponding to multiple language materials under each classification Multiple item collections;And the multiple item collection is generated into multiple frequent mode set, Mei Gesuo by the Apriori algorithm computing State the corresponding frequent mode set of classification.
Optionally, in a kind of specific implementation of the present embodiment, the processing unit 320 is specifically used for screening binary Or more than binary and only comprising a kind of frequent mode in concept and key element;Exclude described and include key element comprising concept and only The frequent mode more than binary and binary of composition, retain unitary frequent mode and binary comprising concept and key element and binary with On frequent mode, and generate the multiple frequent mode set using the frequent mode retained.
Optionally, in a kind of specific implementation of the present embodiment, processing unit 320 is specifically used for setting each frequency A concept or a key element in numerous pattern are unitary, count the number of the member included in candidate's frequent mode; Judge whether candidate's frequent mode is made up of concept more than binary or binary and/or key element;If it is, by described in The computing that all concepts and/or key element in candidate's frequent mode carry out logical AND generates the classification expression formula;It is if described It is made up of in candidate's frequent mode a metanotion or key element, then by candidate's frequent mode of unitary and built classification expression formula The computing for carrying out logic NOT generates the classification expression formula.
Optionally, in a kind of specific implementation of the present embodiment, processing unit 320, which specifically is additionally operable to compare, passes through calculation Whether each frequent mode that method is excavated is identical, and each frequent mode of the excavation and point previously generated Whether the combination of concept and/or key element in class expression formula is identical;If differed, using the frequent mode as the time Frequent mode is selected, and generates candidate's frequent mode set.
Optionally, in a kind of specific implementation of the present embodiment, if the frequent mode comprising binary, and logically With computing, then it is described classification expression formula be expressed as:C_X+e_Y or e_X+c_Y;If the frequent mode comprising binary, and according to patrolling Inverse is collected, then the classification expression formula is expressed as:C_X-e_Y or e_X-c_Y;If the frequent mode comprising ternary, and according to Logic and operation, then it is described classification expression formula be expressed as:c_X+e_Y+c_Z;If the frequent mode comprising ternary, and logically With and logical not operation, then it is described classification expression formula be expressed as:c_X+e_Y-c_Z;Wherein, in above-mentioned each classification expression formula, c tables Show concept, e represents key element, and X, Y and Z represent concept name or key element name, and "+" represents logic and operation, and "-" represents logic NOT fortune Calculate.
Transmitting element 330 is used to externally export the classification expression formula of generation to user or show on platform.
The device that the present embodiment provides, the language material generation under each classification is corresponded to the frequent mould of classification using mining algorithm Formula set, the frequent mode in comparing and screening each set, candidate's frequent mode set is generated, finally to these candidates frequency Concept and/or key element in numerous set of modes are combined computing generation classification expression formula, realize to the automatic mining of language material and Classification, solve and taken time and effort during artificial screening language material and generation classification expression formula, the problem of efficiency is low.
Referring to Fig. 4, the embodiment of the present application additionally provides a kind of grader, for carrying the device shown in Fig. 3.Wherein, should Grader includes the parts such as transceiver 410, processor 420 and memory 430, for realizing above-mentioned classification expression formula generation side Method, further, the program or instruction that the processor 420 in the grader can be stored by performing in memory 430 are real It is existing.
Wherein, the transceiver 410 can be used for receiving or send data, the information such as language material, text, the transceiver 410 can receive or send information under the control of processor 420.
Processor 420 is the control centre of grader, utilizes each portion of various interfaces and the whole grader of connection Point, by running or performing the software program and/or module that are stored in memory, and call and be stored in memory 430 Data, with realize generate classification expression formula function.Further, the processor 420 can be central processing unit (central processing unit, CPU), network processing unit (network processor, NP) or CPU and NP group Close.Processor can further include hardware chip.The hardware chip is provided with integrated circuit or PLD Deng the application is not particularly limited to this.
Memory 430 can include volatile memory (volatile memory), such as random access memory (random access memory, RAM);Nonvolatile memory (non-volatile memory) can also be included, such as Flash memory (flash memory), hard disk (hard disk drive, HDD) or solid state hard disc (solid-state Drive, SSD);The memory can also include the combination of the memory of mentioned kind.
In embodiments of the present invention, with reference to shown in Fig. 3 of above-described embodiment, the acquiring unit 310 and transmitting element 330 The function to be realized can be realized by the transceiver 410 of the grader, or the transceiver 410 controlled by processor 420 Realize;The function to be realized of processing unit 320 can then be realized that memory 430 is used to store by the processor 420 Language material, text, algorithm, frequent item set and classification expression formula etc..
In addition, the present invention also provides a kind of computer-readable storage medium, wherein, the computer-readable storage medium can be stored with journey Sequence, the program may include the part or all of step in classification expression formula generation method provided by the invention when performing.Described Storage medium can be magnetic disc, CD, read-only memory (English:Read-only memory, ROM) or random storage memory Body (English:Random access memory, RAM) etc..
It is required that those skilled in the art can be understood that the technology in the embodiment of the present invention can add by software The mode of general hardware platform realize.Based on such understanding, the technical scheme in the embodiment of the present invention substantially or Say that the part to be contributed to prior art can be embodied in the form of software product, the computer software product can be deposited Storage is in storage medium, such as ROM/RAM, magnetic disc, CD, including some instructions are causing a computer equipment (can be with Be personal computer, server, either network equipment etc.) perform some part institutes of each embodiment of the present invention or embodiment The method stated.
In this specification between each embodiment identical similar part mutually referring to.Especially for above-mentioned each reality For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method In explanation.
Invention described above embodiment is not intended to limit the scope of the present invention..

Claims (10)

1. one kind classification expression formula generation method, it is characterised in that methods described includes:
At least two classification are obtained, each classification includes multiple language materials;
According to each frequent mode of the language material included to each classification progress algorithm excavation generation corresponding to the classification of classifying Set, each frequent mode set include at least one frequent mode, each the frequent mode include concept or At least one of key element, the concept or key element to each language material parsing by obtaining;
Compare in the frequent mode set under all classification, the concept and/or key element of each frequent mode, it is frequent to exclude identical Pattern, retain at least one concept or the key element frequent mode different from other frequent modes, and generate candidate's frequent mode Set, candidate's frequent mode set include at least one candidate's frequent mode;
By the concept at least one candidate's frequent mode and/or the classification of each classification of factor combination computing generation Expression formula.
2. according to the method for claim 1, it is characterised in that each corresponding item collection of the language material, the algorithm bag Include Apriori algorithm,
It is described that the frequent of the classification is corresponded to each classification progress algorithm excavation generation according to the language material that each classification includes Set of modes, including:
Obtain multiple item collections corresponding to multiple language materials under each classification;
The multiple item collection is generated into multiple frequent mode set, each classification pair by the Apriori algorithm computing Answer a frequent mode set.
3. according to the method for claim 2, it is characterised in that the multiple frequent mode set of generation, including:
Screen more than binary or binary and only include a kind of frequent mode in concept and key element;
The frequent mode only more than binary and binary comprising concept and only comprising key element composition is excluded, retains unitary frequent mode The multiple frequency is generated with the frequent mode of frequent mode more than binary and binary comprising concept and key element, and utilization reservation Numerous set of modes.
4. according to the method described in claim any one of 1-3, it is characterised in that
By the concept at least one candidate's frequent mode and/or the classification of each classification of factor combination computing generation Expression formula, including:
If the concept or a key element in each frequent mode are unitary, institute in candidate's frequent mode is counted Comprising member number;
Judge whether candidate's frequent mode is made up of concept more than binary or binary and/or key element;
If it is, all concepts and/or key element in candidate's frequent mode are carried out described in the computing generation of logical AND Classification expression formula;
If be made up of in candidate's frequent mode a metanotion or key element, by candidate's frequent mode of unitary with being completed Classification expression formula carry out the computing of logic NOT and generate the classification expression formula.
5. according to the method for claim 1, it is characterised in that each frequent mode of the comparison includes each classification Under the classification expression formula that previously generates;
The step of comparing the concept and/or key element of each frequent mode, and generating candidate's frequent mode set, including:
Whether identical compare each frequent mode excavated by algorithm, and each frequent mode of the excavation and institute Whether the combination for stating the concept and/or key element in the classification expression formula previously generated is identical;
If differed, using the frequent mode as candidate's frequent mode, and candidate's fuzzy frequent itemsets are generated Close.
6. according to the method described in claim any one of 1-5, it is characterised in that
If the frequent mode comprising binary, and logically with computing, then it is described classification expression formula be expressed as:C_X+e_Y or e_X +c_Y;
If the frequent mode comprising binary, and logically inverse, then the classification expression formula is expressed as:C_X-e_Y or e_ X-c_Y;
If the frequent mode comprising ternary, and logically with computing, then it is described classification expression formula be expressed as:c_X+e_Y+c_Z;
If the frequent mode comprising ternary, and logically with and logical not operation, then it is described classification expression formula be expressed as:c_X+ e_Y-c_Z;
Wherein, in above-mentioned each classification expression formula, c represents concept, and e represents key element, and X, Y and Z represent concept name or key element name, "+" Logic and operation is represented, "-" represents logical not operation.
7. one kind classification expression formula generating means, it is characterised in that described device includes:
Acquiring unit, for obtaining at least two classification, each classification includes multiple language materials;
Processing unit, correspond to this point for carrying out algorithm excavation generation to each classification according to the language material that each classification includes The frequent mode set of class, each frequent mode set include at least one frequent mode, each frequent mode Include at least one of concept or key element, the concept or key element to each language material parsing by obtaining;
The processing unit, be additionally operable in the frequent mode set under all classification of comparison, the concept of each frequent mode and/or Key element, identical frequent mode is excluded, retains at least one concept or the key element frequent mode different from other frequent modes, And candidate's frequent mode set is generated, candidate's frequent mode set includes at least one candidate's frequent mode;And will The classification chart of concept and/or each classification of factor combination computing generation at least one candidate's frequent mode reaches Formula.
8. device according to claim 7, it is characterised in that each corresponding item collection of the language material, the algorithm bag Include Apriori algorithm,
The acquiring unit, specifically for obtaining multiple item collections corresponding to multiple language materials under each classification;
The processing unit, specifically for the multiple item collection is generated into multiple frequent moulds by the Apriori algorithm computing Formula set, each corresponding frequent mode set of classification.
9. device according to claim 8, it is characterised in that
The processing unit, specifically for more than screening binary or binary and only including a kind of frequent mould in concept and key element Formula;The only the frequent mode more than binary and binary comprising concept and only comprising key element composition is excluded, it is frequent to retain unitary Frequent mode more than pattern and binary and binary comprising concept and key element, and generate multiple frequencies using the frequent mode retained Numerous set of modes.
10. according to the device described in claim any one of 7-9, it is characterised in that
The processing unit, specifically for setting a concept in each frequent mode or a key element as unitary, system Count the number of the member included in candidate's frequent mode;Judge candidate's frequent mode whether be by binary or binary with On concept and/or key element composition;If it is, all concepts and/or key element in candidate's frequent mode are patrolled Volume and computing generate the classification expression formula;, will if be made up of in candidate's frequent mode a metanotion or key element The computing that candidate's frequent mode of unitary carries out logic NOT with built classification expression formula generates the classification expression formula.
CN201710961839.5A 2017-10-16 2017-10-16 Classification expression generation method and device Active CN107679209B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710961839.5A CN107679209B (en) 2017-10-16 2017-10-16 Classification expression generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710961839.5A CN107679209B (en) 2017-10-16 2017-10-16 Classification expression generation method and device

Publications (2)

Publication Number Publication Date
CN107679209A true CN107679209A (en) 2018-02-09
CN107679209B CN107679209B (en) 2020-10-20

Family

ID=61141096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710961839.5A Active CN107679209B (en) 2017-10-16 2017-10-16 Classification expression generation method and device

Country Status (1)

Country Link
CN (1) CN107679209B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549723A (en) * 2018-04-28 2018-09-18 北京神州泰岳软件股份有限公司 A kind of text concept sorting technique, device and server
CN109886318A (en) * 2019-01-29 2019-06-14 北京明略软件系统有限公司 A kind of information processing method, device and computer readable storage medium
CN110069634A (en) * 2019-04-24 2019-07-30 北京泰迪熊移动科技有限公司 A kind of method, apparatus and computer readable storage medium generating classification model
CN113268529A (en) * 2021-07-21 2021-08-17 广东粤港澳大湾区硬科技创新研究院 Optimization method and device based on satellite time sequence incidence relation algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020256A (en) * 2012-12-21 2013-04-03 电子科技大学 Association rule mining method of large-scale data
CN105022733A (en) * 2014-04-18 2015-11-04 中科鼎富(北京)科技发展有限公司 DINFO-OEC text analysis mining method and device thereof
WO2016192108A1 (en) * 2015-06-05 2016-12-08 王浩屹 Space generating method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020256A (en) * 2012-12-21 2013-04-03 电子科技大学 Association rule mining method of large-scale data
CN105022733A (en) * 2014-04-18 2015-11-04 中科鼎富(北京)科技发展有限公司 DINFO-OEC text analysis mining method and device thereof
WO2016192108A1 (en) * 2015-06-05 2016-12-08 王浩屹 Space generating method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
陈晓云 等: "《基于分类规则树的频繁模式文本分类》", 《软件学报》 *
黄金: "《聚类和分类技术在生物信息学中的应用》", 《国优秀博硕士学位论文全文数据库 (硕士) 基础科学辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549723A (en) * 2018-04-28 2018-09-18 北京神州泰岳软件股份有限公司 A kind of text concept sorting technique, device and server
CN108549723B (en) * 2018-04-28 2022-04-05 北京神州泰岳软件股份有限公司 Text concept classification method and device and server
CN109886318A (en) * 2019-01-29 2019-06-14 北京明略软件系统有限公司 A kind of information processing method, device and computer readable storage medium
CN109886318B (en) * 2019-01-29 2021-04-30 北京明略软件系统有限公司 Information processing method and device and computer readable storage medium
CN110069634A (en) * 2019-04-24 2019-07-30 北京泰迪熊移动科技有限公司 A kind of method, apparatus and computer readable storage medium generating classification model
CN113268529A (en) * 2021-07-21 2021-08-17 广东粤港澳大湾区硬科技创新研究院 Optimization method and device based on satellite time sequence incidence relation algorithm

Also Published As

Publication number Publication date
CN107679209B (en) 2020-10-20

Similar Documents

Publication Publication Date Title
CN107609121B (en) News text classification method based on LDA and word2vec algorithm
US11714831B2 (en) Data processing and classification
CN108764984A (en) A kind of power consumer portrait construction method and system based on big data
CN106202518A (en) Based on CHI and the short text classification method of sub-category association rule algorithm
CN107679209A (en) Expression formula generation method of classifying and device
CN108416375B (en) Work order classification method and device
US11580119B2 (en) System and method for automatic persona generation using small text components
CN109284626A (en) Random forests algorithm towards difference secret protection
CN108897842A (en) Computer readable storage medium and computer system
US20200272651A1 (en) Heuristic dimension reduction in metadata modeling
CN104424360A (en) Method and system for accessing a set of data tables in a source database
CN111177322A (en) Ontology model construction method of domain knowledge graph
CN103678436A (en) Information processing system and information processing method
CN108647800A (en) A kind of online social network user missing attribute forecast method based on node insertion
CN108304382A (en) Mass analysis method based on manufacturing process text data digging and system
CN109726253A (en) Construction method, device, equipment and the medium of talent's map and talent's portrait
Hammond et al. Cloud based predictive analytics: text classification, recommender systems and decision support
Kotak et al. Enhancing the data mining tool WEKA
Vaish et al. Machine learning techniques for sentiment analysis of hotel reviews
WO2016106944A1 (en) Method for creating virtual human on mapreduce platform
Kanti Kumar et al. Application of graph mining algorithms for the analysis of web data
Bao et al. Predicting paper acceptance via interpretable decision sets
CN110245234A (en) A kind of multi-source data sample correlating method based on ontology and semantic similarity
CN107871055A (en) A kind of data analysing method and device
CN112328653B (en) Data identification method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190905

Address after: Room 630, 6th floor, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Applicant after: China Science and Technology (Beijing) Co., Ltd.

Address before: Room 601, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Applicant before: Beijing Shenzhou Taiyue Software Co., Ltd.

TA01 Transfer of patent application right
CB02 Change of applicant information

Address after: 230000 zone B, 19th floor, building A1, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Applicant after: Dingfu Intelligent Technology Co., Ltd

Address before: Room 630, 6th floor, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Applicant before: DINFO (BEIJING) SCIENCE DEVELOPMENT Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant