CN110264318A - Data processing method, device, electronic equipment and storage medium - Google Patents
Data processing method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110264318A CN110264318A CN201910563737.7A CN201910563737A CN110264318A CN 110264318 A CN110264318 A CN 110264318A CN 201910563737 A CN201910563737 A CN 201910563737A CN 110264318 A CN110264318 A CN 110264318A
- Authority
- CN
- China
- Prior art keywords
- product
- keyword
- sample
- text description
- significance level
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/12—Hotels or restaurants
Abstract
The embodiment of the present disclosure discloses a kind of data processing method, device, electronic equipment and storage medium.This method comprises: obtaining sample data;Wherein, the sample data include sample product text description and the sample product generic;Extract the keyword in the text description;Determine the significance level of the keyword;Product identification model is trained using the characteristic and the sample product generic of the sample product;Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.The product identification model that training obtains in this way can describe to learn into text description keyword under the product category from the text of product to the influence degree of product identification, the accuracy that can be improved product category identification can be identified with the different product of Similar Text description by product identification model.
Description
Technical field
This disclosure relates to field of computer technology, and in particular to a kind of data processing method, device, electronic equipment and storage
Medium.
Background technique
With the development of internet technology, more and more products have appeared in operation platform on line.In order to good
Common ground and the difference of various products are expressed, operation platform would generally generate representation data on line for product, convenient various
Scene is as carried out Classification and Identification to product under retrieval scene.But since product category is various, same product may have respectively
The different text of kind describes such as name of product, and different product may also have same or similar text description, product
Representation data is usually that the mode of manually screening keyword provides, and the abstract summarization of people is different again, so error
It is larger.Therefore the annotation process of product representation data is time-consuming and laborious, and accuracy is not high.
Summary of the invention
The embodiment of the present disclosure provides a kind of data processing method, device, electronic equipment and storage medium.
In a first aspect, providing a kind of data processing method in the embodiment of the present disclosure.
Specifically, the data processing method, comprising:
Obtain sample data;Wherein, the sample data include sample product text description and the sample product
Generic;
Extract the keyword in the text description;
Determine the significance level of the keyword;
Product identification model is carried out using the characteristic and the sample product generic of the sample product
Training;Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
With reference to first aspect, the disclosure obtains sample data in the first implementation of first aspect, comprising:
Obtain the text description of multiple sample products under pre-set categories;
Duplicate removal processing is carried out to the text description of multiple sample products.
With reference to first aspect and/or the first implementation of first aspect, the disclosure is real at second of first aspect
In existing mode, duplicate removal processing is carried out to the text description of multiple sample products, comprising:
The corresponding multiple and different texts descriptions of the same sample product are uniformly mapped as identical text description.
With reference to first aspect, the first implementation of first aspect and/or second of implementation of first aspect, this
It is disclosed in the third implementation of first aspect, extracts the keyword in the text description, comprising:
Text description is segmented;
The participle for being higher than preset threshold with the correlation of the sample product generic is determined as the keyword.
With reference to first aspect, the first implementation of first aspect, second of implementation of first aspect and/or
The third implementation of one side, the disclosure, will be with the sample product institutes in the 4th kind of implementation of first aspect
The correlation for belonging to classification is determined as the keyword higher than the participle of preset threshold, comprising:
The correlation of the participle and the sample product generic is determined using card side's independence test method.
With reference to first aspect, the first implementation, second of implementation of first aspect, first party of first aspect
The third implementation in face and/or the 4th kind of implementation of first aspect, five kind realization of the disclosure in first aspect
In mode, the significance level of the keyword is determined, comprising:
The TD-IDF value of the keyword is determined as to the significance level of the keyword.
With reference to first aspect, the first implementation, second of implementation of first aspect, first party of first aspect
The 5th kind of implementation of the third implementation in face, the 4th kind of implementation of first aspect and/or first aspect, this public affairs
It is opened in the 6th kind of implementation of first aspect, the TD-IDF value of the keyword is determined as to the important journey of the keyword
Degree, comprising:
Determine TD-IDF value of the keyword under the sample product generic;
The keyword be corresponding with it is different classes of under multiple TD-IDF values when, select the smallest TD-IDF value as institute
State the significance level of keyword.
With reference to first aspect, the first implementation, second of implementation of first aspect, first party of first aspect
The third implementation in face, the 4th kind of implementation of first aspect, first aspect the 5th kind of implementation and/or first
6th kind of implementation of aspect, the disclosure determine the important of the keyword in the 7th kind of implementation of first aspect
Degree, comprising:
It is below in the correlation of the corresponding all participles of the sample product and the sample product generic default
When threshold value, using default value as the significance level of the corresponding keyword of the sample product.
Second aspect provides a kind of product identification method in the embodiment of the present disclosure.
Specifically, the product identification method, comprising:
Obtain the text description of product to be identified;
Extract the keyword of the text description;
Determine the significance level of the keyword;
The significance level of the keyword is input in preparatory trained product identification model, to described to be identified
Product is identified;Wherein, the product identification model is obtained using the training of method described in first aspect.
In conjunction with second aspect, the disclosure is extracted in the text description in the first implementation of second aspect
Keyword, comprising:
Text description is segmented;
The participle is matched with keyword set, determines whether the participle is keyword.
The third aspect provides a kind of data processing equipment in the embodiment of the present disclosure.
Specifically, the data processing equipment, comprising:
First obtains module, is configured as obtaining sample data;Wherein, the sample data includes the text of sample product
Description and the sample product generic;
First extraction module is configured as extracting the keyword in the text description;
First determining module is configured to determine that the significance level of the keyword;
Training module is configured as characteristic and the sample product generic pair using the sample product
Product identification model is trained;Wherein, the characteristic includes the important of the corresponding keyword of the sample product
Degree.
Fourth aspect provides a kind of product identification device in the embodiment of the present disclosure.
Specifically, the product identification device, comprising:
Second obtains module, is configured as obtaining the text description of product to be identified;
Second extraction module is configured as extracting the keyword of the text description;
Second determining module is configured to determine that the significance level of the keyword;
Identification module is configured as the significance level of the keyword being input to preparatory trained product identification model
In, to be identified to the product to be identified;Wherein, the product identification model utilizes the training of device described in the third aspect
It obtains.
The function can also execute corresponding software realization by hardware realization by hardware.The hardware or
Software includes one or more modules corresponding with above-mentioned function.
In a possible design, in the structure of data processing equipment and/or product identification device include memory and
Processor, the memory support data processing equipment and/or the execution of product identification device above-mentioned for storing one or more
The computer instruction of data processing method and/or product identification method, the processor is configured to for executing the storage
The computer instruction stored in device.The data processing equipment and/or product identification device can also include communication interface, be used for
Data processing equipment and/or product identification device and other equipment or communication.
5th aspect, the embodiment of the present disclosure provide a kind of electronic equipment, including memory and processor;Wherein, described
Memory is for storing one or more computer instruction, wherein one or more computer instruction is by the processor
It executes to realize following methods step:
Obtain sample data;Wherein, the sample data include sample product text description and the sample product
Generic;
Extract the keyword in the text description;
Determine the significance level of the keyword;
Product identification model is carried out using the characteristic and the sample product generic of the sample product
Training;Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
In conjunction with the 5th aspect, the disclosure obtains sample data in the first implementation of the 5th aspect, comprising:
Obtain the text description of multiple sample products under pre-set categories;
Duplicate removal processing is carried out to the text description of multiple sample products.
In conjunction with the first implementation of the 5th aspect and/or the 5th aspect, second reality of the disclosure at the 5th aspect
In existing mode, duplicate removal processing is carried out to the text description of multiple sample products, comprising:
The corresponding multiple and different texts descriptions of the same sample product are uniformly mapped as identical text description.
In conjunction with second of implementation of the 5th aspect, the first implementation of the 5th aspect and/or the 5th aspect, originally
It is disclosed in the third implementation of the 5th aspect, extracts the keyword in the text description, comprising:
Text description is segmented;
The participle for being higher than preset threshold with the correlation of the sample product generic is determined as the keyword.
In conjunction with the 5th aspect, the first implementation of the 5th aspect, the 5th aspect second of implementation and/or the
The third implementation of five aspects, the disclosure, will be with the sample product institutes in the 4th kind of implementation of the 5th aspect
The correlation for belonging to classification is determined as the keyword higher than the participle of preset threshold, comprising:
The correlation of the participle and the sample product generic is determined using card side's independence test electronic equipment.
In conjunction with the 5th aspect, the first implementation of the 5th aspect, second of implementation of the 5th aspect, the 5th side
The 4th kind of implementation of the third implementation in face and/or the 5th aspect, five kind realization of the disclosure at the 5th aspect
In mode, the significance level of the keyword is determined, comprising:
The TD-IDF value of the keyword is determined as to the significance level of the keyword.
In conjunction with the 5th aspect, the first implementation of the 5th aspect, second of implementation of the 5th aspect, the 5th side
The 5th kind of implementation of the third implementation in face, the 4th kind of implementation of the 5th aspect and/or the 5th aspect, this public affairs
It is opened in the 6th kind of implementation of the 5th aspect, the TD-IDF value of the keyword is determined as to the important journey of the keyword
Degree, comprising:
Determine TD-IDF value of the keyword under the sample product generic;
The keyword be corresponding with it is different classes of under multiple TD-IDF values when, select the smallest TD-IDF value as institute
State the significance level of keyword.
In conjunction with the 5th aspect, the first implementation of the 5th aspect, second of implementation of the 5th aspect, the 5th side
The 5th kind of implementation and/or the 5th in terms of the third implementation in face, the 4th kind of implementation of the 5th aspect, the 5th
6th kind of implementation of aspect, the disclosure determine the important of the keyword in the 7th kind of implementation of the 5th aspect
Degree, comprising:
It is below in the correlation of the corresponding all participles of the sample product and the sample product generic default
When threshold value, using default value as the significance level of the corresponding keyword of the sample product.
6th aspect, the embodiment of the present disclosure provide a kind of electronic equipment, including memory and processor;Wherein, described
Memory is for storing one or more computer instruction, wherein one or more computer instruction is by the processor
It executes to realize following methods step:
Obtain the text description of product to be identified;
Extract the keyword of the text description;
Determine the significance level of the keyword;
The significance level of the keyword is input in preparatory trained product identification model, to described to be identified
Product is identified;Wherein, the product identification model is obtained using electronic equipment training described in the 5th aspect.
In conjunction with the 6th aspect, the disclosure obtains sample data in the first implementation of the 6th aspect, comprising:
Extract the keyword in the text description, comprising:
Text description is segmented;
The participle is matched with keyword set, determines whether the participle is keyword.
7th aspect, the embodiment of the present disclosure provide a kind of computer readable storage medium, for storing data processing dress
Set and/or product identification device used in computer instruction, it includes for executing computer involved in any of the above-described method
Instruction.
The technical solution that the embodiment of the present disclosure provides can include the following benefits:
In the data processing method of the embodiment of the present disclosure, text description and the product generic of sample product are obtained,
And the keyword of text description is extracted, determine the significance level under the extracted keyword sample product generic, in turn
According to the characteristic for the significance level for including keyword and product generic training product identification model.Pass through this side
The product identification model that formula training obtains can describe to learn into text description from the text of product keyword in the product class
The not lower influence degree to product identification can be improved the accuracy of product category identification, even if with Similar Text description
Different product can also be identified by product identification model.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Detailed description of the invention
In conjunction with attached drawing, by the detailed description of following non-limiting embodiment, the other feature of the disclosure, purpose and excellent
Point will be apparent.In the accompanying drawings:
Fig. 1 shows the flow chart of the data processing method according to one embodiment of the disclosure;
Fig. 2 shows the flow charts of the step S101 of embodiment according to Fig. 1;
Fig. 3 shows the flow chart of the step S102 of embodiment according to Fig. 1;
Fig. 4 shows the flow chart that keyword significance level part is determined in embodiment according to Fig. 1;
Fig. 5 shows the flow chart of the product identification method according to one embodiment of the disclosure;
Fig. 6 shows the flow chart of the step S502 of embodiment according to Fig.5,;
Fig. 7 shows the structural block diagram of the data processing equipment according to one embodiment of the disclosure;
Fig. 8 shows the structural block diagram of the first acquisition module 701 of embodiment according to Fig.7,;
Fig. 9 shows the structural block diagram of the first extraction module 702 of embodiment according to Fig.7,;
Figure 10 is shown according to the structural block diagram for determining keyword significance level part in one embodiment of the disclosure;
Figure 11 shows the structural block diagram of the product identification device according to one embodiment of the disclosure;
Figure 12 shows the structural block diagram of the second extraction module 1102 according to Figure 11 illustrated embodiment;
Figure 13 is adapted for the structure for realizing the electronic equipment of the data processing method according to one embodiment of the disclosure
Schematic diagram.
Specific embodiment
Hereinafter, the illustrative embodiments of the disclosure will be described in detail with reference to the attached drawings, so that those skilled in the art can
Easily realize them.In addition, for the sake of clarity, the portion unrelated with description illustrative embodiments is omitted in the accompanying drawings
Point.
In the disclosure, it should be appreciated that the term of " comprising " or " having " etc. is intended to refer to disclosed in this specification
Feature, number, step, behavior, the presence of component, part or combinations thereof, and be not intended to exclude other one or more features,
A possibility that number, step, behavior, component, part or combinations thereof exist or are added.
It also should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure
It can be combined with each other.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the flow chart of the data processing method according to one embodiment of the disclosure.As shown in Figure 1, the data
Processing method the following steps are included:
In step s101, sample data is obtained;Wherein, the sample data include sample product text description and
The classification of the sample product;
In step s 102, the keyword in the text description is extracted;
In step s 103, significance level of the keyword under the classification is determined;
In step S104, product identification model is carried out using the characteristic and the classification of the sample product
Training;Wherein, the characteristic includes significance level of the corresponding keyword of the sample product under the classification.
In the present embodiment, sample product can be line upper mounting plate currently related product, such as takes out and order on platform
Vegetable, the clothes on electric business platform, daily necessity, household items etc..The text of sample product describes
The verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.Such as it takes out platform of ordering and serves
The text description of product may include vegetable name, the food materials of vegetable, way etc..
In some embodiments, the text description of sample product can also include that the text of the affiliated operator of sample product is retouched
It states.Under normal conditions, an operator club manage product category all can more close or even some operators only manage one
The other product of type, therefore in training product identification model, the data of operator are also regard as input data, so that product is known
Other model learns from operator's data to the feature that can influence product category, further improves the knowledge of product identification model
Other accuracy rate.
The data of the affiliated operator of sample product can include but is not limited to the title of operator, Main Management range, warp
A wide range of (style of cooking in such as catering industry) belonging to the managed product of battalion person.
The classification of sample product can be determined according to the data with existing of line upper mounting plate, can also be manually labeled to it.
For example, sample data can be collected from the existing product of line upper mounting plate, and line upper mounting plate would generally have oneself to product
Classification, therefore can be described by the relevant text of product collected under each classification, it obtains this and trains required sample number
According to.
For each sample data obtained, can be extracted from the description of the text of sample product one or more crucial
Word, and then determine the significance level of the one or more keyword, the significance level is for showing that the keyword knows product
Other role size, if the keyword can play an important role in product identification, the significance level of the keyword
It is higher, and if the keyword cannot play critically important effect in product identification, the significance level of the keyword is lower.
The significance level of keyword can first pass through the text of the statistics keyword all sample products under same category in advance
The number occurred in this description goes out to determine, such as in the text description of some keyword all sample products under same category
Occurrence number is more, it may be considered that the significance level of the keyword is higher, and if the frequency of occurrence of the keyword is less, it can
Significance level to think the keyword is lower.
Product identification model can be using xgboost model, GBDT model, neural network model etc..One sample product
It can correspond to multiple keywords, and the corresponding significance level of each keyword, it, can be with when training product identification model
Significance level is converted into vector form, and the corresponding multiple vectors of these multiple keywords are combined to the input to form model
Data.In each iteration cycle process, input of the characteristic as product identification model in a sample data is being obtained
After the output result of product identification model, classification belonging to sample product in the output result and the sample data can be carried out
Compare, and then update the model parameter of the product identification model, so that the output result and sample product of product identification model
Generic is more nearly.By the training of great amount of samples data, the model parameter of product identification model is thus continually updated, and is being instructed
After white silk, product identification model can provide one for input data and more accurately export result.
In the data processing method of the embodiment of the present disclosure, text description and the product generic of sample product are obtained,
And the keyword of text description is extracted, determine the significance level of extracted keyword, and then according to important including keyword
Characteristic and product generic the training product identification model of degree.The product identification that training obtains in this way
Model can from the text of product describe study to text description in keyword under the product category to the shadow of product identification
The degree of sound, can be improved the accuracy of product identification, can be known by product with the different product of Similar Text description
Other model identification.
In an optional implementation of the present embodiment, as shown in Fig. 2, the step S101, i.e. acquisition sample data
The step of, it further includes steps of
In step s 201, the text description of multiple sample products under pre-set categories is obtained;
In step S202, duplicate removal processing is carried out to the text description of multiple sample products.
In the optional implementation, when collecting sample data, there can be classification data for line upper mounting plate
Multiple pre-set categories, and the text description of multiple sample products is obtained under pre-set categories respectively.The text describes
But it is not limited to the verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.In order to avoid collecting
To duplicate sample product, text can be described to carry out duplicate removal.
In an optional implementation of the present embodiment, the step S202, the i.e. text to multiple sample products
This description carries out the step of duplicate removal processing, further includes steps of
The corresponding multiple and different texts descriptions of the same sample product are uniformly mapped as identical text description.
In the optional implementation, may exist has multiple sample products under same pre-set categories, and these samples
Although the description of product text is different, belong to identical product.Such as take out in platform of ordering, in the menu that some trade companies upload
Have " Tomato omelette ", and have " tomato scrambled eggs " in the menu that some trade companies upload, the two substantially belong to same
Product, and different titles is used, therefore the two can be uniformly mapped as to the same name of product.Of course, it is possible to manage
It solves, the other content in text description can also carry out unified mapping.
In an optional implementation of the present embodiment, as shown in figure 3, the step S102, that is, extract the text
It the step of keyword in description, further includes steps of
In step S301, text description is segmented;
In step s 302, the participle for being higher than preset threshold with the correlation of the sample product generic is determined as
The keyword.
It,, can basis after text description participle when extracting the keyword in text description in the optional implementation
These segment correlations with sample product generic to determine keyword, such as take out in platform of ordering " tomato stir-fry chicken
One of participle " stir-fry " of egg ", it is not important for the classification identification of the vegetable, namely the knowledge of " stir-fry " this word and the vegetable
Other correlation is not high, can be by it from rejecting, and not as keyword.Preset threshold may be set according to actual conditions, herein
With no restrictions.Keyword is extracted from the word segmentation result that text describes, the low participle of correlation can be rejected, can be avoided subsequent
The characteristic dimension of training product identification model is excessive, leads to the problem that training effectiveness is low.
In an optional implementation of the present embodiment, the step S302 that is, will be with the affiliated class of the sample product
The step of participle that other correlation is higher than preset threshold is determined as the keyword, further includes steps of
The correlation of the participle and the sample product generic is determined using card side's independence test method.
In the optional implementation, independence verification in card side's can determine relevance between two class variables and interdependent
Property.Therefore, had collected in the embodiment of the present disclosure sample product under different pre-set categories text description, and to text describe into
It has gone after participle, each pre-set categories can be directed to, determined under the pre-set categories using card side's independence verification mode, from institute
The correlation of word segmentation result and the pre-set categories obtained in the text description for the sample product being collected into, and by correlation height
It is determined as keyword in the participle of preset threshold.
The embodiment of the present disclosure is described for the text of wherein sample product, is utilized after being collected into great amount of samples data
Independence verification in card side's therefrom extracts the keyword under different pre-set categories and forms keyword set.The verification of card side's independence
For prior art, details are not described herein.
In an optional implementation of the present embodiment, the step S103 determines the important journey of the keyword
The step of spending further includes steps of
The TD-IDF value of the keyword is determined as to the significance level of the keyword.
In the optional implementation, TF-IDF (term frequency-inverse document frequency)
It is a kind of common weighting technique for information retrieval and data mining, TF is meant word frequency (Term Frequency), IDF meaning
Think of is inverse document frequency (Inverse Document Frequency).The TF-IDF value of keyword can have to be contained as follows
Justice: the frequency TF high occurred in the text description of current key word all sample products under current preset classification, and at it
Seldom occur in the text description of all sample products under his pre-set categories, it may be considered that the keyword has good classification
Separating capacity is adapted to classify, it can be considered that the keyword is more important for current preset classification,
TD-IDF value can be used for measuring the importance of the keyword.
The TD-IDF value of keyword can first pass through statistics line upper mounting plate in advance and have all sample products under pre-set categories
Text description, and then keyword is extracted from the description of these texts, and determine TD- of the keyword in the description of these texts
IDF value.As described above, being formed by keyword set for sample data, each keyword can also correspond to corresponding TD-
When such online recognition, it is corresponding directly can to obtain product to be identified using keyword set and corresponding TD-IDF value for IDF value
Keyword and TD-IDF value.
In the present embodiment, the TF value of a keyword can by the keyword under pre-set categories all sample products
The quantity quantity of all sample products (namely under the pre-set categories) that number appeared in text description is described divided by text
It obtains;And the text that the IDF of the keyword can occur by the keyword describes to preset class belonging to corresponding sample product
The sum of other number and pre-set categories determines that calculation formula is IDF=log (n/m), wherein n is that pre-set categories are total
Number, the number for the pre-set categories that m occurs for the keyword.For example, keyword A is in pre-set categories 1, pre-set categories 2 and presets
Occurred in the text description of sample product under classification 3, and pre-set categories a total of 5, then the keyword has appeared in three
Under a pre-set categories, therefore the IDF=log (5/3) of the keyword.
The TD-IDF value of keyword is the TD value of the keyword and the product of IDF value.
For example, one is taken out the TD-IDF of each keyword under each pre-set categories in the sample data being collected into platform of ordering
Value is as follows:
[" scrambled eggs ", " braised in soy sauce ", " pork braised in brown sauce ", " daily life of a family ", " cold and dressed with sauce ", " braised aubergines ", " Kung Pao Chicken ", " fourth ", " sugar
Vinegar ", " fish-flavoured shredded pork ", " agaric ", " potato ", " shredded pork and eggs with dired mushroom ", " bean curd ", " long bean ", " tomato ", " tenterloin ", " spelling ", " dish ", "
It is small "] home cooking [0.34,033,0.28,0.26,0.22,0.19,0.17,0.13,0.12,0.1,0.07,0.08,0.05,
0.07,0.06,0.07,0.05,0.05,0.05,0.06]
[" beer ", " rice wine ", " Beijing ", " Yanjing Brewery ", " snowflake ", " wheat ", " Harbin ", " listening ", " ends of the earth ", "
Magma ", " Qingdao ", " eggnog ", " bravely rushing ", " Belgium ", " salubrious ", " king ", " white ", " sweet osmanthus ", " small ", " sweet wine "] wine
[1.68,0.42,0.37,0.37,0.22,0.19,0.18,0.17,0.14,0.14,0.14,0.14,0.14,0.13,0.13,
0.12,0.1,0.1,0.11,0.07]
[" barbecue ", " crackling ", " roasting ", " stir-fry ", " crusty pancake ", " sausage ", " shishkabab ", " Orleans ", " Brazil ", " muscle ", "
Cumin ", " red building ", " honeydew ", " salad ", " New Orleans ", " chicken ", " muscle string ", " the meat clip Mo ", " principal filter ", " Turkey "] roasting
Meat [1.57,0.21,0.19,0.18,0.15,0.14,0.11,0.12,0.1,0.1,0.1,0.08,0.08,0.09,0.07,
0.08,0.06,0.08,0.06,0.05]
[" Huang is stewing ", " chicken ", " rice ", " special peppery ", " skin of beancurd ", " needle mushroom ", " Wu's note ", " abalone sauce ", " ", " chicken
Meal ", " ten ", " agaric ", " chicken chicken ", " small point of chicken ", " potato block ", " not generation ", " earth pot ", " dry pot ", " big peppery " " gives beans
Skin "] Huang braised chicken rice [1.81,1.26,0.4,0.14,0.13,0.12,0.1,0.09,0.1,0.08,0.07,0.07,0.05,
0.05,0.05,0.05,0.05,0.05,0.05,0.05]
[" tappasaki ", " skewer ", " palpus ", " peppery degree ", " big chicken cutlet ", " selection ", " squid ", " old foster-mother ", " fried shredded pancake ", "
Iron plate ", " fish ", " eggplant ", " juice ", " bean curd ", " sesame seed cake ", " egg "] tappasaki [2.94,0.43,043,0.41,0.4,
037,0.33,032,0.32,0.29,0.28,0.27,0.25,0.25,0.23,0.21]
[" ball ", " beef dumplings ", " a small ball ", " burger ", " Chinese cabbage ", " stewed ", " soup ", " ball ", " piss ", " winter
Melon ", " casserole ", " burning ", " octopus ", " Deep-fried meatballs ", " Chaozhou ", " hand is beaten ", " four happinesses ", " element ", " vermicelli ", " meat ball "] ball
Son [1.14,0.35,0.19,0.19,0.18,0.16,0.14,0.12,0.11,0.1,0.11,0.1,0.09,0.07,0.07,
0.07,0.06,0.07,0.06,0.05]
[" sweets ", " diplomat ", " double poems ", " Macaron ", " gift box ", " afternoon tea ", " auspicious ± ", " raspberry ", " cloth
It is bright ", " Buddhist nun ", " butter ", " volume "] sweets [2.82,0.66,0.66,0.62,0.6,0.6,0.58,0.55,0.54,0.53,
0.5,0.42]
Wherein, in every section of content, front portion is the multiple keywords extracted under vegetable classification, such as " scrambled eggs ",
" pork braised in brown sauce " etc., centre are the titles of the vegetable classification, such as " home cooking ";Latter half is the corresponding TD-of these keywords
IDF value, such as " 0.34,0.33 ".
For example, being directed to " Kung Pao chicken rice served with meat and vegetables on top " this vegetable, can be extracted from related text description as shown in the table
Keyword and characteristic:
Vegetable name: Kung Pao chicken rice served with meat and vegetables on top, restaurant: Jin Baiwan, the style of cooking: Beijing cuisine, main management: snack
Gong Bao | Diced chicken | ....... | Snack | ....... |
0.3 | 0.5 | 1 |
Wherein, the first behavior keyword in table, the corresponding characteristic of the second behavior " Kung Pao chicken rice served with meat and vegetables on top " this vegetable,
Namely the corresponding TD-IDF value of each keyword.
In an optional implementation of the present embodiment, as shown in figure 4, the TD-IDF value of the keyword is determined as
It the step of significance level of the keyword, further includes steps of
In step S401, TD-IDF value of the keyword under the sample product generic is determined;
In step S402, the keyword be corresponding with it is different classes of under multiple TD-IDF values when, select it is the smallest
Significance level of the TD-IDF value as the keyword.
In the optional implementation chapter, if the same keyword has appeared under multiple pre-set categories, then being directed to
A TD-IDF value of the keyword, for the sake of unification, the weight of the keyword can be calculated in multiple pre-set categories
Want degree that can choose the smallest TD-IDF value.
In an optional implementation of the present embodiment, the step S103 determines the important journey of the keyword
The step of spending further includes steps of
It is below in the correlation of the corresponding all participles of the sample product and the sample product generic default
When threshold value, using default value as the significance level of the corresponding keyword of the sample product.
It, can will be with sample product generic when extracting the keyword in text description in the optional implementation
The low participle of correlation eliminate, if in the text description of a sample product, all participles and the sample product institute
The correlation for belonging to classification is below preset threshold, then can be by the significance level of keyword in the characteristic of the sample product
It is arranged to a default value.The size of default value can according to the actual situation depending on, herein with no restrictions.
Fig. 5 shows the flow chart of the product identification method according to one embodiment of the disclosure.As shown in figure 5, the product
Recognition methods the following steps are included:
In step S501, the text description of product to be identified is obtained;
In step S502, the keyword of the text description is extracted;
In step S503, the significance level of the keyword is determined;
In step S504, the significance level of the keyword is input in preparatory trained product identification model,
To be identified to the product to be identified;Wherein, the product identification model is obtained using the training of above-mentioned data processing method.
In the present embodiment, product to be identified can be line upper mounting plate currently related product, such as take out platform of ordering
On vegetable, the clothes on electric business platform, daily necessity, household items etc..The text description of product to be identified includes but not
It is limited to the verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.Such as take out platform of ordering
The text description of upper vegetable may include vegetable name, the food materials of vegetable, way etc..
In some embodiments, the text description of product to be identified can also include the text of the affiliated operator of product to be identified
This description.Under normal conditions, an operator club manage product category all can more close or even some operators only pass through
A kind of product of classification is sought, therefore in training product identification model, also regard the data of operator as input data, so that producing
Product identification model learns from operator's data to the feature that can influence product category, further improves product identification model
Recognition accuracy.
The data of the affiliated operator of product to be identified can include but is not limited to the title of operator, Main Management range,
A wide range of (style of cooking in such as catering industry) belonging to the managed product of operator.
For product to be identified, one or more keywords can be extracted from the description of the text of product to be identified, in turn
Determine the significance level of the one or more keyword, the significance level is for showing what the keyword played product identification
Size is acted on, if the keyword can play an important role in product identification, the significance level of the keyword is higher, and such as
Fruit keyword in product identification cannot play critically important effect, then the significance level of the keyword is lower.
Product identification model is to be obtained by the training of above-mentioned data processing method, therefore product identification model is specific thin
Section can be found in the above-mentioned associated description to data processing method, and details are not described herein.
In an optional implementation of the present embodiment, as shown in fig. 6, the step S502, that is, extract the text
It the step of keyword in description, further includes steps of
In step s 601, text description is segmented;
In step S602, the participle is matched with keyword set, determines whether the participle is keyword.
In the optional implementation, as described in above-mentioned data processing method, in the training process, it can be directed to and be collected into
All sample products extract the keyword in corresponding text description, and form the corresponding keyword set of sample product, and
And the significance level of these keywords is further defined in the next steps.It therefore, can be with after the completion of product identification model training
Retain the keyword set, and after the participle in the text description for obtaining product to be identified, by these participles and closes
Keyword set is matched, and the participle can be determined as the corresponding keyword of the product to be identified if successful match, and
The significance level of these keywords also can directly determine out.
The determination of keyword and the determination of significance level may refer to the above-mentioned description to data processing method, herein not
It repeats again.
It should be noted that being not present and the matched keyword of above-mentioned keyword set in the text description of product to be identified
When, a default value can be set by the significance level of the corresponding keyword of identification product.
Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.
Fig. 7 shows the structural block diagram of the data processing equipment according to one embodiment of the disclosure, which can be by soft
Part, hardware or both are implemented in combination with as some or all of of electronic equipment.As shown in fig. 7, the data processing dress
It sets and includes:
First obtains module 701, is configured as obtaining sample data;Wherein, the sample data includes sample product
Text description and the sample product generic;
First extraction module 702 is configured as extracting the keyword in the text description;
First determining module 703, is configured to determine that the significance level of the keyword;
Training module 704 is configured as characteristic and the affiliated class of the sample product using the sample product
It is other that product identification model is trained;Wherein, the characteristic includes the corresponding keyword of the sample product
Significance level.
In the present embodiment, sample product can be line upper mounting plate currently related product, such as takes out and order on platform
Vegetable, the clothes on electric business platform, daily necessity, household items etc..The text of sample product describes
The verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.Such as it takes out platform of ordering and serves
The text description of product may include vegetable name, the food materials of vegetable, way etc..
In some embodiments, the text description of sample product can also include that the text of the affiliated operator of sample product is retouched
It states.Under normal conditions, an operator club manage product category all can more close or even some operators only manage one
The other product of type, therefore in training product identification model, the data of operator are also regard as input data, so that product is known
Other model learns from operator's data to the feature that can influence product category, further improves the knowledge of product identification model
Other accuracy rate.
The data of the affiliated operator of sample product can include but is not limited to the title of operator, Main Management range, warp
A wide range of (style of cooking in such as catering industry) belonging to the managed product of battalion person.
The classification of sample product can be determined according to the data with existing of line upper mounting plate, can also be manually labeled to it.
For example, sample data can be collected from the existing product of line upper mounting plate, and line upper mounting plate would generally have oneself to product
Classification, therefore can be described by the relevant text of product collected under each classification, it obtains this and trains required sample number
According to.
For each sample data obtained, can be extracted from the description of the text of sample product one or more crucial
Word, and then determine the significance level of the one or more keyword, the significance level is for showing that the keyword knows product
Other role size, if the keyword can play an important role in product identification, the significance level of the keyword
It is higher, and if the keyword cannot play critically important effect in product identification, the significance level of the keyword is lower.
The significance level of keyword can first pass through the text of the statistics keyword all sample products under same category in advance
The number occurred in this description goes out to determine, such as in the text description of some keyword all sample products under same category
Occurrence number is more, it may be considered that the significance level of the keyword is higher, and if the frequency of occurrence of the keyword is less, it can
Significance level to think the keyword is lower.
Product identification model can be using xgboost model, GBDT model, neural network model etc..One sample product
It can correspond to multiple keywords, and the corresponding significance level of each keyword, it, can be with when training product identification model
Significance level is converted into vector form, and the corresponding multiple vectors of these multiple keywords are combined to the input to form model
Data.In each iteration cycle process, input of the characteristic as product identification model in a sample data is being obtained
After the output result of product identification model, classification belonging to sample product in the output result and the sample data can be carried out
Compare, and then update the model parameter of the product identification model, so that the output result and sample product of product identification model
Generic is more nearly.By the training of great amount of samples data, the model parameter of product identification model is thus continually updated, and is being instructed
After white silk, product identification model can provide one for input data and more accurately export result.
In the data processing equipment of the embodiment of the present disclosure, text description and the product generic of sample product are obtained,
And the keyword of text description is extracted, determine the significance level of extracted keyword, and then according to important including keyword
Characteristic and product generic the training product identification model of degree.The product identification that training obtains in this way
Model can from the text of product describe study to text description in keyword under the product category to the shadow of product identification
The degree of sound, can be improved the accuracy of product identification, can be known by product with the different product of Similar Text description
Other model identification.
In an optional implementation of the present embodiment, as shown in figure 8, described first obtains module 701, comprising:
First acquisition submodule 801 is configured as obtaining the text description of multiple sample products under pre-set categories;
Duplicate removal submodule 802 is configured as carrying out duplicate removal processing to the text description of multiple sample products.
In the optional implementation, when collecting sample data, there can be classification data for line upper mounting plate
Multiple pre-set categories, and the text description of multiple sample products is obtained under pre-set categories respectively.The text describes
But it is not limited to the verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.In order to avoid collecting
To duplicate sample product, text can be described to carry out duplicate removal.
In an optional implementation of the present embodiment, the duplicate removal submodule 802, comprising:
Mapping submodule is configured as the corresponding multiple and different texts descriptions of the same sample product being uniformly mapped as
Identical text description.
In the optional implementation, may exist has multiple sample products under same pre-set categories, and these samples
Although the description of product text is different, belong to identical product.Such as take out in platform of ordering, in the menu that some trade companies upload
Have " Tomato omelette ", and have " tomato scrambled eggs " in the menu that some trade companies upload, the two substantially belong to same
Product, and different titles is used, therefore the two can be uniformly mapped as to the same name of product.Of course, it is possible to manage
It solves, the other content in text description can also carry out unified mapping.
In an optional implementation of the present embodiment, as shown in figure 9, first extraction module 702, comprising:
First participle submodule 901 is configured as describing to segment to the text;
First determines submodule 902, is configured as that default threshold will be higher than with the correlation of the sample product generic
The participle of value is determined as the keyword.
It,, can basis after text description participle when extracting the keyword in text description in the optional implementation
These segment correlations with sample product generic to determine keyword, such as take out in platform of ordering " tomato stir-fry chicken
One of participle " stir-fry " of egg ", it is not important for the classification identification of the vegetable, namely the knowledge of " stir-fry " this word and the vegetable
Other correlation is not high, can be by it from rejecting, and not as keyword.Preset threshold may be set according to actual conditions, herein
With no restrictions.Keyword is extracted from the word segmentation result that text describes, the low participle of correlation can be rejected, can be avoided subsequent
The characteristic dimension of training product identification model is excessive, leads to the problem that training effectiveness is low.
In an optional implementation of the present embodiment, described first determines submodule 902, comprising:
Second determines submodule, is configured as determining the participle and the sample product using card side's independence test device
The correlation of generic.
In the optional implementation, independence verification in card side's can determine relevance between two class variables and interdependent
Property.Therefore, had collected in the embodiment of the present disclosure sample product under different pre-set categories text description, and to text describe into
It has gone after participle, each pre-set categories can be directed to, determined under the pre-set categories using card side's independence verification mode, from institute
The correlation of word segmentation result and the pre-set categories obtained in the text description for the sample product being collected into, and by correlation height
It is determined as keyword in the participle of preset threshold.
The embodiment of the present disclosure is described for the text of wherein sample product, is utilized after being collected into great amount of samples data
Independence verification in card side's therefrom extracts the keyword under different pre-set categories and forms keyword set.The verification of card side's independence
For prior art, details are not described herein.
In an optional implementation of the present embodiment, first determining module 703, comprising:
Third determines submodule, is configured as the TD-IDF value of the keyword being determined as the important journey of the keyword
Degree.
In the optional implementation, TF-IDF (term frequency-inverse document frequency)
It is a kind of common weighting technique for information retrieval and data mining, TF is meant word frequency (Term Frequency), IDF meaning
Think of is inverse document frequency (Inverse Document Frequency).The TF-IDF value of keyword can have to be contained as follows
Justice: the frequency TF high occurred in the text description of current key word all sample products under current preset classification, and at it
Seldom occur in the text description of all sample products under his pre-set categories, it may be considered that the keyword has good classification
Separating capacity is adapted to classify, it can be considered that the keyword is more important for current preset classification,
TD-IDF value can be used for measuring the importance of the keyword.
The TD-IDF value of keyword can first pass through statistics line upper mounting plate in advance and have all sample products under pre-set categories
Text description, and then keyword is extracted from the description of these texts, and determine TD- of the keyword in the description of these texts
IDF value.As described above, being formed by keyword set for sample data, each keyword can also correspond to corresponding TD-
When such online recognition, it is corresponding directly can to obtain product to be identified using keyword set and corresponding TD-IDF value for IDF value
Keyword and TD-IDF value.
In the present embodiment, the TF value of a keyword can by the keyword under pre-set categories all sample products
The quantity quantity of all sample products (namely under the pre-set categories) that number appeared in text description is described divided by text
It obtains;And the text that the IDF of the keyword can occur by the keyword describes to preset class belonging to corresponding sample product
The sum of other number and pre-set categories determines that calculation formula is IDF=log (n/m), wherein n is that pre-set categories are total
Number, the number for the pre-set categories that m occurs for the keyword.For example, keyword A is in pre-set categories 1, pre-set categories 2 and presets
Occurred in the text description of sample product under classification 3, and pre-set categories a total of 5, then the keyword has appeared in three
Under a pre-set categories, therefore the IDF=log (5/3) of the keyword.
The TD-IDF value of keyword is the TD value of the keyword and the product of IDF value.
For example, one is taken out the TD-IDF of each keyword under each pre-set categories in the sample data being collected into platform of ordering
Value is as follows:
[" scrambled eggs ", " braised in soy sauce ", " pork braised in brown sauce ", " daily life of a family ", " cold and dressed with sauce ", " braised aubergines ", " Kung Pao Chicken ", " fourth ", " sugar
Vinegar ", " fish-flavoured shredded pork ", " agaric ", " potato ", " shredded pork and eggs with dired mushroom ", " bean curd ", " long bean ", " tomato ", " tenterloin ", " spelling ", " dish ", "
It is small "] home cooking [0.34,0.33,0.28,0.26,0.22,0.19,0.17,0.13,0.12,0.1,0.07,0.08,0.05,
0.07,0.06,0.07,0.05,0.05,0.05,0.06]
[" beer ", " rice wine ", " Beijing ", " Yanjing Brewery ", " snowflake " " wheat ", " Harbin ", " listening ", " ends of the earth ", "
Magma ", " Qingdao ", " eggnog ", " bravely rushing ", " Belgium ", " salubrious ", " king ", " white ", " sweet osmanthus ", " small ", " sweet wine "] wine
[1.68,0.42,0.37,0.37,0.22,0.19,0.18,0.17,0.14,0.14,0.14,0.14,0.14,0.13,0.13,
0.12,0.1,0.1,0.11,0.07]
[" barbecue ", " crackling ", " roasting ", " stir-fry ", " crusty pancake ", " sausage ", " shishkabab ", " Orleans ", " Brazil ", " muscle ", "
Cumin ", " red building ", " honeydew ", " salad ", " New Orleans ", " chicken ", " muscle string ", " the meat clip Mo ", " principal filter ", " Turkey "] roasting
Meat [1.57,0.21,0.19,0.18,0.15,0.14,0.11,0.12,0.1,0.1,0.1,0.08,0.08,0.09,0.07,
0.08,0.06,0.08,0.06,0.05]
[" Huang is stewing ", " chicken ", " rice ", " special peppery ", " skin of beancurd ", " needle mushroom ", " Wu's note ", " abalone sauce ", " ", " chicken
Meal ", " ten ", " agaric ", " chicken chicken ", " small point of chicken ", " potato block ", " not generation ", " earth pot ", " thousand pots ", " big peppery " " gives beans
Skin "] Huang braised chicken rice [1.81,1.26,0.4,0.14,0.13,0.12,0.1,0.09,0.1,0.08,0.07,0.07,0.05,
0.05,0.05,0.05,0.05,0.05,0.05,0.05]
[" tappasaki ", " skewer ", " palpus ", " peppery degree ", " big chicken cutlet ", " selection ", " squid ", " old foster-mother ", " fried shredded pancake ", "
Iron plate ", " fish ", " eggplant ", " juice ", " bean curd ", " sesame seed cake ", " egg "] tappasaki [2.94,0.43,0.43,0.41,0.4,
0.37,0.33,0.32,0.32,0.29,0.28,0.27,0.25,0.25,0.23,0.21]
[" ball ", " beef dumplings ", " a small ball ", " burger ", " Chinese cabbage ", " stewed ", " soup ", " ball ", " piss ", " winter
Melon ", " casserole ", " burning ", " octopus " " Deep-fried meatballs ", " Chaozhou ", " hand is beaten ", " four happinesses ", " element ", " vermicelli ", " meat ball "] ball
Son [1.14,0.35,0.19,0.19,0.18,0.16,0.14,0.12,0.11,0.1,0.110.1,0.09,0.07,0.07,
0.07,0.06,0.07,0.06,0.05]
[" sweets ", " diplomat ", " double poems " " Macaron ", " gift box ", " afternoon tea ", " auspicious ± ", " raspberry ", " cloth
It is bright ", " Buddhist nun ", " butter ", " volume "] sweets [2.82,0.66,0.66,0.62,0.6,0.6,0.58,0.55,0.54,0.53,
0.5,0.42]
Wherein, in every section of content, front portion is the multiple keywords extracted under vegetable classification, such as " scrambled eggs ",
" pork braised in brown sauce " etc., centre are the titles of the vegetable classification, such as " home cooking ";Latter half is the corresponding TD-IDF of these keywords
Value, such as " 0.34,0.33 ".
For example, being directed to " Kung Pao chicken rice served with meat and vegetables on top " this vegetable, can be extracted from related text description as shown in the table
Keyword and characteristic:
Vegetable name: Kung Pao chicken rice served with meat and vegetables on top, restaurant: Jin Baiwan, the style of cooking: Beijing cuisine, main management: snack
Gong Bao | Diced chicken | ....... | Snack | ....... |
0.3 | 0.5 | 1 |
Wherein, the first behavior keyword in table, the corresponding characteristic of the second behavior " Kung Pao chicken rice served with meat and vegetables on top " this vegetable,
Namely the corresponding TD-IDF value of each keyword.
In an optional implementation of the present embodiment, as shown in Figure 10, the third determines submodule, comprising:
4th determines submodule 1001, is configured to determine that the keyword under the sample product generic
TD-IDF value;
Select submodule 1002, be configured as the keyword be corresponding with it is different classes of under multiple TD-IDF values when,
Select significance level of the smallest TD-IDF value as the keyword.
In the optional implementation chapter, if the same keyword has appeared under multiple pre-set categories, then being directed to
A TD-IDF value of the keyword, for the sake of unification, the weight of the keyword can be calculated in multiple pre-set categories
Want degree that can choose the smallest TD-IDF value.
In an optional implementation of the present embodiment, first determining module 703, comprising:
5th determine submodule, be configured as the sample product it is corresponding it is all participle with the sample product belonging to
When the correlation of classification is below preset threshold, using default value as the significance level of the corresponding keyword of the sample product.
It, can will be with sample product generic when extracting the keyword in text description in the optional implementation
The low participle of correlation eliminate, if in the text description of a sample product, all participles and the sample product institute
The correlation for belonging to classification is below preset threshold, then can be by the significance level of keyword in the characteristic of the sample product
It is arranged to a default value.The size of default value can according to the actual situation depending on, herein with no restrictions.
Figure 11 shows the structural block diagram of the product identification device according to one embodiment of the disclosure, which can be by soft
Part, hardware or both are implemented in combination with as some or all of of electronic equipment.As shown in figure 11, the product identification dress
It sets and includes:
Second obtains module 1101, is configured as obtaining the text description of product to be identified;
Second extraction module 1102 is configured as extracting the keyword of the text description;
Second determining module 1103, is configured to determine that the significance level of the keyword;
Identification module 1104 is configured as the significance level of the keyword being input to preparatory trained product identification
In model, to be identified to the product to be identified;Wherein, the product identification model utilizes data processing as described above
Device training obtains.
In the present embodiment, product to be identified can be line upper mounting plate currently related product, such as take out platform of ordering
On vegetable, the clothes on electric business platform, daily necessity, household items etc..The text description of product to be identified includes but not
It is limited to the verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.Such as take out platform of ordering
The text description of upper vegetable may include vegetable name, the food materials of vegetable, way etc..
In some embodiments, the text description of product to be identified can also include the text of the affiliated operator of product to be identified
This description.Under normal conditions, an operator club manage product category all can more close or even some operators only pass through
A kind of product of classification is sought, therefore in training product identification model, also regard the data of operator as input data, so that producing
Product identification model learns from operator's data to the feature that can influence product category, further improves product identification model
Recognition accuracy.
The data of the affiliated operator of product to be identified can include but is not limited to the title of operator, Main Management range,
A wide range of (style of cooking in such as catering industry) belonging to the managed product of operator.
For product to be identified, one or more keywords can be extracted from the description of the text of product to be identified, in turn
Determine the significance level of the one or more keyword, the significance level is for showing what the keyword played product identification
Size is acted on, if the keyword can play an important role in product identification, the significance level of the keyword is higher, and such as
Fruit keyword in product identification cannot play critically important effect, then the significance level of the keyword is lower.
Product identification model is to be obtained by the training of above-mentioned data processing equipment, therefore product identification model is specific thin
Section can be found in the above-mentioned associated description to data processing equipment, and details are not described herein.
In an optional implementation of the present embodiment, as shown in figure 12, second extraction module 1102, comprising:
Second participle submodule 1201, is configured as describing to segment to the text;
Matched sub-block 1202 is configured as matching the participle with keyword set, determines that the participle is
No is keyword.
In the optional implementation, as described in above-mentioned data processing equipment, in the training process, it can be directed to and be collected into
All sample products extract the keyword in corresponding text description, and form the corresponding keyword set of sample product, and
And the significance level of these keywords is further defined in the next steps.It therefore, can be with after the completion of product identification model training
Retain the keyword set, and after the participle in the text description for obtaining product to be identified, by these participles and closes
Keyword set is matched, and the participle can be determined as the corresponding keyword of the product to be identified if successful match, and
The significance level of these keywords also can directly determine out.
The determination of keyword and the determination of significance level may refer to the above-mentioned description to data processing equipment, herein not
It repeats again.
It should be noted that being not present and the matched keyword of above-mentioned keyword set in the text description of product to be identified
When, a default value can be set by the significance level of the corresponding keyword of identification product.
Embodiment further provides a kind of electronic equipment for the disclosure, as shown in figure 13, including processor 1301;And with place
Manage the memory 1302 that device 1301 communicates to connect;Wherein, memory 1302 is stored with the instruction that can be executed by processor 1301, refers to
It enables and being executed by processor 1301 to realize:
Obtain sample data;Wherein, the sample data include sample product text description and the sample product
Generic;
Extract the keyword in the text description;
Determine the significance level of the keyword;
Product identification model is carried out using the characteristic and the sample product generic of the sample product
Training;Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
Wherein, sample data is obtained, comprising:
Obtain the text description of multiple sample products under pre-set categories;
Duplicate removal processing is carried out to the text description of multiple sample products.
Wherein, duplicate removal processing is carried out to the text description of multiple sample products, comprising:
The corresponding multiple and different texts descriptions of the same sample product are uniformly mapped as identical text description.
Wherein, the keyword in the text description is extracted, comprising:
Text description is segmented;
The participle for being higher than preset threshold with the correlation of the sample product generic is determined as the keyword.
Wherein, the participle for being higher than preset threshold with the correlation of the sample product generic is determined as the key
Word, comprising:
Determine the participle with the sample product generic wherein it is determined that institute using card side's independence test electronic equipment
State the significance level of keyword, comprising:
The TD-IDF value of the keyword is determined as to the significance level of the keyword.
Wherein, the TD-IDF value of the keyword is determined as to the significance level of the keyword, comprising:
Determine TD-IDF value of the keyword under the sample product generic;
The keyword be corresponding with it is different classes of under multiple TD-IDF values when, select the smallest TD-IDF value as institute
State the significance level of keyword.
Wherein it is determined that the significance level of the keyword, comprising:
It is below in the correlation of the corresponding all participles of the sample product and the sample product generic default
When threshold value, using default value as the significance level of the corresponding keyword of the sample product.
This implementation additionally provides a kind of electronic equipment, including memory and processor;Wherein,
The memory is for storing one or more computer instruction, wherein one or more computer instruction
It is executed by the processor to realize following methods step: obtaining the text description of product to be identified;
Extract the keyword of the text description;
Determine the significance level of the keyword;
The significance level of the keyword is input in preparatory trained product identification model, to described to be identified
Product is identified;Wherein, the product identification model is obtained using the training of electronic equipment shown in Figure 13.
Wherein, the keyword in the text description is extracted, comprising:
Text description is segmented;
The participle is matched with keyword set, determines whether the participle is keyword.
Specifically, processor 1301, memory 1302 can be connected by bus or other modes, to pass through in Figure 13
For bus connection.Memory 1302 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile
Software program, non-volatile computer executable program and module.Processor 1301 is stored in memory 1302 by operation
In non-volatile software program, instruction and module, thereby executing the various function application and data processing of equipment, i.e., in fact
The above method in the existing embodiment of the present disclosure.
Memory 1302 may include storing program area and storage data area, wherein storing program area can store operation system
Application program required for system, function;Storage data area can store the historical data etc. of shipping network transport.In addition, memory
1302 may include high-speed random access memory, can also include nonvolatile memory, such as disk memory, flash memory
Device or other non-volatile solid state memory parts.In some embodiments, electronic equipment optionally includes communication component
1303, memory 1302 optionally includes the memory remotely located relative to processor 1301, these remote memories can be with
External equipment is connected to by communication component 1303.The example of above-mentioned network includes but is not limited to internet, intranet, office
Domain net, mobile radio communication and combinations thereof.
One or more module is stored in memory 1302, when being executed by one or more processor 1301,
Execute the above method in the embodiment of the present disclosure.
The said goods can be performed disclosure embodiment provided by method, have the corresponding functional module of execution method and
Beneficial effect, the not technical detail of detailed description in the present embodiment, reference can be made to method provided by disclosure embodiment.
Flow chart and block diagram in attached drawing illustrate system, method and computer according to the various embodiments of the disclosure
The architecture, function and operation in the cards of program product.In this regard, each box in course diagram or block diagram can be with
A part of a module, section or code is represented, a part of the module, section or code includes one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong
The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer
The combination of order is realized.
Being described in unit or module involved in disclosure embodiment can be realized by way of software, can also
It is realized in a manner of through hardware.Described unit or module also can be set in the processor, these units or module
Title do not constitute the restriction to the unit or module itself under certain conditions.
As on the other hand, the disclosure additionally provides a kind of computer readable storage medium, the computer-readable storage medium
Matter can be computer readable storage medium included in device described in above embodiment;It is also possible to individualism,
Without the computer readable storage medium in supplying equipment.Computer-readable recording medium storage has one or more than one journey
Sequence, described program is used to execute by one or more than one processor is described in disclosed method.
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (10)
1. a kind of data processing method characterized by comprising
Obtain sample data;Wherein, the sample data include sample product text description and the sample product belonging to
Classification;
Extract the keyword in the text description;
Determine the significance level of the keyword;
Product identification model is trained using the characteristic and the sample product generic of the sample product;
Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
2. the method according to claim 1, wherein obtaining sample data, comprising:
Obtain the text description of multiple sample products under pre-set categories;
Duplicate removal processing is carried out to the text description of multiple sample products.
3. the method according to claim 1, wherein the text description to multiple sample products carries out duplicate removal
Processing, comprising:
The corresponding multiple and different texts descriptions of the same sample product are uniformly mapped as identical text description.
4. method according to claim 1-3, which is characterized in that the keyword in the text description is extracted,
Include:
Text description is segmented;
The participle for being higher than preset threshold with the correlation of the sample product generic is determined as the keyword.
5. a kind of product identification method characterized by comprising
Obtain the text description of product to be identified;
Extract the keyword of the text description;
Determine the significance level of the keyword;
The significance level of the keyword is input in preparatory trained product identification model, to the product to be identified
It is identified;Wherein, the product identification model is obtained using the described in any item method training of claim 1-4.
6. a kind of data processing equipment characterized by comprising
First obtains module, is configured as obtaining sample data;Wherein, the sample data includes the text description of sample product
And the sample product generic;
First extraction module is configured as extracting the keyword in the text description;
First determining module is configured to determine that the significance level of the keyword;
Training module is configured as characteristic and the sample product generic using the sample product to product
Identification model is trained;Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
7. a kind of product identification device characterized by comprising
Second obtains module, is configured as obtaining the text description of product to be identified;
Second extraction module is configured as extracting the keyword of the text description;
Second determining module is configured to determine that the significance level of the keyword;
Identification module is configured as being input to the significance level of the keyword in preparatory trained product identification model,
To be identified to the product to be identified;Wherein, the product identification model is using device as claimed in claim 6 trained
It arrives.
8. a kind of electronic equipment, which is characterized in that including memory and processor;Wherein,
The memory is for storing one or more computer instruction, wherein one or more computer instruction is by institute
Processor is stated to execute to realize following methods step:
Obtain sample data;Wherein, the sample data include sample product text description and the sample product belonging to
Classification;
Extract the keyword in the text description;
Determine the significance level of the keyword;
Product identification model is trained using the characteristic and the sample product generic of the sample product;
Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
9. a kind of electronic equipment, which is characterized in that including memory and processor;Wherein,
The memory is for storing one or more computer instruction, wherein one or more computer instruction is by institute
Processor is stated to execute to realize following methods step:
Obtain the text description of product to be identified;
Extract the keyword of the text description;
Determine the significance level of the keyword;
The significance level of the keyword is input in preparatory trained product identification model, to the product to be identified
It is identified;Wherein, the product identification model is obtained using electronic equipment training according to any one of claims 8.
10. a kind of computer readable storage medium, is stored thereon with computer instruction, which is characterized in that the computer instruction quilt
Claim 1-5 described in any item methods are realized when processor executes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910563737.7A CN110264318A (en) | 2019-06-26 | 2019-06-26 | Data processing method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910563737.7A CN110264318A (en) | 2019-06-26 | 2019-06-26 | Data processing method, device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110264318A true CN110264318A (en) | 2019-09-20 |
Family
ID=67921955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910563737.7A Pending CN110264318A (en) | 2019-06-26 | 2019-06-26 | Data processing method, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110264318A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837867A (en) * | 2019-11-08 | 2020-02-25 | 深圳市深视创新科技有限公司 | Method for automatically distinguishing similar and heterogeneous products based on deep learning |
CN110941719A (en) * | 2019-12-02 | 2020-03-31 | 中国银行股份有限公司 | Data classification method, test method, device and storage medium |
CN111190635A (en) * | 2020-01-03 | 2020-05-22 | 拉扎斯网络科技(上海)有限公司 | Method, device and equipment for determining characteristic data of application program and storage medium |
CN111429184A (en) * | 2020-03-27 | 2020-07-17 | 北京睿科伦智能科技有限公司 | User portrait extraction method based on text information |
CN111522945A (en) * | 2020-04-10 | 2020-08-11 | 南通大学 | Poetry style analysis method based on chi-square test |
CN113657113A (en) * | 2021-08-24 | 2021-11-16 | 北京字跳网络技术有限公司 | Text processing method and device and electronic equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104951430A (en) * | 2014-03-27 | 2015-09-30 | 携程计算机技术(上海)有限公司 | Product feature tag extraction method and device |
US20160239865A1 (en) * | 2013-10-28 | 2016-08-18 | Tencent Technology (Shenzhen) Company Limited | Method and device for advertisement classification |
CN106095996A (en) * | 2016-06-22 | 2016-11-09 | 量子云未来(北京)信息科技有限公司 | Method for text classification |
CN106156372A (en) * | 2016-08-31 | 2016-11-23 | 北京北信源软件股份有限公司 | The sorting technique of a kind of internet site and device |
CN106294355A (en) * | 2015-05-14 | 2017-01-04 | 阿里巴巴集团控股有限公司 | A kind of determination method and apparatus of business object attribute |
CN107609160A (en) * | 2017-09-26 | 2018-01-19 | 联想(北京)有限公司 | A kind of file classification method and device |
CN108595418A (en) * | 2018-04-03 | 2018-09-28 | 上海透云物联网科技有限公司 | A kind of commodity classification method and system |
US20190005121A1 (en) * | 2017-06-29 | 2019-01-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for pushing information |
CN109388712A (en) * | 2018-09-21 | 2019-02-26 | 平安科技(深圳)有限公司 | A kind of trade classification method and terminal device based on machine learning |
CN109522544A (en) * | 2018-09-27 | 2019-03-26 | 厦门快商通信息技术有限公司 | Sentence vector calculation, file classification method and system based on Chi-square Test |
CN109614475A (en) * | 2018-12-07 | 2019-04-12 | 广东工业大学 | A kind of product feature based on deep learning determines method |
-
2019
- 2019-06-26 CN CN201910563737.7A patent/CN110264318A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160239865A1 (en) * | 2013-10-28 | 2016-08-18 | Tencent Technology (Shenzhen) Company Limited | Method and device for advertisement classification |
CN104951430A (en) * | 2014-03-27 | 2015-09-30 | 携程计算机技术(上海)有限公司 | Product feature tag extraction method and device |
CN106294355A (en) * | 2015-05-14 | 2017-01-04 | 阿里巴巴集团控股有限公司 | A kind of determination method and apparatus of business object attribute |
CN106095996A (en) * | 2016-06-22 | 2016-11-09 | 量子云未来(北京)信息科技有限公司 | Method for text classification |
CN106156372A (en) * | 2016-08-31 | 2016-11-23 | 北京北信源软件股份有限公司 | The sorting technique of a kind of internet site and device |
US20190005121A1 (en) * | 2017-06-29 | 2019-01-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for pushing information |
CN107609160A (en) * | 2017-09-26 | 2018-01-19 | 联想(北京)有限公司 | A kind of file classification method and device |
CN108595418A (en) * | 2018-04-03 | 2018-09-28 | 上海透云物联网科技有限公司 | A kind of commodity classification method and system |
CN109388712A (en) * | 2018-09-21 | 2019-02-26 | 平安科技(深圳)有限公司 | A kind of trade classification method and terminal device based on machine learning |
CN109522544A (en) * | 2018-09-27 | 2019-03-26 | 厦门快商通信息技术有限公司 | Sentence vector calculation, file classification method and system based on Chi-square Test |
CN109614475A (en) * | 2018-12-07 | 2019-04-12 | 广东工业大学 | A kind of product feature based on deep learning determines method |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837867A (en) * | 2019-11-08 | 2020-02-25 | 深圳市深视创新科技有限公司 | Method for automatically distinguishing similar and heterogeneous products based on deep learning |
CN110941719A (en) * | 2019-12-02 | 2020-03-31 | 中国银行股份有限公司 | Data classification method, test method, device and storage medium |
CN110941719B (en) * | 2019-12-02 | 2023-12-19 | 中国银行股份有限公司 | Data classification method, testing method, device and storage medium |
CN111190635A (en) * | 2020-01-03 | 2020-05-22 | 拉扎斯网络科技(上海)有限公司 | Method, device and equipment for determining characteristic data of application program and storage medium |
CN111190635B (en) * | 2020-01-03 | 2021-10-29 | 拉扎斯网络科技(上海)有限公司 | Method, device and equipment for determining characteristic data of application program and storage medium |
CN111429184A (en) * | 2020-03-27 | 2020-07-17 | 北京睿科伦智能科技有限公司 | User portrait extraction method based on text information |
CN111522945A (en) * | 2020-04-10 | 2020-08-11 | 南通大学 | Poetry style analysis method based on chi-square test |
CN113657113A (en) * | 2021-08-24 | 2021-11-16 | 北京字跳网络技术有限公司 | Text processing method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110264318A (en) | Data processing method, device, electronic equipment and storage medium | |
Sajadmanesh et al. | Kissing cuisines: Exploring worldwide culinary habits on the web | |
CN108256474A (en) | For identifying the method and apparatus of vegetable | |
CN107423421A (en) | Menu recommends method, apparatus and refrigerator | |
CN103325047B (en) | Net purchase guide device and method | |
WO2017045516A1 (en) | Method and server for matching convenient dish and digital menu, and terminal | |
CN106161591A (en) | A kind of Cloud Server, intelligent refrigerator and diet management system and method | |
US11823042B2 (en) | System for measuring food weight | |
Mokdara et al. | Personalized food recommendation using deep neural network | |
CN110223757A (en) | The recommended method of recipe scheme, device, medium, electronic equipment | |
JP2019061366A (en) | Alternative recipe presentation device, alternative recipe presentation method, computer program, and data structure | |
Caldeira et al. | Healthy menus recommendation: optimizing the use of the pantry | |
CN107679951A (en) | A kind of method and apparatus for aiding in ordering dishes | |
CN104731809B (en) | The processing method and processing device of the attribute information of object | |
CN109214956B (en) | Meal pushing method and device | |
CN110322323A (en) | Entity methods of exhibiting, device, storage medium and electronic equipment | |
KR20160116449A (en) | Application System providing Cuisine Recipes | |
CN108510361A (en) | The method for quickly positioning in the more vegetables of catering system, choosing vegetable | |
Amano et al. | Food category representatives: Extracting categories from meal names in food recordings and recipe data | |
CN107704816A (en) | The boiling method and device of food | |
US20210391051A1 (en) | Information processing apparatus, information processing method, and program | |
Tachibana et al. | Extraction of naming concepts based on modifiers in recipe titles | |
Sanjo et al. | Towards recommending diverse seasonal cooking recipes: A preliminary study based on monthly view data | |
JP7003739B2 (en) | Menu provision equipment, menu provision method and menu provision program | |
Yanai et al. | Large-scale twitter food photo mining and its applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190920 |
|
RJ01 | Rejection of invention patent application after publication |