CN110264318A - Data processing method, device, electronic equipment and storage medium - Google Patents

Data processing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110264318A
CN110264318A CN201910563737.7A CN201910563737A CN110264318A CN 110264318 A CN110264318 A CN 110264318A CN 201910563737 A CN201910563737 A CN 201910563737A CN 110264318 A CN110264318 A CN 110264318A
Authority
CN
China
Prior art keywords
product
keyword
sample
text description
significance level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910563737.7A
Other languages
Chinese (zh)
Inventor
赵呈路
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rajax Network Technology Co Ltd
Lazhasi Network Technology Shanghai Co Ltd
Original Assignee
Lazhasi Network Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lazhasi Network Technology Shanghai Co Ltd filed Critical Lazhasi Network Technology Shanghai Co Ltd
Priority to CN201910563737.7A priority Critical patent/CN110264318A/en
Publication of CN110264318A publication Critical patent/CN110264318A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/12Hotels or restaurants

Abstract

The embodiment of the present disclosure discloses a kind of data processing method, device, electronic equipment and storage medium.This method comprises: obtaining sample data;Wherein, the sample data include sample product text description and the sample product generic;Extract the keyword in the text description;Determine the significance level of the keyword;Product identification model is trained using the characteristic and the sample product generic of the sample product;Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.The product identification model that training obtains in this way can describe to learn into text description keyword under the product category from the text of product to the influence degree of product identification, the accuracy that can be improved product category identification can be identified with the different product of Similar Text description by product identification model.

Description

Data processing method, device, electronic equipment and storage medium
Technical field
This disclosure relates to field of computer technology, and in particular to a kind of data processing method, device, electronic equipment and storage Medium.
Background technique
With the development of internet technology, more and more products have appeared in operation platform on line.In order to good Common ground and the difference of various products are expressed, operation platform would generally generate representation data on line for product, convenient various Scene is as carried out Classification and Identification to product under retrieval scene.But since product category is various, same product may have respectively The different text of kind describes such as name of product, and different product may also have same or similar text description, product Representation data is usually that the mode of manually screening keyword provides, and the abstract summarization of people is different again, so error It is larger.Therefore the annotation process of product representation data is time-consuming and laborious, and accuracy is not high.
Summary of the invention
The embodiment of the present disclosure provides a kind of data processing method, device, electronic equipment and storage medium.
In a first aspect, providing a kind of data processing method in the embodiment of the present disclosure.
Specifically, the data processing method, comprising:
Obtain sample data;Wherein, the sample data include sample product text description and the sample product Generic;
Extract the keyword in the text description;
Determine the significance level of the keyword;
Product identification model is carried out using the characteristic and the sample product generic of the sample product Training;Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
With reference to first aspect, the disclosure obtains sample data in the first implementation of first aspect, comprising:
Obtain the text description of multiple sample products under pre-set categories;
Duplicate removal processing is carried out to the text description of multiple sample products.
With reference to first aspect and/or the first implementation of first aspect, the disclosure is real at second of first aspect In existing mode, duplicate removal processing is carried out to the text description of multiple sample products, comprising:
The corresponding multiple and different texts descriptions of the same sample product are uniformly mapped as identical text description.
With reference to first aspect, the first implementation of first aspect and/or second of implementation of first aspect, this It is disclosed in the third implementation of first aspect, extracts the keyword in the text description, comprising:
Text description is segmented;
The participle for being higher than preset threshold with the correlation of the sample product generic is determined as the keyword.
With reference to first aspect, the first implementation of first aspect, second of implementation of first aspect and/or The third implementation of one side, the disclosure, will be with the sample product institutes in the 4th kind of implementation of first aspect The correlation for belonging to classification is determined as the keyword higher than the participle of preset threshold, comprising:
The correlation of the participle and the sample product generic is determined using card side's independence test method.
With reference to first aspect, the first implementation, second of implementation of first aspect, first party of first aspect The third implementation in face and/or the 4th kind of implementation of first aspect, five kind realization of the disclosure in first aspect In mode, the significance level of the keyword is determined, comprising:
The TD-IDF value of the keyword is determined as to the significance level of the keyword.
With reference to first aspect, the first implementation, second of implementation of first aspect, first party of first aspect The 5th kind of implementation of the third implementation in face, the 4th kind of implementation of first aspect and/or first aspect, this public affairs It is opened in the 6th kind of implementation of first aspect, the TD-IDF value of the keyword is determined as to the important journey of the keyword Degree, comprising:
Determine TD-IDF value of the keyword under the sample product generic;
The keyword be corresponding with it is different classes of under multiple TD-IDF values when, select the smallest TD-IDF value as institute State the significance level of keyword.
With reference to first aspect, the first implementation, second of implementation of first aspect, first party of first aspect The third implementation in face, the 4th kind of implementation of first aspect, first aspect the 5th kind of implementation and/or first 6th kind of implementation of aspect, the disclosure determine the important of the keyword in the 7th kind of implementation of first aspect Degree, comprising:
It is below in the correlation of the corresponding all participles of the sample product and the sample product generic default When threshold value, using default value as the significance level of the corresponding keyword of the sample product.
Second aspect provides a kind of product identification method in the embodiment of the present disclosure.
Specifically, the product identification method, comprising:
Obtain the text description of product to be identified;
Extract the keyword of the text description;
Determine the significance level of the keyword;
The significance level of the keyword is input in preparatory trained product identification model, to described to be identified Product is identified;Wherein, the product identification model is obtained using the training of method described in first aspect.
In conjunction with second aspect, the disclosure is extracted in the text description in the first implementation of second aspect Keyword, comprising:
Text description is segmented;
The participle is matched with keyword set, determines whether the participle is keyword.
The third aspect provides a kind of data processing equipment in the embodiment of the present disclosure.
Specifically, the data processing equipment, comprising:
First obtains module, is configured as obtaining sample data;Wherein, the sample data includes the text of sample product Description and the sample product generic;
First extraction module is configured as extracting the keyword in the text description;
First determining module is configured to determine that the significance level of the keyword;
Training module is configured as characteristic and the sample product generic pair using the sample product Product identification model is trained;Wherein, the characteristic includes the important of the corresponding keyword of the sample product Degree.
Fourth aspect provides a kind of product identification device in the embodiment of the present disclosure.
Specifically, the product identification device, comprising:
Second obtains module, is configured as obtaining the text description of product to be identified;
Second extraction module is configured as extracting the keyword of the text description;
Second determining module is configured to determine that the significance level of the keyword;
Identification module is configured as the significance level of the keyword being input to preparatory trained product identification model In, to be identified to the product to be identified;Wherein, the product identification model utilizes the training of device described in the third aspect It obtains.
The function can also execute corresponding software realization by hardware realization by hardware.The hardware or Software includes one or more modules corresponding with above-mentioned function.
In a possible design, in the structure of data processing equipment and/or product identification device include memory and Processor, the memory support data processing equipment and/or the execution of product identification device above-mentioned for storing one or more The computer instruction of data processing method and/or product identification method, the processor is configured to for executing the storage The computer instruction stored in device.The data processing equipment and/or product identification device can also include communication interface, be used for Data processing equipment and/or product identification device and other equipment or communication.
5th aspect, the embodiment of the present disclosure provide a kind of electronic equipment, including memory and processor;Wherein, described Memory is for storing one or more computer instruction, wherein one or more computer instruction is by the processor It executes to realize following methods step:
Obtain sample data;Wherein, the sample data include sample product text description and the sample product Generic;
Extract the keyword in the text description;
Determine the significance level of the keyword;
Product identification model is carried out using the characteristic and the sample product generic of the sample product Training;Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
In conjunction with the 5th aspect, the disclosure obtains sample data in the first implementation of the 5th aspect, comprising:
Obtain the text description of multiple sample products under pre-set categories;
Duplicate removal processing is carried out to the text description of multiple sample products.
In conjunction with the first implementation of the 5th aspect and/or the 5th aspect, second reality of the disclosure at the 5th aspect In existing mode, duplicate removal processing is carried out to the text description of multiple sample products, comprising:
The corresponding multiple and different texts descriptions of the same sample product are uniformly mapped as identical text description.
In conjunction with second of implementation of the 5th aspect, the first implementation of the 5th aspect and/or the 5th aspect, originally It is disclosed in the third implementation of the 5th aspect, extracts the keyword in the text description, comprising:
Text description is segmented;
The participle for being higher than preset threshold with the correlation of the sample product generic is determined as the keyword.
In conjunction with the 5th aspect, the first implementation of the 5th aspect, the 5th aspect second of implementation and/or the The third implementation of five aspects, the disclosure, will be with the sample product institutes in the 4th kind of implementation of the 5th aspect The correlation for belonging to classification is determined as the keyword higher than the participle of preset threshold, comprising:
The correlation of the participle and the sample product generic is determined using card side's independence test electronic equipment.
In conjunction with the 5th aspect, the first implementation of the 5th aspect, second of implementation of the 5th aspect, the 5th side The 4th kind of implementation of the third implementation in face and/or the 5th aspect, five kind realization of the disclosure at the 5th aspect In mode, the significance level of the keyword is determined, comprising:
The TD-IDF value of the keyword is determined as to the significance level of the keyword.
In conjunction with the 5th aspect, the first implementation of the 5th aspect, second of implementation of the 5th aspect, the 5th side The 5th kind of implementation of the third implementation in face, the 4th kind of implementation of the 5th aspect and/or the 5th aspect, this public affairs It is opened in the 6th kind of implementation of the 5th aspect, the TD-IDF value of the keyword is determined as to the important journey of the keyword Degree, comprising:
Determine TD-IDF value of the keyword under the sample product generic;
The keyword be corresponding with it is different classes of under multiple TD-IDF values when, select the smallest TD-IDF value as institute State the significance level of keyword.
In conjunction with the 5th aspect, the first implementation of the 5th aspect, second of implementation of the 5th aspect, the 5th side The 5th kind of implementation and/or the 5th in terms of the third implementation in face, the 4th kind of implementation of the 5th aspect, the 5th 6th kind of implementation of aspect, the disclosure determine the important of the keyword in the 7th kind of implementation of the 5th aspect Degree, comprising:
It is below in the correlation of the corresponding all participles of the sample product and the sample product generic default When threshold value, using default value as the significance level of the corresponding keyword of the sample product.
6th aspect, the embodiment of the present disclosure provide a kind of electronic equipment, including memory and processor;Wherein, described Memory is for storing one or more computer instruction, wherein one or more computer instruction is by the processor It executes to realize following methods step:
Obtain the text description of product to be identified;
Extract the keyword of the text description;
Determine the significance level of the keyword;
The significance level of the keyword is input in preparatory trained product identification model, to described to be identified Product is identified;Wherein, the product identification model is obtained using electronic equipment training described in the 5th aspect.
In conjunction with the 6th aspect, the disclosure obtains sample data in the first implementation of the 6th aspect, comprising:
Extract the keyword in the text description, comprising:
Text description is segmented;
The participle is matched with keyword set, determines whether the participle is keyword.
7th aspect, the embodiment of the present disclosure provide a kind of computer readable storage medium, for storing data processing dress Set and/or product identification device used in computer instruction, it includes for executing computer involved in any of the above-described method Instruction.
The technical solution that the embodiment of the present disclosure provides can include the following benefits:
In the data processing method of the embodiment of the present disclosure, text description and the product generic of sample product are obtained, And the keyword of text description is extracted, determine the significance level under the extracted keyword sample product generic, in turn According to the characteristic for the significance level for including keyword and product generic training product identification model.Pass through this side The product identification model that formula training obtains can describe to learn into text description from the text of product keyword in the product class The not lower influence degree to product identification can be improved the accuracy of product category identification, even if with Similar Text description Different product can also be identified by product identification model.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
In conjunction with attached drawing, by the detailed description of following non-limiting embodiment, the other feature of the disclosure, purpose and excellent Point will be apparent.In the accompanying drawings:
Fig. 1 shows the flow chart of the data processing method according to one embodiment of the disclosure;
Fig. 2 shows the flow charts of the step S101 of embodiment according to Fig. 1;
Fig. 3 shows the flow chart of the step S102 of embodiment according to Fig. 1;
Fig. 4 shows the flow chart that keyword significance level part is determined in embodiment according to Fig. 1;
Fig. 5 shows the flow chart of the product identification method according to one embodiment of the disclosure;
Fig. 6 shows the flow chart of the step S502 of embodiment according to Fig.5,;
Fig. 7 shows the structural block diagram of the data processing equipment according to one embodiment of the disclosure;
Fig. 8 shows the structural block diagram of the first acquisition module 701 of embodiment according to Fig.7,;
Fig. 9 shows the structural block diagram of the first extraction module 702 of embodiment according to Fig.7,;
Figure 10 is shown according to the structural block diagram for determining keyword significance level part in one embodiment of the disclosure;
Figure 11 shows the structural block diagram of the product identification device according to one embodiment of the disclosure;
Figure 12 shows the structural block diagram of the second extraction module 1102 according to Figure 11 illustrated embodiment;
Figure 13 is adapted for the structure for realizing the electronic equipment of the data processing method according to one embodiment of the disclosure Schematic diagram.
Specific embodiment
Hereinafter, the illustrative embodiments of the disclosure will be described in detail with reference to the attached drawings, so that those skilled in the art can Easily realize them.In addition, for the sake of clarity, the portion unrelated with description illustrative embodiments is omitted in the accompanying drawings Point.
In the disclosure, it should be appreciated that the term of " comprising " or " having " etc. is intended to refer to disclosed in this specification Feature, number, step, behavior, the presence of component, part or combinations thereof, and be not intended to exclude other one or more features, A possibility that number, step, behavior, component, part or combinations thereof exist or are added.
It also should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure It can be combined with each other.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the flow chart of the data processing method according to one embodiment of the disclosure.As shown in Figure 1, the data Processing method the following steps are included:
In step s101, sample data is obtained;Wherein, the sample data include sample product text description and The classification of the sample product;
In step s 102, the keyword in the text description is extracted;
In step s 103, significance level of the keyword under the classification is determined;
In step S104, product identification model is carried out using the characteristic and the classification of the sample product Training;Wherein, the characteristic includes significance level of the corresponding keyword of the sample product under the classification.
In the present embodiment, sample product can be line upper mounting plate currently related product, such as takes out and order on platform Vegetable, the clothes on electric business platform, daily necessity, household items etc..The text of sample product describes The verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.Such as it takes out platform of ordering and serves The text description of product may include vegetable name, the food materials of vegetable, way etc..
In some embodiments, the text description of sample product can also include that the text of the affiliated operator of sample product is retouched It states.Under normal conditions, an operator club manage product category all can more close or even some operators only manage one The other product of type, therefore in training product identification model, the data of operator are also regard as input data, so that product is known Other model learns from operator's data to the feature that can influence product category, further improves the knowledge of product identification model Other accuracy rate.
The data of the affiliated operator of sample product can include but is not limited to the title of operator, Main Management range, warp A wide range of (style of cooking in such as catering industry) belonging to the managed product of battalion person.
The classification of sample product can be determined according to the data with existing of line upper mounting plate, can also be manually labeled to it. For example, sample data can be collected from the existing product of line upper mounting plate, and line upper mounting plate would generally have oneself to product Classification, therefore can be described by the relevant text of product collected under each classification, it obtains this and trains required sample number According to.
For each sample data obtained, can be extracted from the description of the text of sample product one or more crucial Word, and then determine the significance level of the one or more keyword, the significance level is for showing that the keyword knows product Other role size, if the keyword can play an important role in product identification, the significance level of the keyword It is higher, and if the keyword cannot play critically important effect in product identification, the significance level of the keyword is lower.
The significance level of keyword can first pass through the text of the statistics keyword all sample products under same category in advance The number occurred in this description goes out to determine, such as in the text description of some keyword all sample products under same category Occurrence number is more, it may be considered that the significance level of the keyword is higher, and if the frequency of occurrence of the keyword is less, it can Significance level to think the keyword is lower.
Product identification model can be using xgboost model, GBDT model, neural network model etc..One sample product It can correspond to multiple keywords, and the corresponding significance level of each keyword, it, can be with when training product identification model Significance level is converted into vector form, and the corresponding multiple vectors of these multiple keywords are combined to the input to form model Data.In each iteration cycle process, input of the characteristic as product identification model in a sample data is being obtained After the output result of product identification model, classification belonging to sample product in the output result and the sample data can be carried out Compare, and then update the model parameter of the product identification model, so that the output result and sample product of product identification model Generic is more nearly.By the training of great amount of samples data, the model parameter of product identification model is thus continually updated, and is being instructed After white silk, product identification model can provide one for input data and more accurately export result.
In the data processing method of the embodiment of the present disclosure, text description and the product generic of sample product are obtained, And the keyword of text description is extracted, determine the significance level of extracted keyword, and then according to important including keyword Characteristic and product generic the training product identification model of degree.The product identification that training obtains in this way Model can from the text of product describe study to text description in keyword under the product category to the shadow of product identification The degree of sound, can be improved the accuracy of product identification, can be known by product with the different product of Similar Text description Other model identification.
In an optional implementation of the present embodiment, as shown in Fig. 2, the step S101, i.e. acquisition sample data The step of, it further includes steps of
In step s 201, the text description of multiple sample products under pre-set categories is obtained;
In step S202, duplicate removal processing is carried out to the text description of multiple sample products.
In the optional implementation, when collecting sample data, there can be classification data for line upper mounting plate Multiple pre-set categories, and the text description of multiple sample products is obtained under pre-set categories respectively.The text describes But it is not limited to the verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.In order to avoid collecting To duplicate sample product, text can be described to carry out duplicate removal.
In an optional implementation of the present embodiment, the step S202, the i.e. text to multiple sample products This description carries out the step of duplicate removal processing, further includes steps of
The corresponding multiple and different texts descriptions of the same sample product are uniformly mapped as identical text description.
In the optional implementation, may exist has multiple sample products under same pre-set categories, and these samples Although the description of product text is different, belong to identical product.Such as take out in platform of ordering, in the menu that some trade companies upload Have " Tomato omelette ", and have " tomato scrambled eggs " in the menu that some trade companies upload, the two substantially belong to same Product, and different titles is used, therefore the two can be uniformly mapped as to the same name of product.Of course, it is possible to manage It solves, the other content in text description can also carry out unified mapping.
In an optional implementation of the present embodiment, as shown in figure 3, the step S102, that is, extract the text It the step of keyword in description, further includes steps of
In step S301, text description is segmented;
In step s 302, the participle for being higher than preset threshold with the correlation of the sample product generic is determined as The keyword.
It,, can basis after text description participle when extracting the keyword in text description in the optional implementation These segment correlations with sample product generic to determine keyword, such as take out in platform of ordering " tomato stir-fry chicken One of participle " stir-fry " of egg ", it is not important for the classification identification of the vegetable, namely the knowledge of " stir-fry " this word and the vegetable Other correlation is not high, can be by it from rejecting, and not as keyword.Preset threshold may be set according to actual conditions, herein With no restrictions.Keyword is extracted from the word segmentation result that text describes, the low participle of correlation can be rejected, can be avoided subsequent The characteristic dimension of training product identification model is excessive, leads to the problem that training effectiveness is low.
In an optional implementation of the present embodiment, the step S302 that is, will be with the affiliated class of the sample product The step of participle that other correlation is higher than preset threshold is determined as the keyword, further includes steps of
The correlation of the participle and the sample product generic is determined using card side's independence test method.
In the optional implementation, independence verification in card side's can determine relevance between two class variables and interdependent Property.Therefore, had collected in the embodiment of the present disclosure sample product under different pre-set categories text description, and to text describe into It has gone after participle, each pre-set categories can be directed to, determined under the pre-set categories using card side's independence verification mode, from institute The correlation of word segmentation result and the pre-set categories obtained in the text description for the sample product being collected into, and by correlation height It is determined as keyword in the participle of preset threshold.
The embodiment of the present disclosure is described for the text of wherein sample product, is utilized after being collected into great amount of samples data Independence verification in card side's therefrom extracts the keyword under different pre-set categories and forms keyword set.The verification of card side's independence For prior art, details are not described herein.
In an optional implementation of the present embodiment, the step S103 determines the important journey of the keyword The step of spending further includes steps of
The TD-IDF value of the keyword is determined as to the significance level of the keyword.
In the optional implementation, TF-IDF (term frequency-inverse document frequency) It is a kind of common weighting technique for information retrieval and data mining, TF is meant word frequency (Term Frequency), IDF meaning Think of is inverse document frequency (Inverse Document Frequency).The TF-IDF value of keyword can have to be contained as follows Justice: the frequency TF high occurred in the text description of current key word all sample products under current preset classification, and at it Seldom occur in the text description of all sample products under his pre-set categories, it may be considered that the keyword has good classification Separating capacity is adapted to classify, it can be considered that the keyword is more important for current preset classification, TD-IDF value can be used for measuring the importance of the keyword.
The TD-IDF value of keyword can first pass through statistics line upper mounting plate in advance and have all sample products under pre-set categories Text description, and then keyword is extracted from the description of these texts, and determine TD- of the keyword in the description of these texts IDF value.As described above, being formed by keyword set for sample data, each keyword can also correspond to corresponding TD- When such online recognition, it is corresponding directly can to obtain product to be identified using keyword set and corresponding TD-IDF value for IDF value Keyword and TD-IDF value.
In the present embodiment, the TF value of a keyword can by the keyword under pre-set categories all sample products The quantity quantity of all sample products (namely under the pre-set categories) that number appeared in text description is described divided by text It obtains;And the text that the IDF of the keyword can occur by the keyword describes to preset class belonging to corresponding sample product The sum of other number and pre-set categories determines that calculation formula is IDF=log (n/m), wherein n is that pre-set categories are total Number, the number for the pre-set categories that m occurs for the keyword.For example, keyword A is in pre-set categories 1, pre-set categories 2 and presets Occurred in the text description of sample product under classification 3, and pre-set categories a total of 5, then the keyword has appeared in three Under a pre-set categories, therefore the IDF=log (5/3) of the keyword.
The TD-IDF value of keyword is the TD value of the keyword and the product of IDF value.
For example, one is taken out the TD-IDF of each keyword under each pre-set categories in the sample data being collected into platform of ordering Value is as follows:
[" scrambled eggs ", " braised in soy sauce ", " pork braised in brown sauce ", " daily life of a family ", " cold and dressed with sauce ", " braised aubergines ", " Kung Pao Chicken ", " fourth ", " sugar Vinegar ", " fish-flavoured shredded pork ", " agaric ", " potato ", " shredded pork and eggs with dired mushroom ", " bean curd ", " long bean ", " tomato ", " tenterloin ", " spelling ", " dish ", " It is small "] home cooking [0.34,033,0.28,0.26,0.22,0.19,0.17,0.13,0.12,0.1,0.07,0.08,0.05, 0.07,0.06,0.07,0.05,0.05,0.05,0.06]
[" beer ", " rice wine ", " Beijing ", " Yanjing Brewery ", " snowflake ", " wheat ", " Harbin ", " listening ", " ends of the earth ", " Magma ", " Qingdao ", " eggnog ", " bravely rushing ", " Belgium ", " salubrious ", " king ", " white ", " sweet osmanthus ", " small ", " sweet wine "] wine [1.68,0.42,0.37,0.37,0.22,0.19,0.18,0.17,0.14,0.14,0.14,0.14,0.14,0.13,0.13, 0.12,0.1,0.1,0.11,0.07]
[" barbecue ", " crackling ", " roasting ", " stir-fry ", " crusty pancake ", " sausage ", " shishkabab ", " Orleans ", " Brazil ", " muscle ", " Cumin ", " red building ", " honeydew ", " salad ", " New Orleans ", " chicken ", " muscle string ", " the meat clip Mo ", " principal filter ", " Turkey "] roasting Meat [1.57,0.21,0.19,0.18,0.15,0.14,0.11,0.12,0.1,0.1,0.1,0.08,0.08,0.09,0.07, 0.08,0.06,0.08,0.06,0.05]
[" Huang is stewing ", " chicken ", " rice ", " special peppery ", " skin of beancurd ", " needle mushroom ", " Wu's note ", " abalone sauce ", " ", " chicken Meal ", " ten ", " agaric ", " chicken chicken ", " small point of chicken ", " potato block ", " not generation ", " earth pot ", " dry pot ", " big peppery " " gives beans Skin "] Huang braised chicken rice [1.81,1.26,0.4,0.14,0.13,0.12,0.1,0.09,0.1,0.08,0.07,0.07,0.05, 0.05,0.05,0.05,0.05,0.05,0.05,0.05]
[" tappasaki ", " skewer ", " palpus ", " peppery degree ", " big chicken cutlet ", " selection ", " squid ", " old foster-mother ", " fried shredded pancake ", " Iron plate ", " fish ", " eggplant ", " juice ", " bean curd ", " sesame seed cake ", " egg "] tappasaki [2.94,0.43,043,0.41,0.4, 037,0.33,032,0.32,0.29,0.28,0.27,0.25,0.25,0.23,0.21]
[" ball ", " beef dumplings ", " a small ball ", " burger ", " Chinese cabbage ", " stewed ", " soup ", " ball ", " piss ", " winter Melon ", " casserole ", " burning ", " octopus ", " Deep-fried meatballs ", " Chaozhou ", " hand is beaten ", " four happinesses ", " element ", " vermicelli ", " meat ball "] ball Son [1.14,0.35,0.19,0.19,0.18,0.16,0.14,0.12,0.11,0.1,0.11,0.1,0.09,0.07,0.07, 0.07,0.06,0.07,0.06,0.05]
[" sweets ", " diplomat ", " double poems ", " Macaron ", " gift box ", " afternoon tea ", " auspicious ± ", " raspberry ", " cloth It is bright ", " Buddhist nun ", " butter ", " volume "] sweets [2.82,0.66,0.66,0.62,0.6,0.6,0.58,0.55,0.54,0.53, 0.5,0.42]
Wherein, in every section of content, front portion is the multiple keywords extracted under vegetable classification, such as " scrambled eggs ", " pork braised in brown sauce " etc., centre are the titles of the vegetable classification, such as " home cooking ";Latter half is the corresponding TD-of these keywords IDF value, such as " 0.34,0.33 ".
For example, being directed to " Kung Pao chicken rice served with meat and vegetables on top " this vegetable, can be extracted from related text description as shown in the table Keyword and characteristic:
Vegetable name: Kung Pao chicken rice served with meat and vegetables on top, restaurant: Jin Baiwan, the style of cooking: Beijing cuisine, main management: snack
Gong Bao Diced chicken ....... Snack .......
0.3 0.5 1
Wherein, the first behavior keyword in table, the corresponding characteristic of the second behavior " Kung Pao chicken rice served with meat and vegetables on top " this vegetable, Namely the corresponding TD-IDF value of each keyword.
In an optional implementation of the present embodiment, as shown in figure 4, the TD-IDF value of the keyword is determined as It the step of significance level of the keyword, further includes steps of
In step S401, TD-IDF value of the keyword under the sample product generic is determined;
In step S402, the keyword be corresponding with it is different classes of under multiple TD-IDF values when, select it is the smallest Significance level of the TD-IDF value as the keyword.
In the optional implementation chapter, if the same keyword has appeared under multiple pre-set categories, then being directed to A TD-IDF value of the keyword, for the sake of unification, the weight of the keyword can be calculated in multiple pre-set categories Want degree that can choose the smallest TD-IDF value.
In an optional implementation of the present embodiment, the step S103 determines the important journey of the keyword The step of spending further includes steps of
It is below in the correlation of the corresponding all participles of the sample product and the sample product generic default When threshold value, using default value as the significance level of the corresponding keyword of the sample product.
It, can will be with sample product generic when extracting the keyword in text description in the optional implementation The low participle of correlation eliminate, if in the text description of a sample product, all participles and the sample product institute The correlation for belonging to classification is below preset threshold, then can be by the significance level of keyword in the characteristic of the sample product It is arranged to a default value.The size of default value can according to the actual situation depending on, herein with no restrictions.
Fig. 5 shows the flow chart of the product identification method according to one embodiment of the disclosure.As shown in figure 5, the product Recognition methods the following steps are included:
In step S501, the text description of product to be identified is obtained;
In step S502, the keyword of the text description is extracted;
In step S503, the significance level of the keyword is determined;
In step S504, the significance level of the keyword is input in preparatory trained product identification model, To be identified to the product to be identified;Wherein, the product identification model is obtained using the training of above-mentioned data processing method.
In the present embodiment, product to be identified can be line upper mounting plate currently related product, such as take out platform of ordering On vegetable, the clothes on electric business platform, daily necessity, household items etc..The text description of product to be identified includes but not It is limited to the verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.Such as take out platform of ordering The text description of upper vegetable may include vegetable name, the food materials of vegetable, way etc..
In some embodiments, the text description of product to be identified can also include the text of the affiliated operator of product to be identified This description.Under normal conditions, an operator club manage product category all can more close or even some operators only pass through A kind of product of classification is sought, therefore in training product identification model, also regard the data of operator as input data, so that producing Product identification model learns from operator's data to the feature that can influence product category, further improves product identification model Recognition accuracy.
The data of the affiliated operator of product to be identified can include but is not limited to the title of operator, Main Management range, A wide range of (style of cooking in such as catering industry) belonging to the managed product of operator.
For product to be identified, one or more keywords can be extracted from the description of the text of product to be identified, in turn Determine the significance level of the one or more keyword, the significance level is for showing what the keyword played product identification Size is acted on, if the keyword can play an important role in product identification, the significance level of the keyword is higher, and such as Fruit keyword in product identification cannot play critically important effect, then the significance level of the keyword is lower.
Product identification model is to be obtained by the training of above-mentioned data processing method, therefore product identification model is specific thin Section can be found in the above-mentioned associated description to data processing method, and details are not described herein.
In an optional implementation of the present embodiment, as shown in fig. 6, the step S502, that is, extract the text It the step of keyword in description, further includes steps of
In step s 601, text description is segmented;
In step S602, the participle is matched with keyword set, determines whether the participle is keyword.
In the optional implementation, as described in above-mentioned data processing method, in the training process, it can be directed to and be collected into All sample products extract the keyword in corresponding text description, and form the corresponding keyword set of sample product, and And the significance level of these keywords is further defined in the next steps.It therefore, can be with after the completion of product identification model training Retain the keyword set, and after the participle in the text description for obtaining product to be identified, by these participles and closes Keyword set is matched, and the participle can be determined as the corresponding keyword of the product to be identified if successful match, and The significance level of these keywords also can directly determine out.
The determination of keyword and the determination of significance level may refer to the above-mentioned description to data processing method, herein not It repeats again.
It should be noted that being not present and the matched keyword of above-mentioned keyword set in the text description of product to be identified When, a default value can be set by the significance level of the corresponding keyword of identification product.
Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.
Fig. 7 shows the structural block diagram of the data processing equipment according to one embodiment of the disclosure, which can be by soft Part, hardware or both are implemented in combination with as some or all of of electronic equipment.As shown in fig. 7, the data processing dress It sets and includes:
First obtains module 701, is configured as obtaining sample data;Wherein, the sample data includes sample product Text description and the sample product generic;
First extraction module 702 is configured as extracting the keyword in the text description;
First determining module 703, is configured to determine that the significance level of the keyword;
Training module 704 is configured as characteristic and the affiliated class of the sample product using the sample product It is other that product identification model is trained;Wherein, the characteristic includes the corresponding keyword of the sample product Significance level.
In the present embodiment, sample product can be line upper mounting plate currently related product, such as takes out and order on platform Vegetable, the clothes on electric business platform, daily necessity, household items etc..The text of sample product describes The verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.Such as it takes out platform of ordering and serves The text description of product may include vegetable name, the food materials of vegetable, way etc..
In some embodiments, the text description of sample product can also include that the text of the affiliated operator of sample product is retouched It states.Under normal conditions, an operator club manage product category all can more close or even some operators only manage one The other product of type, therefore in training product identification model, the data of operator are also regard as input data, so that product is known Other model learns from operator's data to the feature that can influence product category, further improves the knowledge of product identification model Other accuracy rate.
The data of the affiliated operator of sample product can include but is not limited to the title of operator, Main Management range, warp A wide range of (style of cooking in such as catering industry) belonging to the managed product of battalion person.
The classification of sample product can be determined according to the data with existing of line upper mounting plate, can also be manually labeled to it. For example, sample data can be collected from the existing product of line upper mounting plate, and line upper mounting plate would generally have oneself to product Classification, therefore can be described by the relevant text of product collected under each classification, it obtains this and trains required sample number According to.
For each sample data obtained, can be extracted from the description of the text of sample product one or more crucial Word, and then determine the significance level of the one or more keyword, the significance level is for showing that the keyword knows product Other role size, if the keyword can play an important role in product identification, the significance level of the keyword It is higher, and if the keyword cannot play critically important effect in product identification, the significance level of the keyword is lower.
The significance level of keyword can first pass through the text of the statistics keyword all sample products under same category in advance The number occurred in this description goes out to determine, such as in the text description of some keyword all sample products under same category Occurrence number is more, it may be considered that the significance level of the keyword is higher, and if the frequency of occurrence of the keyword is less, it can Significance level to think the keyword is lower.
Product identification model can be using xgboost model, GBDT model, neural network model etc..One sample product It can correspond to multiple keywords, and the corresponding significance level of each keyword, it, can be with when training product identification model Significance level is converted into vector form, and the corresponding multiple vectors of these multiple keywords are combined to the input to form model Data.In each iteration cycle process, input of the characteristic as product identification model in a sample data is being obtained After the output result of product identification model, classification belonging to sample product in the output result and the sample data can be carried out Compare, and then update the model parameter of the product identification model, so that the output result and sample product of product identification model Generic is more nearly.By the training of great amount of samples data, the model parameter of product identification model is thus continually updated, and is being instructed After white silk, product identification model can provide one for input data and more accurately export result.
In the data processing equipment of the embodiment of the present disclosure, text description and the product generic of sample product are obtained, And the keyword of text description is extracted, determine the significance level of extracted keyword, and then according to important including keyword Characteristic and product generic the training product identification model of degree.The product identification that training obtains in this way Model can from the text of product describe study to text description in keyword under the product category to the shadow of product identification The degree of sound, can be improved the accuracy of product identification, can be known by product with the different product of Similar Text description Other model identification.
In an optional implementation of the present embodiment, as shown in figure 8, described first obtains module 701, comprising:
First acquisition submodule 801 is configured as obtaining the text description of multiple sample products under pre-set categories;
Duplicate removal submodule 802 is configured as carrying out duplicate removal processing to the text description of multiple sample products.
In the optional implementation, when collecting sample data, there can be classification data for line upper mounting plate Multiple pre-set categories, and the text description of multiple sample products is obtained under pre-set categories respectively.The text describes But it is not limited to the verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.In order to avoid collecting To duplicate sample product, text can be described to carry out duplicate removal.
In an optional implementation of the present embodiment, the duplicate removal submodule 802, comprising:
Mapping submodule is configured as the corresponding multiple and different texts descriptions of the same sample product being uniformly mapped as Identical text description.
In the optional implementation, may exist has multiple sample products under same pre-set categories, and these samples Although the description of product text is different, belong to identical product.Such as take out in platform of ordering, in the menu that some trade companies upload Have " Tomato omelette ", and have " tomato scrambled eggs " in the menu that some trade companies upload, the two substantially belong to same Product, and different titles is used, therefore the two can be uniformly mapped as to the same name of product.Of course, it is possible to manage It solves, the other content in text description can also carry out unified mapping.
In an optional implementation of the present embodiment, as shown in figure 9, first extraction module 702, comprising:
First participle submodule 901 is configured as describing to segment to the text;
First determines submodule 902, is configured as that default threshold will be higher than with the correlation of the sample product generic The participle of value is determined as the keyword.
It,, can basis after text description participle when extracting the keyword in text description in the optional implementation These segment correlations with sample product generic to determine keyword, such as take out in platform of ordering " tomato stir-fry chicken One of participle " stir-fry " of egg ", it is not important for the classification identification of the vegetable, namely the knowledge of " stir-fry " this word and the vegetable Other correlation is not high, can be by it from rejecting, and not as keyword.Preset threshold may be set according to actual conditions, herein With no restrictions.Keyword is extracted from the word segmentation result that text describes, the low participle of correlation can be rejected, can be avoided subsequent The characteristic dimension of training product identification model is excessive, leads to the problem that training effectiveness is low.
In an optional implementation of the present embodiment, described first determines submodule 902, comprising:
Second determines submodule, is configured as determining the participle and the sample product using card side's independence test device The correlation of generic.
In the optional implementation, independence verification in card side's can determine relevance between two class variables and interdependent Property.Therefore, had collected in the embodiment of the present disclosure sample product under different pre-set categories text description, and to text describe into It has gone after participle, each pre-set categories can be directed to, determined under the pre-set categories using card side's independence verification mode, from institute The correlation of word segmentation result and the pre-set categories obtained in the text description for the sample product being collected into, and by correlation height It is determined as keyword in the participle of preset threshold.
The embodiment of the present disclosure is described for the text of wherein sample product, is utilized after being collected into great amount of samples data Independence verification in card side's therefrom extracts the keyword under different pre-set categories and forms keyword set.The verification of card side's independence For prior art, details are not described herein.
In an optional implementation of the present embodiment, first determining module 703, comprising:
Third determines submodule, is configured as the TD-IDF value of the keyword being determined as the important journey of the keyword Degree.
In the optional implementation, TF-IDF (term frequency-inverse document frequency) It is a kind of common weighting technique for information retrieval and data mining, TF is meant word frequency (Term Frequency), IDF meaning Think of is inverse document frequency (Inverse Document Frequency).The TF-IDF value of keyword can have to be contained as follows Justice: the frequency TF high occurred in the text description of current key word all sample products under current preset classification, and at it Seldom occur in the text description of all sample products under his pre-set categories, it may be considered that the keyword has good classification Separating capacity is adapted to classify, it can be considered that the keyword is more important for current preset classification, TD-IDF value can be used for measuring the importance of the keyword.
The TD-IDF value of keyword can first pass through statistics line upper mounting plate in advance and have all sample products under pre-set categories Text description, and then keyword is extracted from the description of these texts, and determine TD- of the keyword in the description of these texts IDF value.As described above, being formed by keyword set for sample data, each keyword can also correspond to corresponding TD- When such online recognition, it is corresponding directly can to obtain product to be identified using keyword set and corresponding TD-IDF value for IDF value Keyword and TD-IDF value.
In the present embodiment, the TF value of a keyword can by the keyword under pre-set categories all sample products The quantity quantity of all sample products (namely under the pre-set categories) that number appeared in text description is described divided by text It obtains;And the text that the IDF of the keyword can occur by the keyword describes to preset class belonging to corresponding sample product The sum of other number and pre-set categories determines that calculation formula is IDF=log (n/m), wherein n is that pre-set categories are total Number, the number for the pre-set categories that m occurs for the keyword.For example, keyword A is in pre-set categories 1, pre-set categories 2 and presets Occurred in the text description of sample product under classification 3, and pre-set categories a total of 5, then the keyword has appeared in three Under a pre-set categories, therefore the IDF=log (5/3) of the keyword.
The TD-IDF value of keyword is the TD value of the keyword and the product of IDF value.
For example, one is taken out the TD-IDF of each keyword under each pre-set categories in the sample data being collected into platform of ordering Value is as follows:
[" scrambled eggs ", " braised in soy sauce ", " pork braised in brown sauce ", " daily life of a family ", " cold and dressed with sauce ", " braised aubergines ", " Kung Pao Chicken ", " fourth ", " sugar Vinegar ", " fish-flavoured shredded pork ", " agaric ", " potato ", " shredded pork and eggs with dired mushroom ", " bean curd ", " long bean ", " tomato ", " tenterloin ", " spelling ", " dish ", " It is small "] home cooking [0.34,0.33,0.28,0.26,0.22,0.19,0.17,0.13,0.12,0.1,0.07,0.08,0.05, 0.07,0.06,0.07,0.05,0.05,0.05,0.06]
[" beer ", " rice wine ", " Beijing ", " Yanjing Brewery ", " snowflake " " wheat ", " Harbin ", " listening ", " ends of the earth ", " Magma ", " Qingdao ", " eggnog ", " bravely rushing ", " Belgium ", " salubrious ", " king ", " white ", " sweet osmanthus ", " small ", " sweet wine "] wine [1.68,0.42,0.37,0.37,0.22,0.19,0.18,0.17,0.14,0.14,0.14,0.14,0.14,0.13,0.13, 0.12,0.1,0.1,0.11,0.07]
[" barbecue ", " crackling ", " roasting ", " stir-fry ", " crusty pancake ", " sausage ", " shishkabab ", " Orleans ", " Brazil ", " muscle ", " Cumin ", " red building ", " honeydew ", " salad ", " New Orleans ", " chicken ", " muscle string ", " the meat clip Mo ", " principal filter ", " Turkey "] roasting Meat [1.57,0.21,0.19,0.18,0.15,0.14,0.11,0.12,0.1,0.1,0.1,0.08,0.08,0.09,0.07, 0.08,0.06,0.08,0.06,0.05]
[" Huang is stewing ", " chicken ", " rice ", " special peppery ", " skin of beancurd ", " needle mushroom ", " Wu's note ", " abalone sauce ", " ", " chicken Meal ", " ten ", " agaric ", " chicken chicken ", " small point of chicken ", " potato block ", " not generation ", " earth pot ", " thousand pots ", " big peppery " " gives beans Skin "] Huang braised chicken rice [1.81,1.26,0.4,0.14,0.13,0.12,0.1,0.09,0.1,0.08,0.07,0.07,0.05, 0.05,0.05,0.05,0.05,0.05,0.05,0.05]
[" tappasaki ", " skewer ", " palpus ", " peppery degree ", " big chicken cutlet ", " selection ", " squid ", " old foster-mother ", " fried shredded pancake ", " Iron plate ", " fish ", " eggplant ", " juice ", " bean curd ", " sesame seed cake ", " egg "] tappasaki [2.94,0.43,0.43,0.41,0.4, 0.37,0.33,0.32,0.32,0.29,0.28,0.27,0.25,0.25,0.23,0.21]
[" ball ", " beef dumplings ", " a small ball ", " burger ", " Chinese cabbage ", " stewed ", " soup ", " ball ", " piss ", " winter Melon ", " casserole ", " burning ", " octopus " " Deep-fried meatballs ", " Chaozhou ", " hand is beaten ", " four happinesses ", " element ", " vermicelli ", " meat ball "] ball Son [1.14,0.35,0.19,0.19,0.18,0.16,0.14,0.12,0.11,0.1,0.110.1,0.09,0.07,0.07, 0.07,0.06,0.07,0.06,0.05]
[" sweets ", " diplomat ", " double poems " " Macaron ", " gift box ", " afternoon tea ", " auspicious ± ", " raspberry ", " cloth It is bright ", " Buddhist nun ", " butter ", " volume "] sweets [2.82,0.66,0.66,0.62,0.6,0.6,0.58,0.55,0.54,0.53, 0.5,0.42]
Wherein, in every section of content, front portion is the multiple keywords extracted under vegetable classification, such as " scrambled eggs ", " pork braised in brown sauce " etc., centre are the titles of the vegetable classification, such as " home cooking ";Latter half is the corresponding TD-IDF of these keywords Value, such as " 0.34,0.33 ".
For example, being directed to " Kung Pao chicken rice served with meat and vegetables on top " this vegetable, can be extracted from related text description as shown in the table Keyword and characteristic:
Vegetable name: Kung Pao chicken rice served with meat and vegetables on top, restaurant: Jin Baiwan, the style of cooking: Beijing cuisine, main management: snack
Gong Bao Diced chicken ....... Snack .......
0.3 0.5 1
Wherein, the first behavior keyword in table, the corresponding characteristic of the second behavior " Kung Pao chicken rice served with meat and vegetables on top " this vegetable, Namely the corresponding TD-IDF value of each keyword.
In an optional implementation of the present embodiment, as shown in Figure 10, the third determines submodule, comprising:
4th determines submodule 1001, is configured to determine that the keyword under the sample product generic TD-IDF value;
Select submodule 1002, be configured as the keyword be corresponding with it is different classes of under multiple TD-IDF values when, Select significance level of the smallest TD-IDF value as the keyword.
In the optional implementation chapter, if the same keyword has appeared under multiple pre-set categories, then being directed to A TD-IDF value of the keyword, for the sake of unification, the weight of the keyword can be calculated in multiple pre-set categories Want degree that can choose the smallest TD-IDF value.
In an optional implementation of the present embodiment, first determining module 703, comprising:
5th determine submodule, be configured as the sample product it is corresponding it is all participle with the sample product belonging to When the correlation of classification is below preset threshold, using default value as the significance level of the corresponding keyword of the sample product.
It, can will be with sample product generic when extracting the keyword in text description in the optional implementation The low participle of correlation eliminate, if in the text description of a sample product, all participles and the sample product institute The correlation for belonging to classification is below preset threshold, then can be by the significance level of keyword in the characteristic of the sample product It is arranged to a default value.The size of default value can according to the actual situation depending on, herein with no restrictions.
Figure 11 shows the structural block diagram of the product identification device according to one embodiment of the disclosure, which can be by soft Part, hardware or both are implemented in combination with as some or all of of electronic equipment.As shown in figure 11, the product identification dress It sets and includes:
Second obtains module 1101, is configured as obtaining the text description of product to be identified;
Second extraction module 1102 is configured as extracting the keyword of the text description;
Second determining module 1103, is configured to determine that the significance level of the keyword;
Identification module 1104 is configured as the significance level of the keyword being input to preparatory trained product identification In model, to be identified to the product to be identified;Wherein, the product identification model utilizes data processing as described above Device training obtains.
In the present embodiment, product to be identified can be line upper mounting plate currently related product, such as take out platform of ordering On vegetable, the clothes on electric business platform, daily necessity, household items etc..The text description of product to be identified includes but not It is limited to the verbal description of the attributes such as name of product, product material, manufacturing process, effect, size, amount.Such as take out platform of ordering The text description of upper vegetable may include vegetable name, the food materials of vegetable, way etc..
In some embodiments, the text description of product to be identified can also include the text of the affiliated operator of product to be identified This description.Under normal conditions, an operator club manage product category all can more close or even some operators only pass through A kind of product of classification is sought, therefore in training product identification model, also regard the data of operator as input data, so that producing Product identification model learns from operator's data to the feature that can influence product category, further improves product identification model Recognition accuracy.
The data of the affiliated operator of product to be identified can include but is not limited to the title of operator, Main Management range, A wide range of (style of cooking in such as catering industry) belonging to the managed product of operator.
For product to be identified, one or more keywords can be extracted from the description of the text of product to be identified, in turn Determine the significance level of the one or more keyword, the significance level is for showing what the keyword played product identification Size is acted on, if the keyword can play an important role in product identification, the significance level of the keyword is higher, and such as Fruit keyword in product identification cannot play critically important effect, then the significance level of the keyword is lower.
Product identification model is to be obtained by the training of above-mentioned data processing equipment, therefore product identification model is specific thin Section can be found in the above-mentioned associated description to data processing equipment, and details are not described herein.
In an optional implementation of the present embodiment, as shown in figure 12, second extraction module 1102, comprising:
Second participle submodule 1201, is configured as describing to segment to the text;
Matched sub-block 1202 is configured as matching the participle with keyword set, determines that the participle is No is keyword.
In the optional implementation, as described in above-mentioned data processing equipment, in the training process, it can be directed to and be collected into All sample products extract the keyword in corresponding text description, and form the corresponding keyword set of sample product, and And the significance level of these keywords is further defined in the next steps.It therefore, can be with after the completion of product identification model training Retain the keyword set, and after the participle in the text description for obtaining product to be identified, by these participles and closes Keyword set is matched, and the participle can be determined as the corresponding keyword of the product to be identified if successful match, and The significance level of these keywords also can directly determine out.
The determination of keyword and the determination of significance level may refer to the above-mentioned description to data processing equipment, herein not It repeats again.
It should be noted that being not present and the matched keyword of above-mentioned keyword set in the text description of product to be identified When, a default value can be set by the significance level of the corresponding keyword of identification product.
Embodiment further provides a kind of electronic equipment for the disclosure, as shown in figure 13, including processor 1301;And with place Manage the memory 1302 that device 1301 communicates to connect;Wherein, memory 1302 is stored with the instruction that can be executed by processor 1301, refers to It enables and being executed by processor 1301 to realize:
Obtain sample data;Wherein, the sample data include sample product text description and the sample product Generic;
Extract the keyword in the text description;
Determine the significance level of the keyword;
Product identification model is carried out using the characteristic and the sample product generic of the sample product Training;Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
Wherein, sample data is obtained, comprising:
Obtain the text description of multiple sample products under pre-set categories;
Duplicate removal processing is carried out to the text description of multiple sample products.
Wherein, duplicate removal processing is carried out to the text description of multiple sample products, comprising:
The corresponding multiple and different texts descriptions of the same sample product are uniformly mapped as identical text description.
Wherein, the keyword in the text description is extracted, comprising:
Text description is segmented;
The participle for being higher than preset threshold with the correlation of the sample product generic is determined as the keyword.
Wherein, the participle for being higher than preset threshold with the correlation of the sample product generic is determined as the key Word, comprising:
Determine the participle with the sample product generic wherein it is determined that institute using card side's independence test electronic equipment State the significance level of keyword, comprising:
The TD-IDF value of the keyword is determined as to the significance level of the keyword.
Wherein, the TD-IDF value of the keyword is determined as to the significance level of the keyword, comprising:
Determine TD-IDF value of the keyword under the sample product generic;
The keyword be corresponding with it is different classes of under multiple TD-IDF values when, select the smallest TD-IDF value as institute State the significance level of keyword.
Wherein it is determined that the significance level of the keyword, comprising:
It is below in the correlation of the corresponding all participles of the sample product and the sample product generic default When threshold value, using default value as the significance level of the corresponding keyword of the sample product.
This implementation additionally provides a kind of electronic equipment, including memory and processor;Wherein,
The memory is for storing one or more computer instruction, wherein one or more computer instruction It is executed by the processor to realize following methods step: obtaining the text description of product to be identified;
Extract the keyword of the text description;
Determine the significance level of the keyword;
The significance level of the keyword is input in preparatory trained product identification model, to described to be identified Product is identified;Wherein, the product identification model is obtained using the training of electronic equipment shown in Figure 13.
Wherein, the keyword in the text description is extracted, comprising:
Text description is segmented;
The participle is matched with keyword set, determines whether the participle is keyword.
Specifically, processor 1301, memory 1302 can be connected by bus or other modes, to pass through in Figure 13 For bus connection.Memory 1302 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile Software program, non-volatile computer executable program and module.Processor 1301 is stored in memory 1302 by operation In non-volatile software program, instruction and module, thereby executing the various function application and data processing of equipment, i.e., in fact The above method in the existing embodiment of the present disclosure.
Memory 1302 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, function;Storage data area can store the historical data etc. of shipping network transport.In addition, memory 1302 may include high-speed random access memory, can also include nonvolatile memory, such as disk memory, flash memory Device or other non-volatile solid state memory parts.In some embodiments, electronic equipment optionally includes communication component 1303, memory 1302 optionally includes the memory remotely located relative to processor 1301, these remote memories can be with External equipment is connected to by communication component 1303.The example of above-mentioned network includes but is not limited to internet, intranet, office Domain net, mobile radio communication and combinations thereof.
One or more module is stored in memory 1302, when being executed by one or more processor 1301, Execute the above method in the embodiment of the present disclosure.
The said goods can be performed disclosure embodiment provided by method, have the corresponding functional module of execution method and Beneficial effect, the not technical detail of detailed description in the present embodiment, reference can be made to method provided by disclosure embodiment.
Flow chart and block diagram in attached drawing illustrate system, method and computer according to the various embodiments of the disclosure The architecture, function and operation in the cards of program product.In this regard, each box in course diagram or block diagram can be with A part of a module, section or code is represented, a part of the module, section or code includes one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer The combination of order is realized.
Being described in unit or module involved in disclosure embodiment can be realized by way of software, can also It is realized in a manner of through hardware.Described unit or module also can be set in the processor, these units or module Title do not constitute the restriction to the unit or module itself under certain conditions.
As on the other hand, the disclosure additionally provides a kind of computer readable storage medium, the computer-readable storage medium Matter can be computer readable storage medium included in device described in above embodiment;It is also possible to individualism, Without the computer readable storage medium in supplying equipment.Computer-readable recording medium storage has one or more than one journey Sequence, described program is used to execute by one or more than one processor is described in disclosed method.
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (10)

1. a kind of data processing method characterized by comprising
Obtain sample data;Wherein, the sample data include sample product text description and the sample product belonging to Classification;
Extract the keyword in the text description;
Determine the significance level of the keyword;
Product identification model is trained using the characteristic and the sample product generic of the sample product; Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
2. the method according to claim 1, wherein obtaining sample data, comprising:
Obtain the text description of multiple sample products under pre-set categories;
Duplicate removal processing is carried out to the text description of multiple sample products.
3. the method according to claim 1, wherein the text description to multiple sample products carries out duplicate removal Processing, comprising:
The corresponding multiple and different texts descriptions of the same sample product are uniformly mapped as identical text description.
4. method according to claim 1-3, which is characterized in that the keyword in the text description is extracted, Include:
Text description is segmented;
The participle for being higher than preset threshold with the correlation of the sample product generic is determined as the keyword.
5. a kind of product identification method characterized by comprising
Obtain the text description of product to be identified;
Extract the keyword of the text description;
Determine the significance level of the keyword;
The significance level of the keyword is input in preparatory trained product identification model, to the product to be identified It is identified;Wherein, the product identification model is obtained using the described in any item method training of claim 1-4.
6. a kind of data processing equipment characterized by comprising
First obtains module, is configured as obtaining sample data;Wherein, the sample data includes the text description of sample product And the sample product generic;
First extraction module is configured as extracting the keyword in the text description;
First determining module is configured to determine that the significance level of the keyword;
Training module is configured as characteristic and the sample product generic using the sample product to product Identification model is trained;Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
7. a kind of product identification device characterized by comprising
Second obtains module, is configured as obtaining the text description of product to be identified;
Second extraction module is configured as extracting the keyword of the text description;
Second determining module is configured to determine that the significance level of the keyword;
Identification module is configured as being input to the significance level of the keyword in preparatory trained product identification model, To be identified to the product to be identified;Wherein, the product identification model is using device as claimed in claim 6 trained It arrives.
8. a kind of electronic equipment, which is characterized in that including memory and processor;Wherein,
The memory is for storing one or more computer instruction, wherein one or more computer instruction is by institute Processor is stated to execute to realize following methods step:
Obtain sample data;Wherein, the sample data include sample product text description and the sample product belonging to Classification;
Extract the keyword in the text description;
Determine the significance level of the keyword;
Product identification model is trained using the characteristic and the sample product generic of the sample product; Wherein, the characteristic includes the significance level of the corresponding keyword of the sample product.
9. a kind of electronic equipment, which is characterized in that including memory and processor;Wherein,
The memory is for storing one or more computer instruction, wherein one or more computer instruction is by institute Processor is stated to execute to realize following methods step:
Obtain the text description of product to be identified;
Extract the keyword of the text description;
Determine the significance level of the keyword;
The significance level of the keyword is input in preparatory trained product identification model, to the product to be identified It is identified;Wherein, the product identification model is obtained using electronic equipment training according to any one of claims 8.
10. a kind of computer readable storage medium, is stored thereon with computer instruction, which is characterized in that the computer instruction quilt Claim 1-5 described in any item methods are realized when processor executes.
CN201910563737.7A 2019-06-26 2019-06-26 Data processing method, device, electronic equipment and storage medium Pending CN110264318A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910563737.7A CN110264318A (en) 2019-06-26 2019-06-26 Data processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910563737.7A CN110264318A (en) 2019-06-26 2019-06-26 Data processing method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110264318A true CN110264318A (en) 2019-09-20

Family

ID=67921955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910563737.7A Pending CN110264318A (en) 2019-06-26 2019-06-26 Data processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110264318A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837867A (en) * 2019-11-08 2020-02-25 深圳市深视创新科技有限公司 Method for automatically distinguishing similar and heterogeneous products based on deep learning
CN110941719A (en) * 2019-12-02 2020-03-31 中国银行股份有限公司 Data classification method, test method, device and storage medium
CN111190635A (en) * 2020-01-03 2020-05-22 拉扎斯网络科技(上海)有限公司 Method, device and equipment for determining characteristic data of application program and storage medium
CN111429184A (en) * 2020-03-27 2020-07-17 北京睿科伦智能科技有限公司 User portrait extraction method based on text information
CN111522945A (en) * 2020-04-10 2020-08-11 南通大学 Poetry style analysis method based on chi-square test
CN113657113A (en) * 2021-08-24 2021-11-16 北京字跳网络技术有限公司 Text processing method and device and electronic equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951430A (en) * 2014-03-27 2015-09-30 携程计算机技术(上海)有限公司 Product feature tag extraction method and device
US20160239865A1 (en) * 2013-10-28 2016-08-18 Tencent Technology (Shenzhen) Company Limited Method and device for advertisement classification
CN106095996A (en) * 2016-06-22 2016-11-09 量子云未来(北京)信息科技有限公司 Method for text classification
CN106156372A (en) * 2016-08-31 2016-11-23 北京北信源软件股份有限公司 The sorting technique of a kind of internet site and device
CN106294355A (en) * 2015-05-14 2017-01-04 阿里巴巴集团控股有限公司 A kind of determination method and apparatus of business object attribute
CN107609160A (en) * 2017-09-26 2018-01-19 联想(北京)有限公司 A kind of file classification method and device
CN108595418A (en) * 2018-04-03 2018-09-28 上海透云物联网科技有限公司 A kind of commodity classification method and system
US20190005121A1 (en) * 2017-06-29 2019-01-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for pushing information
CN109388712A (en) * 2018-09-21 2019-02-26 平安科技(深圳)有限公司 A kind of trade classification method and terminal device based on machine learning
CN109522544A (en) * 2018-09-27 2019-03-26 厦门快商通信息技术有限公司 Sentence vector calculation, file classification method and system based on Chi-square Test
CN109614475A (en) * 2018-12-07 2019-04-12 广东工业大学 A kind of product feature based on deep learning determines method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160239865A1 (en) * 2013-10-28 2016-08-18 Tencent Technology (Shenzhen) Company Limited Method and device for advertisement classification
CN104951430A (en) * 2014-03-27 2015-09-30 携程计算机技术(上海)有限公司 Product feature tag extraction method and device
CN106294355A (en) * 2015-05-14 2017-01-04 阿里巴巴集团控股有限公司 A kind of determination method and apparatus of business object attribute
CN106095996A (en) * 2016-06-22 2016-11-09 量子云未来(北京)信息科技有限公司 Method for text classification
CN106156372A (en) * 2016-08-31 2016-11-23 北京北信源软件股份有限公司 The sorting technique of a kind of internet site and device
US20190005121A1 (en) * 2017-06-29 2019-01-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for pushing information
CN107609160A (en) * 2017-09-26 2018-01-19 联想(北京)有限公司 A kind of file classification method and device
CN108595418A (en) * 2018-04-03 2018-09-28 上海透云物联网科技有限公司 A kind of commodity classification method and system
CN109388712A (en) * 2018-09-21 2019-02-26 平安科技(深圳)有限公司 A kind of trade classification method and terminal device based on machine learning
CN109522544A (en) * 2018-09-27 2019-03-26 厦门快商通信息技术有限公司 Sentence vector calculation, file classification method and system based on Chi-square Test
CN109614475A (en) * 2018-12-07 2019-04-12 广东工业大学 A kind of product feature based on deep learning determines method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837867A (en) * 2019-11-08 2020-02-25 深圳市深视创新科技有限公司 Method for automatically distinguishing similar and heterogeneous products based on deep learning
CN110941719A (en) * 2019-12-02 2020-03-31 中国银行股份有限公司 Data classification method, test method, device and storage medium
CN110941719B (en) * 2019-12-02 2023-12-19 中国银行股份有限公司 Data classification method, testing method, device and storage medium
CN111190635A (en) * 2020-01-03 2020-05-22 拉扎斯网络科技(上海)有限公司 Method, device and equipment for determining characteristic data of application program and storage medium
CN111190635B (en) * 2020-01-03 2021-10-29 拉扎斯网络科技(上海)有限公司 Method, device and equipment for determining characteristic data of application program and storage medium
CN111429184A (en) * 2020-03-27 2020-07-17 北京睿科伦智能科技有限公司 User portrait extraction method based on text information
CN111522945A (en) * 2020-04-10 2020-08-11 南通大学 Poetry style analysis method based on chi-square test
CN113657113A (en) * 2021-08-24 2021-11-16 北京字跳网络技术有限公司 Text processing method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN110264318A (en) Data processing method, device, electronic equipment and storage medium
Sajadmanesh et al. Kissing cuisines: Exploring worldwide culinary habits on the web
CN108256474A (en) For identifying the method and apparatus of vegetable
CN107423421A (en) Menu recommends method, apparatus and refrigerator
CN103325047B (en) Net purchase guide device and method
WO2017045516A1 (en) Method and server for matching convenient dish and digital menu, and terminal
CN106161591A (en) A kind of Cloud Server, intelligent refrigerator and diet management system and method
US11823042B2 (en) System for measuring food weight
Mokdara et al. Personalized food recommendation using deep neural network
CN110223757A (en) The recommended method of recipe scheme, device, medium, electronic equipment
JP2019061366A (en) Alternative recipe presentation device, alternative recipe presentation method, computer program, and data structure
Caldeira et al. Healthy menus recommendation: optimizing the use of the pantry
CN107679951A (en) A kind of method and apparatus for aiding in ordering dishes
CN104731809B (en) The processing method and processing device of the attribute information of object
CN109214956B (en) Meal pushing method and device
CN110322323A (en) Entity methods of exhibiting, device, storage medium and electronic equipment
KR20160116449A (en) Application System providing Cuisine Recipes
CN108510361A (en) The method for quickly positioning in the more vegetables of catering system, choosing vegetable
Amano et al. Food category representatives: Extracting categories from meal names in food recordings and recipe data
CN107704816A (en) The boiling method and device of food
US20210391051A1 (en) Information processing apparatus, information processing method, and program
Tachibana et al. Extraction of naming concepts based on modifiers in recipe titles
Sanjo et al. Towards recommending diverse seasonal cooking recipes: A preliminary study based on monthly view data
JP7003739B2 (en) Menu provision equipment, menu provision method and menu provision program
Yanai et al. Large-scale twitter food photo mining and its applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190920

RJ01 Rejection of invention patent application after publication