CN111882224A - Method and device for classifying consumption scenes - Google Patents
Method and device for classifying consumption scenes Download PDFInfo
- Publication number
- CN111882224A CN111882224A CN202010754949.6A CN202010754949A CN111882224A CN 111882224 A CN111882224 A CN 111882224A CN 202010754949 A CN202010754949 A CN 202010754949A CN 111882224 A CN111882224 A CN 111882224A
- Authority
- CN
- China
- Prior art keywords
- address
- data information
- text
- scene
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000007781 pre-processing Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims description 30
- 238000004422 calculation algorithm Methods 0.000 claims description 23
- 238000004140 cleaning Methods 0.000 claims description 7
- 238000003860 storage Methods 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 abstract description 8
- 238000013486 operation strategy Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 239000002699 waste material Substances 0.000 description 2
- 241000274965 Cyrestis thyodamas Species 0.000 description 1
- 235000013361 beverage Nutrition 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06315—Needs-based resource requirements planning or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Entrepreneurship & Innovation (AREA)
- Finance (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Educational Administration (AREA)
- Operations Research (AREA)
- Tourism & Hospitality (AREA)
- Remote Sensing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a device for classifying consumption scenes, wherein the method comprises the following steps: acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address; carrying out data preprocessing on the data information; storing the preprocessed data information into a database; acquiring a receiving address consumed by a current user; and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs. According to the scheme, the service flow does not need to be changed, and the user can realize the liquor-using scene behavior analysis which is not sensed by the user only by filling the receiving address according to the normal service flow, so that the operator can improve the service flow, analyze the user requirement and adjust the operation strategy.
Description
Technical Field
The invention relates to the field of computers, in particular to a method and a device for classifying consumption scenes.
Background
In the past, it was very difficult for the goods produced and services provided to find the person who needs it, and it was common practice to advertise over a wide range, but the person who notified was not necessarily the person who needed the product and service, and it was a huge waste of social resources if the distribution was not good.
Meanwhile, people who provide goods and services need to distribute the services and goods to various regions, and how to distribute the services and goods cannot be wasted, and how to distribute the services and goods can enable people who need the services and goods to enjoy the needed services and goods in time is a problem which needs to be solved urgently at present.
Disclosure of Invention
In order to solve the problem that the produced products cannot be distributed to the demand users in time in the prior art, the invention provides a method and a device for classifying consumption scenes.
In a first aspect, the present invention provides a method for classifying a consumption scenario, the method comprising:
acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address;
carrying out data preprocessing on the data information;
storing the preprocessed data information into a database;
acquiring a receiving address consumed by a current user;
and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.
Further, the acquiring of the data information under each category after classification includes:
and acquiring data information corresponding to each category of the platform information website within preset time.
Further, the data preprocessing the data information comprises:
carrying out data structuring and data cleaning on the data information to obtain processed data information;
and calling a map interface to add longitude and latitude data to the address information in the processed data information.
Further, the text matching of the address information of the preprocessed data information stored in the database and the delivery address consumed by the user comprises:
matching the scene types of the receiving addresses by using regular sentences and the longest common substring algorithm;
if the matching fails, selecting a text with the matching score of the regular sentence and the longest common substring algorithm larger than a preset threshold value as a training sample;
splitting and extracting the address text in the training sample by using TF-IDF;
and inputting the extracted text into an xgboost model for training, and outputting a training result.
Further, the matching of the scene category of the receiving address by using the regular sentences and the longest common substring algorithm comprises the following steps:
carrying out basic classification on the goods receiving address by using a regular sentence to determine the scene type; or
And acquiring address information of a preset distance in a standard library, and performing text matching on the longitude and latitude data of the receiving address and the address information of the preset distance by using a longest common substring algorithm to determine the scene type.
Further, the method further comprises:
and if the address information of the preprocessed data information stored in the database fails to be matched with the delivery address consumed by the user in a text mode, inputting the text with failed matching as a target word vector into an xgboost model for training, and predicting the category of the consumption scene.
Further, the splitting and extracting of the address text in the training sample by using the TF-IDF comprises:
performing word segmentation processing on the address text in the training sample by using TF-IDF;
and according to the weight given to the words in the address text by each scene type, extracting the words meeting a preset weight threshold value.
In a second aspect, the present invention provides an apparatus for classifying a consumption scenario, the apparatus comprising:
the data information acquisition module is used for acquiring the classified data information under each category; the data information includes: scene name, scene type, province, city and address;
the preprocessing module is used for preprocessing data information;
the storage module is used for storing the preprocessed data information into a database;
the receiving address acquisition module is used for acquiring a receiving address consumed by a current user;
and the text matching module is used for performing text matching on the address information of the preprocessed data information stored in the database and the goods receiving address consumed by the user, and determining the consumption scene category to which the goods receiving address consumed by the current user belongs.
In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method for classifying a consumption scenario provided in the first aspect when executing the program.
In a fourth aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of classifying a consumption scenario as provided in the first aspect.
The invention well exerts the advantages of each party by acquiring the data information under each category, performing text matching, text analysis algorithm and big data calculation, has insufficient complementation and ensures the realization and stability of scene classification judgment. The obtained external experience is utilized, the external experience is combined with the internal data through text matching, and the insufficient precision of text matching is made up through a text analysis algorithm; meanwhile, through a big data analysis method, different consumption fields of all regions are compared in combination with information such as consumption behaviors of people, so that the brand can be helped to know consumption scenes of target service groups, effective data basis for reaching consumers is provided, and win-win of the consumers and the brand is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for classifying a consumption scenario according to an embodiment of the present invention;
fig. 2 is a block diagram of an apparatus for classifying consumption scenarios according to an embodiment of the present invention;
fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
With the development of the internet and the popularization of the mobile internet later, basic people are used to be online at any time, people can conveniently and quickly shop, socialize and the like on various platforms, and meanwhile footprints are left on the internet. For the production brand of wine products, the product provider also wants to know where people needing service are located, and the embodiment of the invention can know where people needing the service are located by sorting and classifying the information and matching the provided service and the characteristics of the product, so that the provider can accurately reach the people needing the service, waste is reduced, and the cooperative efficiency is improved. More specifically, as shown in fig. 1, an embodiment of the present invention provides a method for classifying a consumption scenario, where the method includes:
step S101, acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address;
step S102, data preprocessing is carried out on data information;
step S103, storing the preprocessed data information into a database;
step S104, acquiring a receiving address consumed by the current user;
and step S105, performing text matching on the address information of the preprocessed data information stored in the database and the goods receiving address consumed by the user, and determining the consumption scene category to which the goods receiving address consumed by the current user belongs.
Specifically, in the embodiment of the present invention, preferably, the consumption scenes of the wine products are classified, and crawl data information is first used by a crawler: the method comprises the steps of dividing main wine scene addresses into three categories of office places, residences and restaurants, selecting main platform information websites (including platforms such as 58, Mei Tuo and popular comment) of each category, establishing crawler tasks one by one, and crawling data regularly (or within preset time) through an online server. The data includes: scene name, scene type, province, city, and detailed address. The crawler data can continuously draw external experience, new data are obtained every time, samples are continuously accumulated through a text matching algorithm, and finally the accuracy of a text analysis algorithm is improved.
And carrying out data structuralization on the crawled data, storing the data to an online database, and carrying out preliminary data cleaning through a scheduling process. The cleaning comprises repeated data cleaning, messy code cleaning and special symbol cleaning. And calling common map interfaces such as a Baidu map, a Gaode map and the like to obtain the longitude and latitude data of the address, and storing the longitude and latitude data in a database.
And acquiring a consumption receiving address of a target user, performing text matching on address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs according to a text matching result.
The embodiment of the invention well exerts the advantages of each party by acquiring the data information under each category and performing text matching, text analysis algorithm and big data calculation, has insufficient complementation and ensures the realization and stability of scene classification judgment. The obtained external experience is utilized, the external experience is combined with the internal data through text matching, and the insufficient precision of text matching is made up through a text analysis algorithm; meanwhile, through a big data analysis method, different consumption fields of all regions are compared in combination with information such as consumption behaviors of people, so that the brand can be helped to know consumption scenes of target service groups, effective data basis for reaching consumers is provided, and win-win of the consumers and the brand is realized.
Based on the content of the above embodiments, as an alternative embodiment: the step of performing text matching on the address information of the preprocessed data information stored in the database and the delivery address consumed by the user comprises the following steps:
matching the scene types of the receiving addresses by using regular sentences and the longest common substring algorithm;
if the matching fails, selecting a text with the matching score of the regular sentence and the longest common substring algorithm larger than a preset threshold value as a training sample;
splitting and extracting the address text in the training sample by using TF-IDF;
and inputting the extracted text into an xgboost model for training, and outputting a training result.
Specifically, addresses with obvious classifications are classified based on regular sentences, and if the addresses contain certain words with obvious scene information (the main scene in the embodiment of the invention is a wine scene), the addresses are classified into the scene, such as a hot pot, a string, ktv and the like, and can be classified into catering scene types.
The method comprises the steps of carrying out scene category matching on a receiving address by using a longest common substring algorithm, matching preset distance, such as address information of nearby 1km, from a standard library mainly according to the receiving longitude and latitude, then calculating the similarity between the receiving address and the addresses one by using the longest common substring algorithm, and then obtaining the address with the highest similarity so as to judge a wine using scene. The longest common character string algorithm is that firstly, the longest common character string in the two texts is obtained, and then the length of the common character string/the length of the longer text in the two texts is calculated and used as the address similarity.
If partial address matching failure exists when the regular sentences and the longest common substring algorithm are used for performing text matching on the receiving address, selecting a text with the matching score of the regular sentences and the longest common substring algorithm larger than a preset threshold value as a training sample, for example, selecting a text with a higher score (more than 70 points) as the training sample, and then splitting and extracting the address text in the training sample by using TF-IDF; and inputting the extracted text into an xgboost model for training, and outputting a training result. The xgboost model is trained. And selecting important parameters to construct a grid for searching, and acquiring the optimal model parameters, wherein the parameters comprise the learning rate, the maximum depth, the estimation quantity and the minimum leaf node sample weight. And training the model again according to the obtained optimal model parameters, and verifying the model in the test set and storing the model.
Based on the content of the above embodiments, as an alternative embodiment: the method for splitting and extracting the address texts in the training samples by using the TF-IDF comprises the following steps:
performing word segmentation processing on the address text in the training sample by using TF-IDF;
and according to the weight given to the words in the address text by each scene type, extracting the words meeting a preset weight threshold value.
Specifically, word segmentation is carried out on the address text in the selected training sample, and stop words in word segmentation results are removed. The stop words are equivalent to the filter words to a certain extent, and in a general sense, the stop words can be roughly divided into the following two categories: 1. words that are used quite extensively, even too frequently. Words such as "i", "is", "what" in English, Chinese "I", "just" and the like appear on almost every document; 2. words in text that appear frequently but are not really meaningful. This category includes words such as auxiliary words, adverbs, prepositions, conjunctions, etc., which have no clear meaning and only have a certain effect when put in a complete sentence. Appropriately reducing the frequency of appearance of stop words can effectively help to improve keyword density.
And forming a new result set by using the results without the stop words according to the scene types, giving weights to words in the address text by adopting TF-IDF according to various scene types, and selecting the first 500 words to form a word vector. From the word vectors, the samples are reconstructed and a training set is generated. The TF-IDF method is to count the word frequency of the word in the text and compare the word frequency of the word in all texts, so as to find out the words which are relatively descriptive to the category. It should be noted that the term having the descriptive meaning for the category is a term that is relatively representative of the category type, such as the category of the dining scene, and tends to extract terms such as chafing dish, bottom materials, beverage, etc., and is given a higher weight value.
Based on the content of the above embodiments, as an alternative embodiment: the method further comprises the following steps:
and if the address information of the preprocessed data information stored in the database fails to be matched with the delivery address consumed by the user in a text mode, inputting the text with failed matching as a target word vector into an xgboost model for training, and predicting the category of the consumption scene.
Specifically, if the address text which fails to be matched still exists after the text address matching processing in the foregoing embodiments, the text which fails to be matched is used as the target word vector, and an xgboost model is input for training to predict the consumption scene category.
According to a further aspect of the present invention, an apparatus for classifying a consumption scenario is provided, and referring to fig. 2, fig. 2 is a block diagram of an apparatus for classifying a consumption scenario provided by an embodiment of the present invention. The device is used for completing the classification of the consumption scenes provided by the embodiment of the invention in the embodiments. Therefore, the description and definition of the method for classifying a consumption scenario provided in the foregoing embodiments of the present invention may be used for understanding each execution module in the embodiments of the present invention.
The device includes:
the data information acquiring module 201 is configured to acquire data information under each classified category; the data information includes: scene name, scene type, province, city and address;
the preprocessing module 202 is configured to perform data preprocessing on the data information;
the storage module 203 is used for storing the preprocessed data information into a database;
a receiving address obtaining module 204, configured to obtain a receiving address consumed by the current user;
the text matching module 205 is configured to perform text matching on the address information of the preprocessed data information stored in the database and the shipping address consumed by the user, and determine a consumption scenario category to which the shipping address consumed by the current user belongs.
Specifically, the specific process of each module in the apparatus of this embodiment to implement its function may refer to the related description in the corresponding method embodiment, and is not described herein again.
The embodiment of the invention well exerts the advantages of each party by acquiring the data information under each category and performing text matching, text analysis algorithm and big data calculation, has insufficient complementation and ensures the realization and stability of scene classification judgment. The obtained external experience is utilized, the external experience is combined with the internal data through text matching, and the insufficient precision of text matching is made up through a text analysis algorithm; meanwhile, through a big data analysis method, different consumption fields of all regions are compared in combination with information such as consumption behaviors of people, so that the brand can be helped to know consumption scenes of target service groups, effective data basis for reaching consumers is provided, and win-win of the consumers and the brand is realized.
Fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device includes: a processor 301, a memory 302, and a bus 303;
the processor 301 and the memory 302 respectively complete communication with each other through a bus 303; the processor 301 is configured to call the program instructions in the memory 302 to execute the method for classifying a consumption scenario provided by the foregoing embodiment, for example, including: acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address; carrying out data preprocessing on the data information; storing the preprocessed data information into a database; acquiring a receiving address consumed by a current user; and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.
Embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of classifying a consumption scenario. Examples include: acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address; carrying out data preprocessing on the data information; storing the preprocessed data information into a database; acquiring a receiving address consumed by a current user; and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, the principle and the implementation of the present invention are explained by applying the specific embodiments in the present invention, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A method of classifying a consumption scenario, the method comprising:
acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address;
carrying out data preprocessing on the data information;
storing the preprocessed data information into a database;
acquiring a receiving address consumed by a current user;
and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.
2. The method according to claim 1, wherein the obtaining data information under each classified category comprises:
and acquiring data information corresponding to each category of the platform information website within preset time.
3. The method of claim 1, wherein pre-processing the data information comprises:
carrying out data structuring and data cleaning on the data information to obtain processed data information;
and calling a map interface to add longitude and latitude data to the address information in the processed data information.
4. The method of claim 1, wherein the text matching address information of the preprocessed data information stored in the database with the shipping address consumed by the user comprises:
matching scene types of the receiving addresses by using regular sentences and a longest common substring algorithm;
if the matching fails, selecting a text with the matching score of the regular sentence and the longest common substring algorithm larger than a preset threshold value as a training sample;
splitting and extracting the address text in the training sample by using TF-IDF;
and inputting the extracted text into an xgboost model for training, and outputting a training result.
5. The method of claim 4, wherein the matching of the shipping address for the scene category using a canonical statement and a longest common substring algorithm comprises:
carrying out basic classification on the receiving address by using a regular statement to determine a scene type; or
And acquiring address information of a preset distance in a standard library, and performing text matching on the longitude and latitude data of the receiving address and the address information of the preset distance by using a longest common substring algorithm to determine the scene type.
6. The method of claim 1, further comprising:
and if the address information of the preprocessed data information stored in the database fails to be matched with the delivery address consumed by the user in a text mode, inputting the text with failed matching as a target word vector into an xgboost model for training, and predicting the category of the consumption scene.
7. The method of claim 4, wherein the splitting and extracting address text in the training samples using TF-IDF comprises:
performing word segmentation processing on the address text in the training sample by using TF-IDF;
and according to the weight given to the words in the address text by each scene type, extracting the words meeting a preset weight threshold value.
8. An apparatus for classifying a consumption scenario, the apparatus comprising:
the data information acquisition module is used for acquiring the classified data information under each category; the data information includes: scene name, scene type, province, city and address;
the preprocessing module is used for preprocessing the data information;
the storage module is used for storing the preprocessed data information into a database;
the receiving address acquisition module is used for acquiring a receiving address consumed by a current user;
and the text matching module is used for performing text matching on the address information of the preprocessed data information stored in the database and the goods receiving address consumed by the user, and determining the consumption scene category to which the goods receiving address consumed by the current user belongs.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, carries out the steps of the method of classifying consumption scenarios according to any of claims 1 to 7.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of classifying consumption scenarios according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010754949.6A CN111882224A (en) | 2020-07-30 | 2020-07-30 | Method and device for classifying consumption scenes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010754949.6A CN111882224A (en) | 2020-07-30 | 2020-07-30 | Method and device for classifying consumption scenes |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111882224A true CN111882224A (en) | 2020-11-03 |
Family
ID=73204796
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010754949.6A Pending CN111882224A (en) | 2020-07-30 | 2020-07-30 | Method and device for classifying consumption scenes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111882224A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113537387A (en) * | 2021-08-04 | 2021-10-22 | 北京思特奇信息技术股份有限公司 | Model design method and device for Internet online operation activities and computer equipment |
CN113779331A (en) * | 2021-09-26 | 2021-12-10 | 京东城市(北京)数字科技有限公司 | Address alias identification method and device, electronic equipment and computer storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150154537A1 (en) * | 2013-11-29 | 2015-06-04 | International Business Machines Corporation | Categorizing a use scenario of a product |
CN106296318A (en) * | 2015-04-26 | 2017-01-04 | 上海墨盾电脑科技有限公司 | A kind of ecommerce scene process method and system |
CN109816134A (en) * | 2017-11-22 | 2019-05-28 | 北京京东尚科信息技术有限公司 | Shipping address prediction technique, device and storage medium |
CN110197188A (en) * | 2018-02-26 | 2019-09-03 | 北京京东尚科信息技术有限公司 | Method, system, equipment and the storage medium of business scenario prediction, classification |
CN110765280A (en) * | 2019-10-22 | 2020-02-07 | 京东数字科技控股有限公司 | Address recognition method and device |
CN111274382A (en) * | 2018-11-20 | 2020-06-12 | 北京京东尚科信息技术有限公司 | Text classification method, device, equipment and storage medium |
-
2020
- 2020-07-30 CN CN202010754949.6A patent/CN111882224A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150154537A1 (en) * | 2013-11-29 | 2015-06-04 | International Business Machines Corporation | Categorizing a use scenario of a product |
CN106296318A (en) * | 2015-04-26 | 2017-01-04 | 上海墨盾电脑科技有限公司 | A kind of ecommerce scene process method and system |
CN109816134A (en) * | 2017-11-22 | 2019-05-28 | 北京京东尚科信息技术有限公司 | Shipping address prediction technique, device and storage medium |
CN110197188A (en) * | 2018-02-26 | 2019-09-03 | 北京京东尚科信息技术有限公司 | Method, system, equipment and the storage medium of business scenario prediction, classification |
CN111274382A (en) * | 2018-11-20 | 2020-06-12 | 北京京东尚科信息技术有限公司 | Text classification method, device, equipment and storage medium |
CN110765280A (en) * | 2019-10-22 | 2020-02-07 | 京东数字科技控股有限公司 | Address recognition method and device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113537387A (en) * | 2021-08-04 | 2021-10-22 | 北京思特奇信息技术股份有限公司 | Model design method and device for Internet online operation activities and computer equipment |
CN113779331A (en) * | 2021-09-26 | 2021-12-10 | 京东城市(北京)数字科技有限公司 | Address alias identification method and device, electronic equipment and computer storage medium |
CN113779331B (en) * | 2021-09-26 | 2024-02-06 | 京东城市(北京)数字科技有限公司 | Address alias identification method and device, electronic equipment and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105247507B (en) | Method, system and storage medium for the influence power score for determining brand | |
CN106682169B (en) | Application label mining method and device, application searching method and server | |
CN107862022B (en) | Culture resource recommendation system | |
CN107424043A (en) | A kind of Products Show method and device, electronic equipment | |
CN106970991B (en) | Similar application identification method and device, application search recommendation method and server | |
CN102760128A (en) | Telecommunication field package recommending method based on intelligent customer service robot interaction | |
CN106682170B (en) | Application search method and device | |
CN103336766A (en) | Short text garbage identification and modeling method and device | |
CN103324666A (en) | Topic tracing method and device based on micro-blog data | |
CN110134845A (en) | Project public sentiment monitoring method, device, computer equipment and storage medium | |
CN105069077A (en) | Search method and device | |
KR20190128246A (en) | Searching methods and apparatus and non-transitory computer-readable storage media | |
CN109978020A (en) | A kind of social networks account vest identity identification method based on multidimensional characteristic | |
CN110362662A (en) | Data processing method, device and computer readable storage medium | |
CN111882224A (en) | Method and device for classifying consumption scenes | |
CN113204953A (en) | Text matching method and device based on semantic recognition and device readable storage medium | |
CN108121741B (en) | Website quality evaluation method and device | |
CN111666513A (en) | Page processing method and device, electronic equipment and readable storage medium | |
CN107153697A (en) | Product search method and device in a kind of commodity transaction website | |
CN116823410B (en) | Data processing method, object processing method, recommending method and computing device | |
CN108959289B (en) | Website category acquisition method and device | |
CN107665442B (en) | Method and device for acquiring target user | |
CN116226494B (en) | Crawler system and method for information search | |
CN110209804B (en) | Target corpus determining method and device, storage medium and electronic device | |
CN115329078B (en) | Text data processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |