CN111882224A - Method and device for classifying consumption scenes - Google Patents

Method and device for classifying consumption scenes Download PDF

Info

Publication number
CN111882224A
CN111882224A CN202010754949.6A CN202010754949A CN111882224A CN 111882224 A CN111882224 A CN 111882224A CN 202010754949 A CN202010754949 A CN 202010754949A CN 111882224 A CN111882224 A CN 111882224A
Authority
CN
China
Prior art keywords
address
data information
text
scene
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010754949.6A
Other languages
Chinese (zh)
Inventor
周陶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sunplus Information Technology Chengdu Co ltd
Original Assignee
Sunplus Information Technology Chengdu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sunplus Information Technology Chengdu Co ltd filed Critical Sunplus Information Technology Chengdu Co ltd
Priority to CN202010754949.6A priority Critical patent/CN111882224A/en
Publication of CN111882224A publication Critical patent/CN111882224A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for classifying consumption scenes, wherein the method comprises the following steps: acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address; carrying out data preprocessing on the data information; storing the preprocessed data information into a database; acquiring a receiving address consumed by a current user; and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs. According to the scheme, the service flow does not need to be changed, and the user can realize the liquor-using scene behavior analysis which is not sensed by the user only by filling the receiving address according to the normal service flow, so that the operator can improve the service flow, analyze the user requirement and adjust the operation strategy.

Description

Method and device for classifying consumption scenes
Technical Field
The invention relates to the field of computers, in particular to a method and a device for classifying consumption scenes.
Background
In the past, it was very difficult for the goods produced and services provided to find the person who needs it, and it was common practice to advertise over a wide range, but the person who notified was not necessarily the person who needed the product and service, and it was a huge waste of social resources if the distribution was not good.
Meanwhile, people who provide goods and services need to distribute the services and goods to various regions, and how to distribute the services and goods cannot be wasted, and how to distribute the services and goods can enable people who need the services and goods to enjoy the needed services and goods in time is a problem which needs to be solved urgently at present.
Disclosure of Invention
In order to solve the problem that the produced products cannot be distributed to the demand users in time in the prior art, the invention provides a method and a device for classifying consumption scenes.
In a first aspect, the present invention provides a method for classifying a consumption scenario, the method comprising:
acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address;
carrying out data preprocessing on the data information;
storing the preprocessed data information into a database;
acquiring a receiving address consumed by a current user;
and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.
Further, the acquiring of the data information under each category after classification includes:
and acquiring data information corresponding to each category of the platform information website within preset time.
Further, the data preprocessing the data information comprises:
carrying out data structuring and data cleaning on the data information to obtain processed data information;
and calling a map interface to add longitude and latitude data to the address information in the processed data information.
Further, the text matching of the address information of the preprocessed data information stored in the database and the delivery address consumed by the user comprises:
matching the scene types of the receiving addresses by using regular sentences and the longest common substring algorithm;
if the matching fails, selecting a text with the matching score of the regular sentence and the longest common substring algorithm larger than a preset threshold value as a training sample;
splitting and extracting the address text in the training sample by using TF-IDF;
and inputting the extracted text into an xgboost model for training, and outputting a training result.
Further, the matching of the scene category of the receiving address by using the regular sentences and the longest common substring algorithm comprises the following steps:
carrying out basic classification on the goods receiving address by using a regular sentence to determine the scene type; or
And acquiring address information of a preset distance in a standard library, and performing text matching on the longitude and latitude data of the receiving address and the address information of the preset distance by using a longest common substring algorithm to determine the scene type.
Further, the method further comprises:
and if the address information of the preprocessed data information stored in the database fails to be matched with the delivery address consumed by the user in a text mode, inputting the text with failed matching as a target word vector into an xgboost model for training, and predicting the category of the consumption scene.
Further, the splitting and extracting of the address text in the training sample by using the TF-IDF comprises:
performing word segmentation processing on the address text in the training sample by using TF-IDF;
and according to the weight given to the words in the address text by each scene type, extracting the words meeting a preset weight threshold value.
In a second aspect, the present invention provides an apparatus for classifying a consumption scenario, the apparatus comprising:
the data information acquisition module is used for acquiring the classified data information under each category; the data information includes: scene name, scene type, province, city and address;
the preprocessing module is used for preprocessing data information;
the storage module is used for storing the preprocessed data information into a database;
the receiving address acquisition module is used for acquiring a receiving address consumed by a current user;
and the text matching module is used for performing text matching on the address information of the preprocessed data information stored in the database and the goods receiving address consumed by the user, and determining the consumption scene category to which the goods receiving address consumed by the current user belongs.
In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method for classifying a consumption scenario provided in the first aspect when executing the program.
In a fourth aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of classifying a consumption scenario as provided in the first aspect.
The invention well exerts the advantages of each party by acquiring the data information under each category, performing text matching, text analysis algorithm and big data calculation, has insufficient complementation and ensures the realization and stability of scene classification judgment. The obtained external experience is utilized, the external experience is combined with the internal data through text matching, and the insufficient precision of text matching is made up through a text analysis algorithm; meanwhile, through a big data analysis method, different consumption fields of all regions are compared in combination with information such as consumption behaviors of people, so that the brand can be helped to know consumption scenes of target service groups, effective data basis for reaching consumers is provided, and win-win of the consumers and the brand is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for classifying a consumption scenario according to an embodiment of the present invention;
fig. 2 is a block diagram of an apparatus for classifying consumption scenarios according to an embodiment of the present invention;
fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
With the development of the internet and the popularization of the mobile internet later, basic people are used to be online at any time, people can conveniently and quickly shop, socialize and the like on various platforms, and meanwhile footprints are left on the internet. For the production brand of wine products, the product provider also wants to know where people needing service are located, and the embodiment of the invention can know where people needing the service are located by sorting and classifying the information and matching the provided service and the characteristics of the product, so that the provider can accurately reach the people needing the service, waste is reduced, and the cooperative efficiency is improved. More specifically, as shown in fig. 1, an embodiment of the present invention provides a method for classifying a consumption scenario, where the method includes:
step S101, acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address;
step S102, data preprocessing is carried out on data information;
step S103, storing the preprocessed data information into a database;
step S104, acquiring a receiving address consumed by the current user;
and step S105, performing text matching on the address information of the preprocessed data information stored in the database and the goods receiving address consumed by the user, and determining the consumption scene category to which the goods receiving address consumed by the current user belongs.
Specifically, in the embodiment of the present invention, preferably, the consumption scenes of the wine products are classified, and crawl data information is first used by a crawler: the method comprises the steps of dividing main wine scene addresses into three categories of office places, residences and restaurants, selecting main platform information websites (including platforms such as 58, Mei Tuo and popular comment) of each category, establishing crawler tasks one by one, and crawling data regularly (or within preset time) through an online server. The data includes: scene name, scene type, province, city, and detailed address. The crawler data can continuously draw external experience, new data are obtained every time, samples are continuously accumulated through a text matching algorithm, and finally the accuracy of a text analysis algorithm is improved.
And carrying out data structuralization on the crawled data, storing the data to an online database, and carrying out preliminary data cleaning through a scheduling process. The cleaning comprises repeated data cleaning, messy code cleaning and special symbol cleaning. And calling common map interfaces such as a Baidu map, a Gaode map and the like to obtain the longitude and latitude data of the address, and storing the longitude and latitude data in a database.
And acquiring a consumption receiving address of a target user, performing text matching on address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs according to a text matching result.
The embodiment of the invention well exerts the advantages of each party by acquiring the data information under each category and performing text matching, text analysis algorithm and big data calculation, has insufficient complementation and ensures the realization and stability of scene classification judgment. The obtained external experience is utilized, the external experience is combined with the internal data through text matching, and the insufficient precision of text matching is made up through a text analysis algorithm; meanwhile, through a big data analysis method, different consumption fields of all regions are compared in combination with information such as consumption behaviors of people, so that the brand can be helped to know consumption scenes of target service groups, effective data basis for reaching consumers is provided, and win-win of the consumers and the brand is realized.
Based on the content of the above embodiments, as an alternative embodiment: the step of performing text matching on the address information of the preprocessed data information stored in the database and the delivery address consumed by the user comprises the following steps:
matching the scene types of the receiving addresses by using regular sentences and the longest common substring algorithm;
if the matching fails, selecting a text with the matching score of the regular sentence and the longest common substring algorithm larger than a preset threshold value as a training sample;
splitting and extracting the address text in the training sample by using TF-IDF;
and inputting the extracted text into an xgboost model for training, and outputting a training result.
Specifically, addresses with obvious classifications are classified based on regular sentences, and if the addresses contain certain words with obvious scene information (the main scene in the embodiment of the invention is a wine scene), the addresses are classified into the scene, such as a hot pot, a string, ktv and the like, and can be classified into catering scene types.
The method comprises the steps of carrying out scene category matching on a receiving address by using a longest common substring algorithm, matching preset distance, such as address information of nearby 1km, from a standard library mainly according to the receiving longitude and latitude, then calculating the similarity between the receiving address and the addresses one by using the longest common substring algorithm, and then obtaining the address with the highest similarity so as to judge a wine using scene. The longest common character string algorithm is that firstly, the longest common character string in the two texts is obtained, and then the length of the common character string/the length of the longer text in the two texts is calculated and used as the address similarity.
If partial address matching failure exists when the regular sentences and the longest common substring algorithm are used for performing text matching on the receiving address, selecting a text with the matching score of the regular sentences and the longest common substring algorithm larger than a preset threshold value as a training sample, for example, selecting a text with a higher score (more than 70 points) as the training sample, and then splitting and extracting the address text in the training sample by using TF-IDF; and inputting the extracted text into an xgboost model for training, and outputting a training result. The xgboost model is trained. And selecting important parameters to construct a grid for searching, and acquiring the optimal model parameters, wherein the parameters comprise the learning rate, the maximum depth, the estimation quantity and the minimum leaf node sample weight. And training the model again according to the obtained optimal model parameters, and verifying the model in the test set and storing the model.
Based on the content of the above embodiments, as an alternative embodiment: the method for splitting and extracting the address texts in the training samples by using the TF-IDF comprises the following steps:
performing word segmentation processing on the address text in the training sample by using TF-IDF;
and according to the weight given to the words in the address text by each scene type, extracting the words meeting a preset weight threshold value.
Specifically, word segmentation is carried out on the address text in the selected training sample, and stop words in word segmentation results are removed. The stop words are equivalent to the filter words to a certain extent, and in a general sense, the stop words can be roughly divided into the following two categories: 1. words that are used quite extensively, even too frequently. Words such as "i", "is", "what" in English, Chinese "I", "just" and the like appear on almost every document; 2. words in text that appear frequently but are not really meaningful. This category includes words such as auxiliary words, adverbs, prepositions, conjunctions, etc., which have no clear meaning and only have a certain effect when put in a complete sentence. Appropriately reducing the frequency of appearance of stop words can effectively help to improve keyword density.
And forming a new result set by using the results without the stop words according to the scene types, giving weights to words in the address text by adopting TF-IDF according to various scene types, and selecting the first 500 words to form a word vector. From the word vectors, the samples are reconstructed and a training set is generated. The TF-IDF method is to count the word frequency of the word in the text and compare the word frequency of the word in all texts, so as to find out the words which are relatively descriptive to the category. It should be noted that the term having the descriptive meaning for the category is a term that is relatively representative of the category type, such as the category of the dining scene, and tends to extract terms such as chafing dish, bottom materials, beverage, etc., and is given a higher weight value.
Based on the content of the above embodiments, as an alternative embodiment: the method further comprises the following steps:
and if the address information of the preprocessed data information stored in the database fails to be matched with the delivery address consumed by the user in a text mode, inputting the text with failed matching as a target word vector into an xgboost model for training, and predicting the category of the consumption scene.
Specifically, if the address text which fails to be matched still exists after the text address matching processing in the foregoing embodiments, the text which fails to be matched is used as the target word vector, and an xgboost model is input for training to predict the consumption scene category.
According to a further aspect of the present invention, an apparatus for classifying a consumption scenario is provided, and referring to fig. 2, fig. 2 is a block diagram of an apparatus for classifying a consumption scenario provided by an embodiment of the present invention. The device is used for completing the classification of the consumption scenes provided by the embodiment of the invention in the embodiments. Therefore, the description and definition of the method for classifying a consumption scenario provided in the foregoing embodiments of the present invention may be used for understanding each execution module in the embodiments of the present invention.
The device includes:
the data information acquiring module 201 is configured to acquire data information under each classified category; the data information includes: scene name, scene type, province, city and address;
the preprocessing module 202 is configured to perform data preprocessing on the data information;
the storage module 203 is used for storing the preprocessed data information into a database;
a receiving address obtaining module 204, configured to obtain a receiving address consumed by the current user;
the text matching module 205 is configured to perform text matching on the address information of the preprocessed data information stored in the database and the shipping address consumed by the user, and determine a consumption scenario category to which the shipping address consumed by the current user belongs.
Specifically, the specific process of each module in the apparatus of this embodiment to implement its function may refer to the related description in the corresponding method embodiment, and is not described herein again.
The embodiment of the invention well exerts the advantages of each party by acquiring the data information under each category and performing text matching, text analysis algorithm and big data calculation, has insufficient complementation and ensures the realization and stability of scene classification judgment. The obtained external experience is utilized, the external experience is combined with the internal data through text matching, and the insufficient precision of text matching is made up through a text analysis algorithm; meanwhile, through a big data analysis method, different consumption fields of all regions are compared in combination with information such as consumption behaviors of people, so that the brand can be helped to know consumption scenes of target service groups, effective data basis for reaching consumers is provided, and win-win of the consumers and the brand is realized.
Fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device includes: a processor 301, a memory 302, and a bus 303;
the processor 301 and the memory 302 respectively complete communication with each other through a bus 303; the processor 301 is configured to call the program instructions in the memory 302 to execute the method for classifying a consumption scenario provided by the foregoing embodiment, for example, including: acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address; carrying out data preprocessing on the data information; storing the preprocessed data information into a database; acquiring a receiving address consumed by a current user; and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.
Embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of classifying a consumption scenario. Examples include: acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address; carrying out data preprocessing on the data information; storing the preprocessed data information into a database; acquiring a receiving address consumed by a current user; and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, the principle and the implementation of the present invention are explained by applying the specific embodiments in the present invention, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method of classifying a consumption scenario, the method comprising:
acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address;
carrying out data preprocessing on the data information;
storing the preprocessed data information into a database;
acquiring a receiving address consumed by a current user;
and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.
2. The method according to claim 1, wherein the obtaining data information under each classified category comprises:
and acquiring data information corresponding to each category of the platform information website within preset time.
3. The method of claim 1, wherein pre-processing the data information comprises:
carrying out data structuring and data cleaning on the data information to obtain processed data information;
and calling a map interface to add longitude and latitude data to the address information in the processed data information.
4. The method of claim 1, wherein the text matching address information of the preprocessed data information stored in the database with the shipping address consumed by the user comprises:
matching scene types of the receiving addresses by using regular sentences and a longest common substring algorithm;
if the matching fails, selecting a text with the matching score of the regular sentence and the longest common substring algorithm larger than a preset threshold value as a training sample;
splitting and extracting the address text in the training sample by using TF-IDF;
and inputting the extracted text into an xgboost model for training, and outputting a training result.
5. The method of claim 4, wherein the matching of the shipping address for the scene category using a canonical statement and a longest common substring algorithm comprises:
carrying out basic classification on the receiving address by using a regular statement to determine a scene type; or
And acquiring address information of a preset distance in a standard library, and performing text matching on the longitude and latitude data of the receiving address and the address information of the preset distance by using a longest common substring algorithm to determine the scene type.
6. The method of claim 1, further comprising:
and if the address information of the preprocessed data information stored in the database fails to be matched with the delivery address consumed by the user in a text mode, inputting the text with failed matching as a target word vector into an xgboost model for training, and predicting the category of the consumption scene.
7. The method of claim 4, wherein the splitting and extracting address text in the training samples using TF-IDF comprises:
performing word segmentation processing on the address text in the training sample by using TF-IDF;
and according to the weight given to the words in the address text by each scene type, extracting the words meeting a preset weight threshold value.
8. An apparatus for classifying a consumption scenario, the apparatus comprising:
the data information acquisition module is used for acquiring the classified data information under each category; the data information includes: scene name, scene type, province, city and address;
the preprocessing module is used for preprocessing the data information;
the storage module is used for storing the preprocessed data information into a database;
the receiving address acquisition module is used for acquiring a receiving address consumed by a current user;
and the text matching module is used for performing text matching on the address information of the preprocessed data information stored in the database and the goods receiving address consumed by the user, and determining the consumption scene category to which the goods receiving address consumed by the current user belongs.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, carries out the steps of the method of classifying consumption scenarios according to any of claims 1 to 7.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of classifying consumption scenarios according to any one of claims 1 to 7.
CN202010754949.6A 2020-07-30 2020-07-30 Method and device for classifying consumption scenes Pending CN111882224A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010754949.6A CN111882224A (en) 2020-07-30 2020-07-30 Method and device for classifying consumption scenes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010754949.6A CN111882224A (en) 2020-07-30 2020-07-30 Method and device for classifying consumption scenes

Publications (1)

Publication Number Publication Date
CN111882224A true CN111882224A (en) 2020-11-03

Family

ID=73204796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010754949.6A Pending CN111882224A (en) 2020-07-30 2020-07-30 Method and device for classifying consumption scenes

Country Status (1)

Country Link
CN (1) CN111882224A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537387A (en) * 2021-08-04 2021-10-22 北京思特奇信息技术股份有限公司 Model design method and device for Internet online operation activities and computer equipment
CN113779331A (en) * 2021-09-26 2021-12-10 京东城市(北京)数字科技有限公司 Address alias identification method and device, electronic equipment and computer storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150154537A1 (en) * 2013-11-29 2015-06-04 International Business Machines Corporation Categorizing a use scenario of a product
CN106296318A (en) * 2015-04-26 2017-01-04 上海墨盾电脑科技有限公司 A kind of ecommerce scene process method and system
CN109816134A (en) * 2017-11-22 2019-05-28 北京京东尚科信息技术有限公司 Shipping address prediction technique, device and storage medium
CN110197188A (en) * 2018-02-26 2019-09-03 北京京东尚科信息技术有限公司 Method, system, equipment and the storage medium of business scenario prediction, classification
CN110765280A (en) * 2019-10-22 2020-02-07 京东数字科技控股有限公司 Address recognition method and device
CN111274382A (en) * 2018-11-20 2020-06-12 北京京东尚科信息技术有限公司 Text classification method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150154537A1 (en) * 2013-11-29 2015-06-04 International Business Machines Corporation Categorizing a use scenario of a product
CN106296318A (en) * 2015-04-26 2017-01-04 上海墨盾电脑科技有限公司 A kind of ecommerce scene process method and system
CN109816134A (en) * 2017-11-22 2019-05-28 北京京东尚科信息技术有限公司 Shipping address prediction technique, device and storage medium
CN110197188A (en) * 2018-02-26 2019-09-03 北京京东尚科信息技术有限公司 Method, system, equipment and the storage medium of business scenario prediction, classification
CN111274382A (en) * 2018-11-20 2020-06-12 北京京东尚科信息技术有限公司 Text classification method, device, equipment and storage medium
CN110765280A (en) * 2019-10-22 2020-02-07 京东数字科技控股有限公司 Address recognition method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537387A (en) * 2021-08-04 2021-10-22 北京思特奇信息技术股份有限公司 Model design method and device for Internet online operation activities and computer equipment
CN113779331A (en) * 2021-09-26 2021-12-10 京东城市(北京)数字科技有限公司 Address alias identification method and device, electronic equipment and computer storage medium
CN113779331B (en) * 2021-09-26 2024-02-06 京东城市(北京)数字科技有限公司 Address alias identification method and device, electronic equipment and computer storage medium

Similar Documents

Publication Publication Date Title
CN105247507B (en) Method, system and storage medium for the influence power score for determining brand
CN106682169B (en) Application label mining method and device, application searching method and server
CN107862022B (en) Culture resource recommendation system
CN107424043A (en) A kind of Products Show method and device, electronic equipment
CN106970991B (en) Similar application identification method and device, application search recommendation method and server
CN102760128A (en) Telecommunication field package recommending method based on intelligent customer service robot interaction
CN106682170B (en) Application search method and device
CN103336766A (en) Short text garbage identification and modeling method and device
CN103324666A (en) Topic tracing method and device based on micro-blog data
CN110134845A (en) Project public sentiment monitoring method, device, computer equipment and storage medium
CN105069077A (en) Search method and device
KR20190128246A (en) Searching methods and apparatus and non-transitory computer-readable storage media
CN109978020A (en) A kind of social networks account vest identity identification method based on multidimensional characteristic
CN110362662A (en) Data processing method, device and computer readable storage medium
CN111882224A (en) Method and device for classifying consumption scenes
CN113204953A (en) Text matching method and device based on semantic recognition and device readable storage medium
CN108121741B (en) Website quality evaluation method and device
CN111666513A (en) Page processing method and device, electronic equipment and readable storage medium
CN107153697A (en) Product search method and device in a kind of commodity transaction website
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN108959289B (en) Website category acquisition method and device
CN107665442B (en) Method and device for acquiring target user
CN116226494B (en) Crawler system and method for information search
CN110209804B (en) Target corpus determining method and device, storage medium and electronic device
CN115329078B (en) Text data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination