CN111882224A

CN111882224A - Method and device for classifying consumption scenes

Info

Publication number: CN111882224A
Application number: CN202010754949.6A
Authority: CN
Inventors: 周陶
Original assignee: Sunplus Information Technology Chengdu Co ltd
Current assignee: Sunplus Information Technology Chengdu Co ltd
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2020-11-03

Abstract

The invention provides a method and a device for classifying consumption scenes, wherein the method comprises the following steps: acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address; carrying out data preprocessing on the data information; storing the preprocessed data information into a database; acquiring a receiving address consumed by a current user; and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs. According to the scheme, the service flow does not need to be changed, and the user can realize the liquor-using scene behavior analysis which is not sensed by the user only by filling the receiving address according to the normal service flow, so that the operator can improve the service flow, analyze the user requirement and adjust the operation strategy.

Description

Method and device for classifying consumption scenes

Technical Field

The invention relates to the field of computers, in particular to a method and a device for classifying consumption scenes.

Background

In the past, it was very difficult for the goods produced and services provided to find the person who needs it, and it was common practice to advertise over a wide range, but the person who notified was not necessarily the person who needed the product and service, and it was a huge waste of social resources if the distribution was not good.

Meanwhile, people who provide goods and services need to distribute the services and goods to various regions, and how to distribute the services and goods cannot be wasted, and how to distribute the services and goods can enable people who need the services and goods to enjoy the needed services and goods in time is a problem which needs to be solved urgently at present.

Disclosure of Invention

In order to solve the problem that the produced products cannot be distributed to the demand users in time in the prior art, the invention provides a method and a device for classifying consumption scenes.

In a first aspect, the present invention provides a method for classifying a consumption scenario, the method comprising:

acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address;

carrying out data preprocessing on the data information;

storing the preprocessed data information into a database;

acquiring a receiving address consumed by a current user;

and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.

Further, the acquiring of the data information under each category after classification includes:

and acquiring data information corresponding to each category of the platform information website within preset time.

Further, the data preprocessing the data information comprises:

carrying out data structuring and data cleaning on the data information to obtain processed data information;

and calling a map interface to add longitude and latitude data to the address information in the processed data information.

Further, the text matching of the address information of the preprocessed data information stored in the database and the delivery address consumed by the user comprises:

matching the scene types of the receiving addresses by using regular sentences and the longest common substring algorithm;

if the matching fails, selecting a text with the matching score of the regular sentence and the longest common substring algorithm larger than a preset threshold value as a training sample;

splitting and extracting the address text in the training sample by using TF-IDF;

and inputting the extracted text into an xgboost model for training, and outputting a training result.

Further, the matching of the scene category of the receiving address by using the regular sentences and the longest common substring algorithm comprises the following steps:

carrying out basic classification on the goods receiving address by using a regular sentence to determine the scene type; or

And acquiring address information of a preset distance in a standard library, and performing text matching on the longitude and latitude data of the receiving address and the address information of the preset distance by using a longest common substring algorithm to determine the scene type.

Further, the method further comprises:

and if the address information of the preprocessed data information stored in the database fails to be matched with the delivery address consumed by the user in a text mode, inputting the text with failed matching as a target word vector into an xgboost model for training, and predicting the category of the consumption scene.

Further, the splitting and extracting of the address text in the training sample by using the TF-IDF comprises:

performing word segmentation processing on the address text in the training sample by using TF-IDF;

and according to the weight given to the words in the address text by each scene type, extracting the words meeting a preset weight threshold value.

In a second aspect, the present invention provides an apparatus for classifying a consumption scenario, the apparatus comprising:

the data information acquisition module is used for acquiring the classified data information under each category; the data information includes: scene name, scene type, province, city and address;

the preprocessing module is used for preprocessing data information;

the storage module is used for storing the preprocessed data information into a database;

the receiving address acquisition module is used for acquiring a receiving address consumed by a current user;

and the text matching module is used for performing text matching on the address information of the preprocessed data information stored in the database and the goods receiving address consumed by the user, and determining the consumption scene category to which the goods receiving address consumed by the current user belongs.

In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method for classifying a consumption scenario provided in the first aspect when executing the program.

In a fourth aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of classifying a consumption scenario as provided in the first aspect.

The invention well exerts the advantages of each party by acquiring the data information under each category, performing text matching, text analysis algorithm and big data calculation, has insufficient complementation and ensures the realization and stability of scene classification judgment. The obtained external experience is utilized, the external experience is combined with the internal data through text matching, and the insufficient precision of text matching is made up through a text analysis algorithm; meanwhile, through a big data analysis method, different consumption fields of all regions are compared in combination with information such as consumption behaviors of people, so that the brand can be helped to know consumption scenes of target service groups, effective data basis for reaching consumers is provided, and win-win of the consumers and the brand is realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for classifying a consumption scenario according to an embodiment of the present invention;

fig. 2 is a block diagram of an apparatus for classifying consumption scenarios according to an embodiment of the present invention;

fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

With the development of the internet and the popularization of the mobile internet later, basic people are used to be online at any time, people can conveniently and quickly shop, socialize and the like on various platforms, and meanwhile footprints are left on the internet. For the production brand of wine products, the product provider also wants to know where people needing service are located, and the embodiment of the invention can know where people needing the service are located by sorting and classifying the information and matching the provided service and the characteristics of the product, so that the provider can accurately reach the people needing the service, waste is reduced, and the cooperative efficiency is improved. More specifically, as shown in fig. 1, an embodiment of the present invention provides a method for classifying a consumption scenario, where the method includes:

step S101, acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address;

step S102, data preprocessing is carried out on data information;

step S103, storing the preprocessed data information into a database;

step S104, acquiring a receiving address consumed by the current user;

and step S105, performing text matching on the address information of the preprocessed data information stored in the database and the goods receiving address consumed by the user, and determining the consumption scene category to which the goods receiving address consumed by the current user belongs.

Specifically, in the embodiment of the present invention, preferably, the consumption scenes of the wine products are classified, and crawl data information is first used by a crawler: the method comprises the steps of dividing main wine scene addresses into three categories of office places, residences and restaurants, selecting main platform information websites (including platforms such as 58, Mei Tuo and popular comment) of each category, establishing crawler tasks one by one, and crawling data regularly (or within preset time) through an online server. The data includes: scene name, scene type, province, city, and detailed address. The crawler data can continuously draw external experience, new data are obtained every time, samples are continuously accumulated through a text matching algorithm, and finally the accuracy of a text analysis algorithm is improved.

And carrying out data structuralization on the crawled data, storing the data to an online database, and carrying out preliminary data cleaning through a scheduling process. The cleaning comprises repeated data cleaning, messy code cleaning and special symbol cleaning. And calling common map interfaces such as a Baidu map, a Gaode map and the like to obtain the longitude and latitude data of the address, and storing the longitude and latitude data in a database.

And acquiring a consumption receiving address of a target user, performing text matching on address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs according to a text matching result.

The embodiment of the invention well exerts the advantages of each party by acquiring the data information under each category and performing text matching, text analysis algorithm and big data calculation, has insufficient complementation and ensures the realization and stability of scene classification judgment. The obtained external experience is utilized, the external experience is combined with the internal data through text matching, and the insufficient precision of text matching is made up through a text analysis algorithm; meanwhile, through a big data analysis method, different consumption fields of all regions are compared in combination with information such as consumption behaviors of people, so that the brand can be helped to know consumption scenes of target service groups, effective data basis for reaching consumers is provided, and win-win of the consumers and the brand is realized.

Based on the content of the above embodiments, as an alternative embodiment: the step of performing text matching on the address information of the preprocessed data information stored in the database and the delivery address consumed by the user comprises the following steps:

Specifically, addresses with obvious classifications are classified based on regular sentences, and if the addresses contain certain words with obvious scene information (the main scene in the embodiment of the invention is a wine scene), the addresses are classified into the scene, such as a hot pot, a string, ktv and the like, and can be classified into catering scene types.

The method comprises the steps of carrying out scene category matching on a receiving address by using a longest common substring algorithm, matching preset distance, such as address information of nearby 1km, from a standard library mainly according to the receiving longitude and latitude, then calculating the similarity between the receiving address and the addresses one by using the longest common substring algorithm, and then obtaining the address with the highest similarity so as to judge a wine using scene. The longest common character string algorithm is that firstly, the longest common character string in the two texts is obtained, and then the length of the common character string/the length of the longer text in the two texts is calculated and used as the address similarity.

If partial address matching failure exists when the regular sentences and the longest common substring algorithm are used for performing text matching on the receiving address, selecting a text with the matching score of the regular sentences and the longest common substring algorithm larger than a preset threshold value as a training sample, for example, selecting a text with a higher score (more than 70 points) as the training sample, and then splitting and extracting the address text in the training sample by using TF-IDF; and inputting the extracted text into an xgboost model for training, and outputting a training result. The xgboost model is trained. And selecting important parameters to construct a grid for searching, and acquiring the optimal model parameters, wherein the parameters comprise the learning rate, the maximum depth, the estimation quantity and the minimum leaf node sample weight. And training the model again according to the obtained optimal model parameters, and verifying the model in the test set and storing the model.

Based on the content of the above embodiments, as an alternative embodiment: the method for splitting and extracting the address texts in the training samples by using the TF-IDF comprises the following steps:

Specifically, word segmentation is carried out on the address text in the selected training sample, and stop words in word segmentation results are removed. The stop words are equivalent to the filter words to a certain extent, and in a general sense, the stop words can be roughly divided into the following two categories: 1. words that are used quite extensively, even too frequently. Words such as "i", "is", "what" in English, Chinese "I", "just" and the like appear on almost every document; 2. words in text that appear frequently but are not really meaningful. This category includes words such as auxiliary words, adverbs, prepositions, conjunctions, etc., which have no clear meaning and only have a certain effect when put in a complete sentence. Appropriately reducing the frequency of appearance of stop words can effectively help to improve keyword density.

And forming a new result set by using the results without the stop words according to the scene types, giving weights to words in the address text by adopting TF-IDF according to various scene types, and selecting the first 500 words to form a word vector. From the word vectors, the samples are reconstructed and a training set is generated. The TF-IDF method is to count the word frequency of the word in the text and compare the word frequency of the word in all texts, so as to find out the words which are relatively descriptive to the category. It should be noted that the term having the descriptive meaning for the category is a term that is relatively representative of the category type, such as the category of the dining scene, and tends to extract terms such as chafing dish, bottom materials, beverage, etc., and is given a higher weight value.

Based on the content of the above embodiments, as an alternative embodiment: the method further comprises the following steps:

Specifically, if the address text which fails to be matched still exists after the text address matching processing in the foregoing embodiments, the text which fails to be matched is used as the target word vector, and an xgboost model is input for training to predict the consumption scene category.

According to a further aspect of the present invention, an apparatus for classifying a consumption scenario is provided, and referring to fig. 2, fig. 2 is a block diagram of an apparatus for classifying a consumption scenario provided by an embodiment of the present invention. The device is used for completing the classification of the consumption scenes provided by the embodiment of the invention in the embodiments. Therefore, the description and definition of the method for classifying a consumption scenario provided in the foregoing embodiments of the present invention may be used for understanding each execution module in the embodiments of the present invention.

The device includes:

the data information acquiring module 201 is configured to acquire data information under each classified category; the data information includes: scene name, scene type, province, city and address;

the preprocessing module 202 is configured to perform data preprocessing on the data information;

the storage module 203 is used for storing the preprocessed data information into a database;

a receiving address obtaining module 204, configured to obtain a receiving address consumed by the current user;

the text matching module 205 is configured to perform text matching on the address information of the preprocessed data information stored in the database and the shipping address consumed by the user, and determine a consumption scenario category to which the shipping address consumed by the current user belongs.

Specifically, the specific process of each module in the apparatus of this embodiment to implement its function may refer to the related description in the corresponding method embodiment, and is not described herein again.

Fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device includes: a processor 301, a memory 302, and a bus 303;

the processor 301 and the memory 302 respectively complete communication with each other through a bus 303; the processor 301 is configured to call the program instructions in the memory 302 to execute the method for classifying a consumption scenario provided by the foregoing embodiment, for example, including: acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address; carrying out data preprocessing on the data information; storing the preprocessed data information into a database; acquiring a receiving address consumed by a current user; and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.

Embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of classifying a consumption scenario. Examples include: acquiring data information under each category after classification; the data information includes: scene name, scene type, province, city and address; carrying out data preprocessing on the data information; storing the preprocessed data information into a database; acquiring a receiving address consumed by a current user; and performing text matching on the address information of the preprocessed data information stored in the database and the receiving address consumed by the user, and determining the consumption scene category to which the receiving address consumed by the current user belongs.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, the principle and the implementation of the present invention are explained by applying the specific embodiments in the present invention, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of classifying a consumption scenario, the method comprising:

carrying out data preprocessing on the data information;

storing the preprocessed data information into a database;

acquiring a receiving address consumed by a current user;

2. The method according to claim 1, wherein the obtaining data information under each classified category comprises:

3. The method of claim 1, wherein pre-processing the data information comprises:

4. The method of claim 1, wherein the text matching address information of the preprocessed data information stored in the database with the shipping address consumed by the user comprises:

matching scene types of the receiving addresses by using regular sentences and a longest common substring algorithm;

5. The method of claim 4, wherein the matching of the shipping address for the scene category using a canonical statement and a longest common substring algorithm comprises:

carrying out basic classification on the receiving address by using a regular statement to determine a scene type; or

6. The method of claim 1, further comprising:

7. The method of claim 4, wherein the splitting and extracting address text in the training samples using TF-IDF comprises:

8. An apparatus for classifying a consumption scenario, the apparatus comprising:

the preprocessing module is used for preprocessing the data information;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, carries out the steps of the method of classifying consumption scenarios according to any of claims 1 to 7.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of classifying consumption scenarios according to any one of claims 1 to 7.