CA3060822A1 - Label information acquistion method and apparatus, electronic device and computer readable medium - Google Patents

Label information acquistion method and apparatus, electronic device and computer readable medium Download PDF

Info

Publication number
CA3060822A1
CA3060822A1 CA3060822A CA3060822A CA3060822A1 CA 3060822 A1 CA3060822 A1 CA 3060822A1 CA 3060822 A CA3060822 A CA 3060822A CA 3060822 A CA3060822 A CA 3060822A CA 3060822 A1 CA3060822 A1 CA 3060822A1
Authority
CA
Canada
Prior art keywords
performance
addresses
user
information
classification results
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3060822A
Other languages
French (fr)
Inventor
Jiacheng Ni
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
NI, JIACHENG
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NI, JIACHENG, 10353744 Canada Ltd filed Critical NI, JIACHENG
Publication of CA3060822A1 publication Critical patent/CA3060822A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a label information obtaining method and apparatus, an electronic device and a computer readable medium and pertains to the field of Internet technology. The label information obtaining method comprises: classifying performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses where the user performs orders; and analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user. The method obtains label information of the user, such as occupation, housing values, living habits and other relatively strong financial attributes, by classifying performance addresses of a user and analyzing in combination with performance behaviors of the user , and evaluates consumption power of the user under the precondition of not obtaining sensitive information of the user.

Description

LABEL INFORMATION ACQUISITION METHOD AND APPARATUS, ELECTRONIC
DEVICE AND COMPUTER READABLE MEDIUM
Technical Field [0001] The present disclosure generally relates to the field of Internet technology, specifically to a label information acquisition method and apparatus, an electronic device and a computer readable medium.
Background
[0002] In the conventional financial sector, the income level, consumption power, loan repayment capacity and other information of a user are obtained generally from bank statements, housing provident funds, social insurance, individual income tax certificates, property ownership certificates and an employment certificate in combination with the information filled during application. However, in an online financial platform, we are unable to directly obtain information about occupation and property values.
[0003] Therefore, the technical solutions in the prior art still have a room to improve.
[0004] The foregoing information disclosed in the section of background art is only intended to deepen understanding on the background of the present disclosure, so it may include information that does not constitute the prior art known to those of ordinary skill in the art.
Summary
[0005] The present disclosure provides a label information obtaining method and apparatus, an electronic device and a computer readable medium to solve at least one of the foregoing problems.
[0006] Other features and advantages of the present disclosure will be evident through the following detailed description, or partially learnt through practice of the present disclosure.
[0007] According to one aspect of the present disclosure, a label information obtaining method is provided, comprising:
[0008] classifying performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses where the user performs orders; and analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user.
[0009] In an embodiment of the present disclosure, analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user comprises:
[0010] conducting attribute analysis of the classification results in combination with an industry information database to obtain attribute information of the performance addresses;
[0011] analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results and the attribute information to obtain label information of the user.
[0012] In an embodiment of the present disclosure, classifying the performance addresses to obtain classification results of the performance addresses comprises:
[0013] conducting classified labeling of historical performance addresses based on word vectors to obtain mapping relations between performance addresses and categories, wherein the historical performance addresses are addresses where a plurality of users in a platform perform historical orders;
[0014] obtaining the classification results specific to the performance addresses in combination with the mapping relations between performance addresses and categories.
[0015] In an embodiment of the present disclosure, conducting classified labeling of historical performance addresses based on word vectors comprises:
[0016] extracting trunk information from the historical performance addresses; conducting word segmentation of the trunk information by word segmentation technique to obtain a plurality of address segments; converting the plurality of address segments into word vectors; clustering the word vectors; conducting corresponding classified labeling of classification results of trunk information of the performance addresses according to clustering results.
[0017] In an embodiment of the present disclosure, classifying the performance addresses to obtain classification results of the performance addresses comprises:
[0018] conducting Softmax training of historical performance addresses based on word features to obtain a text classification model, wherein the historical performance addresses are addresses where a plurality of users in a platform perform historical orders;
[0019] inputting the performance addresses to the text classification model and outputting the classification results.
[0020] In an embodiment of the present disclosure, conducting Softmax training of historical performance addresses based on word features comprises:
[0021] configuring a corresponding rule for address text and classification; matching the historical performance addresses by multi-pattern matching, and outputting corresponding classification results according to the Corresponding rule if the historical performance addresses are matched with the address text; and segmenting the performance addresses, performing multiple combination of obtained segments and conducting Softmax training based on features of a single segment or a plurality of segments to obtain the text classification model.
[0022] In an embodiment of the present disclosure, before conducting attribute analysis of the classification results in combination with an industry information database, the method further comprises:
[0023] pre-processing industry classification information; establishing an industry information database according to preprocessed industry classification information;
[0024] wherein the industry information database contains a plurality of pieces of information, each of which comprises:
[0025] a label; classification results; attribute information;
[0026] The classification results include: at least one or more of housing, hospital, hotel, office building and leisure and entertainment venue; the attribute information includes: at least one or more of housing price, hospital type, hotel star rating, office building grade and leisure and entertainment venue grade.
[0027] In an embodiment of the present disclosure, conducting attribute analysis of the classification results in combination with an industry information database to obtain attribute information:
[0028] obtaining corresponding attribute information of the performance addresses through forward maximum matching of information in the classification results and the industry information database.
[0029] In an embodiment of the present disclosure, before analyzing according to performance behaviors of a user specific to the performance addresses in combination with the classification results, the method further comprises:
[0030] obtaining performance behaviors of the user specific to the performance addresses;

=

wherein the performance behaviors include: at least one of the following information, including workday performance times, non-workday performance times, time span of performance, and labeling of performance addresses by the user.
[0031] In an embodiment of the present disclosure, analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user comprises:
[0032] if workday performance times of the user are greater than or equal to a first threshold value, and the classification result is hospital, then obtained label information of the user is occupation, which is medical staff.
[0033] In an embodiment of the present disclosure, analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results and the attribute information to obtain label information of the user comprises:
[0034] if non-workday performance times of the user are greater than or equal to a second threshold value, and the classification result is housing, and a housing price in the attribute information is greater than or equal to a third threshold value, then obtained label information of the user is a high-end living quarter.
[0035] According to another aspect of the present disclosure, a label information obtaining apparatus is provided, comprising: an address classification module, configured to classify performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses where the user performs orders; and a label analysis module, configured to analyze according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user.
[0036] According to another aspect of the present disclosure, an electronic device is provided, comprising a processor; and a memory, storing instructions for controlling steps of the foregoing method by the processor.
[0037] According to another aspect of the present disclosure, a computer readable medium is provided, and stores computer executable instructions, which achieve steps of the foregoing method when being executed.
[0038] The label information obtaining method and apparatus, electronic device and computer readable medium provided by embodiments of the present disclosure obtain label information of a user, such as occupation, housing values, living habits and other relatively strong financial attributes by classifying performance addresses of the user 7268 and analyzing in combination with performance behaviors of the user, and evaluate consumption power of the user under the precondition of not obtaining sensitive information of the user.
[0039] It should be understood that the foregoing general description and subsequent detailed description are only exemplary and cannot limit the present disclosure.
Brief Description of The Drawings
[0040] Through detailed description of exemplary embodiments of the present disclosure with reference to accompanying drawings, the foregoing and other objectives, features and advantages of the present disclosure will be evident.
[0041] FIG. 1 is a flow chart of a label information obtaining method provided in an embodiment of the present disclosure.
[0042] FIG. 2 is a flow chart of an alternative label information obtaining method provided in an embodiment of the present disclosure.
[0043] FIG. 3 is a flow chart of classified labeling based on word vectors in an embodiment of the present disclosure.
[0044] FIG. 4 is a flow chart of text classification training based on word features in an embodiment of the present disclosure.
[0045] FIG. 5 is a flow chart of classifying performance addresses of a user in an embodiment of the present disclosure.
[0046] FIG. 6 is a schematic diagram of a label information obtaining apparatus in another embodiment of the present disclosure.
[0047] FIG. 7 is a schematic diagram of an alternative label information obtaining apparatus provided in another embodiment of the present disclosure.
[0048] FIG. 8 is a structural schematic diagram of an electronic device provided by an embodiment of the present disclosure and suitable to achieve embodiments of the present application.
Detailed Description
[0049] Now, exemplary implementation manners are more comprehensively described with reference to accompanying drawings. However, exemplary implementation manners can be implemented in various forms and should not be understood that they are limited to the examples set forth herein; on the contrary, provision of these implementation manners makes the present disclosure more comprehensive and complete and comprehensively conveys the conception of the exemplary implementation manners to those skilled in the art. The accompanying drawings are only schematic diagrams of the present disclosure and not drawn definitely in proportion. The same reference signs in the drawings denote the same or similar parts, so the repetitive description on them will be omitted here.
[0050] Further, the described characteristics, structures or features can be combined in one or more implementation manners in any appropriate way. In the following description, many details are provided to fully understand the implementation manners of the present disclosure.
However, those skilled in the art will be aware that the technical solution of the present disclosure can be practiced while omitting one or more of the specific details, or other methods, components, apparatuses and steps can be adopted. Under other circumstances, well-known structures, methods, apparatuses, realizations, materials or operations are not stated or described in detail to avoid stealing the show and blur various aspects of the present disclosure.
[0051] Some block diagrams shown in the accompanying drawings are functional entities and do not have to correspond to physically or logically independent entities.
These functional entities can be achieved in form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and /or microprocessor devices.
[0052] In order to make the objectives, technical solutions and advantages of the present invention clearer and more comprehensible, the present invention is further elaborated in combination with specific embodiments and with reference to accompanying drawings.
[0053] In relevant embodiments of the present invention, some attributes of a user can be portrayed in a platform typically according to direct consuming behaviors of the user (such as group buying, take-away, reservation, movies and tickets). For example, according to behaviors of a customer in a platform, such as the movies or tickets that the customer has browsed or transacted, the age and likings of the user can be analyzed and portrayed.
However, excavating financial attributes of a user simply from transactions, browsing and other behaviors of the user is likely to be limited by platform category, resulting in inadequate excavation of financial attributes of the user.
[0054] Based on the foregoing problems, some embodiments of the present disclosure provide a label information obtaining method and apparatus, an electronic device and a computer readable medium, structuralize performance addresses by natural language processing technology based on performance addresses of a user, and further obtain housing values, occupation, living habits and so on of the user from structuralized information, thereby extracting relatively strong financial attributes.
[0055] FIG. 1 is a flow chart of a label information obtaining method provided in an embodiment of the present disclosure, comprising the following steps:
[0056] As shown in FIG. 1, at step S110, classifying performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses where the user performs orders.
[0057] As shown in FIG. 1, at step S120, analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user.
[0058] FIG. 2 is a flow chart of an alternative label information obtaining method provided in an embodiment of the present disclosure, comprising the following steps:
[0059] As shown in FIG. 2, at step S210, classifying performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses where the user performs orders.
[0060] As shown in FIG. 2, at step S220, conducting attribute analysis of the classification results in combination with an industry information database to obtain attribute information of the performance addresses.
[0061] As shown in FIG. 2, at step S230, analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results and the attribute information to obtain label information of the user.
[0062] Different from the method flow shown in FIG. 1, the flow shown in FIG. 2 further conducts attribute analysis according to classification results in combination with an industry information database so that user's label can be deeply analyzed according to performance behaviors of the user in combination with classification results and attribute information of the user to obtain label information of the user.
[0063] The label information obtaining method in this exemplary embodiment obtains label information of a user, such as occupation, housing values, living habits and other relatively strong financial attributes through classification and attribute recognition of performance addresses of the user and through analysis in combination with performance behaviors of the user, and evaluates consumption power of the user under the precondition of not obtaining sensitive information of the user.
[0064] Below the flow shown in FIG. 2 is taken as an example to further describe every step of the label information obtaining method in embodiments of the present disclosure.
[0065] At step S210, classifying performance addresses of a user to obtain classification results of the performance addresses.
[0066] In an embodiment of the present disclosure, the performance addresses are addresses provided by the user to perform orders. For example, addresses written by a user for order performance when the user places orders under 020, such as addresses relating to take-away or online taxi hailing and other orders, are performance addresses. One take-away order contains a performance address, and one online taxi hailing order contains two performance addresses (departure place and destination). The performance addresses in this embodiment mainly take the performance address in the take-away order as an example, while the two performance addresses in the online taxi hailing order, including departure place and destination, are also applicable.
[0067] In an embodiment of the present disclosure, at step S210, classifying the performance addresses to obtain classification results of the performance addresses can be achieved by the following two methods for offline model training, to be specific:
[0068] Methods for offline model training may include:
[0069] 1) conducting classified labeling of the historical performance addresses based on word vectors to obtain mapping relations between performance addresses and categories;
[0070] obtaining the classification results specific to the performance addresses in combination with the mapping relations between performance addresses and categories; or
[0071] 2) conducting Softmax training of the historical performance addresses based on word features to obtain a text classification model;
[0072] inputting the performance addresses to the text classification model and outputting the classification results.
[0073] The foregoing historical performance addresses are addresses where a plurality of users in a platform perform historical orders.
[0074] FIG. 3 is a flow chart of classified labeling based on word vectors in an embodiment of the present disclosure, comprising the following steps:
[0075] As shown in FIG. 3, at step S301, extracting trunk information from the historical performance addresses.
[0076] In an embodiment of the present disclosure, before this step, the method further comprises filtering all historical performance addresses obtained from the platform. Filtering methods include deduplication and other operations.
[0077] For example, a complete performance address can be as follows:
[0078] Zhaofeng Plaza, Changning Road, Shanghai City (Fangtang Town opposite to Xiao Nan Guo on Floor 4)
[0079] The trunk information in the foregoing performance address is "Zhaofeng Plaza, Changning Road, Shanghai City". At this step, information in the brackets is not considered for the time being.
[0080] As shown in FIG. 3, at step S302, conducting word segmentation of the trunk information by word segmentation technique to obtain a plurality of address segments.
[0081] In an embodiment of the present disclosure, for English addresses, space is used as a separator during word segmentation, and there are many word segmentation techniques for Chinese addresses and in general, word segmentation based on dictionary matching and Markov model can be selected according to the requirements.
[0082] Still taking the foregoing performance address as an example, address segments obtained from step S202 are:
[0083] Zhaofeng Plaza, Changning Road, Shanghai City
[0084] As shown in FIG. 3, at step S303, converting the plurality of address segments into word vectors.
[0085] In an embodiment of the present disclosure, at this step, word2vec technique (e.g., skipNgram model) is adopted to convert address segments into word vectors, and the word vector technique trains a neutral network model through context between words.
[0086] Still taking the foregoing performance address as an example, each of address segments "Shanghai City", "Changning Road" and "Zhaofeng Plaza" will obtain a word vector.
In the end, these word vectors are accumulated to obtain word vectors to which this address text corresponds.
[0087] As shown in FIG. 3, at step S304, clustering the word vectors.
[0088] In an embodiment of the present disclosure, clustering can be conducted through kmeans. For example, performance addresses are clustered into 1000 clusters.
The number of clusters obtained from clustering is set according to the need. The more the clusters are, the larger the workload of needed subsequent labeling will be.
[0089] As shown in FIG. 3, at step S305, conducting corresponding classified labeling of classification results of trunk information of the performance addresses according to clustering results.
[0090] In an embodiment of the present disclosure, performance addresses in each cluster are labeled according to clustering results. For example, 1000 clusters of performance addresses closest to the clustering center are artificially labeled, thereby labeling trunks of all performance addresses. For example, classified labeling of Zhaofeng Plaza is shopping mall, classified labeling of Longemont Hotel is hotel, and classified labeling of Regents Park is housing.
[0091] FIG. 4 is a flow chart of text classification training based on word features, comprising the following steps:
[0092] As shown in FIG. 4, at step S401, configuring a corresponding rule for address text and classification.
[0093] In an embodiment of the present disclosure, an artificial dictionary can be configured.
The artificial dictionary contains a corresponding rule for address text and classification, and a format can be: text ¨>classification.
[0094] As shown in FIG. 4, at step S402, matching the historical performance addresses by multi-pattern matching, and outputting corresponding classification results according to the corresponding rule if the historical performance addresses are matched with the address text.
[0095] In an embodiment of the present disclosure, multi-pattern matching is to judge whether the antecedent of the rule has an inclusion relation in the address text. Such inclusion relation is multi-pattern matching, using the foregoing corresponding rule and multi-pattern matching can handle some failure cases or bad cases, the antecedent of the rule is address text, and the consequent of the rule is hotel, i.e., category.
[0096] For example, a rule in an artificial dictionary is guesthouse ¨>hotel. Based on this corresponding rule, if text **guesthouse is given, because "guesthouse" is contained, it is concluded that the category of "**guesthouse" is hotel.
[0097] As shown in FIG. 4, at step S403, segmenting the performance addresses, performing multiple combination of obtained segments and conducting Softmax training based on features of a single segment or a plurality of segments to obtain the text classification model.
[0098] In an embodiment of the present disclosure, for corpus obtained from artificial labeling, performance addresses are subjected to word segmentation and combination of bigram (2 segments) and trigram (3 segments). Based on features of unigram (one segment), bigram and trigram, Softmax training is used to obtain a text classification model.
[0099] After training based on the offline model shown in FIG. 3 and FIG.
4, obtained performance addresses are classified. This process is a process of online prediction. FIG. 5 is a flow chart of classifying performance addresses of a user, comprising the following steps:
[0100] As shown in FIG. 5, at step S501, extracting trunk information from performance addresses. Classified labeling can be obtained from address labeling based on word vectors, through a flow shown in FIG. 3.
[0101] As shown in FIG. 5, at step S502, predicting using a word feature model, and predicting trunk information of performance addresses and content except trunk information (e.g., content in brackets) in turn to obtain prediction results. For example, prediction is conducted using a flow shown in the foregoing FIG. 4. If a µrule in the artificial dictionary can be hit, then classification results are returned directly according to a corresponding rule, otherwise the Softmax model is used to train obtained classification results.
[0102] In a process of online prediction, a prediction method based on word vectors and clustering does not consider content except trunk information (e.g., content in brackets) because the content in the brackets has many distracters, affecting clustering results. However, during online prediction, the offline training model will act upon trunk information and non-trunk information in turn, and then prediction results to which the content in the brackets corresponds are adopted because the address in the brackets is more accurate and concrete.
[0103] For example, when only trunk information is considered, the classification result is:
[0104] Zhaofeng Plaza, Changning Road, Shanghai City shopping mall
[0105] If the non-trunk information in the brackets is considered, the classification result is:
[0106] Zhaofeng Plaza, Changning Road, Shanghai City (Fangtang Town opposite to Xiao Nan Guo on Floor 4) incorporated business
[0107] As Fangtang Town is a mass-innovation space, its corresponding category should be incorporated business rather than a shopping mall. Apparently, the latter classification result is more accurate.
[0108] Before prediction at step S502, localizers of performance addresses are processed.
For example, if a performance address contains "opposite to...", localizer "opposite to" will be omitted.
[0109] At step S220, conducting attribute analysis of the classification results in combination with an industry information database to obtain attribute information of the performance addresses.
[0110] In an embodiment of the present disclosure, before conducting attribute analysis of the classification results in combination with an industry information database, this step further comprises:
[0111] a step of establishing an industry information database, specifically comprising the following steps:
[0112] firstly, pre-processing industry classification information;
secondly, establishing an industry information database according to preprocessed industry classification information.
[0113] The pre-processing can obtain, clean and structuralize address, industry and other information by means of external trading and/or crawling of public data to obtain information in a triple form.
[0114] The established industry information database contains a plurality of pieces of information, and each piece of triple information comprises:
[0115] a label; classification results; attribute information;
[0116] The classification results include: at least one or more of housing, hospital, hotel, office building and leisure and entertainment venue; the attribute information includes: at least one or more of housing price, hospital type, hotel star rating, office building grade and leisure and entertainment venue grade.
[0117] For example, information in an industry information database is as follows:
[0118] Property = Regents Park; housing; housing price = Y 100,000/m2;
[0119] Hotel = Shangri-La; hotel; hotel star rating = 5 stars;
[0120] Hospital = Zhongshan Hospital; hospital; hospital category = Grade-A
tertiary hospital
[0121] In an embodiment of the present disclosure, at this step, conducting attribute analysis of the classification results in combination with an industry information database to obtain attribute information specifically comprises:
[0122] obtaining corresponding attribute information, of the performance addresses through forward maximum matching of information in the classification results and the industry information database.
[0123] Here, an algorithm of forward maximum matching is to separate a character string, with a limitation to separation length, then match separated character sub-strings with the words in a dictionary, and if the matching is successful, then a next round of matching is conducted until all character strings are processed; if the matching is not successful, then a word is removed from the tail of the character sub-string and matching is conducted again , and the above operation is repeated.
[0124] For example, a performance address is:
[0125] Regents Park (306, BLK 12)
[0126] Firstly, an obtained classification result is housing, then screening is conducted according to the second row in the industry information database, which is housing, to obtain housing-related entities, and then an entity containing Regents Park is obtained through forward maximum matching, and an attribute to which the performance address corresponds, i.e., an entity of housing price information, is obtained. That is, the housing price is Y 100,000/m2.
[0127] At step S230, analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results and the attribute information to obtain label information of the user.
[0128] In an embodiment of the present disclosure, before specific analysis, this step further comprises:
[0129] obtaining performance behaviors of the user specific to the performance addresses;
wherein the performance behaviors include: at least one of the following information, including workday performance times, non-workday performance times, time span of performance, and labeling of performance addresses by the user.
[0130] In an embodiment of the present disclosure, analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results and/or the attribute information to obtain label information of the user comprises:
[0131] if workday performance times of the user are greater than or equal to a first threshold value, and the classification result is hospital, then obtained label information of the user is occupation, which is medical staff; or if non-workday performance times of the user are greater than or equal to a second threshold value, and the classification result is housing, and a housing price in the attribute information is greater than or equal to a third threshold value, then obtained label information of the user is a high-end living quarter.
[0132] The threshold values referred to in the foregoing mapping process can be set according to the need. For example, the first threshold value to which the workday performance times correspond can be set to be five times, and the third threshold value to which the housing price corresponds needs to be set in consideration of various cities.
[0133] Analysis and mapping is conducted according to text extracted from given performance addresses by the user and performance behaviors of the user specific to the performance addresses (workday performance times, holiday performance times, time span of performance, labeling of performance addresses by the user, and other information) to obtain label information of the user.
[0134] Example: If performance address category = office building, and the performance address is labeled as a place of work by the user, then it is speculated that the user is a white-collar worker, i.e., the label information is white-collar worker.
[0135] If address category = housing, and developer = Vanke, and housing price =
Y 75,635.0/m2, the time span of performance at this address is >1 year, and the holiday performance times at this address are >=5, then it is speculated that the user lives in a high-end living quarter, i.e., the label is high-end living quarter.
[0136] A group fact label obtained from structuralized information of performance addresses has certain sequencing ability in quota and risk of the user and can be used as a strong financial attribute of the user.
[0137] Based on the foregoing flow, without directly obtaining information of the user and classifying performance addresses of the user, the present disclosure can judge if the user performs in a place of residence (the performance address category is housing) or in a place of work (the address category is office building or incorporated business) or in any other place.
Then on the basis of classification, attribute information of performance addresses of the user can be further analyzed by associating the established industry information database with performance addresses, and label information of the user, such as housing values, occupation, living habits and other information, is obtained in combination with performance behaviors of the user (performance frequency, workday performance frequency, holiday performance frequency, time span of performance dates, etc.), thereby extracting strong financial attributes of the user.
[0138] To sum up, the label information obtaining method provided by this embodiment obtains label information of a user, such as occupation, housing values, living habits and other relatively strong financial attributes through classification and attribute recognition of performance addresses of the user and through analysis in combination with performance behaviors of the user, and evaluates consumption power of the user under the precondition of not obtaining sensitive information of the user.
[0139] FIG. 6 is a schematic diagram of a label information obtaining apparatus provided in another embodiment of the present disclosure. As shown in FIG. 6, this apparatus 600 comprises:
an address classification module 610 and a label analysis module 620.
[0140] The address classification module 610 is configured to classify performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses where the user performs orders; and the label analysis module 620 is configured to analyze according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user.
[0141] FIG. 7 is a schematic diagram of an alternative label information obtaining apparatus provided in another embodiment of the present disclosure. As shown in FIG. 7, this apparatus 700 comprises: an address classification module 710, an attribute recognition module 720 and a label analysis module 730.
[0142] The address classification modu1e710 is configured to classify performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses provided by the user to perform orders;
the attribute recognition module 720 is configured to conduct attribute analysis of the classification results in combination with an industry information database to obtain attribute information of the performance addresses; and the label analysis module 730 is configured to analyze according to performance behaviors of the user specific to the performance addresses in combination with the classification results and the attribute information to obtain label information of the user.
[0143] For the functions of the modules in this apparatus, please refer to relevant descriptions in the foregoing method embodiments. They are not described again here.
[0144] To sum up, the label information obtaining apparatus in this embodiment obtains label information of a user, such as occupation, housing values, living habits and other relatively strong financial attributes through classification and attribute recognition of performance addresses of the user and through analysis in combination with performance behaviors of the user, and evaluates consumption power of the user under the precondition of not obtaining sensitive information of the user.
[0145] On the other hand, the present disclosure further provides an electronic device, comprising a processor and a memory, storing operation instructions for controlling the following method by the foregoing processor:
[0146] classifying performance addresses of a user to obtain classification results Of the performance addresses, wherein the performance addresses are addresses where the user performs orders; analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user. Or
[0147] classifying performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses where the user performs orders; conducting attribute analysis of the classification results in combination with an industry information database to obtain attribute information of the performance addresses;
analyzing according to performance behaviors of the user specific to the performance addresses in combination with, the classification results and the attribute information to obtain label information of the user.
[0148] Below FIG. 8 is referred to. FIG. 8 is a structural schematic diagram of a computer system 800 of an electronic device suitable to achieve embodiments of the present application.
The electronic device shown in FIG. 8 is only an example and should not impose any restriction on functions and use scope of embodiments of the present application.
[0149] As shown in FIG. 8, the computer system 800 comprises a central processing unit (CPU) 801, which can execute various appropriate actions and processing according to programs stored in a read only memory (ROM) 802 or programs loaded from a storage unit 805 to a random access memory (RAM) 803. The RAM 803 also stores all kinds of programs and data that operation of the system 800 needs. The CPU 801, the ROM 802 and the RAM
803 are mutually, connected via a bus 804. An input/output (I/0) interface 805 is also connected to the bus 804.
[0150] The following components are connected to the I/0 interface 805: an input unit 806 comprising a keyboard, a mouse, etc.; an output unit 808 comprising for example a cathode ray tube (CRT), a liquid crystal display (LCD), a loudspeaker, etc.; a storage unit 808 comprising a hard disk, etc.; as well as a communication unit 809 comprising for example an LAN card, Modem and other network interface cards. The communication unit 809 performs communication processing via a network such as the Internet. The driver 810 is also connected to the I/0 interface 805 according to the need. A detachable medium 811, such as magnetic disk, optical disk, magnetic optical disc and semiconductor memory, is installed on the driver 810 according to the need so that computer programs read from the detachable medium 811 are installed and saved in the storage unit 808 as needed.
[0151] Specially, according to embodiments of the present disclosure, the process described with reference to a flow chart in the preceding part of the text can be achieved as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, comprising a computer program loaded on a computer readable medium. This computer program contains a program code used to implement the method shown in the flow chart. In such embodiment, this computer program can be downloaded and installed from the network via the communication unit 809, and/or installed from the detachable medium 811.
When this computer program is executed by the CPU 801, the foregoing functions defined in the system of the present application are executed.
[0152] It needs to be explained that the computer readable medium shown in the present application can be a computer readable signal medium or a computer readable medium or any combination of the foregoing two. The computer readable medium for example can be ¨ without limitation ¨ an electric, magnetic, optical, electromagnetic, infrared or semiconductor system, apparatus or device, or any combination thereof. More concrete examples of the readable memory media may include without limitation: electric connection with one or more conductors, portable computer magnetic disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical memory module, magnetic memory module or any appropriate combination thereof. In the present application, a readable memory medium can be any tangible medium containing or storing programs, which can be used by an instruction execution system, apparatus or device or can be used in combination with an instruction execution system, apparatus or device. In the present application, computer readable signal media may include data signals transmitted in base band or as part of carriers, and carry computer readable program codes. Such transmitted data signals can adopt various forms, including but not limited to: electromagnetic signals, optical signals or any appropriate combination thereof. The computer readable signal media may also be any computer readable media except readable memory media. Such computer readable media can send, transmit or transfer programs used by an instruction execution system, apparatus or device or used in combination with an instruction execution system, apparatus or device. The program codes contained on a computer readable medium can be transferred by any appropriate medium, including but not limited to: wireless, wired, optical cable and RF, or any appropriate combination thereof
[0153] The flow charts and block diagrams in the accompanying drawings show system architectures, functions and operations likely achieved according to the systems, methods and computer program products of various embodiments of the present application.
At this point, every box in the flow charts or block diagrams can represent a part of a module, a program segment or a code. The foregoing part of a module, a program segment or a code comprises one or more executable instructions used to achieve a specified logical function.
It should also be noted that in the realization of some replacements, functions marked in boxes may also occur in a sequence different from that marked in the accompanying drawings. For example, two boxes expressed successively can be executed basically in parallel in fact, and sometimes, they can be executed in a reverse sequence, depending on involved functions. It should also be noted that each box in block diagrams or flow charts and combinations of boxes in the block diagrams or flow charts can be achieved using a special hardware-based system that executes specified functions, or can be achieved using a combination of special hardware and computer instructions.
[0154] Description of units involved in embodiments of the present application can be achieved in form of software, or in form of hardware. The described units may also be arranged in a processor. For example, it can be described as: a processor comprising a sending unit, an obtaining unit, a determining unit and a first processing unit. Names of these units do pot constitute limitation to these units under some circumstances. For example, the sending unit may also be described as "a unit sending a picture obtaining request to a connected server".
[0155] On the other hand, the present disclosure further provides a computer readable medium, which may be included in a device described in the foregoing embodiments; or may exist separately, and is not assembled into the device. The foregoing computer readable medium carries one or more programs. When the foregoing one or more programs are executed by one of such devices, the device includes the following method steps:
[0156] classifying performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses provided by the user to perform orders; and analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user. Or
[0157] classifying performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses where the user performs orders; conducting attribute analysis of the classification results in combination with an industry information database to obtain attribute information of the performance addresses;
analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results and the attribute information to obtain label information of the user.
[0158] It should be clearly understood that the present disclosure describes how to form and use specific examples, but the principles of the present disclosure are not limited to any details of these examples. On the contrary, based on teaching of the content disclosed by the present disclosure, these principles can be applied in many other implementation manners.
[0159] Exemplary implementation manners of the present disclosure are presented and described above. It should be understood that the present disclosure is not limited to the detailed structures, setting modes or implementation methods described here; on the contrary, the present disclosure intends to cover all modifications and equivalent settings included in the spirit and scope of the claims.

Claims (14)

What is claimed is:
1. A label information obtaining method, comprising:
classifying performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses where the user performs orders; and analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user.
2. The label information obtaining method according to claim 1, wherein analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user comprises:
conducting attribute analysis of the classification results in combination with an industry information database to obtain attribute information of the performance addresses; and analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results and the attribute information to obtain label information of the user.
3. The label information obtaining method according to claim 1, wherein classifying the performance addresses to obtain classification results of the performance addresses comprises:
conducting classified labeling of historical performance addresses based on word vectors to obtain mapping relations between performance addresses and categories, =
wherein the historical performance addresses are addresses where a plurality of users in a platform perform historical orders; and obtaining the classification results specific to the performance addresses in combination with the mapping relations between performance addresses and categories.
4. The label information obtaining method according to claim 3, wherein conducting classified labeling of historical performance addresses based on word vectors comprises:
extracting trunk information from the historical performance addresses;
conducting word segmentation of the trunk information by word segmentation technique to obtain a plurality of address segments;
converting the plurality of address segments into word vectors;

clustering the word vectors; and conducting corresponding classified labeling of classification results of trunk information of the performance addresses according to clustering results.
5. The label information obtaining method according to claim 1, wherein classifying the performance addresses to obtain classification results of the performance addresses comprises:
conducting Softmax training of historical performance addresses based on word features to obtain a text classification model, wherein the historical performance addresses are addresses where a plurality of users in a platform perform historical orders;
and inputting the performance addresses to the text classification model and outputting the classification results.
6. The label information obtaining method according to claim 5, wherein conducting Softmax training of historical performance addresses based on word features comprises:
configuring a corresponding rule for address text and classification;
matching the historical performance addresses by multi-pattern matching, and outputting corresponding classification results according to the corresponding rule if the historical performance addresses are matched with the address text; and segmenting the performance addresses, performing multiple combination of obtained segments and conducting Softmax training based on features of a single segment or a plurality of segments to obtain the text classification model.
7. The label information obtaining method according to claim 2, wherein before conducting attribute analysis of the classification results in combination with an industry information database, the method further comprises:
pre-processing industry classification information;
establishing an industry information database according to preprocessed industry classification information;
wherein the industry information database contains a plurality of pieces of information, each of which comprises:
a label; classification results; attribute information;
the classification results include: at least one or more of housing, hospital, hotel, office building and leisure and entertainment venue; and the attribute information includes: at least one or more of housing price, hospital type, hotel star rating, office building grade and leisure and entertainment venue grade.
8. The label information obtaining method according to claim 7, wherein conducting attribute analysis of the classification results in combination with an industry information database to obtain attribute information:
obtaining corresponding attribute information of the performance addresses through forward maximum matching of information in the classification results and the industry information database.
9. The label information obtaining method according to claim 2, wherein before analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results, the method further comprises:
obtaining performance behaviors of the user specific to the performance addresses;
wherein the performance behaviors include: at least one of the following information, including workday performance times, non-workday performance times, time span of performance, and labeling of performance addresses by the user.
10. The label information obtaining method according to claim 9, wherein analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user comprises:
if workday performance times of the user are greater than or equal to a first threshold value, and the classification result is hospital, then obtained label information of the user is occupation, which is medical staff.
11. The label information obtaining method according to claim 9, wherein analyzing according to performance behaviors of the user specific to the performance addresses in combination with the classification results and the attribute information to obtain label information of the user comprises:
if non-workday performance times of the user are greater than or equal to a second threshold value, and the classification result is housing, and a housing price in the attribute information is greater than or equal to a third threshold value, then obtained label information of the user is a high-end living quarter.
12. A label information obtaining apparatus, comprising:

an address classification module, configured to classify performance addresses of a user to obtain classification results of the performance addresses, wherein the performance addresses are addresses where the user performs orders; and a label analysis module, configured to analyze according to performance behaviors of the user specific to the performance addresses in combination with the classification results to obtain label information of the user.
13. An electronic device, comprising:
a processor; and a memory, storing instructions for controlling steps of the method in any of claims 1-11 by the processor.
14. A computer readable medium, storing computer executable instructions, wherein when the executable instructions are executed by a processor, steps of the method in any of claims 1-11 are achieved.
CA3060822A 2018-11-09 2019-11-01 Label information acquistion method and apparatus, electronic device and computer readable medium Pending CA3060822A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811333350.4A CN109492103B (en) 2018-11-09 2018-11-09 Label information acquisition method and device, electronic equipment and computer readable medium
CN201811333350.4 2018-11-09

Publications (1)

Publication Number Publication Date
CA3060822A1 true CA3060822A1 (en) 2020-05-09

Family

ID=65694177

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3060822A Pending CA3060822A1 (en) 2018-11-09 2019-11-01 Label information acquistion method and apparatus, electronic device and computer readable medium

Country Status (2)

Country Link
CN (1) CN109492103B (en)
CA (1) CA3060822A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417251A (en) * 2020-11-30 2021-02-26 华能大理风力发电有限公司 Transaction information retrieval method and device based on wind power bidding
CN112488103A (en) * 2020-11-30 2021-03-12 上海寻梦信息技术有限公司 Address information extraction method, model training method and related equipment
CN112561479A (en) * 2020-12-16 2021-03-26 中国平安人寿保险股份有限公司 Enterprise employee increase method and device based on intelligent decision and computer equipment

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111861046B (en) * 2019-04-02 2023-12-29 南京大学 Intelligent patent value assessment system based on big data and deep learning
CN110213239B (en) * 2019-05-08 2021-06-01 创新先进技术有限公司 Suspicious transaction message generation method and device and server
CN112434154A (en) * 2019-08-26 2021-03-02 北京星选科技有限公司 Object processing method and device, electronic equipment and storage medium
CN111310462A (en) * 2020-02-07 2020-06-19 北京三快在线科技有限公司 User attribute determination method, device, equipment and storage medium
CN112765386A (en) * 2020-06-14 2021-05-07 黄雨勤 Information management method and system based on big data and Internet and cloud server
CN111966730A (en) * 2020-10-23 2020-11-20 北京淇瑀信息科技有限公司 Risk prediction method and device based on permanent premises and electronic equipment
CN116521827A (en) * 2023-05-19 2023-08-01 北京百度网讯科技有限公司 Geographic position place category determination method and device, electronic equipment and medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9882979B2 (en) * 2015-03-16 2018-01-30 International Business Machines Corporation Image file transmission
US10497045B2 (en) * 2016-08-05 2019-12-03 Accenture Global Solutions Limited Social network data processing and profiling
CN108287850B (en) * 2017-01-10 2021-09-21 创新先进技术有限公司 Text classification model optimization method and device
CN108345596A (en) * 2017-01-22 2018-07-31 分众(中国)信息技术有限公司 Building information converged services platform
CN108287858B (en) * 2017-03-02 2021-08-10 腾讯科技(深圳)有限公司 Semantic extraction method and device for natural language
CN108711004A (en) * 2018-05-14 2018-10-26 北京京东金融科技控股有限公司 Methods of risk assessment and device and computer readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417251A (en) * 2020-11-30 2021-02-26 华能大理风力发电有限公司 Transaction information retrieval method and device based on wind power bidding
CN112488103A (en) * 2020-11-30 2021-03-12 上海寻梦信息技术有限公司 Address information extraction method, model training method and related equipment
CN112561479A (en) * 2020-12-16 2021-03-26 中国平安人寿保险股份有限公司 Enterprise employee increase method and device based on intelligent decision and computer equipment
CN112561479B (en) * 2020-12-16 2023-09-19 中国平安人寿保险股份有限公司 Intelligent decision-making-based enterprise personnel increasing method and device and computer equipment

Also Published As

Publication number Publication date
CN109492103A (en) 2019-03-19
CN109492103B (en) 2019-12-17

Similar Documents

Publication Publication Date Title
CA3060822A1 (en) Label information acquistion method and apparatus, electronic device and computer readable medium
WO2022116420A1 (en) Speech event detection method and apparatus, electronic device, and computer storage medium
US10025980B2 (en) Assisting people with understanding charts
US8990149B2 (en) Generating a predictive model from multiple data sources
CN110795568A (en) Risk assessment method and device based on user information knowledge graph and electronic equipment
US10755332B2 (en) Multi-perceptual similarity detection and resolution
US11775504B2 (en) Computer estimations based on statistical tree structures
CN111191825A (en) User default prediction method and device and electronic equipment
CN114140947A (en) Interface display method and device, electronic equipment, storage medium and program product
CN113159934A (en) Method and system for predicting passenger flow of network, electronic equipment and storage medium
CN115587739A (en) Client list distribution method and device, computer equipment and storage medium
CN114780600A (en) Flight searching method, system, equipment and storage medium
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN107729944B (en) Identification method and device of popular pictures, server and storage medium
CN113051911A (en) Method, apparatus, device, medium, and program product for extracting sensitive word
CN115146653B (en) Dialogue scenario construction method, device, equipment and storage medium
CN116402166A (en) Training method and device of prediction model, electronic equipment and storage medium
WO2023130960A1 (en) Service resource determination method and apparatus, and service resource determination system
CN113591881B (en) Intention recognition method and device based on model fusion, electronic equipment and medium
US20230230081A1 (en) Account identification method, apparatus, electronic device and computer readable medium
CN115481031A (en) Southbound gateway detection method, device, equipment and medium
CN115759100A (en) Data processing method, device, equipment and medium
CN115579069A (en) Construction method and device of scRNA-Seq cell type annotation database and electronic equipment
CN115238676A (en) Method and device for identifying hot spots of bidding demands, storage medium and electronic equipment
CN115017385A (en) Article searching method, device, equipment and storage medium

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916