CN117745317A

CN117745317A - Potential user analysis method and device, electronic equipment and storage medium

Info

Publication number: CN117745317A
Application number: CN202311660491.8A
Authority: CN
Inventors: 陈凯; 岂军; 徐爱华; 于永润; 万洪虹; 郭蓉
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2023-12-05
Filing date: 2023-12-05
Publication date: 2024-03-22

Abstract

The application provides a potential user analysis method, a potential user analysis device, an electronic device and a storage medium, which relate to the technical field of user behavior analysis and data mining and are used for reducing human resource waste and improving marketing accuracy and success rate. The method comprises the following steps: acquiring user data of a target user group, wherein the target user group is a user group with high flow demand; according to the user data, analyzing a plurality of target characteristics of a target user group, wherein the target characteristics are used for reflecting the requirements of the target user group on the FTTR broadband of the optical fiber entering the room; determining key features and feature thresholds at least according to a plurality of target features, wherein the key features are features with importance degrees larger than a preset threshold value in the plurality of target features, and the feature thresholds are used for representing the lowest standard of the key features corresponding to users handling the FTTR broadband; based on the key features and feature thresholds, potential users of FTTR broadband in the target user group are analyzed.

Description

Potential user analysis method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of user behavior analysis and data mining technologies, and in particular, to a method, an apparatus, an electronic device, and a storage medium for analyzing a potential user.

Background

With the increase of competition in the communication operation market, the income growth of operators is gradually changed from relying on user updating to diversified factors such as stock user preservation, value added service and the like. The five-star networking optical fiber into-house (fiber to the room, FTTR) broadband can bring better network experience to users, optimize service development structure, and improve service income, thereby becoming a new income increasing point.

While FTTR broadband has higher network rates and better network coverage, its high networking costs result in a delayed market for non-deployment. How to accurately enable the development of the FTTR broadband service, so that first-line service personnel can efficiently obtain a potential user list and timely obtain development information of various dimensions, thereby reducing the waste of human resources, improving the marketing accuracy and becoming a problem to be solved by communication operators.

Disclosure of Invention

The application provides a potential user analysis method, a potential user analysis device, electronic equipment and a storage medium, which are used for reducing human resource waste and improving marketing accuracy and success rate.

In a first aspect, the present application provides a method of potential user analysis, the method comprising: acquiring user data of a target user group, wherein the target user group is a user group with high flow demand; according to the user data, analyzing a plurality of target characteristics of a target user group, wherein the target characteristics are used for reflecting the requirements of the target user group on the FTTR broadband of the optical fiber entering the room; determining key features and feature thresholds at least according to a plurality of target features, wherein the key features are features with importance degrees larger than a preset threshold value in the plurality of target features, and the feature thresholds are used for representing the lowest standard of the key features corresponding to users handling the FTTR broadband; based on the key features and feature thresholds, potential users of FTTR broadband in the target user group are analyzed.

The technical scheme provided by the application at least brings the following beneficial effects: acquiring user data of a target user group with high requirements on broadband quality; then according to the user data, analyzing a plurality of target characteristics capable of reflecting the requirements of the target user group on the FTTR broadband; then determining key features with importance greater than a preset threshold value in a plurality of target features and a feature threshold value used for representing the minimum standard of the key features corresponding to the user handling the FTTR broadband; in this way, potential users of FTTR broadband in the target user group can be analyzed based on the key features and feature thresholds. It can be understood that the method and the device for the broadband distribution of the FTTR locate the group analyzed by the potential user at first to the user group with high requirement on broadband quality, so that on one hand, the development market of the FTTR broadband is determined, and on the other hand, the marketing range of service personnel is also reduced, and therefore, the work burden of the service personnel is reduced, and meanwhile, the FTTR broadband can be accurately marketed, so that great convenience is brought to the service personnel; secondly, the method and the device determine a plurality of target characteristics of the target user group by analyzing the acquired user data, then determine key characteristics from the plurality of target characteristics, and finally calculate a characteristic threshold corresponding to the key characteristics according to the user data, so that service personnel can know the expectations of the target user group on broadband quality and the requirements of the FTTR broadband more deeply; finally, according to the key characteristics and the characteristic threshold value, the business personnel can efficiently obtain the potential user list and timely obtain development information of various dimensions, so that on one hand, the manpower resource waste can be reduced, and the marketing accuracy is improved; on the other hand, the targeted marketing strategy and marketing technique can be formulated according to the development information of each dimension of the user, so that the success rate of marketing is improved.

As a possible implementation manner, the method further includes: marking positive and negative samples of the user data to obtain positive sample data and negative sample data; the positive sample data are data of users who transact the FTTR broadband, and the negative sample data are data of users who do not transact the FTTR broadband; inputting the positive sample data and the negative sample data into a model comprising a target algorithm to obtain the importance of each target feature; wherein the target algorithm is an algorithm for predicting the importance of the feature; selecting target features with importance greater than a preset threshold value from a plurality of target features based on the importance of the target features, and taking the target features as key features; a feature threshold is determined from the positive sample data and the key features.

As a possible implementation manner, the target algorithm includes: a first algorithm and a second algorithm; the first algorithm corresponds to the first weight, and the second algorithm corresponds to the second weight; the method further comprises the following steps: respectively inputting the positive sample data and the negative sample data into a first model comprising a first algorithm and a second model comprising a second algorithm to obtain the importance of each target feature output by the first model and the importance of each target feature output by the second model; and obtaining the importance degree of each target feature based on the importance degree and the first weight of each target feature output by the first model and the importance degree and the second weight of each target feature output by the second model.

As a possible implementation manner, the method further includes: determining data corresponding to the key features from the positive sample data; under the condition that the data corresponding to the key features are discrete data, determining a feature threshold according to a histogram of the data corresponding to the key features; and under the condition that the data corresponding to the key features are continuous data, carrying out box division processing on the data corresponding to the key features to obtain the feature threshold.

In a second aspect, the present application provides a potential user analysis device comprising: the acquisition module is used for acquiring user data of a target user group, wherein the target user group is a user group with high flow demand; the processing module is used for analyzing a plurality of target characteristics of the target user group according to the user data, wherein the target characteristics are used for reflecting the requirements of the target user group on the FTTR broadband of the optical fiber entering the room; the determining module is used for determining key features and feature thresholds at least according to the target features, wherein the key features are features with importance degrees larger than a preset threshold value in the target features, and the feature thresholds are used for representing the minimum standard of the key features corresponding to the user handling the FTTR broadband; and the analysis module is used for analyzing potential users of the FTTR broadband in the target user group based on the key characteristics and the characteristic threshold value.

As a possible implementation manner, the determining module is specifically configured to perform positive and negative sample marking on the user data to obtain positive sample data and negative sample data; the positive sample data are data of users who transact the FTTR broadband, and the negative sample data are data of users who do not transact the FTTR broadband; inputting the positive sample data and the negative sample data into a model comprising a target algorithm to obtain the importance of each target feature; wherein the target algorithm is an algorithm for predicting the importance of the feature; selecting target features with importance greater than a preset threshold value from a plurality of target features based on the importance of the target features, and taking the target features as key features; a feature threshold is determined from the positive sample data and the key features.

As one possible implementation, the target algorithm includes: a first algorithm and a second algorithm; the first algorithm corresponds to the first weight, and the second algorithm corresponds to the second weight; the determining module is further used for inputting the positive sample data and the negative sample data into a first model comprising a first algorithm and a second model comprising a second algorithm respectively to obtain the importance of each target feature output by the first model and the importance of each target feature output by the second model; and obtaining the importance degree of each target feature based on the importance degree and the first weight of each target feature output by the first model and the importance degree and the second weight of each target feature output by the second model.

As a possible implementation manner, the determining module is further configured to determine data corresponding to the key feature from the positive sample data; under the condition that the data corresponding to the key features are discrete data, determining a feature threshold according to a histogram of the data corresponding to the key features; and under the condition that the data corresponding to the key features are continuous data, carrying out box division processing on the data corresponding to the key features to obtain the feature threshold.

In a third aspect, the present application provides an electronic device comprising a processor and a memory, the processor being coupled to the memory; the memory is for storing computer instructions that are loaded and executed by the processor to cause the computer apparatus to implement the method of potential user analysis of any of the first aspects.

In a fourth aspect, the present application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform any of the potential user analysis methods provided in the first aspect above.

The description of the second aspect to the fourth aspect in the present application may refer to the detailed description of the first aspect; also, the advantageous effects described in the second aspect to the fourth aspect may refer to the advantageous effect analysis of the first aspect, and are not described herein.

Drawings

FIG. 1 is a schematic diagram of a potential user analysis system architecture according to some embodiments;

FIG. 2 is a flow chart diagram of a method of potential user analysis in accordance with some embodiments;

FIG. 3 is a schematic diagram of a missing value processing method according to some embodiments;

FIG. 4 is a flow chart diagram II of a potential user analysis method according to some embodiments;

FIG. 5 is a schematic diagram of a feature importance calculation flow according to some embodiments;

FIG. 6 is a schematic diagram of an importance ranking according to some embodiments;

FIG. 7 is a schematic diagram of a potential user analysis flow according to some embodiments;

FIG. 8 is a schematic diagram of a game-type user community user data collection, in accordance with some embodiments;

FIG. 9 is a panoramic representation of a gaming user community in accordance with some embodiments;

FIG. 10 is a schematic diagram of a potential user analysis device in accordance with some embodiments;

FIG. 11 is a schematic diagram II of a potential user analysis device according to some embodiments.

Detailed Description

A detailed description of one potential user analysis method provided in this application will now be provided with reference to the accompanying drawings.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.

The terms "first" and "second" and the like in the description and in the drawings are used for distinguishing between different objects or for distinguishing between different processes of the same object and not for describing a particular sequential order of objects.

Furthermore, references to the terms "comprising" and "having" and any variations thereof in the description of the present application are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or apparatus.

It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the description of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more.

As described in the background art, with the increase of competition in the communication operation market, the income growth of operators is gradually changed from relying users to diversified factors such as stock user preservation and value added service. The five-star networking FTTR broadband not only can bring better network experience to users, but also can optimize service development structure and improve service income, thereby becoming a new income increasing point.

In view of the above problems, related personnel propose a recommendation method for broadband services, as follows:

example one, the relevant technician first collects user data information of point of interest (point of interest, POI) tags; then cleaning the user data information to obtain potential user data information; then carrying out association analysis on longitude and latitude information of the potential user data information and broadband service information; and finally, recommending broadband service to the potential user according to the correlation analysis result. According to the method, potential users are deeply excavated according to the construction condition of the resources or reasonable layout of the resources is carried out according to the distribution of the users by establishing the dimension relation of the geographic position between the users and the broadband resources, but the method does not carry out deep excavation on the service information of the users, has fewer used characteristics, has poor collaborative filtering generalization capability and obvious head effect, and cannot effectively recommend the service for long-tailed users.

Secondly, firstly, training an acquired user behavior sample data set by related technicians to obtain an initial classification model based on a hyperplane; then correcting the initial classification model according to the weight coefficient; and finally, analyzing the user behavior data to be classified through the corrected classification model to obtain classification results of at least two broadband users. Although the method improves the performance of the classifier, the method does not carry out detailed analysis on specific service scenes, lacks description of processing and analysis of user characteristics, and does not consider service influence factors.

In summary, the traditional machine learning classification model has a better prediction effect on samples with large quantity and balanced distribution, but because the FTTR broadband is an emerging networking product, users who have handled the service occupy a relatively small amount of broadband users, and an overfitting phenomenon can occur only by using a machine learning method, so that the effect of actual prediction is poor.

Aiming at the technical problems, the embodiment of the application provides a potential user recommending method, which specifically comprises the following steps: acquiring user data of a target user group with high requirements on broadband quality; then according to the user data, analyzing a plurality of target characteristics capable of reflecting the requirements of the target user group on the FTTR broadband; then determining key features with importance greater than a preset threshold value in a plurality of target features and a feature threshold value used for representing the minimum standard of the key features corresponding to the user handling the FTTR broadband; in this way, potential users of FTTR broadband in the target user group can be analyzed based on the key features and feature thresholds. It can be understood that the method and the device for the broadband distribution of the FTTR locate the group analyzed by the potential user at first to the user group with high requirement on broadband quality, so that on one hand, the development market of the FTTR broadband is determined, and on the other hand, the marketing range of service personnel is also reduced, and therefore, the work burden of the service personnel is reduced, and meanwhile, the FTTR broadband can be accurately marketed, so that great convenience is brought to the service personnel; secondly, the method and the device determine a plurality of target characteristics of the target user group by analyzing the acquired user data, then determine key characteristics from the plurality of target characteristics, and finally calculate a characteristic threshold corresponding to the key characteristics according to the user data, so that service personnel can know the expectations of the target user group on broadband quality and the requirements of the FTTR broadband more deeply; finally, according to the key characteristics and the characteristic threshold value, the business personnel can efficiently obtain the potential user list and timely obtain development information of various dimensions, so that on one hand, the manpower resource waste can be reduced, and the marketing accuracy is improved; on the other hand, the targeted marketing strategy and marketing technique can be formulated according to the development information of each dimension of the user, so that the success rate of marketing is improved.

The embodiments provided in the present application are specifically described below with reference to the drawings attached to the specification.

Referring to fig. 1, a schematic diagram of a potential user analysis system architecture is provided in an embodiment of the present application. As shown in fig. 1, the system architecture includes: the first server 100, the second server 200, the third server 300, and the fourth server 400. The first server 100, the second server 200, and the third server 300 are respectively connected to the fourth server 400 in communication.

The first server 100 is a network layer node. The network layer is used for recording the broadband internet surfing information connected with the terminal equipment, wherein the terminal equipment is equipment for performing man-machine interaction with a user.

By way of example, the terminal device may be a cell phone, tablet, desktop, laptop, handheld computer, notebook, ultra-mobile personal computer (UMPC), netbook, cell phone, personal digital assistant (personal digital assistant, PDA), augmented reality (augmented reality, AR) \virtual reality (VR) device, or the like. The embodiment of the application does not limit the specific form of the terminal device. The system can perform man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment and the like.

In some embodiments, the first server 100 is also used to host a system associated with the network layer.

The second server 200 is a service layer node. The service layer is used for storing and managing user service information.

In some embodiments, the second server 200 is also used to host a system for managing user traffic, such as a customer relationship management (customer relationship management, CRM) system.

The third server 300 is a customer service level node. The customer service layer is used for processing and receiving complaint texts of the broadband users.

In some embodiments, the third server 300 is also used to host a system for serving users, such as a customer service system.

Fourth server 400 is a data layer node. Wherein the data layer is used for storing and processing data. In this embodiment of the present application, the fourth server 400 is configured to receive the user data uploaded by the first server 100, the second server 200, and the third server 300, and process the user data.

In some embodiments, the first server 100, the second server 200, the third server 300, and the fourth server 400 may be a single server, or may be a server cluster formed by a plurality of servers. In some implementations, the server cluster may also be a distributed cluster.

In some embodiments, the fourth server 400 is also used to host a system for managing data, such as a database system.

It should be noted that, the system architecture described in the embodiments of the present application is for more clearly describing the technical solution of the embodiments of the present application, and does not constitute a limitation on the technical solution provided in the embodiments of the present application, and those skilled in the art can know that, with the evolution of the system architecture, the technical solution provided in the embodiments of the present application is equally applicable to similar technical problems.

Referring to fig. 2, a potential user analysis method provided in an embodiment of the present application is applied to a fourth server. As shown in fig. 2, the method includes:

s101, acquiring user data of a target user group, wherein the target user group is a user group with high flow demand.

By way of example, the target user group may be, but is not limited to, the following user groups: a game type user group, a video type user group, a live broadcast type user group, a financial type user group, a medical type user group, a scientific research institution type user group and the like.

The user data refers to various information related to the user, and in the embodiment of the present application, the user data is information data related to the user and the broadband.

In some embodiments, the user data is derived from broadband internet log information recorded by the network layer, user service information recorded by the service layer, and complaint text stored by the customer service layer.

In some embodiments, the user data includes, but is not limited to, the following: tag information data, complaint data, related user information data, related service information data, terminal broadband information data, broadband convergence information data, broadband base information data, broadband usage data, video call information data, broadband access information data, network topology information data, and the like.

In some embodiments, step S101 may be embodied as the following steps a1 to a9.

And a step a1, extracting broadband internet log information of the whole broadband user from the network layer acquisition equipment according to the period by the data layer. Where a full-scale broadband user refers to all users that are using or have used broadband services.

In some embodiments, the period may be divided by year, month, or day according to business needs.

By way of example, a period may be 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, one week, two weeks, one month, two months, one year, two years, etc., without being particularly limited.

And a2, the data layer screens out target broadband internet log information from the broadband internet log information of the whole broadband users, wherein the target broadband internet log information is the broadband internet log information of the target user group.

Illustratively, in the case where the target user group is a game user group, the data layer screens out broadband internet log information associated with the game user group from the broadband internet log information of the entire broadband users.

And a step a3, the data layer analyzes the target broadband internet log information according to the professional rule.

The professional rule refers to a detailed and systematic standard for analyzing and identifying the target broadband internet log information.

Exemplary, professional rules include, but are not limited to, the following rules.

1. User behavior analysis rules. The rules are based on in-depth understanding and analysis of the user's behavior on the website or platform. For example, a user frequently accesses a page or performs an operation for a period of time, which may mean that the user is of particular interest to the page or operation. By parsing the target broadband internet log information, a pattern of the behavior can be identified and associated with a particular user group.

2. The user interests identify rules. The rules identify the interests of the user based on the content that the user browses and searches on the web site or platform. For example, if a user frequently logs into a gaming website, it may be inferred that the user is interested in the game. By parsing the target broadband internet log information, the interests of the user can be identified and associated with a particular user group.

3. User population classification rules. The rules are divided into different groups based on the user's behavior, interests and other characteristics. For example, users may be classified as active users and silent users, or users may be classified as users interested in a particular product or service and users not interested in that particular product or service. By parsing the target broadband internet log information, these classifications can be identified and associated with a particular user group.

And a step a4, the data layer stores analysis results of the target broadband internet log information. Wherein the parsing result includes a broadband service number.

In some embodiments, the analysis result further includes behavior data such as a name of the login platform, a surfing time, surfing times, traffic usage, and the like.

And a step a5, the data layer acquires the user service information of the full broadband user from the service layer.

The user service information refers to information of various service activities related to the user, including user basic information, billing and accounting information, network use information, contract information, service information, account information, marketing information and the like.

In some embodiments, the data layer obtains user service information for a full volume of broadband users from a CRM system of the service layer.

And a step a6, the data layer acquires user basic information and charging account information corresponding to the users of the target user group from the user service information of the total broadband users according to the broadband service number, and stores the user basic information and the charging account information.

The user basic information refers to basic information about a user, and comprises a user name, a certificate type, a certificate number, gender, age, a broadband service number, a broadband number attribution, a network type, network age, a broadband access mode, a user state, an attribution cell and the like.

The billing and accounting information refers to information for recording, counting and managing broadband use fees, and comprises broadband service numbers, bill records, payment records, preferential records, integral records and the like.

In some embodiments, the data layer may obtain, from the user service information of the full broadband user, user base information and charging accounting information corresponding to the broadband service number according to the broadband service number.

And a step a7, the data layer acquires complaint texts of the whole broadband user in a preset time period from the customer service layer.

Wherein, complaint text includes: broadband service number, complaint content, complaint time, etc.

In some embodiments, the preset time period may be set according to the service requirement. For example, the preset time period may be within half a year of the closest historical time.

And a8, extracting keywords of complaint contents from the complaint text by the data layer through a semantic analysis technology, and storing the keywords.

In some embodiments, semantic analysis techniques may analyze from multiple perspectives, including lexical analysis, syntactic analysis, semantic analysis, context analysis, and the like.

Specifically, the lexical analysis is to analyze the prefix, suffix, etc. of the word, and control the characteristics of the whole vocabulary system so as to accurately analyze the characteristics of the information input by the user; the syntactic analysis is to analyze word phrases of natural language input by a user, identify the syntactic structure of sentences and realize the process of automatic syntactic analysis; the language analysis is a higher-level linguistic analysis, which is characterized in that the analysis of context, language background, context and the like is added relative to the semantic analysis, and additional information such as images, interpersonal relations and the like is extracted from the structures of articles. Context analysis is a technique that primarily analyzes a large number of "gaps" outside of the original query language in order to more accurately interpret the desired query language.

To facilitate understanding of the semantic analysis technique, the semantic analysis technique is described below in an exemplary form.

Illustratively, assume that there is one complaint content: broadband networks are very slow. The complaint content can be split into a plurality of words of broadband, net speed, true and slow through lexical analysis; by semantic analysis it can be inferred that "broadband" refers to a service providing network connectivity, "network speed" refers to network transmission speed, "true" means emphasis and "slow" means low speed; through language analysis, the emotion expressed by the complaint content can be determined to be negative, which is dissatisfaction to broadband network speed; through context analysis, it can be identified that the entity of the complaint content is broadband internet speed.

In some embodiments, the keywords of the complaint content may include at least one of: slow network speed, network card, high delay, dropped lines, signal difference, etc.

In some embodiments, keywords of the complaint content are associated with broadband service numbers of the complaint text.

Exemplary keywords of complaint content of broadband service number 1 include slow network speed and network card, and keywords of complaint content of broadband service number 2 include high delay and signal difference.

And a step a9, the data layer integrates the analysis result of the broadband internet log information, the user basic information corresponding to the users of the target user group, the billing and accounting information and the keywords of the complaint content to obtain the user data of the target user group.

In some embodiments, the data layer associates the analysis result of the broadband internet log information, the user basic information corresponding to the user of the target user group, the billing and accounting information, the keywords of the complaint content and other information together according to the broadband service number to obtain user data, and stores the user data in a data warehouse of the data layer.

It can be understood that, the user data is obtained by collecting and processing different information of the network layer, the service layer and the customer service layer, so that the content contained in the user data is more comprehensive, and therefore, service personnel can also know the behavior and the requirement of the user more deeply.

S102, analyzing a plurality of target characteristics of the target user group according to the user data. Wherein the target characteristics are used to reflect the need for a target subscriber group to fiber-to-the-room FTTR broadband.

In some embodiments, the step S102 may be implemented as: and the fourth server performs multidimensional analysis on the user data according to the service requirement, screens out data which are relevant to the service and valuable in the analysis of the service from the user data, and thus determines the target characteristics.

In some embodiments, the step S102 may be further implemented as: and the fourth server is used for carrying out deep mining and analysis on the characteristics corresponding to the user data in combination with understanding of the service, and generating new derivative characteristics serving as target characteristics.

For example, the liveness condition of the user can be calculated according to the characteristics of the counted internet surfing time, internet surfing times and the like, and the purchasing capability of the user is calculated through the bill record of the user.

In some embodiments, the above method further comprises: the fourth server labels the plurality of target features and generates a panoramic image of the users of the target user group. For example, labels of multiple target features may be stitched to characterize a panoramic image of users of a target user group.

The panoramic image of the user of the target user group refers to the omnibearing and multi-angle analysis and description of the target user, and comprises a plurality of aspects of basic information, behavior characteristics, preferences, demands, social relations and the like of the user.

It can be understood that the user data of the target user group is initially screened, and then a plurality of target features are determined by combining service understanding, so that a panoramic image of the target user group is generated, and service personnel can be helped to know the requirements and behaviors of the user more conveniently.

S103, determining key features and feature thresholds at least according to the target features. The key features are features with importance degrees larger than a preset threshold value in the target features, and the feature threshold value is used for representing the minimum standard of the key features corresponding to the user handling the FTTR broadband.

In some embodiments, before step S103, further comprises: and processing the user data corresponding to the target features based on a fourth server, and specifically realizing the following steps one to four.

Step one, carrying out missing value processing on user data.

Specifically, deleting target features with the user data deletion rate of not less than 90% corresponding to the target features based on a fourth server, performing structure derivative variable processing on target features with the user data deletion rate of 30% -90% corresponding to the target features, and performing mean filling processing on target features with the user data deletion rate of less than 30% corresponding to the target features.

The construction derivative variable is to generate new features through a plurality of features with correlation, and generate user data corresponding to the new features through user data corresponding to the features with correlation.

In order to facilitate understanding of the above missing value processing, the missing value processing will be explained below in the form of a scenario example.

The three target features in the scene one shown in fig. 3 are a target feature 1, a target feature 2 and a target feature 3 respectively, and each feature corresponds to 10 users respectively.

As shown in fig. 3, the user data deletion rate corresponding to the target feature 1 is up to 90%, that is, the target feature 1 is deleted; the user data deletion rate corresponding to the target feature 2 is between 30% and 90%, so that the target feature 2 is subjected to structure derivative variable processing; the missing rate of the user data corresponding to the target feature 3 is less than 30%, so that the missing user data corresponding to the target feature 3 is padded to 16.25.

And step two, carrying out normalization processing on the user data.

Specifically, the user data corresponding to different types of target features is adjusted to a consistent range by using a standard deviation normalization method (Min-Max normalization, min-Max), and the user data corresponding to the target features is generally converted into a numerical value between 0 and 1 by inputting a normalization processing formula.

In some embodiments, the normalization process formula is as follows:

new_value＝(original_value-min_value)/(max_value-min_value)

the original_value represents user data corresponding to the target feature, the min_value represents a minimum value in the user data corresponding to the target feature, the max_value represents a maximum value in the user data corresponding to the target feature, and the new_value represents converted user data.

In order to facilitate understanding of the normalization process described above, the normalization process will be explained below taking the internet surfing times feature as an example.

For the feature of the number of surfing, it is assumed that the maximum value of user data corresponding to the number of surfing is 200 and the minimum value is 100. According to the normalization processing formula, the minimum value 100 will be normalized to (100-100)/(200-100) =0, and the maximum value 200 will be normalized to (200-100)/(200-100) =1.

And thirdly, carrying out box division processing on the user data.

The binning process refers to a process of converting continuous values into discrete values. In the embodiment of the application, the binning process is used for converting continuous user data corresponding to the target feature into discrete user data.

Illustratively, taking the internet surfing time as an example, firstly cutting user data corresponding to the internet surfing time into a plurality of time periods, for example, the internet surfing time is 0-1 hour, 1-2 hours, 2-3 hours and the like; classifying the user data corresponding to the internet surfing time; finally, statistics and calculation are carried out on each category, for example, frequency and the like of each category are calculated.

And step four, reducing the dimension of the user portrait according to the processed target features and the user data corresponding to the target features.

In some embodiments, the service personnel eliminates the target features with high correlation by performing correlation analysis on the user data corresponding to the processed target features, thereby reducing the dimension of the user portrait.

Among them, the correlation analysis method includes, but is not limited to, the following: pearson correlation coefficient, spearman rank correlation coefficient, kendel rank correlation coefficient, and the like.

In order to facilitate understanding of the above-described embodiments, description will be made below in the form of a scenario example. Scenario one, assume that there is a target feature set of game users, including three target features of user surfing time length, user surfing flow and broadband user area distribution number.

By calculating that the user surfing time length has a strong positive correlation with the user surfing flow, that is, the longer the user surfing time is, the surfing flow is correspondingly increased. From the above, it can be seen that there is a certain redundancy in information between the user internet time length and the user internet traffic, and a strong positive correlation between the user internet time length and the user internet traffic can cause a fitting phenomenon to occur in a subsequent use of the target algorithm, so that a target feature needs to be reserved between the user internet time length and the user internet traffic.

By means of example, the correlation between the user internet time length and the broadband user regional distribution quantity is weak, namely the information provided by the user internet time length and the broadband user regional distribution quantity is relatively independent and is not directly related, so that in order to ensure the comprehensiveness of service analysis, two target characteristics of the user internet time length and the broadband user regional distribution quantity are reserved.

In some embodiments, the business personnel need to cull the target features to within the number required by the business.

By way of example, assuming that there are 100 target features, 50 target features may be reserved according to the service requirement, or 50 target features may be reserved from the 100 target features according to a correlation analysis or the like.

It can be appreciated that the above method effectively preprocesses the user data and selects the features, improves the quality and accuracy of the user data, and provides an effective data basis for determining the key features and the feature threshold values according to at least a plurality of target features.

In some embodiments, as shown in fig. 4, step S103 is embodied as the following steps S1031 to S1034.

S1031, marking positive and negative samples of the user data to obtain positive sample data and negative sample data.

The positive sample data is the data of the user who handles the FTTR broadband, and the negative sample data is the data of the user who does not handle the FTTR broadband.

Specifically, a tag field is added to the user data, the tag field of the user who has handled the FTTR broadband is filled with 1, and the tag field of the user who has not handled the FTTR broadband is filled with 0.

S1032, inputting the positive sample data and the negative sample data into a model comprising a target algorithm, and obtaining the importance of each target feature. Wherein the objective algorithm is an algorithm for predicting the importance of the feature.

In some embodiments, the target algorithm may be a machine learning algorithm.

By way of example, the target algorithm may include at least one of: random forests, lightweight gradient lifting algorithms (light gradient boosting machine, lightBGM), extreme gradient lifting algorithms (extreme gradient boosting, XGBoost), neural networks, and other machine learning algorithms.

In some embodiments, importance refers to the extent to which a target feature affects whether a user handles FTTR broadband.

Illustratively, if the importance of the target feature 1 is higher than that of the target feature 2, it is explained that the influence degree of the target feature 1 on whether the user handles the FTTR broadband is higher than that of the target feature 2 on whether the user handles the FTTR broadband. Thus, target feature 1 may be used as an indicator to determine whether a user handles FTTR broadband.

In some embodiments, as shown in fig. 5, the importance of the target feature may be calculated by inputting the positive sample data and the negative sample data into the first model and the second model, respectively, to determine the importance of each model (i.e., the first model corresponds to the first importance and the second model corresponds to the second importance), and then calculating the importance of the target feature by a weighted summation method.

The above step S1032 is specifically implemented as the following steps b1 to b2, for example.

And b1, respectively inputting the positive sample data and the negative sample data into a first model comprising a first algorithm and a second model comprising a second algorithm, and obtaining the importance of each target feature output by the first model and the importance of each target feature output by the second model.

Wherein the first algorithm and the second algorithm are target algorithms for calculating feature importance, and the first algorithm is different from the second algorithm.

It will be appreciated that in calculating the importance of the target feature, the importance of the target feature may be calculated by a single target algorithm, or the importance of the target feature may be calculated using a plurality of (two or more may be referred to as a plurality of) target algorithms.

And b2, obtaining the importance degree of each target feature based on the importance degree and the first weight of each target feature output by the first model and the importance degree and the second weight of each target feature output by the second model.

Wherein the weight characterization includes an accuracy of the model of the target algorithm.

In some embodiments, the first weight is a weight corresponding to the first model, and is used for characterizing accuracy of a result output by the first model; the second weight is the weight corresponding to the second model and is used for representing the accuracy of the result output by the second model. The first weight and the second weight may be obtained based on various statistics and metrics of a corresponding model including the target algorithm during training, which is not limited in the embodiments of the present application.

In some embodiments, the importance of the target feature may be calculated according to an importance weighted sum formula. The importance weighted summation formula is as follows:

I _n ＝F _a ×I _na +F _b ×I _nb

wherein n is the number of the target feature n, I _na Characterizing a first importance level corresponding to the target feature n, I _nb Characterizing a second importance corresponding to the target feature n, F _a Characterizing a first weight corresponding to the first model, F _b Characterizing a second weight corresponding to the second model, I _n The importance of the target feature with the number n is characterized.

For example, assuming that the first importance of the target feature 1 is a1, the first weight is F1, the second importance of the target feature 1 is b1, and the second weight is F2, which are calculated from the user data corresponding to the target feature 1 by the first model, the importance of the target feature 1 is I, which can be calculated from the importance and the corresponding weight of the first model and the second model by a weighted summation formula ₁ ＝a1×F1+b2×F2。

It should be noted that the above-mentioned first algorithm and second algorithm are only examples, and no limitation is made on the number of target algorithms provided in the embodiments of the present application. In other embodiments, the target algorithm for calculating the importance of the target feature may further include a third algorithm, a fourth algorithm, and the like, which is not limited in the embodiments of the present application.

For ease of understanding, the importance calculating method adopted in the embodiment of the present application will be described below by way of example.

For example, it is assumed that the user data corresponding to the target feature 1 and the user data corresponding to the target feature 2 are calculated by a first algorithm, so that a first importance of the target feature 1 is greater than a first importance of the target feature 2; and calculating the user data corresponding to the target feature 1 and the user data corresponding to the target feature 2 through a second algorithm to obtain that the second importance of the target feature 1 is smaller than that of the target feature 2. From the above, the results obtained by using different target algorithms to obtain the target features are different, so, in order to improve the accuracy of the calculation result, in the embodiment of the application, a plurality of target algorithms may be used to calculate the importance of the target features, and then the final importance is obtained in a weighted summation mode.

S1033, selecting target features with importance degrees larger than a preset threshold value from a plurality of target features based on the importance degrees of the target features, and taking the target features as key features.

In some embodiments, the business personnel sorts the importance of the target features from high to low, and determines a preset threshold and key features according to the sorting result and the business requirements.

As shown in fig. 6, for example, according to the service requirement, the service personnel needs to select 10 target features with the top importance level, where the 10 target features with the top importance level are respectively the number of login platforms, the platform type, the broadband service rate, the uplink flow of the last three months, the downlink flow of the last three months, complaints about whether broadband problems exist, the router rate, the online time length of the last three months, the package period charge, and whether to fuse, and if the importance level to fuse is a preset threshold, the 10 features are key features.

S1034, determining a characteristic threshold according to the positive sample data and the key characteristics.

In some embodiments, step S1033 is embodied as the following steps c 1-c 3.

And c1, determining data corresponding to the key features from the positive sample data.

Specifically, user data corresponding to the key features is screened from the user data of the positive sample data. The user data may be discrete data or continuous data.

And c2, determining a feature threshold according to a histogram of the data corresponding to the key features when the data corresponding to the key features are discrete data.

For example, taking the internet surfing times as an example, among users who transact FTTR broadband, the maximum or most of users who transact FTTR broadband with the internet surfing times of 150 times are near 150 times, and the characteristic threshold of the internet surfing times is 150.

And c3, carrying out box division processing on the data corresponding to the key features to obtain the feature threshold value under the condition that the data corresponding to the key features are continuous data.

In some embodiments, in the case where the data corresponding to the key feature is continuous data, the continuous data is converted into discrete data by the binning process, and the feature threshold corresponding to the key feature is determined according to the method described in step c 3.

S104, analyzing potential users of the FTTR broadband in the target user group based on the key characteristics and the characteristic threshold.

In some embodiments, the fourth server screens users in the target user group for key features and feature thresholds. Specifically, among the users in the target user group, the users whose feature values of the key features are larger than the feature threshold corresponding to the key features are potential users.

In some embodiments, the fourth server may further formulate targeted marketing and marketing strategies for the corresponding potential users according to the key features and feature thresholds, and update the key features and feature thresholds periodically according to the marketing results of the business personnel, thereby forming a closed loop.

In some embodiments, as shown in fig. 7, the technical solution provided in the embodiments of the present application includes: user data is acquired, portraits are constructed, features are cleaned, features are selected, and targets are predicted.

Taking a game user group as an example, the technical scheme provided by the embodiment of the application is specifically implemented as the following steps d1 to d5.

Step d1, obtaining user data. As shown in fig. 8, the data layer obtains a broadband internet log of a game user group from the network layer, analyzes the broadband internet log to obtain game behavior information, obtains user service information from the service layer, obtains complaint content from the customer service layer, integrates the behavior information, the user service information and the complaint content to obtain user data, and stores the user data in a warehouse.

And d2, constructing an image. The data layer constructs a panoramic representation of the game user population based on the user data and obtains a plurality of target features. As shown in fig. 9, the panoramic image of the game user group includes 9 kinds of features, which are game tags, user complaints, related users, related services, consumption behaviors, terminal broadband web pages, broadband convergence, broadband base information, and broadband use cases, respectively.

To facilitate an understanding of the above 9 general features, the 9 general features are described below.

1. A game tag. In the embodiment of the application, the game tags are 16 game tags in total.

2. Customer complaints. Customer service complaints include broadband quality complaints, and the like.

3. And the relevant users. The relevant user includes a user name, a telephone number, a certificate number, a wireless local area network (wireless fidelity, WIFI) attachment number, etc. Wherein, the game application (app) preference corresponding to the number can be determined according to the name and telephone number of the user, and the game app preference corresponding to the related number can be determined according to the same certificate number and WIFI attachment number.

4. Related services. Related services include interactive web tv (internet protocol television, IPTV). The IPTV includes information such as an IPTV start-stop time, a service Identity (ID), and an IPTV account number.

5. Consumption behavior. Consumption behavior includes broadband apportionment revenue, fusion group revenue, and the like.

6. And (5) a terminal broadband webpage. The terminal broadband web page comprises broadband speed, broadband speed-up, optical cat speed, routing speed and the like.

7. Broadband fusion. Broadband fusion includes binding type, fusion validation and invalidation time, fusion account number, fusion primary number, etc. Wherein the information fusing the primary numbers includes game app preferences corresponding to the primary numbers.

8. Broadband base information. The broadband basic information includes a transaction channel, an online account, an online time, an installed address, a user name, a certificate number, a broadband access mode, a user status, a user sex, a user age, a package year, a package month, a home cell, a home region, a network type, and the like. Wherein the information of the home cell includes the number of FTTR broadband subscribers in the cell.

9. Broadband use case. Broadband usage includes whether active, duration of surfing the internet, number of surfing the internet, broadband traffic (upstream and downstream), etc.

Step d3, cleaning the characteristics. And the business personnel conduct feature cleaning on the user data corresponding to the target features. The feature cleaning comprises missing value processing, normalization processing, analysis processing and feature selection processing.

And d4, selecting characteristics. The fourth server marks the user data as positive sample data and negative sample data, inputs the positive sample data and the negative sample data into a model comprising a target algorithm to obtain the importance of each target feature, and then selects target features with the importance larger than a preset threshold value from a plurality of target features based on the importance of the target features as key features.

In some embodiments, the business personnel processes the positive sample data and the key features through a data analysis method to determine feature thresholds corresponding to the key features.

And d5, predicting a target. Business personnel analyze potential users of FTTR broadband in a game user group based on key features and feature thresholds.

It can be understood that, in the embodiment of the application, firstly, through understanding the service and analyzing big data, a specific user group (the user group with high requirements on broadband network speed and network quality) is focused, so that on one hand, the development market of the FTTR broadband is determined, and the success rate of handling the FTTR broadband by the user is improved; on the other hand, the marketing scope of the business personnel is also reduced, and great convenience is brought to the business personnel. And then, by integrating data resources such as broadband internet log, basic user information, customer service complaint behaviors, consumption behaviors and the like of the severely preferred users, constructing a user panoramic portrait in a targeted manner, and constructing a brand-new feature importance calculation formula by combining a target algorithm, key features are determined, potential users with the highest handling will are identified from a specific user group, a more accurate and reliable potential user list is provided for first-line business personnel to develop business, and marketing efficiency is improved.

It can be seen that the foregoing description of the solution provided by the embodiments of the present application has been presented mainly from a method perspective. To achieve the above-mentioned functions, embodiments of the present application provide corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The embodiment of the application may divide the functional modules of the network node according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. Optionally, the division of the modules in the embodiments of the present application is schematic, which is merely a logic function division, and other division manners may be actually implemented.

Fig. 10 is a schematic structural diagram of a potential user analysis device according to an embodiment of the present application. The potential user analysis device 1000 may be applied to the potential user analysis method described above. The potential user analysis device includes: an acquisition module 1001, a processing module 1002, a determination module 1003, and an analysis module 1004.

The obtaining module 1001 is configured to obtain user data of a target user group, where the target user group is a user group with a high traffic demand.

The processing module 1002 is configured to analyze a plurality of target characteristics of a target user group according to user data, where the target characteristics are configured to reflect a requirement of the target user group for FTTR broadband of the optical fiber into the room.

A determining module 1003, configured to determine a key feature and a feature threshold according to at least the plurality of target features, where the key feature is a feature with importance greater than a preset threshold in the plurality of target features, and the feature threshold is used to represent a minimum criterion of the key feature corresponding to a user handling the FTTR broadband.

An analysis module 1004 is configured to analyze potential users of the FTTR broadband in the target user group based on the key features and the feature threshold.

In some embodiments, the determining module 1003 is specifically configured to sign positive and negative samples of the user data, to obtain positive sample data and negative sample data; the positive sample data are data of users who transact the FTTR broadband, and the negative sample data are data of users who do not transact the FTTR broadband; inputting the positive sample data and the negative sample data into a model comprising a target algorithm to obtain the importance of each target feature; wherein the target algorithm is an algorithm for predicting the importance of the feature; selecting target features with importance greater than a preset threshold value from a plurality of target features based on the importance of the target features, and taking the target features as key features; a feature threshold is determined from the positive sample data and the key features.

In some embodiments, the target algorithm includes: a first algorithm and a second algorithm; the first algorithm corresponds to the first weight, and the second algorithm corresponds to the second weight; the determining module 1003 is further configured to input the positive sample data and the negative sample data to a first model including a first algorithm and a second model including a second algorithm, respectively, to obtain importance of each target feature output by the first model, and importance of each target feature output by the second model; and obtaining the importance degree of each target feature based on the importance degree and the first weight of each target feature output by the first model and the importance degree and the second weight of each target feature output by the second model.

In some embodiments, the determining module is further configured to determine data corresponding to the key feature from the positive sample data; under the condition that the data corresponding to the key features are discrete data, determining a feature threshold according to a histogram of the data corresponding to the key features; and under the condition that the data corresponding to the key features are continuous data, carrying out box division processing on the data corresponding to the key features to obtain the feature threshold.

In the case of implementing the functions of the integrated modules described above in the form of hardware, the embodiments of the present invention provide another possible structural schematic diagram of the potential user analysis device involved in the above embodiments. As shown in fig. 11, the potential user analysis apparatus 1100 includes: a processor 1102, a communication interface 1103, a bus 1104. Optionally, the potential user analysis device may further comprise a memory 1101.

The processor 1102 may be a processor that implements or performs the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 1102 may be a central processing unit, general purpose processor, digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 1102 may also be a combination of computing functions, e.g., including one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

A communication interface 1103 for connecting with other devices via a communication network. The communication network may be an ethernet, a radio access network, a wireless local area network (wireless local area networks, WLAN), etc.

The memory 1101 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), magnetic disk storage or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

As a possible implementation, the memory 1101 may exist separately from the processor 1102, and the memory 1101 may be connected to the processor 1102 by a bus 1104 for storing instructions or program code. The processor 1102, when calling and executing instructions or program code stored in the memory 1101, is capable of implementing the potential user analysis method provided by the embodiments of the present invention.

In another possible implementation, the memory 1101 may also be integrated with the processor 1102.

Bus 1104 may be an extended industry standard architecture (extended industry standard architecture, EISA) bus or the like. The bus 1104 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 11, but not only one bus or one type of bus.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the potential user analysis device is divided into different functional modules to perform all or part of the functions described above.

Embodiments of the present application also provide a computer-readable storage medium. All or part of the flow in the above method embodiments may be implemented by computer instructions to instruct related hardware, and the program may be stored in the above computer readable storage medium, and the program may include the flow in the above method embodiments when executed. The computer readable storage medium may be any of the foregoing embodiments or memory. The computer readable storage medium may be an external storage device of the potential user analysis apparatus, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like provided in the potential user analysis apparatus. Further, the computer readable storage medium may also include both an internal storage unit and an external storage device of the potential user analysis apparatus. The computer readable storage medium is used to store the computer program described above as well as other programs and data required by the potential user analysis device described above. The above-described computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application also provide a computer program product comprising a computer program which, when run on a computer, causes the computer to perform any of the potential user analysis methods provided in the embodiments described above.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of potential user analysis, the method comprising:

acquiring user data of a target user group, wherein the target user group is a user group with high flow demand;

according to the user data, analyzing a plurality of target characteristics of the target user group, wherein the target characteristics are used for reflecting the requirement of the target user group on the FTTR broadband of the optical fiber into the room;

determining key features and feature thresholds at least according to the target features, wherein the key features are features with importance degrees larger than a preset threshold value in the target features, and the feature thresholds are used for representing the lowest standard of the key features corresponding to users handling the FTTR broadband;

Based on the key features and the feature thresholds, potential users of the FTTR broadband in the target user group are analyzed.

2. The method of claim 1, wherein the determining key features and feature thresholds based at least on the plurality of target features comprises:

marking positive and negative samples of the user data to obtain positive sample data and negative sample data; the positive sample data is data of a user who has transacted the FTTR broadband, and the negative sample data is data of a user who has not transacted the FTTR broadband;

inputting the positive sample data and the negative sample data into a model comprising a target algorithm to obtain the importance of each target feature; wherein the objective algorithm is an algorithm for predicting the importance of the feature;

selecting a target feature with the importance degree larger than the preset threshold value from the target features based on the importance degree of the target feature as the key feature;

and determining the characteristic threshold according to the positive sample data and the key characteristic.

3. The method of claim 2, wherein the target algorithm comprises: a first algorithm and a second algorithm; the first algorithm corresponds to a first weight, and the second algorithm corresponds to a second weight; the step of inputting the positive sample data and the negative sample data into a model comprising a target algorithm to obtain the importance of each target feature comprises the following steps:

Respectively inputting the positive sample data and the negative sample data into a first model comprising the first algorithm and a second model comprising the second algorithm, and obtaining the importance of each target feature output by the first model and the importance of each target feature output by the second model;

and obtaining the importance of each target feature based on the importance of each target feature and the first weight output by the first model and the importance of each target feature and the second weight output by the second model.

4. The method of claim 2, wherein said determining the feature threshold from the positive sample data and the key feature comprises:

determining data corresponding to the key features from the positive sample data;

determining the feature threshold according to a histogram of the data corresponding to the key feature when the data corresponding to the key feature is discrete data;

and under the condition that the data corresponding to the key features are continuous data, carrying out box division on the data corresponding to the key features to obtain the feature threshold.

5. A potential user analysis device, the device comprising:

the system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring user data of a target user group, wherein the target user group is a user group with high flow demand;

the processing module is used for analyzing a plurality of target characteristics of the target user group according to the user data, wherein the target characteristics are used for reflecting the requirement of the target user group on the FTTR broadband of the optical fiber into the room;

the determining module is used for determining key features and feature thresholds at least according to the target features, wherein the key features are features with importance degrees larger than a preset threshold value in the target features, and the feature thresholds are used for representing the lowest standard of the key features corresponding to the user handling the FTTR broadband;

and the analysis module is used for analyzing potential users of the FTTR broadband in the target user group based on the key characteristics and the characteristic threshold value.

6. The apparatus of claim 5, wherein the device comprises a plurality of sensors,

the determining module is specifically configured to perform positive and negative sample marking on the user data to obtain positive sample data and negative sample data; the positive sample data is data of a user who has transacted the FTTR broadband, and the negative sample data is data of a user who has not transacted the FTTR broadband; inputting the positive sample data and the negative sample data into a model comprising a target algorithm to obtain the importance of each target feature; selecting a target feature with the importance degree larger than the preset threshold value from the target features based on the importance degree of the target feature as the key feature; and determining the characteristic threshold according to the positive sample data and the key characteristic.

7. The apparatus of claim 6, wherein the device comprises a plurality of sensors,

the target algorithm comprises the following steps: a first algorithm and a second algorithm; the first algorithm corresponds to a first weight, and the second algorithm corresponds to a second weight; the determining module is further configured to input the positive sample data and the negative sample data to a first model including the first algorithm and a second model including the second algorithm, respectively, to obtain importance of each target feature output by the first model, and importance of each target feature output by the second model; and obtaining the importance of each target feature based on the importance of each target feature and the first weight output by the first model and the importance of each target feature and the second weight output by the second model.

8. The apparatus of claim 6, wherein the device comprises a plurality of sensors,

the determining module is further used for determining data corresponding to the key features from the positive sample data; determining the feature threshold according to a histogram of the data corresponding to the key feature when the data corresponding to the key feature is discrete data; and under the condition that the data corresponding to the key features are continuous data, carrying out box division on the data corresponding to the key features to obtain the feature threshold.

9. An electronic device comprising a processor and a memory, the processor coupled to the memory; the memory is for storing computer instructions that are loaded and executed by the processor to cause a computer device to implement the method of potential user analysis of any one of claims 1 to 4.

10. A computer readable storage medium comprising computer executable instructions which, when run on a computer, cause the computer to perform the potential user analysis method of any one of claims 1 to 4.