WO2020134299A1

WO2020134299A1 - Indoor and outdoor label distinguishing method, training method and device of classifier and medium

Info

Publication number: WO2020134299A1
Application number: PCT/CN2019/109438
Authority: WO
Inventors: 钟勇才
Original assignee: 中兴通讯股份有限公司
Priority date: 2018-12-25
Filing date: 2019-09-30
Publication date: 2020-07-02
Also published as: CN111368862A

Abstract

An indoor and outdoor label distinguishing method, a training method and device of a classifier and a medium. The distinguishing method comprises: collecting measurement report data of a target user (S101); inputting the measurement report data of the target user into a random forest classifier for classifying user indoor and outdoor labels (S102); and determining the indoor and outdoor label of the target user according to the classified counting of the random forest classifier (S103).

Description

Method for distinguishing indoor and outdoor marks, training method for classifier, equipment and medium

This article claims the priority of the Chinese patent application CN201811595402.5, entitled "Division method for indoor and outdoor marks, training method for classifiers and equipment and media", filed on December 25, 2018, the entire contents of which are incorporated herein by reference in.

Technical field

This article relates to the field of communications, and in particular to a method for distinguishing indoor and outdoor tags, a training method for classifiers, equipment and media.

Background technique

In the mobile Internet era, people's lifestyles and behaviors have been changed by smart terminals. People habitually use location-based services (LBS, Location Based Service) to find shopping malls, hospitals, banks, even friends, etc. Some of the mobile services occur indoors, and some mobile services occur outdoors. How to accurately determine whether a mobile service user is indoors or outdoors is very important for a specific room. For example, distinguishing between indoor and outdoor users can solve the problem of how operators can accurately identify deep coverage, and customize accurate station addition solutions accordingly. If the indoor coverage is insufficient, add a room substation; if the outdoor coverage is insufficient, add an outdoor station: for the elderly or children in need of care, you can judge whether they are in the room or area by indoor and outdoor; and within the company Access the network, once you leave the office building, you cannot access company information, etc.

According to the demand analysis of the above applications, indoor and outdoor differentiation of mobile services requires high real-time performance and high accuracy. However, in some cases, there are problems of low efficiency, high misjudgment rate, and real-time performance in determining the distinction between indoor and outdoor mobile users.

Summary of the invention

In order to overcome the above defects, the technical problem to be solved in this paper is to provide a method for distinguishing indoor and outdoor marks, a training method for classifiers, equipment and media, to at least solve the problem of high misjudgment rate in determining indoor and outdoor marks of users .

In order to solve the above technical problems, a method for distinguishing indoor and outdoor marks of users in the embodiments of this document includes: collecting measurement report data of a target user; and inputting the measurement report data of the target user into indoor and outdoor marks for classifying users Random forest classifier; determine indoor and outdoor marks of the target user according to the classification calculation of the random forest classifier.

In order to solve the above technical problems, a training method of a random forest classifier in the embodiment of this document includes: extracting training data from the collected measurement report data of the sample users in the target area and the actual indoor and outdoor tags corresponding to each piece of training data Set; input the training data set into a preset random forest classification model for training; during the training process, search the optimal model parameters of the random forest classification model through a grid; correspond to the optimal model parameters The random forest classification model is used as the random forest classifier.

To solve the above technical problem, a communication node device in the embodiments herein includes a memory and a processor. The memory stores a user's indoor and outdoor marking programs, and the processor executes the computer program to implement the above distinction method. A step of.

In order to solve the above technical problem, a random forest classifier training device in the embodiments herein includes a memory and a processor, the memory stores a random forest classifier training program, and the processor executes the computer program to Steps to achieve the above training method.

To solve the above technical problem, a computer-readable storage medium in the embodiments herein stores a user's indoor and outdoor labeling program, and the computer program may be executed by at least one processor to implement the steps of the above distinguishing method.

To solve the above technical problem, a computer-readable storage medium in the embodiments herein stores a training program of a random forest classifier, and the computer program may be executed by at least one processor to implement the steps of the training method above.

The above description in this article is only an overview of the technical solutions in this article. In order to understand the technical means in this article more clearly, it can be implemented in accordance with the content of the specification, and in order to make the above and other purposes, features and advantages of this article more obvious and understandable, the following Specific implementation of this article.

BRIEF DESCRIPTION

By reading the detailed description of the preferred embodiments below, various other advantages and benefits will become clear to those of ordinary skill in the art. The drawings are only for the purpose of showing the preferred embodiments, and are not considered as limitations to this document. Furthermore, throughout the drawings, the same reference symbols are used to denote the same components. In the drawings:

FIG. 1 is a flowchart of a method for distinguishing indoor and outdoor marks of users in an embodiment of this document;

2 is a flowchart of a method for selectively distinguishing indoor and outdoor user marks in the embodiment of this document;

FIG. 3 is a prediction effect diagram of indoor and outdoor marks of a target user in the embodiment of this document.

detailed description

Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although the exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

In the subsequent description, the use of suffixes such as "module", "part" or "unit" used to denote elements is only for the benefit of the description herein, and has no specific meaning in itself. Therefore, "module", "component" or "unit" can be used in a mixed manner.

The use of prefixes such as "first", "second", etc. for distinguishing elements is only for the benefit of the description herein, and has no specific meaning in itself.

Example one

The embodiments herein provide a method for distinguishing indoor and outdoor marks of users. As shown in FIG. 1, the method includes: S101, collecting measurement report data (MR, Measurement) of a target user; S102, measuring the target user The report data is input to a random forest classifier for classifying indoor and outdoor marks POSITIONMARK_REAL of users; S103, the indoor and outdoor marks of the target user are determined according to the classification calculation of the random forest classifier.

The target user refers to the user to be located, and the user generally refers to the mobile user. MR records the mobile user's serving cell ID (identification), RSRP (test power value), (LTE reference signal reception quality rsrq), TA_CALC (delay), AOA (incidence angle), STARTTIME (start time) , ENDTIME (end time), IMSI (International Mobile Subscriber Identity) and other wireless measurement information. The MR data of the target user collected in the embodiment of this document includes AOA (angle of incidence), TA_CALC (time delay), RSRP (test power value), TADLTVALUE (downlink time delay), TIME_DIFFERENCE (time difference endtime-starttime). Indoor and outdoor marks are used to mark whether a user is indoor or outdoor, and can also be described as indoor or outdoor marks or indoor and outdoor marks.

The method in the embodiment of this document can be applied to the communication node side, for example, the base station side; in the determination process, the base station can collect MR data of the target user in real time, so the MR data in the embodiment of this document can also be described as real-time MR data. Since the determination process is realized by the classification calculation of the random forest classifier, the determination process is also a prediction process.

In this embodiment, the collected target user's MR data is input to a random forest classifier for classification calculation, so that the indoor and outdoor marks of the target user can be determined according to the classification calculation, and thus the false judgment rate can be effectively reduced in determining the indoor and outdoor marks of the user, and Judgment based on MR data effectively guarantees the real-time nature of the process of determining user indoor and outdoor marking.

Based on the foregoing embodiments, several specific and optional implementations are given below to refine and optimize the embodiments in this document, so that the implementation of the solutions in the embodiments in this document is more convenient and accurate. It should be noted that the following embodiments can be arbitrarily combined with each other without conflict.

In order to effectively ensure the real-time performance in determining the indoor and outdoor labeling of the user, in some embodiments, the input of the measurement report data of the target user before the random forest classifier used to classify the indoor and outdoor labeling of the user includes: Collect the measurement report data of the sample users in the target area, and the indoor or outdoor tags corresponding to each measurement report data; extract the training from the collected measurement report data of the sample users in the target area and the actual indoor and outdoor tags corresponding to each training data Data set; input the training data set into a preset random forest classification model for training; during the training process, search the optimal model parameters of the random forest classification model through the GRIDSEARCHCV grid; enter the optimal model The random forest classification model corresponding to the parameter serves as the random forest classifier.

The target area may be a designated area, and the model parameters may include the number of decision trees N_ESTIMATORS and the calculated attribute CRITERION; the random forest classification model may be implemented through Python code, and the random forest classification model in the embodiment of this document may be simply referred to as a model. Of course, before inputting the training data set and the actual indoor and outdoor markers corresponding to each piece of training data into the preset random forest classification model for training, in order to improve the prediction accuracy of the indoor and outdoor markers of the user, the collected target can be The measurement report data of the sample users in the area and the indoor or outdoor tags corresponding to each measurement report data are used as the original data, and the original data is preprocessed to remove abnormal data.

In the prediction process, the AOA (incidence angle), TA_CALC (time delay), RSRP (test power value), TADLTVALUE (downlink time delay), TIME_DIFFERENCE (time difference endtime-starttime) and other features in the training data set are extracted as independent variables X, The corresponding POSITIONMARK_REAL (indoor and outdoor mark) is set as the dependent variable Y, and the independent variable X is used to determine the indoor and outdoor mark Y; that is, each training data in the training data set is set as an independent variable, and each training data is set The corresponding actual indoor and outdoor markers are set as dependent variables determined by the independent variables, which can be regarded as a 0-1 classification problem, which can effectively reduce the complexity of the random forest classifier training process and effectively improve the indoor and outdoor markers of users Prediction accuracy.

In the prediction process, the random forest classification model obtained by the verification training can also be predicted through the test data set, and the prediction accuracy of the obtained indoor and outdoor marks of the user can be guaranteed by the prediction verification. In other words, data preprocessing is performed on the original data to remove abnormal data, and feature values are extracted from the original data from which the abnormal data is removed to obtain a data set, and the data set is divided into a training data set and a test data set. The test data set is continuously input into the trained random forest model for cross-prediction and verification until a relatively superior model is found as the final random forest classification model. That is, using the random forest classification model corresponding to the optimal model parameter as the random forest classifier may include: extracting a test data set from the sample measurement report data; and inputting the test data set Go to the random forest classification model corresponding to the optimal model parameters for prediction verification; determine the minimum mean square error between the prediction verification result and the actual indoor and outdoor markers corresponding to the test data set; where the mean square error is not greater than When a preset threshold is used, the random forest classification model corresponding to the optimal model parameter is used as the random forest classifier; when the mean square error is greater than the threshold, the random forest classification model is searched again through a grid Optimal model parameters.

Example 2

Based on Embodiment 1, the embodiments herein provide a specific method for distinguishing indoor and outdoor user marks. As shown in FIG. 2, the method is mainly divided into two stages: an offline stage and an online stage. The offline stage is mainly used for random forest classification Training, the online phase is mainly used for real-time prediction of the target, including:

Step 201: Collect MR data of the sample user in the target area.

Select a designated area and collect 12,000 MR data reported by users on the base station side. MR data records the user's service cell ID, TA_CALC, RSRP, RSRQ, TA, AOA, MRTIME, STARTTIME, ENDTIME, IMSI and other wireless measurement information during the business process, and the POSTIONMARK_REAL indoor and outdoor marks corresponding to each measurement information.

Step 202, abnormal data processing.

Replace the abnormal data or null values of each field in the 12,000 MR data taken with 0, and perform orthogonal normalization processing on the entire data matrix. Randomly select 75% of the data in the data set as the training set, and 25% of the data as the test set are saved in two files.

Step 203: Select MR data corresponding to the feature value.

Because there are many index items recorded by MR, it has a lot of influence on the calculation and accuracy of the entire model. In order to improve the calculation and accuracy of the model, the characteristic values such as AOA (angle of incidence), TA_CALC (delay), RSRP (test power value), TADLTVALUE (downlink delay), TIME_DIFFERENCE (time difference endtime-starttime) are selected as independent variables X , The corresponding POSITIONMARK_REAL (indoor and outdoor mark) is set as the dependent variable Y. In this way, the problem is converted into a mathematical problem, and the indoor variable X is determined by the environmental variable X, which can be regarded as a 0-1 classification problem. In the embodiment of this article, the random forest classification model has better accuracy and generalization. Build a random forest classification model through Python code, input the training data set into the RANDOMFORESTCLASSIFIER model and start training.

Step 204: Train the model to optimize model parameters.

Input the training data set into the RANDOMFORESTCLASSIFIER model, and then search the optimal number of decision trees N_ESTIMATORS and the calculated attribute CRITERION through the GRIDSEARCHCV grid for the random forest classification algorithm; input the test data set into the trained model for cross-validation verification. If the error is smaller, select the model, otherwise continue to adjust the model parameters until the error of the model verification test data is small enough.

Step 205. A mechanism for measuring the accuracy of the model.

Input the test set data into the trained random forest classification model for cross-prediction verification,

Crossover prediction verifies the minimum mean square error between the predicted value and the true value of the test set data. The model is better if the error is smaller, otherwise it is worse. Each time the accuracy of the prediction data set of the model is recorded, the model with the highest accuracy rate is selected, and the model is saved; when the mean square error is not greater than a preset threshold, the corresponding A random forest classification model serves as the random forest classifier; when the mean square error is greater than the threshold, the grid is again searched for optimal model parameters of the random forest classification model.

Step 206: Collect real-time MR data of the target user.

Randomly select target users in an area and collect MR real-time data of some target users on the base station side, including at least AOA (angle of incidence), TA_CALC (time delay), RSRP (test power value), TADLTVALUE (downlink delay), TIME_DIFFERENCE( Time difference (endtime-starttime) several indicators, and then use these indicators to actually predict the indoor and outdoor marks of mobile users.

Step 207: Real-time MR data preprocessing.

There may be abnormal or null data in the real-time data, and these abnormal data are replaced with 0, and several indexes corresponding to the training model are selected as the feature values. Orthogonal normalization of eigenvalue data can effectively avoid the occurrence of overfitting.

Step 208: Real-time MR data is input into a random forest classifier for prediction.

As shown in Figure 3, the processed real-time MR data is input into the previously trained random forest classifier, and then fitted by the random forest classifier.

Step 209: The indoor and outdoor marking results corresponding to the real-time MR data of these target users can be obtained.

The embodiments herein effectively improve the prediction accuracy of indoor and outdoor markings of users, and effectively ensure the real-time nature of the process of determining indoor and outdoor markings of users.

Example Three

The embodiments herein provide a training method for a random forest classifier. The method includes: extracting training data sets from the collected measurement report data of sample users in the target area and the actual indoor and outdoor tags corresponding to each piece of training data; The training data set is input into a preset random forest classification model for training; during the training process, the grid is searched for the optimal model parameters of the random forest classification model; the random forest classification corresponding to the optimal model parameters is classified The model serves as the random forest classifier.

The training process of the random forest classifier of the embodiment of this document is the same as the training process of the first embodiment. For specific implementation, refer to the first embodiment, which has corresponding technical effects.

Example 4

The embodiments herein provide a communication node device, wherein the device includes a memory and a processor, the memory stores a user's indoor and outdoor marking programs, and the processor executes the computer program to implement the first embodiment and The steps of the method according to any one of the second embodiment. The communication node device may be a base station or the like.

Example 5

The embodiments herein provide a random forest classifier training device. The device includes a memory and a processor. The memory stores a random forest classifier training program. The processor executes the computer program to implement the embodiment. Three steps of the method.

Example Six

The embodiments herein provide a computer-readable storage medium, wherein the storage medium stores a user's indoor and outdoor marking programs, and the computer program may be executed by at least one processor to implement the first and second embodiments. Any one of the steps of the method.

Example 7

The embodiments herein provide a computer-readable storage medium, wherein the storage medium stores a random forest classifier training program, and the computer program may be executed by at least one processor to implement the method described in Embodiment 3. step.

It should be noted that the specific implementation of the third to seventh embodiments can refer to the first embodiment, which has corresponding technical effects.

It should be noted that in this article, the terms "include", "include" or any other variant thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device that includes a series of elements includes not only those elements, It also includes other elements that are not explicitly listed, or include elements inherent to this process, method, article, or device. Without more restrictions, the element defined by the sentence "including one..." does not exclude that there are other identical elements in the process, method, article or device that includes the element.

The sequence numbers of the above embodiments of the present invention are for description only, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods in the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware, but in many cases the former is better Implementation. Based on this understanding, the technical solutions in this article can be embodied in the form of software products, which can be embodied in the form of software products. The computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) ) Includes several instructions to enable a terminal (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to perform the methods described in the various embodiments of this document.

The beneficial effects of the embodiments herein are as follows: In each of the above embodiments, the collected target user's MR data is input to a random forest classifier for classification calculation, so that the indoor and outdoor marks of the target user can be determined according to the classification calculation, and then the user indoor In terms of external marking, the rate of misjudgment is effectively reduced, and the judgment is based on MR data, which effectively guarantees the real-time nature of determining the indoor and outdoor marking process of users.

The embodiments of this document have been described above in conjunction with the drawings, but this document is not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are merely illustrative, not limiting, and those of ordinary skill in the art Inspired by this, there are many forms that can be made without departing from the scope of the purpose and claims of this article, which are all covered by this article.

Claims

A method for distinguishing indoor and outdoor user marks, wherein the method includes:

Collect measurement report data of target users;

Input the measurement report data of the target user into a random forest classifier for classifying indoor and outdoor tags of users;

According to the classification calculation of the random forest classifier, determine indoor and outdoor marks of the target user.
The method according to claim 1, wherein the inputting the measurement report data of the target user into the random forest classifier used to classify the indoor and outdoor marks of the user includes:

Extract the training data set from the collected measurement report data of the sample users in the target area and the actual indoor and outdoor tags corresponding to each piece of training data;

Input the training data set into a preset random forest classification model for training;

During the training process, search the optimal model parameters of the random forest classification model through a grid;

A random forest classification model corresponding to the optimal model parameter is used as the random forest classifier.
The method according to claim 2, wherein the inputting the training data set into a preset random forest classification model for training includes:

Each training data in the training data set is set as an independent variable, and the actual indoor and outdoor marks corresponding to each training data are set as dependent variables determined by the independent variables.
The method according to claim 2, wherein the random forest classification model corresponding to the optimal model parameter as the random forest classifier includes:

Extract test data sets from the sample measurement report data;

Input the test data set into the random forest classification model corresponding to the optimal model parameter for prediction and verification;

Determine the minimum mean square error between the prediction verification result and the actual indoor and outdoor marks set corresponding to the test data set;

When the mean square error is not greater than a preset threshold, use the random forest classification model corresponding to the optimal model parameter as the random forest classifier;

When the mean square error is greater than the threshold, the grid is searched again for the optimal model parameters of the random forest classification model.
The method according to any one of claims 1-4, wherein the measurement report data includes an incident angle, a time delay, a test power value, a downlink time delay, and a time difference.
A random forest classifier training method, wherein the method includes:

Extract the training data set from the collected measurement report data of the sample users in the target area and the actual indoor and outdoor tags corresponding to each piece of training data;

Input the training data set into a preset random forest classification model for training;

During the training process, search the optimal model parameters of the random forest classification model through a grid;

A random forest classification model corresponding to the optimal model parameter is used as the random forest classifier.
A communication node device, wherein the device includes a memory and a processor, the memory stores a user's indoor and outdoor marking programs, and the processor executes the computer program to implement any one of claims 1-5 Item of the method.
A random forest classifier training device, wherein the device includes a memory and a processor, the memory stores a random forest classifier training program, and the processor executes the computer program to implement the claim 6 Describe the steps of the method.
A computer-readable storage medium, wherein the storage medium stores a user's indoor and outdoor marking program, and the computer program can be executed by at least one processor to implement the method according to any one of claims 1-5 A step of.
A computer-readable storage medium, wherein the storage medium stores a training program of a random forest classifier, and the computer program is executable by at least one processor to implement the steps of the method of claim 6.