CN116663677A

CN116663677A - Model training method and device, electronic equipment and storage medium

Info

Publication number: CN116663677A
Application number: CN202310678757.5A
Authority: CN
Inventors: 冯朝阳; 尹卜一; 焦恒建; 王畔
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2023-06-08
Filing date: 2023-06-08
Publication date: 2023-08-29

Abstract

The present disclosure provides a model training method, device, equipment and medium, the method comprising: acquiring historical wireless fidelity WiFi scanning information and historical WiFi distribution information; the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point; generating co-occurrence information corresponding to the interest points based on second WiFi information associated with the interest points and at least one first WiFi information in each group of WiFi scanning information for each interest point; generating training sample data based on the co-occurrence information and a label corresponding to the co-occurrence information for each co-occurrence information; training the model to be trained based on the data of each training sample to obtain a trained model; the trained model is used for determining corresponding interest point positioning information based on the current WiFi scanning information. The embodiment of the disclosure is beneficial to improving the accuracy of the positioning result.

Description

Model training method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of positioning technologies, and in particular relates to a model training method, a model training device, electronic equipment and a computer storage medium.

Background

WiFi fingerprint location technology is one of the important technologies for realizing point of interest (Point of Interest, POI) location, and is widely used in the industry because it does not require manual deployment of equipment and has good spatial distribution characteristics. In a common fingerprint positioning technology, a current interest point of a user is determined based on cross information between WiFi information scanned by the user through an electronic device and collected historical WiFi fingerprint information, if error exists in the cross information, the reliability of a final positioning result is affected, and therefore, how to improve the reliability of the positioning result is a current urgent problem to be solved.

Disclosure of Invention

The embodiment of the disclosure at least provides a model training method, a device, electronic equipment and a storage medium, which can improve the credibility of a positioning result.

The embodiment of the disclosure provides a model training method, which comprises the following steps:

acquiring historical wireless fidelity WiFi scanning information and historical WiFi distribution information; the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point;

Generating co-occurrence information corresponding to each interest point based on second WiFi information associated with the interest point and at least one first WiFi information in each group of WiFi scanning information for each interest point; the co-occurrence information is used for representing the scanning state of each first WiFi information in each group relative to the second WiFi information associated with the interest point;

generating training sample data for each co-occurrence information based on the co-occurrence information and a label corresponding to the co-occurrence information; the label corresponding to the co-occurrence information is determined based on preset offline behavior information associated with the interest point corresponding to the co-occurrence information;

training the model to be trained based on the data of each training sample to obtain a trained model; the trained model is used for determining corresponding interest point positioning information based on the current WiFi scanning information.

In the embodiment of the disclosure, co-occurrence information corresponding to the interest point is constructed based on the historical WiFi distribution information and the historical WiFi scanning information, and because the co-occurrence information characterizes the scanning state of the historical scanned WiFi information relative to the second WiFi information associated with the interest point, training sample data is generated based on the co-occurrence information to train a model to be trained, so that the memory of the model on the association relationship between the WiFi scanning information and the interest point can be enhanced, a trained model is obtained, further, interest point prediction can be performed based on the trained model, and the reliability of a positioning result is further improved.

In one possible implementation, each point of interest has corresponding point of interest identification information; the generating co-occurrence information corresponding to the interest point based on the second WiFi information associated with the interest point and at least one first WiFi information in each set of WiFi scan information includes:

generating scanning distribution identification information corresponding to each interest point based on second WiFi information associated with the interest point and the first WiFi information in each group aiming at each interest point;

and generating the co-occurrence information corresponding to the interest point based on the interest point identification information of the interest point and the scanning distribution identification information corresponding to the interest point.

In the embodiment of the disclosure, the co-occurrence information corresponding to the interest point is generated based on the unique interest point identification information of the interest point and the scanning distribution identification information corresponding to the interest point, so that the co-occurrence information has the characteristics of the interest point and the corresponding associated WiFi characteristics, and when the model is trained based on the co-occurrence information in the subsequent step, the model can learn the association relationship between the interest point and the corresponding WiFi, which is beneficial to improving the precision of the model.

In a possible implementation manner, the first WiFi information includes a first WiFi and a signal strength of the first WiFi, the second WiFi information includes a second WiFi and a signal strength of the second WiFi, and a distribution sequence exists between the second WiFi associated with each interest point, and the distribution sequence is determined by the signal strengths of the respective second WiFi;

generating, for each interest point, scan distribution identification information corresponding to the interest point based on the second WiFi information associated with the interest point and the first WiFi information in each group, including:

generating sub-scan distribution identification information for each second WiFi associated with each point of interest based on the second WiFi and each first WiFi in each group;

and splicing all sub-scanning distribution identification information according to the distribution sequence among the second WiFi related to the interest points to generate the scanning distribution identification information.

In the embodiment of the disclosure, since each interest point is associated with at least one second WiFi, corresponding sub-scanning distribution identification information can be generated for each second WiFi, and then each sub-scanning distribution identification information is spliced to obtain scanning distribution identification information, so that accuracy of the scanning distribution identification information can be improved, accuracy of subsequent generated training sample data is improved, and accuracy of model training is improved.

In a possible implementation manner, the sub-scanning distribution identification information corresponds to one scanning zone bit and a plurality of signal strength zone bits, and different signal strength zone bits are used for representing different signal strengths; the generating sub-scan distribution identification information for each second WiFi associated with each point of interest based on the second WiFi and each first WiFi in each group includes:

for each second WiFi, determining that the value of the scanning flag bit is 1 when there is a target first WiFi with the same media access control address MAC address as the second WiFi;

determining a target signal strength zone bit corresponding to the signal strength of the target first WiFi from the signal strength zone bits, determining the value of the target signal strength zone bit as 1, and determining the values of other signal strength zone bits except the target signal strength zone bit as 0;

and generating the sub-scanning distribution identification information based on the value of the scanning zone bit and the value of the signal strength zone bit.

In the embodiment of the disclosure, the sub-scanning distribution identification information is determined by determining the value of the scanning zone bit and the value of the signal strength zone bit, which is beneficial to improving the accuracy of the sub-scanning distribution identification information.

In one possible implementation, the format of the co-occurrence information is a character string format; for each co-occurrence information, generating training sample data based on the co-occurrence information and a tag corresponding to the co-occurrence information, including:

performing feature processing on the co-occurrence information of the character string format aiming at each piece of co-occurrence information to generate a feature vector corresponding to the co-occurrence information;

acquiring preset offline behavior information associated with the interest points corresponding to the co-occurrence information, and determining a true value corresponding to the tag based on the preset offline behavior information;

and generating the training sample data based on the feature vector and the true value corresponding to the label.

In the embodiment of the disclosure, the co-occurrence information in the character string format is converted into the feature vector, so that model training of subsequent steps is facilitated. In addition, the true value of the feature vector is determined based on the preset offline behavior information, and the true value is calibrated for the feature vector, so that the accuracy of the training sample data can be improved.

In a possible implementation manner, the determining the true value corresponding to the tag based on the preset offline behavior information includes:

Under the condition of the occurrence of the preset offline behavior, determining the true value corresponding to the label as 1; or alternatively, the process may be performed,

and under the condition that the preset offline behavior does not occur, determining the true value corresponding to the label as 0.

In the embodiment of the disclosure, if the preset offline behavior instruction is performed in the interest point, the true value corresponding to the tag is determined to be 1, and if the preset offline behavior instruction is not performed in the interest point, the true value corresponding to the tag is determined to be 0, so that the accuracy of the true value determination can be improved, and the accuracy of the training sample data can be improved.

In one possible implementation manner, the feature processing is performed on the co-occurrence information in the character string format for each co-occurrence information, so as to generate a feature vector corresponding to the co-occurrence information, and the feature vector includes:

numbering each co-occurrence information of the character string format according to a preset numbering mode, and constructing a feature dictionary based on the numbers corresponding to each co-occurrence information; the feature dictionary is used for converting the co-occurrence information of the character string format into feature vectors;

and aiming at the co-occurrence information of each character string format, performing single-heat encoding processing on the co-occurrence information of the character string format based on the feature dictionary, and generating feature vectors corresponding to the co-occurrence information.

In the embodiment of the disclosure, the co-occurrence information of the character string format is subjected to feature conversion through the constructed feature dictionary, so that the efficiency of subsequent model training is improved. In addition, the corresponding feature vector can be represented by one state code through the single-hot coding, so that the logic of feature conversion can be simplified through the single-hot coding mode.

In one possible implementation manner, the performing single-hot encoding processing on the co-occurrence information of the character string format based on the feature dictionary to generate a feature vector corresponding to the co-occurrence information includes:

determining the number of numbers contained in the feature dictionary, and creating a zero vector with a vector length of the number;

and modifying the value of a target index bit which is the same as the number in the zero vector into 1 according to the number corresponding to the co-occurrence information in the feature dictionary for each co-occurrence information, and generating a feature vector corresponding to the co-occurrence information.

In the embodiment of the disclosure, for each co-occurrence information, the target index bit in the zero vector is modified based on the number of the co-occurrence information in the feature dictionary, so that the accuracy of the feature vector can be improved.

In a possible implementation manner, the training the model to be trained based on each training sample data to obtain a trained model includes:

Inputting each training sample data into the model to be trained to obtain a prediction result corresponding to each training sample data;

based on the prediction result corresponding to each training sample data and the label of each training sample data, adjusting model parameters of the model to be trained;

repeating the above process until the training result meets the preset requirement, and obtaining the trained model.

In the embodiment of the disclosure, the model is subjected to supervised training based on the prediction result corresponding to each training sample data and the label of each training sample data, so that the performance of model training can be improved, and the prediction precision of the model is further improved.

The embodiment of the disclosure provides a method for positioning an interest point, which comprises the following steps:

acquiring current WiFi scanning information of current equipment and associated WiFi information corresponding to each interest point corresponding to an area where the current equipment is located;

generating current co-occurrence information based on the current WiFi scanning information and associated WiFi information corresponding to the interest points aiming at each interest point;

inputting each piece of current co-occurrence information into a trained model to obtain the interest point positioning information of the current equipment; wherein the trained model is obtained by the model training method of any one of claims 1-9.

In the embodiment of the disclosure, the interest point positioning information of the current equipment is determined based on the trained model, so that the accuracy of the interest point positioning can be improved.

The embodiment of the disclosure provides a model training device, comprising:

the information acquisition module is used for acquiring historical wireless fidelity WiFi scanning information and historical WiFi distribution information; the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point;

the information generation module is used for generating co-occurrence information corresponding to each interest point based on the second WiFi information associated with the interest point and at least one first WiFi information in each group of WiFi scanning information; the co-occurrence information is used for representing the scanning state of each first WiFi information in each group relative to the second WiFi information associated with the interest point;

the sample generation module is used for generating training sample data according to the co-occurrence information and the labels corresponding to the co-occurrence information aiming at each co-occurrence information; the label corresponding to the co-occurrence information is determined based on preset offline behavior information associated with the interest point corresponding to the co-occurrence information;

The model training module is used for training the model to be trained based on the training sample data to obtain a trained model; the trained model is used for determining corresponding interest point positioning information based on the current WiFi scanning information.

In one possible implementation, each point of interest has corresponding point of interest identification information; the information generation module is specifically configured to:

In a possible implementation manner, the first WiFi information includes a first WiFi and a signal strength of the first WiFi, the second WiFi information includes a second WiFi and a signal strength of the second WiFi, and a distribution sequence exists between the second WiFi associated with each interest point, and the distribution sequence is determined by the signal strengths of the respective second WiFi; the information generation module is specifically configured to:

In a possible implementation manner, the sub-scanning distribution identification information corresponds to one scanning zone bit and a plurality of signal strength zone bits, and different signal strength zone bits are used for representing different signal strengths; the information generation module is specifically configured to:

In one possible implementation, the format of the co-occurrence information is a character string format; the sample generation module is specifically configured to:

In one possible implementation manner, the sample generation module is specifically configured to:

In one possible implementation, the model training module is specifically configured to:

inputting each training sample data into the model to be trained to obtain a prediction result corresponding to each training sample;

based on the prediction result corresponding to each training sample and the label of each training sample data, adjusting model parameters of the model to be trained;

The embodiment of the disclosure provides an interest point positioning device, which comprises:

the acquisition module is used for acquiring current WiFi scanning information of the current equipment and associated WiFi information corresponding to each interest point corresponding to the area where the current equipment is located;

the generating module is used for generating current co-occurrence information according to the current WiFi scanning information and the associated WiFi information corresponding to the interest points aiming at each interest point;

the positioning module is used for inputting each piece of current co-occurrence information into the trained model to obtain the interest point positioning information of the current equipment; wherein the trained model is obtained by the model training method of any one of claims 1-9.

The embodiment of the disclosure also provides an electronic device, including: a processor, a memory and a bus, the memory storing machine readable instructions executable by the processor, the processor in communication with the memory via the bus when the electronic device is running, the machine readable instructions when executed by the processor performing the steps of the model training method or the steps of the point of interest positioning method described in any of the possible embodiments described above.

The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the model training method described in any one of the possible implementations above or the steps of the point of interest positioning method described above.

The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.

FIG. 1 illustrates a flow chart of a model training method provided by an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of a method for generating co-occurrence information provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a generation process of sub-scan distribution identification information provided by an embodiment of the present disclosure;

FIG. 4 illustrates a flowchart of a method for generating training data samples provided by embodiments of the present disclosure;

FIG. 5 illustrates a flow chart of a feature processing method for co-occurrence information provided by an embodiment of the present disclosure;

FIG. 6 illustrates a flow chart of a method of interest point location provided by an embodiment of the present disclosure;

FIG. 7 illustrates a schematic diagram of a model training apparatus provided by an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a point of interest locating device according to an embodiment of the present disclosure;

fig. 9 shows a schematic diagram of an electronic device provided by an embodiment of the disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The term "and/or" is used herein to describe only one relationship, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.

According to research, in the common fingerprint positioning technology, the current interest point of the user is usually determined based on the cross information between the WiFi information scanned by the electronic equipment and the WiFi fingerprint information collected by the history, and the accuracy degree of the cross information calculation directly influences the reliability of the final positioning result, so that the accuracy degree of the cross information is improved, and the reliability of the positioning result is further improved.

In the fields of personalized information recommendation, information scanning service and online advertising, a Click-through Rate (CTR) estimation model is one of important technologies and is used for learning and predicting feedback information of a user, wherein the feedback information of the user can be behavior information such as clicking, collecting or purchasing performed by the user. The CTR model realizes the functions of information recommendation and the like through the memory capacity of the CTR model, wherein the memory capacity refers to the capacity of the model to directly learn and utilize the 'co-occurrence frequency' of the request and the historical fingerprint in the historical data. Generally, collaborative filtering models, logistic regression models and other models have stronger memory capacity, and because the models have simple structures, historical data can often directly influence recommended results, namely, the models can learn the distribution characteristics of the historical data, and the results are predicted by utilizing the memory of the models.

Based on the above study, the present disclosure provides a model training method, device, electronic equipment and storage medium, firstly acquiring historical wireless fidelity WiFi scanning information and historical WiFi distribution information; the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point; secondly, generating co-occurrence information corresponding to each interest point based on second WiFi information associated with the interest point and at least one first WiFi information in each group of WiFi scanning information; the co-occurrence information is used for representing the scanning state of each first WiFi information in each group relative to the second WiFi information associated with the interest point; then generating training sample data according to each co-occurrence information based on the co-occurrence information and the label corresponding to the co-occurrence information; the label corresponding to the co-occurrence information is determined based on preset offline behavior information associated with the interest point corresponding to the co-occurrence information; finally, training the model to be trained based on the data of each training sample to obtain a trained model; the trained model is used for determining corresponding interest point positioning information based on the current WiFi scanning information.

In the embodiment of the disclosure, corresponding co-occurrence information is constructed based on the historical WiFi distribution information and the historical WiFi scanning information, and because the co-occurrence information characterizes the scanning state of the WiFi information scanned by each first user through the electronic device relative to at least one WiFi information covering the interest point, training sample data is generated based on the co-occurrence information to train the model, so that the memory of the model on the association relationship between the WiFi scanning information and the interest point can be enhanced, a trained model is obtained, further, interest point prediction can be performed based on the trained model, and the reliability of the positioning result is further improved.

For the convenience of understanding the present embodiment, first, a detailed description will be given of an execution body of the model training method provided in the embodiment of the present disclosure. The execution subject of the model training method provided by the embodiment of the disclosure is electronic equipment. In this embodiment, the electronic device is a server, and the server may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, cloud storage, big data, and an artificial intelligence platform. In other embodiments, the electronic device may also be a terminal device. The terminal device may be a mobile device, a user terminal, a handheld device, a computing device, a wearable device, or the like. The model training method may be implemented by a processor invoking computer readable instructions stored in a memory.

The model training method provided by the embodiment of the application is described in detail below with reference to the accompanying drawings. Referring to fig. 1, a flowchart of a model training method according to an embodiment of the disclosure is shown, where the method includes steps S101 to S104, where:

s101, acquiring historical wireless fidelity WiFi scanning information and historical WiFi distribution information; the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point.

The historical WiFi scan information may be WiFi information scanned by an electronic device, that is, may refer to multiple sets of WiFi scan information scanned by different electronic devices, and since WiFi signals have corresponding coverage areas, each set of WiFi scan information includes at least one first WiFi information. The first WiFi information comprises first WiFi and signal strength of the first WiFi, and each first WiFi corresponds to one interest point.

The electronic device may be a terminal device, such as a smart phone, a tablet computer, or a smart watch, which is not limited herein. It should be appreciated that the historical WiFi information is the WiFi information collected during the historical time (e.g., the previous month).

The historical WiFi distribution information comprises second WiFi information associated with each interest point, the second WiFi information comprises second WiFi and signal intensity of the second WiFi, a distribution sequence exists among the second WiFi associated with each interest point, and the distribution sequence is determined by the signal intensity of each second WiFi.

The historical WiFi fingerprint distribution information may be WiFi distribution information under a point of interest (POI) dimension constructed based on historical WiFi scan information, that is, the historical WiFi fingerprint distribution information may be WiFi distribution constructed according to multiple sets of WiFi scan information scanned in a historical time for each point of interest, and because signal strengths of the second WiFi are different, for each point of interest, a corresponding distribution sequence exists between the second WiFi associated with the point of interest.

For example, for the point of interest a, the corresponding second WiFi may include W1, W2, and W3, where the distances of W1, W2, and W3 with respect to the point of interest a are different, so that the signal strengths of the second WiFi associated with the point of interest a are also different, for example, the signal strengths between W1, W2, and W3 are W1> W2> W3, and then the distribution order between the second WiFi may be W1, W2, and W3.

In the embodiment of the disclosure, the interest point may refer to one store in a target location, which may be a mall, an office building, or the like, and the interest point may be one store in the mall, for example. Each interest point has corresponding unique interest point identification information, and the interest point identification information can be identification information formed by numbers, letters and the like.

S102, generating co-occurrence information corresponding to each interest point based on second WiFi information associated with the interest point and at least one piece of first WiFi information in each group of WiFi scanning information for each interest point; the co-occurrence information is used to characterize a scanning state of each first WiFi information in each group relative to a second WiFi information associated with the point of interest.

Wherein, the scan state may refer to a scanned state and an unscanned state.

Here, the second WiFi information associated with each point of interest may be compared with at least one first WiFi information in each set of WiFi scan information, so it may be determined whether the first WiFi information identical to the second WiFi information associated with the point of interest exists in the at least one first WiFi information, and further it may be determined a scan state of each first WiFi information relative to the second WiFi information associated with the point of interest.

Optionally, for step S102, when generating co-occurrence information corresponding to each interest point based on the second WiFi information associated with the interest point and at least one first WiFi information in each set of WiFi scan information, referring to fig. 2, the following steps S1021 to S1022 may be included:

s1021, for each interest point, generating scanning distribution identification information corresponding to the interest point based on the second WiFi information associated with the interest point and the first WiFi information in each group.

In this step, for each interest point, the second WiFi information associated with the interest point and the first WiFi information in each set of WiFi scan information may be compared, so as to generate scan distribution identification information corresponding to the interest point.

Specifically, when generating, for each point of interest, scan distribution identification information corresponding to the point of interest based on the second WiFi information associated with the point of interest and the first WiFi information in each group, the method may include the following steps (1) - (2):

(1) For each second WiFi associated with each point of interest, generating sub-scan distribution identification information based on the second WiFi and each first WiFi in each group.

The sub-scanning distribution identification information corresponds to one scanning zone bit and a plurality of signal strength zone bits, and the scanning zone bit is used for representing whether a corresponding second WiFi is scanned or not. The signal strength flag bit is used for representing signal strength, wherein different signal strength flag bits correspond to different signal strengths, and in this embodiment, the signal strength flag bit includes 8 bits, for example, -70dBm to-60 dBm corresponds to signal strength flag bit 0, -60dBm to-50 dBm corresponds to signal strength flag bit 1.

Specifically, for each second WiFi, a scanning flag bit may be determined according to whether a first WiFi in each set of first WiFi information is the same as the second WiFi, a signal strength flag bit is determined based on a signal strength of each first WiFi in the first WiFi information, and sub-scanning distribution identification information is generated based on the scanning flag bit and the signal strength flag bit. Specifically, the method may comprise the following steps (a) to (c):

(a) For each second WiFi, determining that the value of the scanning flag bit is 1 when there is a target first WiFi with the same media access control address MAC address as the second WiFi.

(b) And determining a target signal strength zone bit corresponding to the signal strength of the target first WiFi from the signal strength zone bits, determining the value of the target signal strength zone bit as 1, and determining the values of the other signal strength zone bits except the target signal strength zone bit as 0.

(c) And generating the sub-scanning distribution identification information based on the value of the scanning zone bit and the value of the signal strength zone bit.

In this embodiment, when determining the sub-scan distribution identification information according to the value of the scan flag bit and the value of the signal strength flag bit, the determination may be performed according to a generation formula of the sub-scan distribution identification information, as shown in formula (1):

ID _n ＝ Tag _scan *(2 ⁸ +2 ^I ) (1)

wherein, ID _n For sub-scanning distribution identification information, scanning the Tag bit Tag _scan The method comprises the steps of indicating that a target first WiFi with the same MAC address as a media access control address of a second WiFi exists in first WiFi information, determining that the value of a scanning zone bit is 1 if the target first WiFi exists, and determining that the value of the scanning zone bit is 0 if the target first WiFi does not exist; i represents an index value that the signal strength of the target first WiFi falls into a signal bin, where the signal bin refers to a signal range.

For example, please refer to fig. 3, which is a schematic diagram illustrating a process for generating sub-scan distribution identification information according to an embodiment of the present disclosure. As shown in fig. 3, the point of interest POI-1 is distributed with 5 second WiFi (W1, W2, W3, W4, W5), for each second WiFi (taking W1 as an example), comparing W1 with each first WiFi in each set of first WiFi information, if there is a target first WiFi with the same MAC address as W1, determining the value of the scanning flag bit to be 1, then determining the target signal strength flag bit corresponding to the signal strength of the target first WiFi according to the signal strength of the target first WiFi, assigning the value of the target signal strength flag bit to be 1, and assigning the values of other signal strength flag bits to be 0, so as to obtain a binary code 100010000 with a length of 9, converting the binary code to be a decimal integer, and obtaining the sub-scanning distribution identification information 272.

Similarly, it may be determined for W2 that the sub-scan distribution identification information corresponding to the sub-scan distribution identification information bits 000, W3 is 272, the sub-scan distribution identification information corresponding to W4 is 258, and the sub-scan distribution identification information corresponding to W5 is 288.

(2) And splicing all sub-scanning distribution identification information according to the distribution sequence among the second WiFi related to the interest points to generate the scanning distribution identification information.

Here, after obtaining the sub-scan distribution identification information, the sub-scan distribution identification information may be spliced according to the distribution order of the second WiFi, as shown in fig. 3, where the obtained sub-scan distribution identification information is 272, 000, 272, 258, and 288, and the sub-scan distribution identification information is spliced to obtain the scan distribution identification information 272000272258288.

S1022, generating the co-occurrence information corresponding to the interest point based on the interest point identification information of the interest point and the scanning distribution identification information corresponding to the interest point.

It can be understood that after the scan distribution identification information is obtained, co-occurrence information corresponding to the interest point can be generated based on the interest point identification information and the scan distribution identification information corresponding to the interest point, and specifically, the interest point identification information and the scan distribution identification information corresponding to the interest point can be spliced to generate the co-occurrence information, where the format of the co-occurrence information is a character string format.

For example, the point of interest identification information is 22535659086281011, the scan distribution identification information corresponding to the point of interest is 272000272258288, and the co-occurrence information is: 272000272258288_22535659086281011.

It should be noted that, the above example is described with respect to one point of interest and a set of first WiFi scan information, and therefore, in the process of actually generating co-occurrence information, for each point of interest, multiple co-occurrence information may be determined based on multiple sets of first WiFi scan information.

S103, generating training sample data according to each piece of co-occurrence information based on the co-occurrence information and the label corresponding to the co-occurrence information; and determining the label corresponding to the co-occurrence information based on preset offline behavior information associated with the interest point corresponding to the co-occurrence information.

The preset offline behavior information may refer to whether a preset offline behavior occurs, and by way of example, the preset offline behavior may refer to an offline consumption behavior, such as a coupon verification, etc.

It should be noted that, when generating the training sample data, for each first WiFi, if no offline behavior occurs in any of the points of interest corresponding to the first WiFi, the corresponding co-occurrence information will not be used to generate the training sample data, for example, the first WiFi in the first WiFi scan information includes W11, W12, …, and W19, which respectively correspond to the points of interest POI1, POI2, …, and POI9, and no offline behavior occurs in the points of interest POI0, POI2, …, and POI9, and the corresponding co-occurrence information will not be used to generate the training sample data.

Optionally, for step S103, when generating training sample data based on the co-occurrence information and the tag corresponding to the co-occurrence information for each co-occurrence information, please refer to fig. 4, the following S1031 to S1033 may be included:

s1031, carrying out feature processing on the co-occurrence information of the character string format aiming at each piece of co-occurrence information, and generating feature vectors corresponding to the co-occurrence information.

It should be appreciated that since co-occurrence information is in a string format, it needs to be converted into corresponding feature vectors in order to facilitate model training in subsequent steps.

Optionally, when performing feature processing on the co-occurrence information in the character string format for each co-occurrence information to generate a feature vector corresponding to the co-occurrence information, please refer to fig. 5, including the following S10311 to S10312:

s10311, numbering each co-occurrence information of the character string format according to a preset numbering mode, and constructing a feature dictionary based on the numbers corresponding to each co-occurrence information; the feature dictionary is used for converting the co-occurrence information in the character string format into feature vectors.

Specifically, the co-occurrence information may be uniformly numbered for each co-occurrence information, so that each co-occurrence information uniquely corresponds to an integer. For example, please refer to table 1, which shows the correspondence between co-occurrence information and numbers.

TABLE 1

Co-occurrence information	Feature dictionary
		272000272258288_22535659086281011	0
272000256258288_22535659086282098	1
		272288272258288_22535659086281035	2
…	…
		272000272258288_22535659086098765	n

Note that, in generating the feature dictionary, co-occurrence information of some low frequency occurrences (for example, occurrence times less than 5 times) will not be numbered.

S10312, aiming at the co-occurrence information of each character string format, performing single-heat encoding processing on the co-occurrence information of the character string format based on the feature dictionary, and generating feature vectors corresponding to the co-occurrence information.

Wherein the feature vector is a high-dimensional sparse feature vector.

After the feature dictionary is generated, the co-occurrence information of the character string format can be encoded, and in the embodiment of the disclosure, a single-hot encoding mode is adopted, wherein the single-hot encoding is also called One-hot encoding, and the method is to encode N states by using N-bit state registers, each state has independent register bits, and only One register bit is valid at any time.

Optionally, when performing single-hot encoding processing on the co-occurrence information in the character string format based on the feature dictionary to generate feature vectors corresponding to the co-occurrence information, the number of numbers contained in the feature dictionary may be determined first, a zero vector with a vector length of the number may be created, and then, for each co-occurrence information, according to the number corresponding to the co-occurrence information in the feature dictionary, a value of a target index bit, which is the same as the number, in the zero vector is modified to be 1, so as to generate feature vectors corresponding to the co-occurrence information.

For example, referring to table 2, a feature vector conversion process obtained by performing feature conversion on co-occurrence information based on a feature dictionary is shown.

TABLE 2

/>

As can be seen from table 2, if the number of numbers in the feature dictionary is 9, a zero vector having a vector length of 9 is created, and for each co-occurrence information (for example, 272288272258288_22535659086281035), the number corresponding to the co-occurrence information is determined to be 2 in the feature dictionary, and then the value of the register bit having the index bit of 2 in the zero vector is modified to be 1, so that a feature vector 001000000 corresponding to the co-occurrence information is generated.

S1032, obtaining preset offline behavior information associated with the interest point corresponding to the co-occurrence information, and determining a true value corresponding to the tag based on the preset offline behavior information.

Wherein, under the condition of the occurrence of the preset offline behavior, the true value corresponding to the label is determined to be 1,

And S1033, generating the training sample data based on the feature vector and the true value corresponding to the label.

Thus, after the feature vectors are determined, the true values of each feature vector determined in the foregoing embodiments may be calibrated to obtain training sample data.

S104, training the model to be trained based on the training sample data to obtain a trained model; the trained model is used for determining corresponding interest point positioning information based on the current WiFi scanning information.

The model to be trained may refer to a logistic regression model, and in other embodiments, the model to be trained may also be other models, which are not limited herein.

It can be understood that after the training sample data is obtained, the model to be trained can be trained based on the training sample data (or the training sample data is fitted based on the logistic regression model to be trained), so as to obtain a trained model, and here, each training sample data has a corresponding label, so that the model to be trained can be subjected to supervised training.

In the embodiment of the disclosure, corresponding co-occurrence information is constructed based on the historical WiFi distribution information and the historical WiFi scanning information, and because the co-occurrence information characterizes the scanning state of each scanned set of first WiFi scanning information relative to at least one second WiFi information associated with the interest point, training sample data is generated based on the co-occurrence information to train the model, so that the memory of the model on the association relationship between the WiFi scanning information and the interest point can be enhanced, a trained model is obtained, further, interest point prediction can be performed based on the trained model, and the reliability of the positioning result is further improved.

Optionally, when training a model to be trained based on training sample data to obtain a trained model, each training sample data may be input into the model to be trained to obtain a prediction result corresponding to each training sample, then, based on the prediction result corresponding to each training sample data and the label of each training sample data, a model parameter of the model is adjusted, specifically, a loss function may be preset, a loss value between the prediction result corresponding to each training sample data and the label of each training sample data is calculated, and the model parameter of the model is adjusted based on the loss value, so, the above process is repeated until the training result meets the preset requirement, and the trained model is obtained.

Referring to fig. 6, a method for locating an interest point according to an embodiment of the disclosure includes S601 to S603:

s601, acquiring current WiFi scanning information of current equipment and associated WiFi information corresponding to each interest point corresponding to an area where the current equipment is located.

For example, the current WiFi scan information may be obtained by the current electronic device. The current WiFi scanning information comprises at least one WiFi currently scanned and the signal strength of each WiFi.

Here, the area where the current device is located may be determined according to the location information of the current device, and the associated WiFi information corresponding to each interest point may be pre-constructed.

S602, for each interest point, generating current co-occurrence information based on the current WiFi scanning information and associated WiFi information corresponding to the interest point.

In the step, the current co-occurrence information is obtained by combining the current WiFi scanning information with the interest point identification information of each interest point respectively and performing One-hot encoding (One-hot) on the combined information, wherein the current co-occurrence information is in a vector form.

And S603, inputting each piece of current co-occurrence information into the trained model to obtain the interest point positioning information of the current equipment.

Wherein the trained model is obtained by the model training method of any one of the above.

It can be understood that after each piece of current co-occurrence information is input to the trained model, the probability corresponding to each interest point corresponding to the area where the current device is located can be output, so that in the process of performing POI location, only the obtained current co-occurrence information in the vector form is input to the trained model, the model can output the probability corresponding to each interest point, that is, according to the probability corresponding to each interest point, the interest point location information of the current device can be determined, and in particular, the interest point corresponding to the maximum probability can be determined as the interest point where the current device is located.

In some embodiments, after determining the point of interest in which the current device is located, information recommendations (e.g., coupon recommendations, merchandise recommendations, online advertising delivery, etc.) may be made by the user who wants to use the current device based on the point of interest in which the current device is located.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

Based on the same inventive concept, the embodiment of the disclosure further provides a model training device corresponding to the model training method, and since the principle of solving the problem by the device in the embodiment of the disclosure is similar to that of the model training method in the embodiment of the disclosure, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 7, a schematic diagram of a model training apparatus 700 according to an embodiment of the disclosure is provided, where the apparatus includes:

the information acquisition module 701 is configured to acquire historical wireless fidelity WiFi scanning information and historical WiFi distribution information; the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point;

An information generating module 702, configured to generate, for each point of interest, co-occurrence information corresponding to the point of interest based on second WiFi information associated with the point of interest and at least one first WiFi information in each set of WiFi scan information; the co-occurrence information is used for representing the scanning state of each first WiFi information in each group relative to the second WiFi information associated with the interest point;

a sample generation module 703, configured to generate training sample data for each co-occurrence information based on the co-occurrence information and a tag corresponding to the co-occurrence information; the label corresponding to the co-occurrence information is determined based on preset offline behavior information associated with the interest point corresponding to the co-occurrence information;

the model training module 704 is configured to train a model to be trained based on each training sample data, so as to obtain a trained model; the trained model is used for determining corresponding interest point positioning information based on the current WiFi scanning information.

In one possible implementation, each point of interest has corresponding point of interest identification information; the method comprises the steps of carrying out a first treatment on the surface of the The information generating module 702 is specifically configured to:

In a possible implementation manner, the first WiFi information includes a first WiFi and a signal strength of the first WiFi, the second WiFi information includes a second WiFi and a signal strength of the second WiFi, and a distribution sequence exists between the second WiFi associated with each interest point, and the distribution sequence is determined by the signal strengths of the respective second WiFi; the information generating module 702 is specifically configured to:

In a possible implementation manner, the sub-scanning distribution identification information corresponds to one scanning zone bit and a plurality of signal strength zone bits, and different signal strength zone bits are used for representing different signal strengths; the information generating module 702 is specifically configured to:

In one possible implementation, the format of the co-occurrence information is a character string format; the sample generation module 703 is specifically configured to:

In one possible implementation, the sample generation module 703 is specifically configured to:

In one possible implementation, the model training module 704 is specifically configured to:

Referring to fig. 8, an interest point positioning device provided in an embodiment of the present disclosure includes:

an obtaining module 801, configured to obtain current WiFi scan information of a current device and associated WiFi information corresponding to each interest point corresponding to an area where the current device is located;

a generating module 802, configured to generate, for each interest point, current co-occurrence information based on the current WiFi scan information and associated WiFi information corresponding to the interest point;

the positioning module 803 is configured to input each piece of current co-occurrence information to a trained model, so as to obtain interest point positioning information of the current device; wherein the trained model is obtained by the model training method of any one of claims 1-9.

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

Based on the same technical concept, the embodiment of the disclosure also provides electronic equipment. Referring to fig. 9, a schematic structural diagram of an electronic device 900 according to an embodiment of the disclosure includes a processor 901, a memory 902, and a bus 903. The memory 902 is configured to store execution instructions, including a memory 9021 and an external memory 9022; the memory 9021 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 901 and data exchanged with an external memory 9022 such as a hard disk, and the processor 901 exchanges data with the external memory 9022 via the memory 9021.

In the embodiment of the present application, the memory 902 is specifically configured to store application program codes for executing the solution of the present application, and the processor 901 controls the execution. That is, when the electronic device 900 is running, communication between the processor 901 and the memory 902 is via the bus 903, such that the processor 901 executes the application code stored in the memory 902, thereby performing the methods described in any of the foregoing embodiments.

The Memory 902 may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

Processor 901 may be an integrated circuit chip with signal processing capabilities. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 900. In other embodiments of the application, electronic device 900 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of model training in the method embodiments described above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.

Embodiments of the present disclosure further provide a computer program product carrying program code, where the program code includes instructions for performing the steps of model training in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein.

Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of model training, comprising:

2. The method of claim 1, wherein each point of interest has corresponding point of interest identification information; the generating co-occurrence information corresponding to the interest point based on the second WiFi information associated with the interest point and at least one first WiFi information in each set of WiFi scan information includes:

3. The method of claim 2, wherein the first WiFi information comprises a first WiFi and a signal strength of the first WiFi, the second WiFi information comprises a second WiFi and a signal strength of the second WiFi, and a distribution order exists between the second WiFi associated with each point of interest, the distribution order being determined by the signal strengths of the respective second WiFi;

4. The method of claim 3, wherein the sub-scan distribution identification information corresponds to a scan flag bit and a plurality of signal strength flag bits, different signal strength flag bits being used to characterize different signal strengths; the generating sub-scan distribution identification information for each second WiFi associated with each point of interest based on the second WiFi and each first WiFi in each group includes:

5. The method of claim 1, wherein the co-occurrence information is in a string format; for each co-occurrence information, generating training sample data based on the co-occurrence information and a tag corresponding to the co-occurrence information, including:

6. The method of claim 5, wherein determining the true value corresponding to the tag based on the preset offline behavior information comprises:

under the condition of the occurrence of the preset offline behavior, determining the true value corresponding to the label as 1; or if the preset offline behavior does not occur, determining the true value corresponding to the label as 0.

7. The method according to claim 5, wherein the performing feature processing on the co-occurrence information in the character string format for each co-occurrence information to generate a feature vector corresponding to the co-occurrence information includes:

8. The method of claim 7, wherein the performing the single-hot encoding process on the co-occurrence information in the character string format based on the feature dictionary to generate the feature vector corresponding to the co-occurrence information comprises:

9. The method according to claim 1, wherein training the model to be trained based on the respective training sample data to obtain a trained model comprises:

10. A method of locating a point of interest, comprising:

11. A model training device, comprising:

12. A point of interest locating device, comprising:

13. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine-readable requests executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine-readable requests when executed by said processor performing the steps of the model training method according to any one of claims 1 to 9 or the model training method according to claim 10.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the model training method according to any one of claims 1 to 9 or the model training method according to claim 10.