CN115243250A

CN115243250A - Method, system and storage medium for acquiring wifi portrait

Info

Publication number: CN115243250A
Application number: CN202210880497.5A
Authority: CN
Inventors: 尹雅露; 莫志强; 陈志勇; 方宏源
Original assignee: Merit Interactive Co Ltd
Current assignee: Merit Interactive Co Ltd
Priority date: 2022-07-25
Filing date: 2022-07-25
Publication date: 2022-10-25

Abstract

The application provides a method, a system and a storage medium for obtaining a wifi portrait, wherein the SSID name of wifi containing Chinese characters is obtained firstly, and then the SSID name is input into a preset wifi classification model to obtain the wifi portrait of the category, wherein the preset wifi classification model is obtained based on a plurality of equivalent training wifi subsets randomly extracted from training wifi in a centralized mode and a plurality of different text-cnn models trained by the corresponding category subsets. Through the content, the robustness of the preset wifi classification model can be obviously improved, and therefore the obtained wifi type portrait is more accurate.

Description

Method, system and storage medium for acquiring wifi portrait

Technical Field

The application relates to the field of data processing, in particular to a method, a system and a storage medium for acquiring a wifi portrait.

Background

wifi plays an increasingly important role in people's daily work and life because of its own convenience. Since wifi itself is usually fixed in location, portraying wifi imagery helps to further help build a user imagery that connects to wifi, where obtaining wifi imagery through wifi naming (SSID name) is a common method. In the prior art, people are used to name wifi by using Chinese characters and other characters, so that part of wifi can easily obtain a category portrait according to the names of the characters in the wifi, for example, wifi of a shopping mall may be named a certain shopping mall, but a considerable majority of wifi names still have difficulty in identifying category attributes according to the names even if the Chinese characters are used, and therefore, how to obtain the wifi portrait named by the Chinese characters is a technical problem which needs to be solved at present.

Disclosure of Invention

To the above technical problem, the technical scheme adopted by the application is as follows: a method for acquiring a wifi portrait comprises the following steps: s100, acquiring the SSID name of the wifi, wherein the SSID name comprises Chinese characters; s200, acquiring the type portrait of the wifi based on the SSID name and a preset wifi classification model.

A system for capturing a wifi representation, the system comprising a processor and a non-transitory computer readable storage medium storing at least one instruction or at least one program, the processor loading and executing the at least one instruction or at least one program to implement the method described above.

A computer-readable storage medium storing a program or instructions for causing a computer to perform the steps of the method described above.

According to the method, the SSID name containing the Chinese characters of the wifi is obtained firstly, and then the SSID name is input into a preset wifi classification model to obtain a wifi category portrait. The preset wifi classification model is obtained by training a plurality of different text-cnn models on the basis of a plurality of equivalent training wifi subsets randomly extracted from the training wifi sets and the corresponding affiliated category subsets. Through the content, the robustness of the preset wifi classification model can be improved, and therefore the obtained wifi type portrait is more accurate.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a method for obtaining a wifi portrait according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a method for obtaining a wifi portrait, as shown in fig. 1, comprising the following steps:

s100, the SSID name of the wifi is obtained, and the SSID name contains Chinese characters. Those skilled in the art can use SSID as the wireless network name of wifi, and the user can change the SSID name as needed. In this step, a regular manner may be used to determine whether the SSID name includes a chinese character, and whether the SSID name includes a chinese character may also be determined according to a preset chinese character library.

And S200, acquiring a category portrait of the wifi based on the SSID name and a preset wifi classification model. In the present application, the predetermined wifi classification model may adopt any classification technique in the prior art, such as SVM, and in the preferred embodiment of the present application, the predetermined wifi classification model is a text-cnn model. Specifically, the obtaining of the preset wifi classification model comprises:

s301, acquiring an SSID name set S = { SSID (service set identifier) = of wifi set for training ₁ ，SSID ₂ ，...，SSID _n And the corresponding belonged class set L = { L = } ₁ ，L ₂ ，...，L _n H, wherein the SSID name of the ith wifi in the wifi set for training is SSID _i Corresponding to the category of L _i ，L _i One of M predetermined categories, SSID _i Comprises Chinese characters, i is more than or equal to 1 and less than or equal to n. In the present application, the M preset categories may be set by a user, such as a traffic category, a peripheral trip category, a hotel category, and the like, and the above is only an exemplary example and is not intended to limit the scope of the present application. As known to those skilled in the art, in order to improve the robustness of the classification model, each class sample in the M preset classes in the wifi set for training is as much as possible so as to traverse all possible situations.

And S302, taking the name set S and the class set L as input data of the text-cnn to train the text-cnn model so as to obtain the preset wifi classification model. Specifically, the steps include the following:

s3021, obtaining a vectorized name set SV = { SSIDV) based on the name set S ₁ ，SSIDV ₂ ，...，SSIDV _n }. Specifically, in the present application, any vectorization method in the prior art may be used to assign the SSID _i Vectorization to SSIDV _i Such as one-hot, word2vec, etc. In this step, one skilled in the art can count all chinese characters and non-chinese characters of the SSID name that may be used for wifi and perform vectorization of the SSID name based on the counted total number of characters. Illustratively, when the total number of counted characters (assuming that only 5 characters in east-south-north) is 5 and the SSID name is vectorized by using the one-hot method, the vectorized data corresponding to the 5 characters (in east-south-north) may be set to 10000, 01000, 00100, 00010, and 00001, where when the SSID name of a wifi is "southeast", the SSID name of the wifi is vectorizedThe data is a 2 × 5 matrix: [[1,0,0,0,0]；[0,0,1,0,0]]。

S3022, based on the vectorized name set SV = { SSIDV ₁ ，SSIDV ₂ ，...，SSIDV _n Acquisition of equivalent length name set SH = { SSIDH = } ₁ ，SSIDH ₂ ，...，SSIDH _n }. In this step, the maximum length of the element in the wifi name can be obtained according to the preset maximum length of the wifi name or based on the vectorized name set SV, and the complementary 0 operation is adopted to enable the SSIDH to perform ₁ ，SSIDH ₂ ，...，SSIDH _n With the same number of matrix rows. For example, according to the example in step S3021, if the maximum length of an element in the SV or the preset wifi name is 4, the SSID name "southeast" becomes [ [1,0,0,0,0 ] after the complementary 0 operation]；[0,0,1,0,0]；[0,0,0,0,0]；[0,0,0,0,0]]I.e. a 4 x 5 matrix.

S3023, the isometric name set SH = { SSIDH ₁ ，SSIDH ₂ ，...，SSIDH _n And (5) inputting the classification set L and the text-cnn model to train so as to obtain the preset wifi classification model. In order to obtain the optimal wifi class portrait, the value range of the convolution kernel size of the text-cnn model is [ three, five kinds of ]]Preferably five, and 128 each. Preferably, the convolution kernels are convolution kernels with sizes of 1, 2, 3, 4 and 5 respectively. By using the five convolution kernels with different sizes, effective characteristics in the SSID name of each wifi can be comprehensively and effectively acquired, and classification accuracy is further improved. Table 1 shows the partial recognition accuracy data for different wifi types of images.

TABLE 1

	Predictive score>0	Predictive score>0.5	Predictive score>0.6	Predictive score>0.7
					Accuracy of	53％	66％-70％	73.5％-75％	75％-78％
Coverage rate	100％	80％	65％	50％

As can be seen from table 1, for a given test set, relative prediction accuracies can be obtained according to different prediction score thresholds, when the prediction score threshold =0, the prediction score of 100% of samples in the given test set is > 0, and the prediction accuracy of the part of samples with the sample prediction score > 0 is 53%; when prediction score threshold =0.5, the prediction score of 80% of samples in the given test set is > 0.5, and the prediction accuracy of the part of samples with sample prediction scores > 0.5 is 66% -70% (the prediction accuracy is due to some wifi that cannot be objectively verified, thus leading to the existence of the interval, that is, if the wifi that cannot be objectively verified is considered to be correctly identified, the prediction accuracy is 70% at this time, otherwise 66%); when the prediction score threshold =0.6, the prediction score of 65% of the samples in the given test set is > 0.6, and the prediction accuracy of the portion of samples with sample prediction scores > 0.6 is 73.5% -75%; when the prediction score threshold =0.7, the prediction score of 50% of the samples in the given test set > 0.7, and the prediction accuracy of the portion of samples with a sample prediction score > 0.7 is 50%. According to the above content, the SSID name containing Chinese characters of wifi is obtained first, and then the SSID name is input into a preset wifi classification model to obtain a category portrait of the wifi. Through the wifi classification model that predetermines, can acquire wifi's classification portrait effectively.

In a preferred embodiment of the present application, the obtaining of the preset wifi classification model includes:

s401, acquiring an SSID name set S = { SSID (service set identifier) = of wifi set for training ₁ ，SSID ₂ ，...，SSID _n And the corresponding belonged class set L = { L = } ₁ ，L ₂ ，...，L _n And the names of SSIDs of the ith wifi in the wifi for training are SSIDs _i Corresponding to class L _i ，L _i One of M predetermined categories, SSID _i Comprises Chinese characters, i is more than or equal to 1 and less than or equal to n.

S402, acquiring SSID name subset S of K groups of wifi subsets for training based on SSID name set S and class set L to which the SSID name set S belongs ¹ 、S ² 、...、S ^K And its corresponding subset of classes L ¹ 、L ² 、...、L ^K Wherein m name subsets S are randomly extracted from the SSID name set S ^j ＝{SSID ^j ₁ ，SSID ^j ₂ ，...，SSID ^j _m Its corresponding subset of classes L ^j ＝{L ^j ₁ ，L ^j ₂ ，...，L ^j _m }，SSID ^j _t Corresponding to class L ^j _t ，L ^j _t Belongs to the L, m/n = preset grouping ratio threshold, j is more than or equal to 1 and less than or equal to K, and t is more than or equal to 1 and less than or equal to m. Specifically, in the present application, the value range of the preset packet occupancy ratio threshold is [0.7,0.9]And preferably 0.8.K is [3,5]]And preferably 5.

S403, respectively sub-setting SSID names S ¹ 、S ² 、...、S ^K And its corresponding subset of classes L ¹ 、L ² 、...、L ^K Inputting the input data into K different text-cnn classification models to train so as to obtain the preset wifi classification model. In this step, S is ¹ And L ¹ Inputting the 1 st text-cnn classification model to train to obtain the 1 st trained text-cnn model, and converting S into S ² And L ² Inputting the 2 nd text-cnn classification model for training to obtain the 2 nd trained text-cnn model, and repeating the steps to obtain K trained different text-cnn models. When a wifi category portrait needs to be predicted, the SSID name of the wifi is respectively input into the K trained different text-cnn models to obtain K different category predicted values, voting is carried out based on the K different category predicted values, and the wifi category portrait is finally obtained. Wherein the voting mechanism may employ any of the existing techniques, such as the commonly used minority-compliant majority.

In order to obtain the best wifi class image, the convolution kernel size of the text-cnn model is in the range of [ three, five ], preferably five, and each has 128. Preferably, the convolution kernels are convolution kernels with sizes of 1, 2, 3, 4 and 5 respectively. By using the five convolution kernels with different sizes, effective characteristics in the SSID name of each wifi can be comprehensively and effectively obtained, and then classification accuracy is improved. Table 2 shows the partial recognition accuracy data of the images of different wifi categories based on the obtained wifi classification model.

TABLE 2

	Predictive score>0	Predictive score>0.5	Predictive score>0.6	Predictive score>0.7
					Accuracy of	58.9％-72.1％	69.6％-78.5％	72.8％-79.4％	78.3％-83.9％
Coverage rate	100％	86％	73％	59％

As can be seen from the content in table 2, the prediction result of the preset wifi classification model is obviously better than the data in table 1 in terms of coverage rate and accuracy, and it can be seen that the effect of the preset wifi classification model obtained by using the above preferred embodiment is better. Table 3 shows the part wifi prediction accuracy data of the preset wifi classification model for different wifi types of portrait images obtained according to the above preferred embodiment of the present application.

TABLE 3

Category portraits	Accuracy of classification prediction
		Personal	0.9
Leisure entertainment	0.9
		Company enterprise	0.8
Peripheral game	0.6
		College	0.9
Household building material	0.7
		Food	0.6
Shopping	0.8
		......	......

As can be seen from the content in Table 3, the method provided by the application can accurately acquire the wifi type portrait, can meet the requirements of people on the wifi type portrait to a certain extent, and is high in adaptability.

Embodiments of the present application further provide a system for obtaining a wifi representation, where the system includes a processor and a non-transitory computer-readable storage medium, where the storage medium is used to store at least one instruction or at least one program, and the processor loads and executes the at least one instruction or the at least one program to implement the method provided by the foregoing embodiments.

A computer-readable storage medium storing a program or instructions for causing a computer to perform the method provided by the above-described embodiments.

Embodiments of the present application also provide a non-transitory computer-readable storage medium that can be disposed in an electronic device to store at least one instruction or at least one program for implementing a method of the method embodiments, where the at least one instruction or the at least one program is loaded into and executed by a processor to implement the method provided by the above embodiments.

Embodiments of the present application also provide an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.

Embodiments of the present application further provide a computer program product comprising program code means for causing an electronic device to carry out the steps of the method according to various exemplary embodiments of the present application described above in this description, when said program product is run on the electronic device.

Although some specific embodiments of the present application have been described in detail by way of illustration, it should be understood by those skilled in the art that the above illustration is only for purposes of illustration and is not intended to limit the scope of the present application. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the present application. The scope of the present application is defined by the appended claims.

Claims

1. A method for acquiring a wifi image is characterized by comprising the following steps:

s100, acquiring the SSID name of the wifi, wherein the SSID name comprises Chinese characters;

s200, acquiring the type portrait of the wifi based on the SSID name and a preset wifi classification model.

2. The method of claim 1, wherein the predefined wifi classification model is a text-cnn model.

3. The method of claim 2, wherein the obtaining of the preset wifi classification model comprises:

s301, acquiring an SSID name set S = { SSID (service set identifier) = of wifi set for training ₁ ，SSID ₂ ，...，SSID _n And the corresponding belonged class set L = { L = } ₁ ，L ₂ ，...，L _n And the names of SSIDs of the ith wifi in the wifi for training are SSIDs _i Corresponding to class L _i ，L _i One of M predetermined categories, SSID _i Comprises Chinese characters, i is more than or equal to 1 and less than or equal to n;

and S302, taking the name set S and the class set L as input data of the text-cnn model to train the text-cnn model so as to obtain the preset wifi classification model.

4. The method according to claim 1, wherein the obtaining of the preset wifi classification model comprises:

s401, acquiring an SSID name set S = { SSID (service set identifier) = of wifi set for training ₁ ，SSID ₂ ，...，SSID _n And the corresponding belonged class set L = { L = } ₁ ，L ₂ ，...，L _n And the names of SSIDs of the ith wifi in the wifi for training are SSIDs _i Corresponding to class L _i ，L _i One of M predetermined categories, SSID _i Comprises Chinese characters, i is more than or equal to 1 and less than or equal to n;

s402, acquiring SSID name subset S of K groups of wifi subsets for training based on SSID name set S and class set L to which the SSID name set S belongs ¹ 、S ² 、...、S ^K And its corresponding subset of classes L ¹ 、L ² 、...、L ^K Wherein m name subsets S are randomly extracted from the SSID name set S ^j ＝{SSID ^j ₁ ，SSID ^j ₂ ，...，SSID ^j _m Its corresponding subset of classes L ^j ＝{L ^j ₁ ，L ^j ₂ ，...，L ^j _m }，SSID ^j _t Corresponding to class L ^j _t ，L ^j _t E L, m/n = preset packet ratioThe threshold value j is more than or equal to 1 and less than or equal to K, and t is more than or equal to 1 and less than or equal to m;

s403, respectively sub-setting SSID names S ¹ 、S ² 、...、S ^K And its corresponding subset of classes L ¹ 、L ² 、...、L ^K Inputting the training result into K different text-cnn classification models to obtain the preset wifi classification model.

5. The method of claim 4, wherein the predetermined packet fraction threshold value is in a range of [0.7,0.9].

6. The method according to claim 4, wherein K is selected from the group consisting of [3,5], preferably 5.

7. The method according to claim 3 or 4, wherein the text-cnn model corresponds to five convolution kernel sizes, each of which is 128.

8. The method of claim 7, wherein the convolution kernels are convolution kernels having sizes of 1, 2, 3, 4, and 5, respectively.

9. A system for capturing wifi imagery, the system comprising a processor and a non-transitory computer readable storage medium storing at least one instruction or at least one program, wherein the at least one instruction or at least one program is loaded and executed by the processor to implement the method of any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that it stores a program or instructions for causing a computer to carry out the steps of the method according to any one of claims 1 to 8.