CN112364169B - Nlp-based wifi identification method, electronic device and medium - Google Patents

Nlp-based wifi identification method, electronic device and medium Download PDF

Info

Publication number
CN112364169B
CN112364169B CN202110039208.4A CN202110039208A CN112364169B CN 112364169 B CN112364169 B CN 112364169B CN 202110039208 A CN202110039208 A CN 202110039208A CN 112364169 B CN112364169 B CN 112364169B
Authority
CN
China
Prior art keywords
wifi
name information
wifi name
labels
corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110039208.4A
Other languages
Chinese (zh)
Other versions
CN112364169A (en
Inventor
朱金星
张静雅
葛丹妮
段力阁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunzhenxin Technology Co ltd
Original Assignee
Beijing Yunzhenxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunzhenxin Technology Co ltd filed Critical Beijing Yunzhenxin Technology Co ltd
Priority to CN202110039208.4A priority Critical patent/CN112364169B/en
Publication of CN112364169A publication Critical patent/CN112364169A/en
Application granted granted Critical
Publication of CN112364169B publication Critical patent/CN112364169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a wifi identification method, electronic equipment and a medium based on nlp, wherein the method comprises the steps of S1, obtaining wifi name information of a plurality of labeled labels as corpus construction corpus, wherein the labels in the corpus comprise public wifi labels and wifi labels of M categories; step S2, obtaining wifi name information corresponding to all mth labels from the corpus, and constructing an mth target keyword list; step S3, obtaining wifi name information to be identified, judging whether the wifi name information to be identified comprises the keywords in the mth target keyword list, and if so, marking the wifi name information to be identified as the mth label; and S4, updating the corpus by taking all wifi name information labeled as the m-th label in the step S3 in a preset time period as corpus, and returning to the step S2. The invention improves the accuracy and efficiency of wifi identification.

Description

Nlp-based wifi identification method, electronic device and medium
Technical Field
The invention relates to the technical field of computers, in particular to a wifi identification method based on nlp, electronic equipment and a medium.
Background
In the process of processing wifi data, a label is generally required to be added to wifi to distinguish wifi categories. The existing processing mode is to add a label to wifi usually through the mode of manual labeling, and each type of label represents a wifi category, such as wifi in hospitals, wifi in markets, and the like, but the manual labeling is high in cost and low in efficiency. After the added tags are stored in the database, the tags are basically not changed, so that the data are old and cannot be updated. However, wifi data updating is fast, for example, some devices do not change, but the corresponding wifi tag has changed, or after a period of time, the tag marking needs to be carried out on the newly-added wifi, and if the wifi tag is not updated in time, the accuracy of the wifi tag cannot be guaranteed. Therefore, how to realize automatic identification of wifi tags and update the labeling situation in real time becomes a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a wifi identification method based on nlp, an electronic device and a medium, which can automatically identify wifi tags, update the labeling condition in real time and improve the accuracy and efficiency of wifi identification.
According to the first aspect of the invention, an nlp-based wifi identification method is provided, which comprises the following steps:
step S1, obtaining wifi name information of a plurality of labeled labels as corpus construction corpus, wherein the labels in the corpus comprise public wifi labels and wifi labels of M types, the wifi labels of the M types are respectively a first label and a second label … th label … th label, M =1,2 … M is a positive integer, and the wifi name information is text information;
step S2, obtaining wifi name information corresponding to all mth labels from the corpus, and constructing an mth target keyword list;
step S3, obtaining wifi name information to be identified, judging whether the wifi name information to be identified comprises the keywords in the mth target keyword list, and if so, marking the wifi name information to be identified as the mth label;
and S4, updating the corpus by taking all wifi name information labeled as the m-th label in the step S3 in a preset time period as corpus, and returning to the step S2.
According to a second aspect of the present invention, there is provided an electronic apparatus comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the method of the first aspect of the invention.
According to a third aspect of the invention, there is provided a computer readable storage medium, the computer instructions being for performing the method of the first aspect of the invention.
Compared with the prior art, the invention has obvious advantages and beneficial effects. By means of the technical scheme, the wifi identification method, the electronic device and the medium based on nlp, which are provided by the invention, can achieve considerable technical progress and practicability, have wide industrial utilization values, and at least have the following advantages:
the method can automatically identify the wifi label, can update the marking condition in real time, and improves the accuracy and efficiency of wifi identification.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.
Drawings
Fig. 1 is a flowchart of a wifi identification method based on nlp according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description will be given to specific embodiments and effects of a wifi identification method, an electronic device and a medium based on nlp according to the present invention with reference to the accompanying drawings and preferred embodiments.
The embodiment of the invention provides a wifi identification method based on Natural language processing (nlp for short), which is shown in fig. 1 and comprises the following steps:
step S1, obtaining wifi name information of a plurality of labeled labels as corpus construction corpus, wherein the labels in the corpus comprise public wifi labels and wifi labels of M types, the wifi labels of the M types are respectively a first label and a second label … th label … th label, M =1,2 … M is a positive integer, and the wifi name information is text information;
the wifi name information of the labeled tag can be acquired from a first database which is set in advance. It should be noted that the wifi name information includes public wifi name information and private wifi name information, the public wifi name information generally does not need to be identified and classified, the wifi name information needing to be identified and classified is the private wifi name information, and the wifi name information to be identified is one of wifi tags to be determined as M categories. The wifi name information is unique in the database, and each wifi name information only corresponds to one label.
As an example, the wifi name information is the ssid of wifi, and it can be understood that the wifi name information may also be a unique identifier wifi and contain other text information capable of distinguishing wifi types.
Step S2, obtaining wifi name information corresponding to all mth labels from the corpus, and constructing an mth target keyword list;
it is understood that, through step S2, a target keyword list corresponding to each tag may be obtained, and through the subsequent steps, multiple types of wifi may be identified. .
Step S3, obtaining wifi name information to be identified, judging whether the wifi name information to be identified comprises the keywords in the mth target keyword list, and if so, marking the wifi name information to be identified as the mth label;
the wifi name information to be identified comprises wifi name information marked with a label and wifi name information not marked with a label. The wifi name information to be identified can be obtained from a preset second database without labeling the wifi name information of the label, and can also be obtained from a first database with labeled wifi name information, the first database and the second database can be the same database, and can also be independently set, so that the wifi name information can be conveniently obtained.
When the wifi name information to be identified is the wifi name information which is not marked with the label, the corresponding label can be determined for the wifi name information which is not marked with the label. When the wifi name information to be identified is labeled wifi name information, if the wifi name information is changed, the label information corresponding to the wifi name information can be corrected in real time, and therefore accuracy of wifi information identification is improved.
And S4, updating the corpus by taking all wifi name information labeled as the m-th label in the step S3 in a preset time period as corpus, and returning to the step S2.
The preset time period can be set according to specific application requirements, computing resources can be wasted due to too short setting, updating of the wifi tag can be caused to be untimely due to too long setting, and therefore accuracy of wifi identification is reduced, the time period can be set to [20 days, 45 days ], and the preset time period is preferably one month.
The method provided by the embodiment of the invention can automatically identify wifi name information and update the corpus at preset time intervals, so that the target keyword is updated based on the updated corpus, wifi labels can be updated in real time, and the accuracy and efficiency of wifi identification are improved.
It should be noted that some exemplary embodiments of the present invention are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. Moreover, the order of steps is merely set forth for convenience of reference and does not imply a required order of execution or steps to be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
As an example, the step S2 includes:
step S21, performing word segmentation on the wifi name information corresponding to all the m-th labels to obtain a word segmentation list;
in this case, in step S2, existing word segmentation algorithms may be used for word segmentation, for example, jieba word segmentation, which is not further described herein. The word segmentation list comprises a plurality of word segmentations.
Step S22, performing stop word processing on the word segmentation list based on a pre-constructed stop word library to obtain a first keyword list;
specifically, the method further includes step S20 of constructing the disuse word stock, including:
step S201, selecting corresponding basic stop words from a basic stop word bank and adding the basic stop words into the stop word bank based on the naming rule of wifi name information;
and the naming rule of the wifi name information comprises no punctuation mark. The basic stop word stock is a general stop word stock, and the taken basic stop words comprise city names, structural auxiliary words, pronouns and other basic stop words suitable for the use scene of the embodiment of the invention, and are not listed one by one.
Step S202, obtaining wifi name information of a plurality of public wifi from the corpus, and determining that high-frequency keywords of the public wifi are added into the stop word bank based on the wifi name information of the public wifi.
Because public wifi does not need to be identified, high-frequency keywords of the public wifi can be added into stop words, so that the high-frequency keywords of the wifi are directly filtered when an mth target keyword list is built, noise and redundant calculation are avoided, and the accuracy and the efficiency of wifi identification are improved.
As an embodiment, the step S202 further includes:
step S2021, performing word segmentation processing on the obtained wifi name information of a plurality of public wifi to obtain a word segmentation list of the public wifi;
in step S2021, the existing word segmentation algorithm may also be used to perform word segmentation, for example, jieba word segmentation, which is not described herein. The common wifi word segmentation list comprises a plurality of common wifi words.
Step S2022, based on the basic deactivation word bank, carrying out deactivation word processing on the word segmentation list of the public wifi to obtain a first keyword list of the public wifi;
the first keyword list of the public wifi comprises a plurality of keywords of the public wifi.
Step S2023, obtaining the high-frequency keywords of the public wifi from the first keyword list of the public wifi.
And step S23, acquiring high-frequency keywords from the first keyword list to construct the target keyword list.
As an embodiment, in both step S23 and step S2023, a high frequency keyword is further extracted from a first keyword list obtained after the stop word filtering, so as to improve the identification accuracy of wifi information, and in both step S23 and step S2023, the obtaining of the high frequency keyword from the first keyword list can be implemented by:
step S231, the keyword list is { a1, a2 … aN … aN }, aN represents the nth keyword, N =1,2 … N, N is a positive integer, and the word frequency TF (aN) and the inverse document frequency idf (aN), TF-idf (aN) of aN are obtained:
tf (an) = an number in wifi name information of the type of label/total keyword number in wifi name information of the type of label;
it is understood that "class tag" refers to the wifi tag category to which an corresponds.
Idf (an) = log (total number of wifi name information of all category tags/(1 + number of wifi name information of an appearing in wifi name information of all category tags));
it is understood that all category labels refer to all category labels in the corpus that contain the wifi label category to which an corresponds, as well as other wifi labels.
TF-IDF(an)= TF (an)* IDF(an);
And step S232, acquiring the keywords with the TF-IDF values larger than a preset TF-IDF threshold value as the high-frequency keywords in the keyword list.
It should be noted that the TF-IDF threshold may be set according to factors such as the identification accuracy of the wifi name information.
As a variation of step S232, the TF-IDF values corresponding to all keywords in the keyword list may be sorted in a descending order, and the keywords corresponding to the top Q TF-IDF values are selected as the target keywords. The specific data of Q may be set according to factors such as the identification accuracy of specific wifi name information.
It can be understood that, after the mth target keyword list is obtained, in step S3, wifi name information to be recognized may be directly matched with the M target keyword lists one by one, a tag of the wifi name information to be recognized is determined, and the tag of the wifi name information to be recognized is matched with the mth target keyword list to determine whether the wifi name information to be recognized should be the mth tag. And based on the target keywords of a certain category as positive samples, training classification models by taking the target keywords of non-category as negative samples, and then extracting the keywords of the wifi name information to be recognized and inputting the keywords into the classification models for recognition, wherein the description is not expanded.
An embodiment of the present invention further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions configured to perform a method according to an embodiment of the invention.
The embodiment of the invention also provides a computer-readable storage medium, and the computer instructions are used for executing the method of the embodiment of the invention.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A wiFi identification method based on nlp is used for identifying private wiFi, and comprises the following steps:
step S1, obtaining wifi name information of a plurality of labeled labels from a preset first database as corpus construction corpus, wherein the labels in the corpus comprise public wifi labels and M types of wifi labels, the M types of wifi labels are respectively a first label, a second label …, an mth label …, M is 1,2 … M, M is a positive integer, and the wifi name information is text information;
the wifi name information comprises public wifi name information and private wifi name information, the wifi name information to be identified is the private wifi name information, and the wifi name information to be identified is one of wifi tags which are determined to be M categories;
step S2, obtaining wifi name information corresponding to all mth labels from the corpus, and constructing an mth target keyword list;
step S3, obtaining wifi name information to be identified, judging whether the wifi name information to be identified comprises the keywords in the mth target keyword list, and if so, marking the wifi name information to be identified as the mth label;
the wifi name information to be identified comprises private wifi name information of a labeled tag acquired from a first database and wifi name information of an unlabeled tag acquired from a preset second database, when the wifi name information to be identified is the wifi name information of the unlabeled tag, a corresponding tag is determined for the wifi name information of the unlabeled tag, when the wifi name information to be identified is the private wifi name information of the labeled tag, and if the wifi name information is changed, the tag information corresponding to the wifi name information is corrected in real time;
step S4, every preset time period, updating the corpus by taking all wifi name information labeled as the mth label in the step S3 in the time period as corpus, and returning to the step S2;
the step S2 includes:
step S21, performing word segmentation on the wifi name information corresponding to all the m-th labels to obtain a word segmentation list;
step S22, performing stop word processing on the word segmentation list based on a pre-constructed stop word library to obtain a first keyword list;
step S23, acquiring high-frequency keywords from the first keyword list to construct the target keyword list;
the method further includes step S20, constructing the decommissioned word stock, including:
step S201, selecting corresponding basic stop words from a basic stop word bank and adding the basic stop words into the stop word bank based on the naming rule of wifi name information;
step S202, obtaining wifi name information of a plurality of public wifi from the corpus, and determining high-frequency keywords of the public wifi to be added into the stop word bank based on the wifi name information of the plurality of public wifi;
the step S202 includes:
step S2021, performing word segmentation processing on the obtained wifi name information of a plurality of public wifi to obtain a word segmentation list of the public wifi;
step S2022, based on the basic deactivation word bank, carrying out deactivation word processing on the word segmentation list of the public wifi to obtain a first keyword list of the public wifi;
step S2023, obtaining the high-frequency keywords of the public wifi from the first keyword list of the public wifi.
2. The method of claim 1,
in step S23 and step S2023, the obtaining of the high-frequency keyword from the first keyword list includes:
step S231, the keyword list is { a1, a2 … aN … aN }, aN represents the nth keyword, N is 1,2 … N, N is a positive integer, and the word frequency TF (aN) and the inverse document frequency idf (aN), TF-idf (aN) of aN are obtained:
TF (an) is equal to the number of an in the wifi name information of the label/the number of the total keywords in the wifi name information of the label;
idf (an) ═ log (total number of wifi name information of all category labels/(1 + number of wifi name information of an appearing in wifi name information of all category labels));
TF-IDF(an)=TF(an)*IDF(an);
and step S232, acquiring the keywords with the TF-IDF values larger than a preset TF-IDF threshold value as the high-frequency keywords in the keyword list.
3. The method of claim 1,
the wifi name information is wifi ssid information.
4. The method of claim 1,
the preset time period is one month.
5. An electronic device, comprising:
at least one processor;
and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the method of any of the preceding claims 1-4.
6. A computer-readable storage medium having stored thereon computer-executable instructions for performing the method of any of the preceding claims 1-4.
CN202110039208.4A 2021-01-13 2021-01-13 Nlp-based wifi identification method, electronic device and medium Active CN112364169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110039208.4A CN112364169B (en) 2021-01-13 2021-01-13 Nlp-based wifi identification method, electronic device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110039208.4A CN112364169B (en) 2021-01-13 2021-01-13 Nlp-based wifi identification method, electronic device and medium

Publications (2)

Publication Number Publication Date
CN112364169A CN112364169A (en) 2021-02-12
CN112364169B true CN112364169B (en) 2022-03-04

Family

ID=74534828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110039208.4A Active CN112364169B (en) 2021-01-13 2021-01-13 Nlp-based wifi identification method, electronic device and medium

Country Status (1)

Country Link
CN (1) CN112364169B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113891323B (en) * 2021-12-07 2022-03-18 杭州云信智策科技有限公司 WiFi-based user tag acquisition system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989184A (en) * 2015-08-25 2016-10-05 中国银联股份有限公司 Classification method and apparatus
CN107766371A (en) * 2016-08-19 2018-03-06 中兴通讯股份有限公司 A kind of text message sorting technique and its device
CN108112026A (en) * 2017-12-13 2018-06-01 北京奇虎科技有限公司 WiFi recognition methods and device
CN110287321A (en) * 2019-06-26 2019-09-27 南京邮电大学 A kind of electric power file classification method based on improvement feature selecting
CN111343564A (en) * 2018-11-30 2020-06-26 北京嘀嘀无限科技发展有限公司 Method and device for determining category of wireless network, electronic equipment and storage medium
CN111767403A (en) * 2020-07-07 2020-10-13 腾讯科技(深圳)有限公司 Text classification method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106970988A (en) * 2017-03-30 2017-07-21 联想(北京)有限公司 Data processing method, device and electronic equipment
CN108334533B (en) * 2017-10-20 2021-12-24 腾讯科技(深圳)有限公司 Keyword extraction method and device, storage medium and electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989184A (en) * 2015-08-25 2016-10-05 中国银联股份有限公司 Classification method and apparatus
CN107766371A (en) * 2016-08-19 2018-03-06 中兴通讯股份有限公司 A kind of text message sorting technique and its device
CN108112026A (en) * 2017-12-13 2018-06-01 北京奇虎科技有限公司 WiFi recognition methods and device
CN111343564A (en) * 2018-11-30 2020-06-26 北京嘀嘀无限科技发展有限公司 Method and device for determining category of wireless network, electronic equipment and storage medium
CN110287321A (en) * 2019-06-26 2019-09-27 南京邮电大学 A kind of electric power file classification method based on improvement feature selecting
CN111767403A (en) * 2020-07-07 2020-10-13 腾讯科技(深圳)有限公司 Text classification method and device

Also Published As

Publication number Publication date
CN112364169A (en) 2021-02-12

Similar Documents

Publication Publication Date Title
CN109885692B (en) Knowledge data storage method, apparatus, computer device and storage medium
CN107004159B (en) Active machine learning
EP3499384A1 (en) Word and sentence embeddings for sentence classification
CN110598001A (en) Method, device and storage medium for extracting association entity relationship
JP2005158010A (en) Apparatus, method and program for classification evaluation
CN111552766B (en) Using machine learning to characterize reference relationships applied on reference graphs
WO2022222300A1 (en) Open relationship extraction method and apparatus, electronic device, and storage medium
CN112632278A (en) Labeling method, device, equipment and storage medium based on multi-label classification
CN113704429A (en) Semi-supervised learning-based intention identification method, device, equipment and medium
CN109783801B (en) Electronic device, multi-label classification method and storage medium
CN111984792A (en) Website classification method and device, computer equipment and storage medium
Hossari et al. TEST: A terminology extraction system for technology related terms
CN112364169B (en) Nlp-based wifi identification method, electronic device and medium
JP6770709B2 (en) Model generator and program for machine learning.
CN112328655A (en) Text label mining method, device, equipment and storage medium
CN111985212A (en) Text keyword recognition method and device, computer equipment and readable storage medium
CN111797115A (en) Employee information searching method and device
CN111639500A (en) Semantic role labeling method and device, computer equipment and storage medium
CN115827871A (en) Internet enterprise classification method, device and system
CN115640376A (en) Text labeling method and device, electronic equipment and computer-readable storage medium
US11580499B2 (en) Method, system and computer-readable medium for information retrieval
CN112380348B (en) Metadata processing method, apparatus, electronic device and computer readable storage medium
CN111339301B (en) Label determining method, label determining device, electronic equipment and computer readable storage medium
Qu et al. Sentence dependency tagging in online question answering forums
CN112988699B (en) Model training method, and data label generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant