CN116032741A - Equipment identification method and device, electronic equipment and computer storage medium - Google Patents

Equipment identification method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN116032741A
CN116032741A CN202111253826.5A CN202111253826A CN116032741A CN 116032741 A CN116032741 A CN 116032741A CN 202111253826 A CN202111253826 A CN 202111253826A CN 116032741 A CN116032741 A CN 116032741A
Authority
CN
China
Prior art keywords
equipment
model
equipment information
information
matching sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111253826.5A
Other languages
Chinese (zh)
Inventor
陶禹诺
时均见
孙孝辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Hangzhou Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202111253826.5A priority Critical patent/CN116032741A/en
Publication of CN116032741A publication Critical patent/CN116032741A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the application discloses a device identification method, a device, an electronic device and a computer storage medium, wherein the method comprises the following steps: acquiring equipment information to be identified; inputting the information of the equipment to be identified into a preset identification model, wherein the preset identification model comprises a single character matching sub-model and a character string matching sub-model; identifying the equipment information to be identified by utilizing the single character matching sub-model, determining a first identification result, identifying the equipment information to be identified by utilizing the character string matching sub-model, and determining a second identification result; and determining a device identification result according to the first identification result and the second identification result. In this way, the single character matching sub-model and the character string matching sub-model are used for carrying out combined identification on the equipment information to be identified, so that the accuracy of identifying the equipment information can be improved, and meanwhile, the identification efficiency is also improved.

Description

Equipment identification method and device, electronic equipment and computer storage medium
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a device identification method, an apparatus, an electronic device, and a computer storage medium.
Background
With the rapid development of the internet and the internet of things, networks have become an indispensable part of life of people, and more intelligent terminals appear, so that challenges are presented for managing gateway down-hanging devices.
In recent years, with the rapid development of integration of three networks and broadband, an intelligent gateway gradually goes into daily life of people, becomes a key for starting intelligent life, and how to dig the potential value of the intelligent gateway deeply is a problem worthy of thinking and research, and the intelligent gateway is a 'heart' of an intelligent home, so that an intelligent gateway user can easily control intelligent equipment in the home. Therefore, identifying which devices a home user associates with using an intelligent gateway is a challenge.
Disclosure of Invention
The application provides a device identification method, a device, electronic equipment and a computer storage medium, which can accurately identify the devices associated with an intelligent gateway, and improve the identification efficiency and accuracy.
The technical scheme of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a device identification method, where the method includes:
acquiring equipment information to be identified;
inputting the equipment information to be identified into a preset identification model, wherein the preset identification model comprises a single character matching sub-model and a character string matching sub-model;
Identifying the equipment information to be identified by using the single character matching sub-model, determining a first identification result, identifying the equipment information to be identified by using the character string matching sub-model, and determining a second identification result;
and determining a device identification result according to the first identification result and the second identification result.
In a second aspect, an embodiment of the present application provides a device identification apparatus, which includes an acquisition unit, an identification unit, and a determination unit, wherein,
the acquisition unit is configured to acquire equipment information to be identified;
the identification unit is configured to input the equipment information to be identified into a preset identification model, wherein the preset identification model comprises a single character matching sub-model and a character string matching sub-model; identifying the equipment information to be identified by using the single character matching sub-model, determining a first identification result, identifying the equipment information to be identified by using the character string matching sub-model, and determining a second identification result;
the determining unit is configured to determine a device identification result according to the first identification result and the second identification result.
In a third aspect, embodiments of the present application provide an electronic device comprising a memory and a processor, wherein,
the memory is used for storing a computer program capable of running on the processor;
the processor is configured to perform the device identification method according to the first aspect when the computer program is run.
In a fourth aspect, embodiments of the present application provide a computer storage medium storing a computer program, which when executed by at least one processor implements the device identification method according to the first aspect.
The embodiment of the application provides a device identification method, a device, electronic equipment and a computer storage medium, wherein the device information to be identified is obtained; inputting the information of the equipment to be identified into a preset identification model, wherein the preset identification model comprises a single character matching sub-model and a character string matching sub-model; identifying the equipment information to be identified by utilizing the single character matching sub-model, determining a first identification result, identifying the equipment information to be identified by utilizing the character string matching sub-model, and determining a second identification result; and determining a device identification result according to the first identification result and the second identification result. In this way, the single character matching sub-model and the character string matching sub-model are used for carrying out combined recognition on the equipment information to be recognized, namely, the equipment recognition result of the equipment information to be recognized is jointly determined according to the recognition results of the two sub-models, so that the accuracy of recognizing the equipment information can be improved, and meanwhile, the recognition efficiency is also improved.
Drawings
Fig. 1 is a schematic flow chart of a device identification method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a training flow of a preset recognition model according to an embodiment of the present application;
fig. 3 is a detailed flowchart of a device identification method according to an embodiment of the present application;
fig. 4 is a detailed flowchart of another device identification method according to an embodiment of the present application;
fig. 5 is a detailed flowchart of another device identification method according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a training framework of a preset recognition model according to an embodiment of the present application;
fig. 7 is a detailed flowchart of still another device identification method according to an embodiment of the present application;
fig. 8 is a schematic diagram of a composition structure of an apparatus identification device according to an embodiment of the present application;
fig. 9 is a schematic diagram of a composition structure of an electronic device according to an embodiment of the present application;
fig. 10 is a schematic diagram of a composition structure of another electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to be limiting. It should be noted that, for convenience of description, only a portion related to the related application is shown in the drawings.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
It should be noted that the term "first\second\third" in relation to the embodiments of the present application is merely to distinguish similar objects and does not represent a specific ordering for the objects, it being understood that the "first\second\third" may be interchanged in a specific order or sequence, where allowed, to enable the embodiments of the present application described herein to be practiced in an order other than that illustrated or described herein.
At present, the identification method of the intelligent gateway down-hanging equipment is very limited, most schemes are that an equipment model database is created, then character string matching is carried out on equipment information reported by the intelligent gateway and keywords in the equipment model database, and the information of the model, the type and the like of the equipment can be obtained after complete matching is successful. However, the method is only suitable for intelligent gateway down-hanging equipment identification with small data volume and relatively fixed equipment type, and has at least the following disadvantages: (1) When the key words of the equipment information reported by the intelligent gateway are incomplete or the arrangement sequence of the key words is slightly changed, the identification effect of the equipment is greatly reduced; (2) Depending too much on the device model database, when a new device not contained in the database appears, it is discarded as invalid device information; (3) The device information which is not successfully matched cannot be deeply analyzed, so that the utilization rate of the device information reported by the intelligent gateway is low; (4) The device model database cannot be automatically updated according to the device information reported by the intelligent gateway, so that the identifiable device model types can be screened and expanded only by manpower.
That is, the information of the down hanging device reported by the intelligent gateway has the characteristics of larger data volume, more device types (such as a smart television, a smart phone, a smart air conditioner, a smart camera and the like), non-uniform information formats (such as a honor_7A-6036f0010, T1-821w-ed7e90066d182, 15557ac28e_bbk-H8A-6173d and the like) and the like, so that the difficulty of accurately identifying the information of the down hanging device model and the like of the intelligent gateway is larger.
Based on this, the embodiment of the application provides a device identification method, and the basic idea of the method is that: acquiring equipment information to be identified; inputting the information of the equipment to be identified into a preset identification model, wherein the preset identification model comprises a single character matching sub-model and a character string matching sub-model; identifying the equipment information to be identified by utilizing the single character matching sub-model, determining a first identification result, identifying the equipment information to be identified by utilizing the character string matching sub-model, and determining a second identification result; and determining a device identification result according to the first identification result and the second identification result. In this way, the single character matching sub-model and the character string matching sub-model are used for carrying out combined recognition on the equipment information to be recognized, namely, the equipment recognition result of the equipment information to be recognized is jointly determined according to the recognition results of the two sub-models, so that the accuracy of recognizing the equipment information can be improved, and meanwhile, the recognition efficiency is also improved.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
In an embodiment of the present application, referring to fig. 1, a schematic flow chart of a device identification method provided in an embodiment of the present application is shown. As shown in fig. 1, the method may include:
s101, acquiring equipment information to be identified.
It should be noted that the device identification method provided in the embodiment of the present application may be applied to a device identification apparatus or an electronic device integrated with the apparatus. Here, the electronic device may be, for example, a computer, a smart phone, a tablet computer, a notebook computer, a palm computer, a personal digital assistant (Personal Digital Assistant, PDA), a navigation device, a server, or the like, which is not particularly limited in the embodiments of the present application.
It should also be noted that, the device identification method provided by the embodiment of the application is mainly applied to accurately identifying the information of the down-hanging device of the intelligent gateway in the home, so that the device information to be identified is usually the device information reported by the intelligent gateway. It will be appreciated that the device information to be identified may be obtained by other means, which is not specifically limited in the embodiments of the present application.
In addition, the device information to be identified is typically a series of messages that may include information related to the device, such as: wang-njaf-oppo-k9, 201-phinom fwr706, netcore-mg-1200ac, dh-nvr2108hs-8p-s1-kd gj, hikvision ezviz cs-n1p-204, and so forth.
S102, inputting the equipment information to be identified into a preset identification model, wherein the preset identification model comprises a single character matching sub-model and a character string matching sub-model.
And S103, identifying the equipment information to be identified by using the single character matching sub-model, determining a first identification result, and identifying the equipment information to be identified by using the character string matching sub-model, and determining a second identification result.
It should be noted that, in the embodiment of the present application, the device information to be identified is identified by a preset identification model, so as to obtain a device identification result, thereby determining a device tag corresponding to the device information to be identified, for example, tag information such as a device model, a device brand, a device type, and the like.
Specifically, the preset recognition model may include a single character matching sub-model and a character string matching sub-model, after the device information to be recognized is input into the preset recognition model, the device information to be recognized is recognized by using the single character matching sub-model and the character string matching sub-model, a first recognition result and a second recognition result are obtained respectively, and a final device recognition result is determined according to the first recognition result and the second recognition result. Here, the single character matching sub-model and the character string matching sub-model may be a convolutional neural network structure, or may be other neural network structures, which is not specifically limited in the embodiment of the present application.
Further, for a training process of a preset recognition model, referring to fig. 2, a schematic diagram of a training flow of the preset recognition model provided in an embodiment of the present application is shown. As shown in fig. 2, the process may include:
s201, acquiring a sample training set from a preset equipment model database.
When determining the preset recognition model, firstly, acquiring a sample training set for model training, wherein the sample training set is acquired from a preset equipment model database, namely, the sample training set consists of equipment information acquired from the preset model database and equipment labels corresponding to the equipment information. The sample training set comprises at least one piece of equipment information and at least one piece of equipment label corresponding to the equipment information, wherein the equipment label at least comprises label information such as equipment model number, equipment brand, equipment type and the like corresponding to the equipment information.
Further, for the preset device model database, in some embodiments, the method may further include:
acquiring an original equipment information set, wherein the original equipment information set comprises at least one piece of equipment information;
performing data cleaning and data statistics on the equipment information in the original equipment information set to obtain an intermediate equipment information set and the occurrence frequency of each piece of equipment information in the intermediate equipment information set;
Comparing the occurrence frequency of each piece of equipment information in the intermediate equipment information set with a preset frequency threshold value to obtain equipment information with the occurrence frequency larger than the preset frequency threshold value;
constructing a total equipment information database according to the equipment information with the occurrence times larger than a preset time threshold;
and grabbing the equipment information total database to obtain a preset equipment model database.
It should be noted that, in the original device information set, at least one device information is included. Because the embodiment of the application is mainly applied to the accurate identification of the intelligent gateway down-hanging equipment, when the original equipment information set is acquired, the equipment information reported by all intelligent gateways can be acquired, and the equipment information can be acquired based on gateway plug-in components.
In this way, after the original equipment information set is acquired, at least some invalid equipment information, repeated equipment information and the like may exist in the equipment information, so that data cleaning and data statistics can be performed on the equipment information in the original equipment information set, invalid and repeated equipment information is removed, only valid equipment information is reserved, the occurrence frequency of each valid equipment information is counted, and an intermediate equipment information set and the occurrence frequency of each piece of equipment information in the intermediate equipment information set are obtained.
In some specific embodiments, performing data cleaning and data statistics on the device information in the original device information set to obtain an intermediate device information set and the occurrence number of each device information in the intermediate device information set may include:
removing invalid equipment information in the original equipment information set by using a preset expression to obtain at least one piece of valid equipment information;
counting the occurrence times of each piece of effective equipment information in at least one piece of effective equipment information, and performing de-duplication processing on the at least one piece of effective equipment information to obtain at least one piece of target equipment information;
and constructing an intermediate device information set according to the at least one target device information, and determining the occurrence number of each device information in the intermediate device information set.
When invalid device information is removed by using a preset expression, it is generally meant that device information is removed under the conditions of null, anonymity, etc.; in addition, since the device information generally includes Chinese characters, letters, and numbers, special characters in addition to the Chinese characters generally represent that the device information is invalid, it can also be rejected. It should be noted that, as technology advances, the device information will also change, so that the possibility that a special character exists in the valid device information is not excluded, and therefore, whether the device information containing the special character is invalid device information can also be determined in combination with the actual scene.
It should be further noted that, when invalid device information is removed, the preset expression used may be a regular expression.
Thus, at least one piece of effective equipment information is obtained after the invalid equipment information in the original equipment information set is removed. And counting the occurrence number of each piece of effective equipment information for the at least one piece of effective equipment information, and performing de-duplication processing on the effective equipment information, specifically, reserving only one piece of the effective equipment information for a plurality of pieces of completely consistent effective equipment information, thereby obtaining at least one piece of target equipment information and the occurrence number corresponding to the at least one piece of target equipment information. In addition, it should be noted that, when counting the occurrence number of each piece of effective device information, the statistics may be implemented by a preset statistics tool, where the preset statistics tool may include MapReduce, and the like, where MapReduce is a programming model for parallel operation of a large-scale data set.
In this way, the embodiment of the application can obtain at least one piece of target equipment information after cleaning and de-duplication, construct the intermediate equipment information set according to the at least one piece of target equipment information, and determine the occurrence frequency of each piece of equipment information in the intermediate equipment information set.
In this way, the occurrence frequency of each piece of equipment information in the intermediate equipment information set is compared with the preset frequency threshold value, the equipment information with the occurrence frequency larger than the preset frequency threshold value is obtained, and the equipment information with the occurrence frequency larger than the preset frequency threshold value is utilized to construct the equipment information total database.
It should be further noted that, for the device information reported by the intelligent gateway, the larger the occurrence number of the device information is, the higher the use frequency of the device is, and in the embodiment of the application, the device information that the occurrence number of the intermediate device information set is greater than the preset number threshold is reserved, and for the setting of the preset number threshold, the setting can be determined in combination with the actual use requirement, which is not particularly limited in the embodiment of the application.
It should be further noted that, because the embodiment of the application mainly performs recognition analysis on the down-hanging device of the intelligent gateway in the home, the device information only has a device maximum probability of 1 or 2 times and does not belong to member devices of the home, if too much device information is needed, the difficulty of first training a preset recognition model can be improved, and the recognition accuracy is affected, so that the embodiment of the application only builds a device information total database according to the effective device information with the occurrence number greater than the preset number threshold. In addition, if the application scenario is relatively simple and the number of related device information is small, all the device information can be reserved, but since the device information is usually massive, a deletion operation is necessary.
Thus, after the equipment information total database is determined, the equipment information total database is subjected to grabbing processing, so that a preset equipment model database is obtained. Specifically, in some embodiments, the grabbing processing is performed on the device information total database to obtain a preset device model database, which may include:
respectively acquiring seed URLs of each piece of equipment information in the equipment information total database by utilizing a web crawler mode, and acquiring webpage information corresponding to each seed URL by utilizing a web crawler technology;
respectively acquiring a seed Uniform Resource Locator (URL) of each piece of equipment information in the equipment information total database by utilizing a web crawler mode, and acquiring webpage information corresponding to the seed URL of each piece of equipment information by utilizing the web crawler mode;
determining a device tag corresponding to each piece of device information from the webpage information corresponding to the seed URL of each piece of device information;
and constructing a preset equipment model database according to each piece of equipment information and the equipment label corresponding to each piece of equipment information.
That is, for any device information in the device information total database, the web crawler is first utilized to automatically obtain the seed uniform resource locator (Uniform Resource Locator, URL) related to the device information from the query result of the search engine, wherein the query result of the search engine is determined after searching the device information in the search engine.
And then, for the grabbed seed URL, acquiring corresponding webpage information again by utilizing a web crawler technology, and screening out corresponding equipment labels from the webpage information, such as label information of equipment model, equipment brand, equipment type and the like.
And finally, constructing a preset equipment model database according to each piece of equipment information and each corresponding piece of equipment label.
Further, in some embodiments, the method may further comprise:
if one piece of equipment information cannot successfully acquire the seed URL, storing the one piece of equipment information into an invalid equipment information database; and/or the number of the groups of groups,
if one piece of equipment information cannot successfully determine the equipment label, storing the one piece of equipment information into an invalid equipment information database.
It should be noted that, for the device information in the device information total database, if after the search engine searches, it is impossible to search for relevant information of some device information, that is, for some device information, the seed URL cannot be successfully obtained, or for some device information, when the seed URL is obtained but the crawler is again, the device tag cannot be successfully obtained, for these device information, possibly, because the device information is invalid or the corresponding device is not fully marketed, and so on, the device tag cannot be obtained temporarily in the web page, then it is stored in the invalid device information database.
Thus, after the preset device model database is constructed, a sample training set can be obtained from the preset device model database so as to perform model training.
S202, training the single character matching sub-network by using a sample training set to obtain a single character matching sub-model, and training the character string matching sub-network by using the sample training set to obtain a character string matching sub-model.
The training set is used for training the single character matching sub-network and the character string matching sub-network respectively, so that the single character matching sub-model and the character string matching sub-model can be obtained respectively. In addition, single character matching sub-networks and string matching sub-networks typically select convolutional neural networks.
Further, for training of the single character matching sub-model and the string matching sub-model, in some embodiments, the method may further comprise:
creating an equipment tag database according to the equipment tags corresponding to the at least one piece of equipment information;
correspondingly, training the single character matching sub-network by using the sample training set to obtain a single character matching sub-model can comprise:
performing single character segmentation on the equipment information in the sample training set to determine a single character Fu Ciku;
Performing single character level vector conversion on the single word Fu Ciku and the sample training set to obtain a single character training vector set;
inputting the single character training vector set into a single character matching sub-network, and performing supervised iterative training by using a device tag database to obtain a character matching sub-model;
training the character string matching sub-network by using the sample training set to obtain a character string matching sub-model, which may include:
performing character string segmentation on the equipment information in the sample training set, and determining a character string word stock;
performing string level vector conversion on the string word library and the sample training set to obtain a string training vector set;
and inputting the character string training vector set into a character string matching sub-network, and performing supervised iterative training by using the equipment tag database to obtain a character string matching sub-model.
When creating the device tag database, all the device tags may be extracted from the sample training set and subjected to the deduplication process, and then the device tag database may be created according to the device tags after the deduplication process. That is, in the device tag database, only one of each device tag is reserved.
It should also be noted that, in the embodiment of the present application, the device tag database is used to perform supervised iterative training on the single-character matching sub-network and the character string matching sub-network to obtain the single-character matching sub-model and the character string matching sub-model.
Specifically, single character segmentation or character string segmentation is respectively carried out on the equipment information in the sample training set, so that a single character word stock and a character string word stock are respectively obtained; and then, performing single character level vector conversion on the single word Fu Ciku and the sample training set to obtain a single character training vector set, and performing character string level vector conversion on the character string word stock and the sample training set to obtain a character string training vector set.
Illustratively, for device information: yunuo-oppo-k9, the result of single character segmentation of which is: y u n u o o p p o k9, eleven characters in total; the result of the character string segmentation is as follows: yunuo oppo k9, three words in total.
Inputting the single character training vector set into a single character matching sub-network to extract and classify key features, and performing supervised iterative training by using an equipment tag database to obtain a character matching sub-model; and inputting the character string training vector set into a character string matching sub-network to extract and classify key features, and performing supervised iterative training by using an equipment tag database to obtain a character string matching sub-model.
It should be further noted that, when performing iterative training on the two sub-models, two independent loss functions may be used for the single character matching sub-network and the character string matching sub-network, and the two independent loss functions may be the same loss function, for example, both are cross entropy loss functions. Thus, after the classification iterative training is completed, a single character matching sub-model and a character string matching sub-model can be obtained respectively.
S203, determining a preset recognition model according to the single character matching sub-model and the character string matching sub-model.
It should be noted that, model combination is performed on the single character matching sub-model and the character string matching sub-model, so as to obtain the preset recognition model.
That is, the embodiment of the present application may obtain the preset recognition model according to the above steps S201 to S203. In this way, the single character matching sub-model and the character string matching sub-model in the preset recognition model can be used for respectively recognizing the equipment information to be recognized, and a first recognition result and a second recognition result are respectively obtained.
For the first recognition result, in some embodiments, using the single character matching sub-model to recognize the device information to be recognized, determining the first recognition result may include:
performing feature extraction and classification on the equipment information to be identified by utilizing the single character matching sub-model, and determining a first probability that the equipment information to be identified belongs to each equipment label;
and selecting the maximum value from the first probability, and determining the equipment label corresponding to the maximum value as a first identification result.
For the second recognition result, in some embodiments, recognizing the device information to be recognized using the character string matching sub-model, determining the second recognition result may include:
Performing feature extraction and classification on the equipment information to be identified by using the character string matching sub-model, and determining a second probability that the equipment information to be identified belongs to each equipment label;
and selecting a maximum value from the second probability, and determining the equipment label corresponding to the maximum value as a second identification result.
The method comprises the steps that key feature extraction and classification are carried out on equipment information to be identified by utilizing a single character matching sub-model and a character string matching sub-model respectively, so that first probability that the equipment information to be identified belongs to each equipment label can be obtained through the single character matching sub-model, and the equipment label corresponding to the maximum value in the first probability is determined to be a first identification result; and obtaining a second probability that the equipment information to be identified belongs to each equipment label through the character string matching sub-model, and determining the equipment label corresponding to the maximum value in the second probability as a second identification result. Thus, the first recognition result and the second recognition result are obtained, respectively.
In addition, in some embodiments, after obtaining the device information to be identified, the method may further include:
matching the equipment information to be identified with the equipment information in the invalid equipment information database;
If the matching is successful, determining the equipment information to be identified as invalid equipment information;
if the matching is unsuccessful, the step of inputting the equipment information to be identified into a preset identification model is executed.
It should be noted that, because there is a possibility that the device information to be identified is invalid, after the device information to be identified is obtained, the embodiment of the present application may also match the device information to be identified with the device information in the invalid device information database, if the matching is successful, it may be determined that the device information to be identified is invalid device information, and at this time, identification by a preset identification model is not needed; if the matching is unsuccessful, a step of inputting the information of the device to be recognized into a preset recognition model is required to perform recognition. In this way, the calculation pressure of the equipment identification device can be reduced, and the invalid identification is avoided, so that the resource waste is caused.
It should be further noted that, in the embodiment of the present application, if the device information to be identified is invalid device information, the invalid device information database may also be updated by using the device information to be identified.
S104, determining a device identification result according to the first identification result and the second identification result.
The final device identification result can be determined according to the first identification result and the second identification result. Specifically, in some embodiments, determining the device identification result according to the first identification result and the second identification result may include:
if the first identification result is equal to the second identification result, determining the first identification result as a device identification result;
if the first recognition result is not equal to the second recognition result, determining a first classification probability corresponding to the first recognition result and a second classification probability corresponding to the second recognition result, and determining a device recognition result according to a comparison result of the first classification probability and a first judgment threshold and a comparison result of the second classification probability and a second judgment threshold.
In determining the device identification result, if the first identification result and the second identification result are the same, the first identification result (or the second identification result) is directly determined as the final device identification result.
If the first recognition result and the second recognition result are different, the first classification probability (namely, the maximum value of the first probability) corresponding to the first recognition result, the second classification probability (namely, the maximum value of the second probability) corresponding to the second recognition result, and the first judgment threshold value and the second judgment threshold value are combined for further determination.
The first judging threshold is an optimal threshold for judging whether the identification result of the single character matching sub-model is reliable or not, and the second judging threshold is an optimal threshold for judging whether the identification result of the character string matching sub-model is reliable or not. The embodiment of the application can respectively input the same test set into the single character matching sub-model and the character string matching sub-model, so as to respectively obtain a first judgment threshold value and a second judgment threshold value.
Specifically, in some embodiments, determining the device identification result according to the comparison result of the first classification probability and the first judgment threshold value and the comparison result of the second classification probability and the second judgment threshold value may include:
if the first classification probability is greater than or equal to the first judgment threshold value and the second classification probability is less than the second judgment threshold value, determining that the equipment identification result is a first identification result;
if the first classification probability is smaller than the first judgment threshold value and the second classification probability is larger than or equal to the second judgment threshold value, determining that the equipment identification result is a second identification result;
and if the first classification probability is greater than or equal to the first judgment threshold value and the second classification probability is greater than or equal to the second judgment threshold value, or the first classification probability is smaller than the first judgment threshold value and the second classification probability is smaller than the second judgment threshold value, determining that the equipment identification result is unidentified.
In the case where the first recognition result and the second recognition result are different, if the first classification probability is greater than or equal to the first judgment threshold value and the second classification probability is less than the second judgment threshold value, the first recognition result is determined as the device recognition result. If the first classification probability is less than the first judgment threshold and the second classification probability is greater than or equal to the second judgment threshold, determining the second recognition result as a device recognition result. If the first classification probability is greater than or equal to the first judgment threshold value, and the second classification probability is greater than or equal to the second judgment threshold value; or, the first classification probability is smaller than the first judgment threshold value, and the second classification probability is smaller than the second judgment threshold value, then the equipment identification result is determined to be unidentified. Thus, the final device identification result of the device information to be identified is obtained.
Further, in some embodiments, after determining the device identification result, the method may further include:
if the equipment identification result is not identified, adding the equipment information to be identified into a preset equipment model database to obtain a new preset equipment model database;
and updating and training the preset recognition model by using a new preset equipment model database.
It should be noted that if the device identification result is not identified, the device information to be identified is possibly new device information, and at this time, the device information to be identified is added to the preset device model database, so as to obtain a new preset device model database, and the new preset device model database is used for updating and training the preset identification model.
When the update training is performed on the preset recognition model, the web crawler technology may be used to obtain the device tag of the device information to be recognized and update the preset device model database according to the partial steps described in the foregoing steps S201 to S203, and update the single character word library and the character string word library, and then update and train the single character matching sub-model and the character string matching sub-model again to implement iterative update on the preset recognition model.
It should be further noted that if the seed URL or the device tag is not successfully acquired in the process of performing the web crawler on the device information to be identified, the device information to be identified may be added to the invalid device information database as described above.
In addition, in the foregoing embodiment, the device information in the invalid device information library may be a device tag temporarily acquired on the network, and the device information may be valid as the network information increases; for example, when the product is in the testing stage, the device-related information is not disclosed, the web crawler cannot acquire the device tag, and when the product is formally marketed, the device-related information is disclosed in the network, so that the device information may be valid.
Thus, in some embodiments, the method may further comprise:
judging whether the equipment information in the invalid equipment information database is valid or not;
if the equipment information in the invalid equipment information database is judged to be valid, updating and training the preset identification model by using the equipment information.
It should be noted that, in the embodiment of the present application, a certain period of time may be spaced or at a preset time, and if the device tag of the device information in the invalid device information database is obtained by using the web crawler technology, and if the device tag is obtained successfully, it is indicated that the device information has become valid device information, the preset recognition model may be updated and trained accordingly, and the updating method is the same as the foregoing, and the corresponding device information is deleted from the invalid device information database, so as to avoid misjudgment caused by untimely information updating.
The embodiment provides an identification method, which comprises the steps of obtaining equipment information to be identified; inputting the information of the equipment to be identified into a preset identification model, wherein the preset identification model comprises a single character matching sub-model and a character string matching sub-model; identifying the equipment information to be identified by utilizing the single character matching sub-model, determining a first identification result, identifying the equipment information to be identified by utilizing the character string matching sub-model, and determining a second identification result; and determining a device identification result according to the first identification result and the second identification result. In this way, the equipment information to be identified is combined and identified through the single character matching sub-model and the character string matching sub-model, and the equipment identification result of the equipment information to be identified is jointly determined according to the identification results of the two sub-models, so that the accuracy of identifying the equipment information is effectively improved, and meanwhile, the identification efficiency is also improved; even if the keywords of the equipment information are incomplete or the arrangement sequence of the keywords is changed, the equipment information can be accurately identified, and the identification effect is greatly improved; in addition, in the training and updating process of the preset identification model, the device labels corresponding to the device information are automatically acquired through the web crawler technology, so that the automatic and accurate acquisition of the device labels and the automatic updating of the preset identification model are realized, and the new device information which does not exist in the preset device model database can be effectively identified.
In another embodiment of the present application, reference is made to fig. 3, which is a detailed flow chart of a device identification method provided in an embodiment of the present application. As shown in fig. 3, the detailed flow may include:
s301, cleaning and counting intelligent gateway reporting equipment information.
It should be noted that, in the embodiment of the present application, the device information may be reported based on the intelligent gateway as the original data information for training the preset recognition model. Specifically, firstly, all the intelligent gateway reported equipment information is cleaned and counted for model training.
Further, for step S301, the specific implementation process may be referred to fig. 4, which shows a detailed flowchart of another device identification method provided in the embodiment of the present application. As shown in fig. 4, the detailed flow may include:
and S301a, storing the intelligent gateway report device information into a device information total database.
It should be noted that, in this step, a device information total database (also referred to as a device report information total database) based on the Hadoop distributed file system (Hadoop Distributed File System, HDFS) may be created, and device information reported by all intelligent gateways may be stored in the device information total database. Since HDFS has advantages of high reliability, high expandability, high throughput, and the like, the embodiment of the present application creates a device information total database using HDFS, but is not particularly limited.
S301b, removing invalid equipment information in the equipment information total database by using the regular expression.
It should be noted that, this step may use a regular expression to remove special characters except for Chinese characters, letters and numbers in the total database of device information, and invalid device information in the situations of empty, anonymity, etc.
S301c, counting the occurrence times of different equipment information by using a preset counting tool, and completing de-duplication.
It should be noted that, in this step, the number of occurrences of different device information (i.e., the valid device information remaining after the invalid device information is removed) in the device information total database may be counted by using a preset counting tool (e.g., mapReduce), and the device information in the device information total database is subjected to a deduplication operation.
S301d, screening effective and high-frequency equipment information according to the statistics times of the equipment information.
S301e, updating a device information total database.
According to the statistics result of the occurrence number of the device information and the preset number threshold, the device information total database is updated by using the device information with the occurrence number of the device information larger than the preset number threshold, and only the effective and high-frequency device information is reserved in the device information total database.
S302, constructing/updating a preset equipment model database by the web crawler.
It should be noted that the preset device model database (also referred to as a device model database) may include: the device information in the device information total database and four types of data corresponding to the device model, the device brand and the device type.
Further, for step S302, the specific implementation process may be referred to fig. 5, which shows a detailed flowchart of still another device identification method provided in the embodiment of the present application. As shown in fig. 5, the detailed flow may include:
s302a, the web crawler technology automatically acquires seed URLs of all the device information in the device information total database.
It should be noted that, in this step, the web crawler technology is utilized to automatically obtain the relevant seed URL of the device information from the query result of the search engine.
S302b, capturing webpage information corresponding to the seed URL.
It should be noted that, in this step, the crawlers are used again to obtain the corresponding web page information for the crawlers.
S302c, three kinds of information including equipment model, equipment brand and equipment type are screened out from the webpage information.
The three types of information (i.e., the device tag, also referred to as the device model information) including the device model, the device brand, and the device type are selected from the web page information.
And S302d, if the equipment label grabbing is successful, adding the equipment information and the corresponding equipment model, equipment brand and equipment type into an equipment model database.
S302e, if the device tag grabbing is unsuccessful, adding the device information into an invalid device information database.
For the device information that can successfully capture the device tag, the device information and the corresponding device tag (i.e., the device model, the device brand, and the device type corresponding to the device information) are added into the device model database.
For device information that cannot be successfully captured to the device tag, the device information is added to the invalid device information database.
And (3) carrying out the steps on each piece of equipment information in the equipment information total database to complete the construction of the equipment model database, and storing the equipment information which cannot be obtained by the network crawler into the equipment information database, wherein the equipment model, the equipment brand and the equipment type are stored into the invalid equipment information database.
S303, training a preset recognition model based on a convolutional neural network.
After the device model database is obtained, model training may be performed according to the device model database to obtain a preset recognition model (also referred to as a device precise recognition model) based on the convolutional neural network.
It should be noted that, for step S303, the network structure and the training process of the model training may be referred to fig. 6, which shows a training frame schematic diagram of a preset recognition model provided in the embodiment of the present application. As shown in fig. 6, the frame may include: a training vector set acquisition section 601, a sub-network training section 602, and a model combining section 603.
In the training vector set obtaining portion 601, the main functions include respectively obtaining a single word Fu Ciku (also called a word stock) and a character string word stock (also called a word stock) by respectively adopting a single character segmentation method and a character string segmentation method on equipment information in an equipment model database, and then respectively performing single character level vector conversion and character string level vector conversion on the single word Fu Ciku and the character word stock and the equipment model database to obtain a training vector set M1 and a training vector set N1; where M1 represents a single character training vector set and N1 represents a string training vector set.
In the sub-network training section 602, a single character matching sub-network and a character string matching sub-network are included; the single character matching sub-network comprises a convolution layer C1, a convolution layer C2, a convolution layer C3, a full connection layer F1, a dropout layer, an activation layer R1, a full connection layer F2 and a cross entropy loss layer S1; the character string matching sub-network comprises a convolution layer D1, a convolution layer D2, a convolution layer D3, a full connection layer T1, a dropout layer, an activation layer K1, a full connection layer T2 and a cross entropy loss layer S2. The main functions of the sub-network training part 602 include that M1 and N1 are respectively input into a single character matching sub-network and a character string matching sub-network to extract key features of equipment information, the two sub-networks adopt two independent cross entropy loss functions and unified equipment labels (also called equipment labels and from an equipment label database) to perform iterative supervised training, and after the equipment label classification iterative training is completed, a single character matching sub-model (also called equipment identification letter model) and a character string matching sub-model (also called equipment identification word model) are respectively obtained.
In the model combining section 603, the main functions include model combination of the single character matching sub-model and the character string matching sub-model to obtain a preset recognition model.
It should be noted that, the classification training based on the convolutional neural network mainly adopts a parallel network structure of a single character matching sub-network and a character string matching sub-network, the network structure can refer to fig. 6, a sample training set used by a training model is a device model database constructed by a web crawler, a device model, a device brand and a device type corresponding to each piece of device information in the device model database are extracted, then unified duplication removal operation is performed, and a device label database is constructed by using the device model, the device brand and the device type after duplication removal. And then, automatically creating and updating a single word Fu Ciku and a character string word stock according to a sample training set by utilizing a single character segmentation method and a character string segmentation method, respectively obtaining a training vector set M1 and a training vector set N1 by utilizing single character level vector conversion and character string level vector conversion by utilizing a single word Fu Ciku and the character string word stock and a device model database, respectively inputting the M1 and the N1 into a single character matching sub-network and a character string matching sub-network to extract key features of device information, performing iterative supervised training by adopting two independent cross entropy loss functions and unified device labels by adopting the two sub-networks, respectively obtaining a single character matching sub-model and a character string matching sub-model after finishing classification iterative training, and combining the two sub-models to serve as a preset recognition model.
S304, presetting the use of an identification model.
It should be noted that, firstly, the same test set is input into two sub-models, the optimal threshold values (i.e., the first judgment threshold value and the second judgment threshold value in the foregoing embodiment) for identifying the two sub-models are obtained respectively, the device information to be identified is input into the single character matching sub-model and the character string matching sub-model respectively for extracting and classifying key features of the device information, the probability that the device information belongs to each device label is arranged from large to small, the device label with the largest classification probability in the two sub-models is taken as the identification result of the sub-model respectively, the identification results of the two sub-models are compared with the optimal threshold values thereof, and the device identification result (also referred to as the device accurate identification result) is finally output according to the set threshold value discrimination rule.
Determining that the device identification result follows the following threshold discrimination rule, see in particular formula (1):
Figure BDA0003323334930000131
in formula (1), T L Represents a first judgment threshold value, T W Represents a second judgment threshold value, R L Representing a first recognition result of the equipment information to be recognized in the single character matching submodel, which belongs to R L Is P L ,R W Representing a second recognition result of the equipment information to be recognized in the character string matching submodel, which belongs to R W The second classification probability of P W R represents the device identification result.
S305, automatically identifying and updating newly-added intelligent gateway down-hanging equipment.
It should be noted that, for step S305, the specific implementation process may be referred to fig. 7, which shows a detailed flowchart of still another device identification method provided in the embodiment of the present application. As shown in fig. 7, the detailed flow may include:
and S305a, the intelligent gateway reports the information of the equipment to be identified.
S305b, determining whether the device information to be identified exists in the invalid device information database.
It should be noted that, if the determination result is yes, step S305i is executed; if the determination is negative, step S305c is performed.
S305c, inputting the equipment information to be identified into a preset identification model for identification.
S305d, judging whether the identification result is unidentified.
It should be noted that, if the determination result is yes, step S305e is executed; if the determination result is negative, step S305j is performed.
S305e, outputting the equipment information to be identified.
And S305f, updating the equipment model database by the web crawler.
S305g, updating the word Fu Ciku and the string thesaurus.
S305h, iterative training of a preset recognition model based on a convolutional neural network.
It should be noted that if the device tag of the device information to be identified can be successfully obtained by using the web crawler, the device information and the corresponding device tag are used to update the device model library, further update the word Fu Ciku and the character string library, and finally perform iterative training based on the preset identification model of the convolutional neural network, thereby completing iterative update of the preset identification model.
S305i, outputting the device information to be identified as invalid device information.
It should be noted that if the device information to be identified exists in the invalid device information database, the device information to be identified is directly determined as the invalid device information.
S305j, outputting the identified device model, device brand and device type.
In the identification process of the equipment information to be identified, firstly, matching the equipment information to be identified reported by the intelligent gateway with equipment information in an invalid equipment information database, and if the matching is successful, ending the identification and outputting: the reported information is invalid equipment information, and an invalid equipment information database can be updated at the same time; if the matching is unsuccessful, continuing to execute the process of the step S304, identifying the equipment information to be identified by using a preset identification model, and when the identification result obtained by the preset identification model is 'unidentified', adding unidentified newly-added equipment information into an equipment model database according to the steps S302 and S303 in sequence, so as to complete automatic crawling of equipment labels and update the preset identification model.
As can be seen from the foregoing, the embodiment of the present application provides a device identification method, and the implementation of the method may be divided into five steps: s301, cleaning and counting intelligent gateway reporting equipment information, S302, constructing/updating an equipment model database by a web crawler, S303, training a preset recognition model based on a convolutional neural network, S304, using the preset recognition model, and S305, automatically recognizing and updating newly-added intelligent gateway down-hanging equipment.
Step S301 is briefly described as follows: and creating a device information total database based on the Hadoop distributed file system, and storing all intelligent gateway reporting device information into the device information total database.
And removing special characters except Chinese characters, letters and numbers from the equipment information total database by adopting a regular expression, and removing invalid equipment information in the conditions of empty, anonymity and the like.
And counting the occurrence times of different equipment information in the equipment information total database by using MapReduce, and performing duplication removal operation on the equipment information.
And according to the statistical result, combining with a preset frequency threshold, updating the original equipment information total database by using the equipment information with the occurrence frequency of the equipment information being greater than the preset frequency threshold, and only retaining the effective and high-frequency equipment information. The process flow diagram is shown in fig. 4.
Step S302 is briefly described as follows: the device model database includes: the device information in the device information total database and four types of data corresponding to the device model, the device brand and the device type.
The web crawler technology is first utilized to automatically acquire relevant seed URLs of device information from query results of a search engine.
And then, obtaining corresponding webpage information by utilizing the web crawler technology again for the grabbed seed URL.
And finally, three kinds of information including equipment model, equipment brand and equipment type are screened out from the webpage information.
And (3) carrying out the steps on each piece of effective equipment information reported by the intelligent gateway to complete the construction of an equipment model database, and storing the equipment information reported by the intelligent gateway, which cannot be obtained by the crawler, into an invalid equipment information database. This process is detailed in fig. 5.
Step S303 is briefly described as follows: the device model classification training based on the convolutional neural network mainly adopts a parallel network structure of a single character matching sub-network and a character string matching sub-network, the network structure is shown as a graph X, a sample training set used for training is a device model database constructed by a web crawler, unified de-duplication operation is carried out after the device model, the device brand and the device type corresponding to each piece of device information in the device model database are extracted, a device label database is constructed by using the de-duplicated device model, the device brand and the device type, then a single character segmentation method and a character string segmentation method are utilized, automatic creation and updating of a single character Fu Ciku and a character string word library according to a training data set are realized, the single character Fu Ciku and the character string word library are respectively matched with the device model database by utilizing single character level vector conversion and character string level vector conversion to obtain a training vector set M1 and a training vector set N1, the M1 and the N1 are respectively input into the single character matching sub-network and the character string matching sub-network to carry out key feature extraction of the device information, the two sub-networks adopt two independent cross entropy functions and unified device labels to carry out iterative training model matching and the iterative training model is respectively, and the single character matching sub-model is obtained after the iterative training model matching is respectively matched with the single character model. This process is detailed in fig. 6.
Step S304 is briefly described as follows: firstly, inputting the same test set into two sub-models, respectively obtaining optimal thresholds for equipment identification by the two sub-models, respectively inputting equipment information to be identified into a single character matching sub-model and a character string matching sub-model for equipment information key feature extraction and classification, arranging the probability of the equipment information belonging to each equipment label from large to small, respectively taking the label with the largest classification probability in the two models as an identification result of the model, comparing the identification result of the two models with the thresholds, and finally outputting the accurate identification result of the equipment according to a set threshold judging rule. Determining threshold discrimination rules followed by device identification results see (1)
Step S305 is briefly described as follows: firstly, matching the newly added equipment information to be identified with the information in the invalid equipment information database, ending identification if the matching is successful, and outputting: the reported information is invalid equipment information; if the matching is unsuccessful, continuing to step S304, and when the result obtained by the preset recognition model is 'unidentified', sequentially performing step S302 and step S303, adding unidentified equipment information into an equipment model database, completing automatic equipment label crawling and updating the preset recognition model. This process is detailed in fig. 7.
The embodiment provides a device identification method, and detailed description is made on specific implementation of the foregoing embodiment through the foregoing embodiment, so that compared with the intelligent gateway down-hanging device identification method provided by the related art, the embodiment of the application combines a big data analysis technology, a web crawler and a convolutional neural network classification algorithm, and realizes accurate identification of multi-dimensional information such as type, brand and model of down-hanging device. Automatically grabbing equipment models, equipment brands and equipment types corresponding to the intelligent gateway reporting equipment information in the equipment information total database by utilizing a web crawler technology, and constructing a preset equipment model database and an invalid equipment information database; constructing a single word Fu Ciku and a character string word stock by using a single character segmentation method and a character string segmentation method respectively, obtaining corresponding training vector sets by using single character level vector conversion and character string level vector conversion by using the single word Fu Ciku and the character string word stock and the equipment model database respectively, inputting the two training vector sets into a corresponding single character matching sub-network and a corresponding character string matching sub-network respectively to extract key features of equipment information, performing iterative supervised training by using two independent cross entropy loss functions and unified equipment labels by using the two sub-networks, and finally obtaining a single character matching sub-model and a character string matching sub-model, wherein the combination of the two sub-models is used as a preset recognition model; when the intelligent gateway reports new equipment information to be identified, key feature extraction and classification identification are carried out on the equipment information to be identified by utilizing a preset identification model, when the preset identification model cannot be accurately classified, a web crawler is automatically started to update an equipment model database, the updated equipment model database is utilized to carry out preset identification model training again, an updated equipment accurate identification model is obtained, automatic accurate identification of the intelligent gateway hanging equipment is realized, and automatic identification of newly-added equipment labels and equipment model database updating are realized while the identification accuracy of the intelligent gateway hanging equipment is improved.
In yet another embodiment of the present application, referring to fig. 8, a schematic diagram of a composition structure of a device identification apparatus 80 provided in an embodiment of the present application is shown. As shown in fig. 8, the device identification apparatus 80 may include an acquisition unit 801, an identification unit 802, and a determination unit 803, wherein,
an acquiring unit 801 configured to acquire device information to be identified;
a recognition unit 802 configured to input the device information to be recognized into a preset recognition model, where the preset recognition model includes a single character matching sub-model and a character string matching sub-model; identifying the equipment information to be identified by using the single character matching sub-model, determining a first identification result, identifying the equipment information to be identified by using the character string matching sub-model, and determining a second identification result;
a determining unit 803 configured to determine a device identification result according to the first identification result and the second identification result.
In some embodiments, the determining unit 803 is further configured to match the device information to be identified with device information in an invalid device information database; if the matching is successful, determining that the equipment information to be identified is invalid equipment information; and if the matching is unsuccessful, the step of inputting the equipment information to be identified into a preset identification model is executed.
In some embodiments, referring to fig. 8, the device identification apparatus 80 may further include a training unit 804 configured to obtain a sample training set from a preset device model database; the sample training set comprises at least one piece of equipment information and equipment labels corresponding to the at least one piece of equipment information, wherein the equipment labels at least comprise equipment models, equipment brands and equipment types; training a single character matching sub-network by using the sample training set to obtain the single character matching sub-model, and training a character string matching sub-network by using the sample training set to obtain the character string matching sub-model; and determining the preset recognition model according to the single character matching sub-model and the character string matching sub-model.
In some embodiments, training unit 804 is further configured to obtain a set of original device information, the set of original device information including at least one device information; performing data cleaning and data statistics on the equipment information in the original equipment information set to obtain an intermediate equipment information set and the occurrence frequency of each piece of equipment information in the intermediate equipment information set; comparing the occurrence frequency of each piece of equipment information in the intermediate equipment information set with a preset frequency threshold value to obtain equipment information with the occurrence frequency larger than the preset frequency threshold value; constructing a total database of the equipment information according to the equipment information of which the occurrence number is larger than the preset number threshold; and performing grabbing processing on the equipment information total database to obtain the preset equipment model database.
In some embodiments, the training unit 804 is specifically configured to reject the invalid device information in the original device information set by using a preset expression to obtain at least one valid device information; counting the occurrence times of each piece of effective equipment information in the at least one piece of effective equipment information, and performing de-duplication processing on the at least one piece of effective equipment information to obtain at least one piece of target equipment information; and constructing the intermediate device information set according to the at least one piece of target device information, and determining the occurrence number of each piece of device information in the intermediate device information set.
In some embodiments, the training unit 804 is further specifically configured to obtain, by using a web crawler manner, a seed URL of each piece of equipment information in the equipment information total database, and obtain, by using the web crawler manner, web page information corresponding to the seed URL of each piece of equipment information; determining a device tag corresponding to each piece of device information from the webpage information corresponding to the seed URL of each piece of device information; and constructing the preset equipment model database according to each piece of equipment information and the equipment label corresponding to each piece of equipment information.
In some embodiments, the training unit 804 is further configured to store one of the device information to the invalid device information database if the seed URL cannot be successfully obtained by the one of the device information; and/or if one piece of equipment information cannot successfully determine the equipment label, storing the one piece of equipment information into an invalid equipment information database.
In some embodiments, the training unit 804 is further specifically configured to create a device tag database according to the device tags corresponding to the at least one device information; performing single character segmentation on the equipment information in the sample training set to determine a single character Fu Ciku; performing single character level vector conversion on the single character word stock and the sample training set to obtain a single character training vector set; inputting the single character training vector set into the single character matching sub-network, and performing supervised iterative training by using the equipment tag database to obtain the character matching sub-model; performing character string segmentation on the equipment information in the sample training set, and determining a character string word stock; performing string level vector conversion on the string word library and the sample training set to obtain a string training vector set; and inputting the character string training vector set into the character string matching sub-network, and performing supervised iterative training by using the equipment tag database to obtain the character string matching sub-model.
In some embodiments, the determining unit 803 is specifically configured to perform feature extraction and classification on the to-be-identified device information by using the single character matching sub-model, and determine a first probability that the to-be-identified device information belongs to each device tag; and selecting a maximum value from the first probability, and determining the equipment label corresponding to the maximum value as the first identification result.
In some embodiments, the determining unit 803 is specifically configured to perform feature extraction and classification on the to-be-identified device information by using the character string matching sub-model, and determine a second probability that the to-be-identified device information belongs to each device tag; and selecting a maximum value from the second probability, and determining the equipment label corresponding to the maximum value as the second identification result.
In some embodiments, the determining unit 803 is specifically configured to determine the first recognition result as the device recognition result if the first recognition result is equal to the second recognition result; and if the first recognition result is not equal to the second recognition result, determining a first classification probability corresponding to the first recognition result and a second classification probability corresponding to the second recognition result, and determining the equipment recognition result according to a comparison result of the first classification probability and a first judgment threshold value and a comparison result of the second classification probability and a second judgment threshold value.
In some embodiments, the determining unit 803 is specifically configured to determine that the device identification result is the first identification result if the first classification probability is greater than or equal to the first judgment threshold value and the second classification probability is less than the second judgment threshold value; and if the first classification probability is smaller than the first judgment threshold value and the second classification probability is larger than or equal to the second judgment threshold value, determining that the equipment identification result is the second identification result; and if the first classification probability is greater than or equal to the first judgment threshold and the second classification probability is greater than or equal to the second judgment threshold, or if the first classification probability is less than the first judgment threshold and the second classification probability is less than the second judgment threshold, determining that the equipment identification result is unidentified.
In some embodiments, referring to fig. 8, the device identifying apparatus 80 may further include an updating unit 805 configured to, if the device identifying result is unidentified, add the device information to be identified to a preset device model database, to obtain a new preset device model database; and updating and training the preset recognition model by utilizing the new preset equipment model database.
In some embodiments, the updating unit 805 is further configured to determine whether the device information in the invalid device information database is valid; and if the equipment information in the invalid equipment information database is judged to be valid, updating and training the preset identification model by using the equipment information.
It will be appreciated that in this embodiment, the "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., and may of course be a module, or may be non-modular. Furthermore, the components in the present embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional modules.
The integrated units, if implemented in the form of software functional modules, may be stored in a computer-readable storage medium, if not sold or used as separate products, and based on such understanding, the technical solution of the present embodiment may be embodied essentially or partly in the form of a software product, which is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform all or part of the steps of the method described in the present embodiment. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Accordingly, the present embodiment provides a computer storage medium storing a computer program which, when executed by at least one processor, implements the device identification method of any of the preceding embodiments.
Based on the above-described composition of the device identification apparatus 80 and the computer storage medium, referring to fig. 9, a schematic diagram of the composition structure of an electronic device 90 according to an embodiment of the present application is shown. As shown in fig. 9, the electronic device 90 may include: a communication interface 901, a memory 902, and a processor 903; the various components are coupled together by a bus system 904. It is appreciated that the bus system 904 is used to facilitate connected communications between these components. The bus system 904 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration, the various buses are labeled as bus system 904 in fig. 9. The communication interface 901 is configured to receive and send signals in a process of receiving and sending information with other external network elements;
a memory 902 for storing a computer program capable of running on the processor 903;
the processor 903 is configured to execute, when executing the computer program:
Acquiring equipment information to be identified;
inputting the equipment information to be identified into a preset identification model, wherein the preset identification model comprises a single character matching sub-model and a character string matching sub-model;
identifying the equipment information to be identified by using the single character matching sub-model, determining a first identification result, identifying the equipment information to be identified by using the character string matching sub-model, and determining a second identification result;
and determining a device identification result according to the first identification result and the second identification result.
It is to be appreciated that the memory 902 in embodiments of the present application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DRRAM). The memory 902 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
And the processor 903 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry of hardware in the processor 903 or instructions in the form of software. The processor 903 described above may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 902, and the processor 903 reads information in the memory 902, and in combination with the hardware, performs the steps of the method described above.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (DSP devices, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, microcontrollers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Optionally, as another embodiment, the processor 903 is further configured to perform the device identification method of any of the previous embodiments when running the computer program.
Based on the above-mentioned composition and hardware structure of the device identification apparatus 80, referring to fig. 10, a schematic composition structure of another electronic device 90 according to an embodiment of the present application is shown. As shown in fig. 10, the electronic device 90 includes at least the device identification means 80 according to any of the foregoing embodiments.
For the electronic device 90, the device information to be identified is identified by combining the single character matching sub-model and the character string matching sub-model, and the identification result of the device information to be identified is determined jointly according to the identification results of the two sub-models, so that the accuracy of identifying the device information is effectively improved, and the identification efficiency is also improved.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application.
It should be noted that, in this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
The methods disclosed in the several method embodiments provided in the present application may be arbitrarily combined without collision to obtain a new method embodiment.
The features disclosed in the several product embodiments provided in the present application may be combined arbitrarily without conflict to obtain new product embodiments.
The features disclosed in the several method or apparatus embodiments provided in the present application may be arbitrarily combined without conflict to obtain new method embodiments or apparatus embodiments.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (16)

1. A method of device identification, the method comprising:
acquiring equipment information to be identified;
inputting the equipment information to be identified into a preset identification model, wherein the preset identification model comprises a single character matching sub-model and a character string matching sub-model;
Identifying the equipment information to be identified by using the single character matching sub-model, determining a first identification result, identifying the equipment information to be identified by using the character string matching sub-model, and determining a second identification result;
and determining a device identification result according to the first identification result and the second identification result.
2. The method of claim 1, wherein after the obtaining the device information to be identified, the method further comprises:
matching the equipment information to be identified with equipment information in an invalid equipment information database;
if the matching is successful, determining the equipment information to be identified as invalid equipment information;
and if the matching is unsuccessful, the step of inputting the equipment information to be identified into a preset identification model is executed.
3. The method according to claim 1, wherein the method further comprises:
acquiring a sample training set from a preset equipment model database; the sample training set comprises at least one piece of equipment information and equipment labels corresponding to the at least one piece of equipment information, wherein the equipment labels at least comprise equipment models, equipment brands and equipment types;
Training a single character matching sub-network by using the sample training set to obtain the single character matching sub-model, and training a character string matching sub-network by using the sample training set to obtain the character string matching sub-model;
and determining the preset recognition model according to the single character matching sub-model and the character string matching sub-model.
4. A method according to claim 3, characterized in that the method further comprises:
acquiring an original equipment information set, wherein the original equipment information set comprises at least one piece of equipment information;
performing data cleaning and data statistics on the equipment information in the original equipment information set to obtain an intermediate equipment information set and the occurrence number of each piece of equipment information in the intermediate equipment information set;
comparing the occurrence frequency of each piece of equipment information in the intermediate equipment information set with a preset frequency threshold value to obtain equipment information with the occurrence frequency larger than the preset frequency threshold value;
constructing a total database of the equipment information according to the equipment information of which the occurrence times are larger than the preset times threshold;
and grabbing the equipment information total database to obtain the preset equipment model database.
5. The method of claim 4, wherein the performing data cleaning and data statistics on the device information in the original device information set to obtain an intermediate device information set and a number of occurrences of each device information in the intermediate device information set includes:
rejecting invalid equipment information in the original equipment information set by using a preset expression to obtain at least one piece of valid equipment information;
counting the occurrence times of each piece of effective equipment information in the at least one piece of effective equipment information, and performing de-duplication processing on the at least one piece of effective equipment information to obtain at least one piece of target equipment information;
and constructing the intermediate equipment information set according to the at least one piece of target equipment information, and determining the occurrence frequency of each piece of equipment information in the intermediate equipment information set.
6. The method of claim 4, wherein the grasping the total device information database to obtain the preset device model database includes:
respectively acquiring a seed Uniform Resource Locator (URL) of each piece of equipment information in the equipment information total database by utilizing a web crawler mode, and acquiring webpage information corresponding to the seed URL of each piece of equipment information by utilizing the web crawler mode;
Determining a device tag corresponding to each piece of device information from the webpage information corresponding to the seed URL of each piece of device information;
and constructing the preset equipment model database according to each piece of equipment information and the equipment label corresponding to each piece of equipment information.
7. The method of claim 6, wherein the method further comprises:
if one piece of equipment information cannot successfully acquire the seed URL, storing the one piece of equipment information into an invalid equipment information database; and/or the number of the groups of groups,
if one piece of equipment information cannot successfully determine the equipment label, storing the one piece of equipment information into an invalid equipment information database.
8. A method according to claim 3, characterized in that the method further comprises:
creating an equipment tag database according to the equipment tags corresponding to the at least one piece of equipment information;
correspondingly, the training the single character matching sub-network by using the sample training set to obtain the single character matching sub-model comprises the following steps:
performing single character segmentation on the equipment information in the sample training set to determine a single character Fu Ciku;
performing single character level vector conversion on the single character word stock and the sample training set to obtain a single character training vector set;
Inputting the single character training vector set into the single character matching sub-network, and performing supervised iterative training by using the equipment tag database to obtain the character matching sub-model;
training the character string matching sub-network by using the sample training set to obtain the character string matching sub-model, wherein the training comprises the following steps:
performing character string segmentation on the equipment information in the sample training set, and determining a character string word stock;
performing string level vector conversion on the string word library and the sample training set to obtain a string training vector set;
and inputting the character string training vector set into the character string matching sub-network, and performing supervised iterative training by using the equipment tag database to obtain the character string matching sub-model.
9. The method of claim 1, wherein the identifying the device information to be identified using the single character matching sub-model, determining a first identification result, comprises:
performing feature extraction and classification on the equipment information to be identified by using the single character matching sub-model, and determining a first probability that the equipment information to be identified belongs to each equipment label;
And selecting a maximum value from the first probability, and determining the equipment label corresponding to the maximum value as the first identification result.
10. The method of claim 1, wherein the identifying the device information to be identified using the string matching sub-model, determining a second identification result, comprises:
performing feature extraction and classification on the equipment information to be identified by using the character string matching sub-model, and determining a second probability that the equipment information to be identified belongs to each equipment label;
and selecting a maximum value from the second probability, and determining the equipment label corresponding to the maximum value as the second identification result.
11. The method of claim 1, wherein the determining a device identification result based on the first identification result and the second identification result comprises:
if the first identification result is equal to the second identification result, determining the first identification result as the equipment identification result;
if the first recognition result is not equal to the second recognition result, determining a first classification probability corresponding to the first recognition result and a second classification probability corresponding to the second recognition result, and determining the equipment recognition result according to a comparison result of the first classification probability and a first judgment threshold value and a comparison result of the second classification probability and a second judgment threshold value.
12. The method of claim 11, wherein the determining the device identification result based on the comparison of the first classification probability with a first decision threshold and the comparison of the second classification probability with a second decision threshold comprises:
if the first classification probability is greater than or equal to the first judgment threshold value and the second classification probability is smaller than the second judgment threshold value, determining that the equipment identification result is the first identification result;
if the first classification probability is smaller than the first judgment threshold value and the second classification probability is larger than or equal to the second judgment threshold value, determining that the equipment identification result is the second identification result;
and if the first classification probability is greater than or equal to the first judgment threshold value and the second classification probability is greater than or equal to the second judgment threshold value, or the first classification probability is smaller than the first judgment threshold value and the second classification probability is smaller than the second judgment threshold value, determining that the equipment identification result is unidentified.
13. The method according to any one of claims 1 to 12, wherein after the determining the device identification result, the method further comprises:
If the equipment identification result is not identified, adding the equipment information to be identified into a preset equipment model database to obtain a new preset equipment model database;
and updating and training the preset recognition model by using the new preset equipment model database.
14. A device identification apparatus, characterized in that the device identification apparatus comprises an acquisition unit, an identification unit and a determination unit, wherein,
the acquisition unit is configured to acquire equipment information to be identified;
the identification unit is configured to input the equipment information to be identified into a preset identification model, wherein the preset identification model comprises a single character matching sub-model and a character string matching sub-model; identifying the equipment information to be identified by using the single character matching sub-model, determining a first identification result, identifying the equipment information to be identified by using the character string matching sub-model, and determining a second identification result;
the determining unit is configured to determine a device identification result according to the first identification result and the second identification result.
15. An electronic device comprising a memory and a processor, wherein,
The memory is used for storing a computer program capable of running on the processor;
the processor being configured to perform the device identification method of any one of claims 1 to 13 when the computer program is run.
16. A computer storage medium storing a computer program which, when executed by at least one processor, implements the device identification method of any one of claims 1 to 13.
CN202111253826.5A 2021-10-27 2021-10-27 Equipment identification method and device, electronic equipment and computer storage medium Pending CN116032741A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111253826.5A CN116032741A (en) 2021-10-27 2021-10-27 Equipment identification method and device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111253826.5A CN116032741A (en) 2021-10-27 2021-10-27 Equipment identification method and device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN116032741A true CN116032741A (en) 2023-04-28

Family

ID=86076651

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111253826.5A Pending CN116032741A (en) 2021-10-27 2021-10-27 Equipment identification method and device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN116032741A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116934195A (en) * 2023-09-14 2023-10-24 海信集团控股股份有限公司 Commodity information checking method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116934195A (en) * 2023-09-14 2023-10-24 海信集团控股股份有限公司 Commodity information checking method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112199375B (en) Cross-modal data processing method and device, storage medium and electronic device
CN107943792B (en) Statement analysis method and device, terminal device and storage medium
US20140101124A1 (en) System and method for recursively traversing the internet and other sources to identify, gather, curate, adjudicate, and qualify business identity and related data
CN111797239B (en) Application program classification method and device and terminal equipment
CN112307762B (en) Search result sorting method and device, storage medium and electronic device
US20180004815A1 (en) Stop word identification method and apparatus
CN110458324B (en) Method and device for calculating risk probability and computer equipment
CN110909160A (en) Regular expression generation method, server and computer readable storage medium
CN111563382A (en) Text information acquisition method and device, storage medium and computer equipment
KR101472451B1 (en) System and Method for Managing Digital Contents
CN110659175A (en) Log trunk extraction method, log trunk classification method, log trunk extraction equipment and log trunk storage medium
CN114416998A (en) Text label identification method and device, electronic equipment and storage medium
CN113962199B (en) Text recognition method, text recognition device, text recognition equipment, storage medium and program product
CN111368867B (en) File classifying method and system and computer readable storage medium
CN116032741A (en) Equipment identification method and device, electronic equipment and computer storage medium
CN112395881B (en) Material label construction method and device, readable storage medium and electronic equipment
CN116756327B (en) Threat information relation extraction method and device based on knowledge inference and electronic equipment
CN112199388A (en) Strange call identification method and device, electronic equipment and storage medium
Dehdar et al. Image steganalysis using modified graph clustering based ant colony optimization and Random Forest
CN115953123A (en) Method, device and equipment for generating robot automation flow and storage medium
CN111310176B (en) Intrusion detection method and device based on feature selection
CN114398887A (en) Text classification method and device and electronic equipment
CN113657443A (en) Online Internet of things equipment identification method based on SOINN network
CN111460088A (en) Similar text retrieval method, device and system
CN104484414A (en) Processing method and device of favourite information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination