CN112800043A - Internet of things terminal information extraction method, device, equipment and storage medium - Google Patents

Internet of things terminal information extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN112800043A
CN112800043A CN202110162802.2A CN202110162802A CN112800043A CN 112800043 A CN112800043 A CN 112800043A CN 202110162802 A CN202110162802 A CN 202110162802A CN 112800043 A CN112800043 A CN 112800043A
Authority
CN
China
Prior art keywords
information
internet
things
user agent
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110162802.2A
Other languages
Chinese (zh)
Inventor
彭世裕
赵一衡
凌宏喜
曹雄
杨学刚
邱卉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kaitong Technology Co ltd
Original Assignee
Kaitong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kaitong Technology Co ltd filed Critical Kaitong Technology Co ltd
Priority to CN202110162802.2A priority Critical patent/CN112800043A/en
Publication of CN112800043A publication Critical patent/CN112800043A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y10/00Economic sectors
    • G16Y10/75Information technology; Communication
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y40/00IoT characterised by the purpose of the information processing
    • G16Y40/20Analytics; Diagnosis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for extracting terminal information of the Internet of things, which are applied to a server, wherein the server is in communication connection with a client, and the method comprises the following steps: acquiring user agent information corresponding to a client; inputting user agent information into a preset target information extraction model; wherein the target information extraction model comprises an encoding part and a decoding part; extracting output information and hidden layer information in the user agent information through the encoding part; when a start mark sent by a client is received, generating a plurality of pieces of information to be spliced through a decoding part according to the start mark, output information and hidden layer information; and splicing a plurality of pieces of information to be spliced, generating terminal information of the Internet of things and returning the terminal information of the Internet of things to the client. Therefore, the Internet of things terminal can be matched and identified more accurately, and further analysis of the Internet of things information is facilitated.

Description

Internet of things terminal information extraction method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of terminal information extraction, in particular to a method, a device, equipment and a storage medium for extracting terminal information of an internet of things.
Background
At present, most of modes for extracting terminal information in the internet of things are regular matching methods. The regular matching method is used for matching the terminal information by researching the position condition of the terminal information in a User Agent (UA) in advance and writing a regular expression according to the position condition, a general rule or an industrial protocol. The regular expression method is more suitable for the Web field with more definite and standard terminal information, and terminal analysis is carried out aiming at the UA of the request.
However, for the existence of the UA of the internet of things, the UA does not have an integral unified specification, and each manufacturer has a self-built specification, but under the condition that the specification difference among manufacturers is large, or a new manufacturer or a new terminal appears, a part of regular expressions are likely to need to be newly added or rewritten, and the maintenance is very difficult. Therefore, terminal information is often not matched during ETL (Extract-Transform-Load, data warehouse technology), and the terminal cannot be accurately identified, so that further analysis of the information of the Internet of things is hindered.
Disclosure of Invention
The invention provides an Internet of things terminal information extraction method, an Internet of things terminal information extraction device, Internet of things terminal information extraction equipment and a storage medium, and solves the technical problems that terminal information cannot be accurately matched, a terminal is difficult to accurately identify and further analysis of the Internet of things information is not facilitated due to the fact that a regular matching method is adopted for UA of the Internet of things in the prior art.
The invention provides an Internet of things terminal information extraction method, which is applied to a server, wherein the server is in communication connection with a client, and the method comprises the following steps:
acquiring user agent information corresponding to the client;
inputting the user agent information into a preset target information extraction model; wherein the target information extraction model includes an encoding portion and a decoding portion;
extracting output information and hidden layer information in the user agent information through the encoding part;
when a start mark sent by the client is received, generating a plurality of pieces of information to be spliced according to the start mark, the output information and the hidden layer information through the decoding part;
and splicing a plurality of information to be spliced, generating terminal information of the Internet of things and returning the terminal information to the client.
Optionally, the encoding portion includes a data preprocessing component and a first recurrent neural network, and the step of extracting the output information and the hidden layer information in the user agent information by the encoding portion includes:
performing data preprocessing operation on the user agent information through the data preprocessing component to obtain a vector to be extracted;
and inputting the vector to be extracted into the first recurrent neural network to obtain output information and hidden layer information.
Optionally, the step of performing data preprocessing operation on the user agent information through the data preprocessing component to obtain a vector to be extracted includes:
performing data cleaning on the user agent information through the data cleaning layer to generate cleaning data;
and converting the cleaning data into a vector to be extracted through the word embedding layer.
Optionally, the decoding portion includes an attention component and a second recurrent neural network, and the step of generating, by the decoding portion, a plurality of pieces of information to be spliced according to the start flag, the output information, and the hidden layer information when receiving the start flag sent by the client includes:
when a start mark sent by the client is received, splicing the start mark and the hidden layer information through the attention component, and performing linear transformation to generate transformation information;
determining attention information corresponding to the user agent information according to an inner product result of the transformation information and the output information through the attention component;
inputting the attention information into the second recurrent neural network to obtain intermediate information and recurrent hidden layer information;
updating the output information and the start mark into the intermediate information, updating the hidden layer information into the cyclic hidden layer information, skipping to execute the step of splicing the start mark and the hidden layer information through the attention component and performing linear transformation to generate transformation information until a preset amount of intermediate information is obtained;
and extracting all the intermediate information to obtain a plurality of pieces of information to be spliced.
Optionally, the method further comprises:
acquiring training data;
and training a preset initial information extraction model by using the training data to obtain a target information extraction model.
Optionally, the step of training a preset initial information extraction model by using the training data to obtain a target information extraction model includes:
inputting the training data into the preset initial information extraction model to obtain extraction information and classification probability thereof;
constructing a loss function and calculating a loss value according to the extraction information and the classification probability;
and if the loss value is greater than a preset threshold value, generating the target information extraction model.
Optionally, the method further comprises:
if the loss function is smaller than the preset threshold value, adjusting the model parameters of the initial information extraction model by adopting a gradient descent method;
and skipping to execute the step of inputting the training data into the preset initial information extraction model to obtain extraction information and classification probability thereof until the loss function is greater than the preset threshold value to obtain the target information extraction model.
The invention provides a second aspect of an internet of things terminal information extraction device, which is applied to a server, wherein the server is in communication connection with a client, and the device comprises:
the user agent information acquisition module is used for acquiring user agent information corresponding to the client;
the information input module is used for inputting the user agent information into a preset target information extraction model; wherein the target information extraction model includes an encoding portion and a decoding portion;
the information extraction module is used for extracting output information and hidden layer information in the user agent information through the coding part;
the information decoding module is used for generating a plurality of pieces of information to be spliced according to the starting mark, the output information and the hidden layer information through the decoding part when the starting mark sent by the client is received;
and the information splicing module is used for splicing a plurality of information to be spliced, generating terminal information of the Internet of things and returning the terminal information of the Internet of things to the client.
The third aspect of the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the method for extracting terminal information of an internet of things according to any one of the first aspect of the present invention.
The fourth aspect of the present invention also provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by the processor, implements the method for extracting terminal information of the internet of things according to any one of the first aspect of the present invention.
According to the technical scheme, the invention has the following advantages:
the method comprises the steps that user agent information corresponding to a client is obtained through a server, the user agent information is input into a preset target information extraction model, and the user agent information is extracted through a coding part in the user agent information extraction model to obtain output information and hidden layer information; after receiving a start mark sent by a client, carrying out information prediction by adopting output information, hidden layer information and the start mark through a decoding part to generate a plurality of pieces of information to be spliced; and splicing a plurality of pieces of information to be spliced, generating terminal information of the Internet of things and returning the terminal information of the Internet of things to the client. Therefore, the technical problems that terminal information cannot be accurately matched, the terminal is difficult to accurately identify and further analysis of the information of the Internet of things is not facilitated due to the fact that the terminal information cannot be accurately matched and the terminal of the Internet of things is accurately matched and identified in the prior art due to the fact that the UA of the Internet of things adopts a regular matching method are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flowchart illustrating steps of a method for extracting information from a terminal of an internet of things according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating steps of a method for extracting information from a terminal of the internet of things according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of a training process of a target information extraction model according to a second embodiment of the present invention;
fig. 4 is a schematic diagram of a decoding process according to a second embodiment of the present invention;
fig. 5 is a block diagram of a structure of an internet of things terminal information extraction device according to a third embodiment of the present invention.
Detailed Description
The embodiment of the invention provides an Internet of things terminal information extraction method, an Internet of things terminal information extraction device, Internet of things terminal information extraction equipment and a storage medium, and aims to solve the technical problems that terminal information cannot be accurately matched, a terminal is difficult to accurately identify and further analysis of the Internet of things information is not facilitated due to the fact that a regular matching method is adopted for UA of the Internet of things in the prior art.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a method for extracting information from a terminal of an internet of things according to an embodiment of the present invention.
The invention provides an Internet of things terminal information extraction method, which is applied to a server, wherein the server is in communication connection with a client, and the method comprises the following steps:
step 101, obtaining user agent information corresponding to the client;
the User Agent information refers to a User Agent, UA for short, and is a special string header, so that the server can identify an operating system and version, a CPU type, a browser and version, a browser rendering engine, a browser language, a browser plug-in, and the like used by the client.
In the embodiment of the invention, when a user needs to extract the terminal information of the internet of things of the client, the server can acquire the corresponding user agent information from the client.
Step 102, inputting the user agent information into a preset target information extraction model; wherein the target information extraction model includes an encoding portion and a decoding portion;
after the user agent information is acquired, the user agent information is input into a preset target information extraction model to extract the terminal information of the internet of things in the user agent information, wherein the target information extraction model comprises an encoding part and a decoding part.
103, extracting output information and hidden layer information in the user agent information through the coding part;
and extracting information of the user agent information through a coding part in the target information extraction model to obtain output information and hidden layer information in the user agent information.
104, when a start mark sent by the client is received, generating a plurality of pieces of information to be spliced according to the start mark, the output information and the hidden layer information through the decoding part;
after the output information and the hidden layer information of the coding part are obtained, if a start mark input by a user from a client is received, the output information and the hidden layer information can be decoded, and a plurality of pieces of information to be spliced are generated by further decoding and predicting according to the start mark, the output information and the hidden layer information by the decoding part.
And 105, splicing a plurality of pieces of information to be spliced, generating terminal information of the Internet of things and returning the terminal information of the Internet of things to the client.
In a specific implementation, after the information to be spliced is obtained, the information to be spliced can be spliced according to an output sequence to obtain the terminal information of the internet of things, and the terminal information of the internet of things is returned to the client so as to further analyze the client subsequently.
In the embodiment of the invention, the user agent information corresponding to the client is acquired through the server, and is input into a preset target information extraction model, so that the user agent information is extracted through a coding part in the preset target information extraction model to obtain output information and hidden layer information; after receiving a start mark sent by a client, carrying out information prediction by adopting output information, hidden layer information and the start mark through a decoding part to generate a plurality of pieces of information to be spliced; and splicing a plurality of pieces of information to be spliced, generating terminal information of the Internet of things and returning the terminal information of the Internet of things to the client. Therefore, the technical problems that terminal information cannot be accurately matched, the terminal is difficult to accurately identify and further analysis of the information of the Internet of things is not facilitated due to the fact that the terminal information cannot be accurately matched and the terminal of the Internet of things is accurately matched and identified in the prior art due to the fact that the UA of the Internet of things adopts a regular matching method are solved.
Referring to fig. 2, fig. 2 is a flowchart illustrating steps of a method for extracting information from a terminal of an internet of things according to a second embodiment of the present invention.
The invention provides an Internet of things terminal information extraction method, which is applied to a server, wherein the server is in communication connection with a client, and the method comprises the following steps:
step 201, acquiring training data;
training data refers to user agent information which is converted into vectors through a word embedding layer after data washing, wherein the training data carries out different first labels [ sep ] and second labels [ cls ] at the beginning and the end of a terminal name, and data without the terminal name is labeled as a third label, such as [ None ]; for some training data, there are multiple terminals of different manufacturers or different models, and it is considered that these may have a problem of the falsification of the training data, which is labeled as [ may protocol ].
And 202, training a preset initial information extraction model by using the training data to obtain a target information extraction model.
Optionally, step 202 may include the following sub-steps:
inputting the training data into the preset initial information extraction model to obtain extraction information and classification probability thereof;
constructing a loss function and calculating a loss value according to the extraction information and the classification probability;
and if the loss value is greater than a preset threshold value, generating the target information extraction model.
In the embodiment of the invention, training data is input into a preset initial information extraction model, codes of an ASCII code table are used as a code table of the model, the training data is converted into vectors through a word embedding layer, and the vectors are input into GRUs (neural network units) with the recurrent nerves being finished for information extraction, so that output information and hidden layer information of the GRUs are obtained; the extraction information and the hidden layer information are input to a decoder for a decoding process.
In the actual training process, the model cannot predict what the next word is after the starting mark is given when the model is just trained, and if the result of directly adopting the model prediction is transmitted into the model to predict the next word, the deviation from the actual data is large, so that the training of the model is not favorable. So the Teacher Force approach needs to be adopted in the training of the first n epochs. Namely, the real result is forced to be input into the model regardless of the prediction result of the model, the training of the model is guided, and the Teacher Force is ended under the condition that the training set is 95% accurate. And finally, adopting NllLoss (Negative Log Likelihood Loss) as a Loss function to guide the gradient descent process of the model. The NllLoss formula is as follows:
loss(p,x)=-∑*x*log(p)
where x is the class of classification and p is the probability of classifying as x.
It is worth mentioning that an end flag may also be entered to exit the decoding loop of the model.
Further, step 202 may also include the following sub-steps:
if the loss function is smaller than the preset threshold value, adjusting the model parameters of the initial information extraction model by adopting a gradient descent method;
and skipping to execute the step of inputting the training data into the preset initial information extraction model to obtain extraction information and classification probability thereof until the loss function is greater than the preset threshold value to obtain the target information extraction model.
Optionally, after the target information extraction model is obtained, additional verification data may be input, and after the information extraction of the verification data is performed through the target information extraction model, if the obtained terminal information of the internet of things is completely the same as the real terminal information corresponding to the verification data, it is indicated that the target information extraction model is trained; if the difference exists, continuing to train the model.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a training process of a target information extraction model according to a second embodiment of the present invention.
In the embodiment of the invention, an initial information extraction model is an Encoder-Decoder encoding-decoding structure, user agent information Input is received in an encoding part Encoder, the user agent information is converted into a vector through word Embedding layer Embedding, the vector is Input into a recurrent neural network GRU, output information Out and Hidden layer information Hidden are obtained and Input into a decoding part Decoder; when the Attention component Attention receives the Start flag Start pocket, the Output information Out and the Hidden layer information Hidden, the Attention information is obtained and input to the recurrent neural network GRU, the Hidden layer information Hidden and the intermediate information top are Output and input to the Attention component Attention again, a plurality of intermediate information top are obtained after a predetermined number of times as Output, and the internet of things terminal information Target is obtained after splicing.
Optionally, the Teacher Force is used in the training of the first n epochs. Namely, the real result Target is forcibly input into the recurrent neural network GRU regardless of the prediction result of the model, and the training of the model is guided to obtain the Target information extraction model.
Step 203, obtaining user agent information corresponding to the client;
step 204, inputting the user agent information into a preset target information extraction model; wherein the target information extraction model includes an encoding portion and a decoding portion;
in the embodiment of the present invention, the specific implementation process of steps 203-204 is similar to that of steps 101-102, and is not described herein again.
Step 205, extracting output information and hidden layer information in the user agent information through the encoding part;
in one example of the present invention, the encoding portion includes a data preprocessing component and a first recurrent neural network, and step 205 may include the following sub-steps S11-S12:
s11, performing data preprocessing operation on the user agent information through the data preprocessing component to obtain a vector to be extracted;
further, step S11 may include the following sub-steps:
performing data cleaning on the user agent information through the data cleaning layer to generate cleaning data;
and converting the cleaning data into a vector to be extracted through the word embedding layer.
Data cleansing (Data cleansing) refers to a process of reviewing and verifying Data for the purpose of deleting duplicate information, correcting existing errors, and providing Data consistency, and may specifically include processes of checking Data consistency, processing invalid values and missing values, and the like.
In the embodiment of the invention, because the user agent information does not contain characters except for ASCII codes, the user agent information can be subjected to data cleaning to obtain cleaning data, and then the cleaning data is converted by adopting the word embedding layer to obtain the vector to be extracted.
And S12, inputting the vector to be extracted into the first recurrent neural network to obtain output information and hidden layer information.
Step 206, when a start mark sent by the client is received, generating a plurality of pieces of information to be spliced according to the start mark, the output information and the hidden layer information by the decoding part;
in another example of the present invention, the decoding portion includes an attention component and a second recurrent neural network, and step 206 may include the sub-steps of:
when a start mark sent by the client is received, splicing the start mark and the hidden layer information through the attention component, and performing linear transformation to generate transformation information;
determining attention information corresponding to the user agent information according to an inner product result of the transformation information and the output information through the attention component;
inputting the attention information into the second recurrent neural network to obtain intermediate information and recurrent hidden layer information;
updating the output information and the start mark into the intermediate information, updating the hidden layer information into the cyclic hidden layer information, skipping to execute the step of splicing the start mark and the hidden layer information through the attention component and performing linear transformation to generate transformation information until a preset amount of intermediate information is obtained;
and extracting all the intermediate information to obtain a plurality of pieces of information to be spliced.
In the embodiment of the invention, when a start mark sent by a client is received, the start mark and hidden layer information are spliced through an attention component, and the spliced information is subjected to linear transformation to generate transformation information; performing inner product on the transformation information and the output information to obtain an inner product result, and determining attention information corresponding to the user agent information; inputting the attention information into a second recurrent neural network to obtain intermediate information and recurrent hidden layer information; updating the output information and the start mark into the intermediate information, updating the hidden layer information into the cyclic hidden layer information, obtaining a plurality of intermediate information after the above steps are cycled, and determining the plurality of intermediate information as a plurality of information to be spliced.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating a decoding process according to an embodiment of the invention.
After the decoder receives the output information and the hidden layer information, splicing the start mark and the hidden layer information, then carrying out linear transformation, carrying out inner product on the transformed transformation information and the output information to obtain attention information in the training data, inputting the attention information into a GRU (general purpose unit) for decoding to obtain a first character H and first hidden layer information corresponding to the first character; inputting the first character and the first hidden layer information into a GRU model to perform the same operation, cycling for a preset number of times to obtain a plurality of characters H, u, a, W, e and i, and splicing the plurality of characters to obtain extracted information HuaWei.
And step 207, splicing a plurality of pieces of information to be spliced to generate Internet of things terminal information and returning the Internet of things terminal information to the client.
After the plurality of pieces of information to be spliced are obtained, splicing each piece of information to be spliced according to the output sequence of the pieces of information to be spliced to obtain the terminal information of the Internet of things, and returning the terminal information of the Internet of things to the client for displaying so as to continuously process the terminal information of the Internet of things subsequently.
In the embodiment of the invention, the user agent information corresponding to the client is acquired through the server, and is input into a preset target information extraction model, so that the user agent information is extracted through a coding part in the preset target information extraction model to obtain output information and hidden layer information; after receiving a start mark sent by a client, carrying out information prediction by adopting output information, hidden layer information and the start mark through a decoding part to generate a plurality of pieces of information to be spliced; and splicing a plurality of pieces of information to be spliced, generating terminal information of the Internet of things and returning the terminal information of the Internet of things to the client. Therefore, the technical problems that terminal information cannot be accurately matched, the terminal is difficult to accurately identify and further analysis of the information of the Internet of things is not facilitated due to the fact that the terminal information cannot be accurately matched and the terminal of the Internet of things is accurately matched and identified in the prior art due to the fact that the UA of the Internet of things adopts a regular matching method are solved.
Referring to fig. 5, fig. 5 is a block diagram of a terminal information extraction device of the internet of things according to a third embodiment of the present invention.
The invention provides an Internet of things terminal information extraction device, which is applied to a server, wherein the server is in communication connection with a client, and the device comprises:
a user agent information obtaining module 501, configured to obtain user agent information corresponding to the client;
an information input module 502, configured to input the user agent information into a preset target information extraction model; wherein the target information extraction model includes an encoding portion and a decoding portion;
an information extraction module 503, configured to extract output information and hidden layer information in the user agent information through the encoding portion;
an information decoding module 504, configured to generate, by the decoding portion, a plurality of pieces of information to be spliced according to the start flag, the output information, and the hidden layer information when receiving the start flag sent by the client;
and the information splicing module 505 is configured to splice a plurality of pieces of information to be spliced, generate terminal information of the internet of things, and return the terminal information of the internet of things to the client.
Optionally, the encoding portion includes a data preprocessing component and a first recurrent neural network, and the information extraction module 503 includes:
the vector to be extracted conversion submodule is used for carrying out data preprocessing operation on the user agent information through the data preprocessing component to obtain a vector to be extracted;
and the vector input submodule is used for inputting the vector to be extracted into the first recurrent neural network to obtain output information and hidden layer information.
Optionally, the vector to be extracted conversion sub-module includes:
a data cleansing unit for performing data cleansing on the user agent information through the data cleansing layer to generate cleansing data;
and the data conversion unit is used for converting the cleaning data into a vector to be extracted through the word embedding layer.
Optionally, the decoding portion includes an attention component and a second recurrent neural network, and the information decoding module 504 includes:
the conversion information generation submodule is used for splicing the start mark and the hidden layer information through the attention component and carrying out linear conversion to generate conversion information when the start mark sent by the client is received;
an attention information determining submodule, configured to determine, by the attention component, attention information corresponding to the user agent information according to an inner product result of the transformation information and the output information;
the attention information input submodule is used for inputting the attention information into the second recurrent neural network to obtain intermediate information and recurrent hidden layer information;
an information updating submodule, configured to update the output information and the start flag to the intermediate information, update the hidden layer information to the cyclic hidden layer information, skip to execute the step of splicing the start flag and the hidden layer information by the attention component and perform linear transformation to generate transformation information until a predetermined amount of intermediate information is obtained;
and the intermediate information extraction submodule is used for extracting all the intermediate information to obtain a plurality of pieces of information to be spliced.
Optionally, the method further comprises:
the training data acquisition module is used for acquiring training data;
and the training module is used for training a preset initial information extraction model by adopting the training data to obtain a target information extraction model.
Optionally, the training module comprises:
the extraction information generation submodule is used for inputting the training data into the preset initial information extraction model to obtain extraction information and classification probability thereof;
the loss function construction submodule is used for constructing a loss function and calculating a loss value according to the extraction information and the classification probability;
and the first model judgment submodule is used for generating the target information extraction model if the loss value is greater than a preset threshold value.
Optionally, the training module further comprises:
the parameter adjusting submodule is used for adjusting the model parameters of the initial information extraction model by adopting a gradient descent method if the loss function is smaller than the preset threshold value;
and the second model judgment submodule is used for skipping the step of inputting the training data into the preset initial information extraction model to obtain the extraction information and the classification probability thereof until the loss function is greater than the preset threshold value to obtain the target information extraction model.
The embodiment of the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the method for extracting information from a terminal of an internet of things according to any embodiment of the present invention.
The embodiment of the invention also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by the processor, the method for extracting the terminal information of the internet of things according to any embodiment of the invention is implemented.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. The method for extracting the terminal information of the Internet of things is applied to a server, the server is in communication connection with a client, and the method comprises the following steps:
acquiring user agent information corresponding to the client;
inputting the user agent information into a preset target information extraction model; wherein the target information extraction model includes an encoding portion and a decoding portion;
extracting output information and hidden layer information in the user agent information through the encoding part;
when a start mark sent by the client is received, generating a plurality of pieces of information to be spliced according to the start mark, the output information and the hidden layer information through the decoding part;
and splicing a plurality of information to be spliced, generating terminal information of the Internet of things and returning the terminal information to the client.
2. The method for extracting terminal information of the internet of things as claimed in claim 1, wherein the encoding part comprises a data preprocessing component and a first recurrent neural network, and the step of extracting the output information and the hidden layer information in the user agent information through the encoding part comprises:
performing data preprocessing operation on the user agent information through the data preprocessing component to obtain a vector to be extracted;
and inputting the vector to be extracted into the first recurrent neural network to obtain output information and hidden layer information.
3. The method for extracting terminal information of the internet of things according to claim 2, wherein the data preprocessing component comprises a data cleaning layer and a word embedding layer, and the step of performing data preprocessing operation on the user agent information through the data preprocessing component to obtain the vector to be extracted comprises the steps of:
performing data cleaning on the user agent information through the data cleaning layer to generate cleaning data;
and converting the cleaning data into a vector to be extracted through the word embedding layer.
4. The method for extracting terminal information of the internet of things according to claim 1, wherein the decoding part comprises an attention component and a second recurrent neural network, and the step of generating a plurality of pieces of information to be spliced by the decoding part according to the start mark, the output information and the hidden layer information when the start mark sent by the client is received comprises:
when a start mark sent by the client is received, splicing the start mark and the hidden layer information through the attention component, and performing linear transformation to generate transformation information;
determining attention information corresponding to the user agent information according to an inner product result of the transformation information and the output information through the attention component;
inputting the attention information into the second recurrent neural network to obtain intermediate information and recurrent hidden layer information;
updating the output information and the start mark into the intermediate information, updating the hidden layer information into the cyclic hidden layer information, skipping to execute the step of splicing the start mark and the hidden layer information through the attention component and performing linear transformation to generate transformation information until a preset amount of intermediate information is obtained;
and extracting all the intermediate information to obtain a plurality of pieces of information to be spliced.
5. The method for extracting the terminal information of the internet of things according to claim 1, further comprising:
acquiring training data;
and training a preset initial information extraction model by using the training data to obtain a target information extraction model.
6. The internet of things terminal information extraction method according to claim 5, wherein the step of training a preset initial information extraction model by using the training data to obtain a target information extraction model comprises:
inputting the training data into the preset initial information extraction model to obtain extraction information and classification probability thereof;
constructing a loss function and calculating a loss value according to the extraction information and the classification probability;
and if the loss value is greater than a preset threshold value, generating the target information extraction model.
7. The internet of things terminal information extraction method of claim 6, further comprising:
if the loss function is smaller than the preset threshold value, adjusting the model parameters of the initial information extraction model by adopting a gradient descent method;
and skipping to execute the step of inputting the training data into the preset initial information extraction model to obtain extraction information and classification probability thereof until the loss function is greater than the preset threshold value to obtain the target information extraction model.
8. The utility model provides a thing networking terminal information extraction element which characterized in that is applied to the server, server and client communication connection, the device includes:
the user agent information acquisition module is used for acquiring user agent information corresponding to the client;
the information input module is used for inputting the user agent information into a preset target information extraction model; wherein the target information extraction model includes an encoding portion and a decoding portion;
the information extraction module is used for extracting output information and hidden layer information in the user agent information through the coding part;
the information decoding module is used for generating a plurality of pieces of information to be spliced according to the starting mark, the output information and the hidden layer information through the decoding part when the starting mark sent by the client is received;
and the information splicing module is used for splicing a plurality of information to be spliced, generating terminal information of the Internet of things and returning the terminal information of the Internet of things to the client.
9. An electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the steps of the method for extracting terminal information of internet of things according to any one of claims 1-7.
10. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by the processor, implementing the internet of things terminal information extraction method according to any one of claims 1 to 7.
CN202110162802.2A 2021-02-05 2021-02-05 Internet of things terminal information extraction method, device, equipment and storage medium Pending CN112800043A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110162802.2A CN112800043A (en) 2021-02-05 2021-02-05 Internet of things terminal information extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110162802.2A CN112800043A (en) 2021-02-05 2021-02-05 Internet of things terminal information extraction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112800043A true CN112800043A (en) 2021-05-14

Family

ID=75814423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110162802.2A Pending CN112800043A (en) 2021-02-05 2021-02-05 Internet of things terminal information extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112800043A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256436A (en) * 2021-07-02 2021-08-13 平安科技(深圳)有限公司 Vehicle insurance claim payment pre-prompting method, device, equipment and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051672A (en) * 2012-11-21 2013-04-17 中兴通讯股份有限公司 Terminal information obtaining method and device in heterogeneous terminal environment
CN103678393A (en) * 2012-09-20 2014-03-26 腾讯科技(深圳)有限公司 Information obtaining method and device
CN103873477A (en) * 2014-03-27 2014-06-18 江苏物联网研究发展中心 Access authentication method based on two-dimension code and asymmetric encryption in agricultural material Internet of Things
CN104079608A (en) * 2013-03-29 2014-10-01 株式会社日立制作所 Proxy module equipment for Internet of things and method thereof
CN104144180A (en) * 2013-05-07 2014-11-12 中兴通讯股份有限公司 Internet-of-things management method, internet-of-things client side and internet-of-things platform
CN106230917A (en) * 2016-07-26 2016-12-14 广东凯通科技股份有限公司 A kind of batch data communication means, device and system
CN106650256A (en) * 2016-12-20 2017-05-10 安徽安龙基因医学检验所有限公司 Precise medical platform for molecular diagnosis and treatment
KR20180003665A (en) * 2016-06-30 2018-01-10 전자부품연구원 Method for web service by apparatus for managing factories in internet of things
WO2018184418A1 (en) * 2017-04-06 2018-10-11 平安科技(深圳)有限公司 Data cleaning method, terminal and computer readable storage medium
CN110309407A (en) * 2018-03-13 2019-10-08 优酷网络技术(北京)有限公司 Viewpoint extracting method and device
WO2020107878A1 (en) * 2018-11-30 2020-06-04 平安科技(深圳)有限公司 Method and apparatus for generating text summary, computer device and storage medium
CN111797076A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Data cleaning method and device, storage medium and electronic equipment
CN111835836A (en) * 2020-06-24 2020-10-27 清科优能(深圳)技术有限公司 Intelligent Internet of things terminal data processing device and method
CN112217831A (en) * 2017-09-18 2021-01-12 创新先进技术有限公司 Information interaction method, device and equipment about Internet of things equipment
CN112270172A (en) * 2020-10-21 2021-01-26 北京钛氪新媒体科技有限公司 Automatic network data cleaning method and system based on webpage label distribution characteristics

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678393A (en) * 2012-09-20 2014-03-26 腾讯科技(深圳)有限公司 Information obtaining method and device
CN103051672A (en) * 2012-11-21 2013-04-17 中兴通讯股份有限公司 Terminal information obtaining method and device in heterogeneous terminal environment
CN104079608A (en) * 2013-03-29 2014-10-01 株式会社日立制作所 Proxy module equipment for Internet of things and method thereof
CN104144180A (en) * 2013-05-07 2014-11-12 中兴通讯股份有限公司 Internet-of-things management method, internet-of-things client side and internet-of-things platform
CN103873477A (en) * 2014-03-27 2014-06-18 江苏物联网研究发展中心 Access authentication method based on two-dimension code and asymmetric encryption in agricultural material Internet of Things
KR20180003665A (en) * 2016-06-30 2018-01-10 전자부품연구원 Method for web service by apparatus for managing factories in internet of things
CN106230917A (en) * 2016-07-26 2016-12-14 广东凯通科技股份有限公司 A kind of batch data communication means, device and system
CN106650256A (en) * 2016-12-20 2017-05-10 安徽安龙基因医学检验所有限公司 Precise medical platform for molecular diagnosis and treatment
WO2018184418A1 (en) * 2017-04-06 2018-10-11 平安科技(深圳)有限公司 Data cleaning method, terminal and computer readable storage medium
CN112217831A (en) * 2017-09-18 2021-01-12 创新先进技术有限公司 Information interaction method, device and equipment about Internet of things equipment
CN110309407A (en) * 2018-03-13 2019-10-08 优酷网络技术(北京)有限公司 Viewpoint extracting method and device
WO2020107878A1 (en) * 2018-11-30 2020-06-04 平安科技(深圳)有限公司 Method and apparatus for generating text summary, computer device and storage medium
CN111797076A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Data cleaning method and device, storage medium and electronic equipment
CN111835836A (en) * 2020-06-24 2020-10-27 清科优能(深圳)技术有限公司 Intelligent Internet of things terminal data processing device and method
CN112270172A (en) * 2020-10-21 2021-01-26 北京钛氪新媒体科技有限公司 Automatic network data cleaning method and system based on webpage label distribution characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
常新旭;张杨;杨林;寇金桥;王昕;徐冬冬;: "融合多头自注意力机制的语音增强方法", 西安电子科技大学学报, no. 01, 15 November 2019 (2019-11-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256436A (en) * 2021-07-02 2021-08-13 平安科技(深圳)有限公司 Vehicle insurance claim payment pre-prompting method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105224623B (en) The training method and device of data model
CN107220386A (en) Information-pushing method and device
CN105095444A (en) Information acquisition method and device
CN111460807A (en) Sequence labeling method and device, computer equipment and storage medium
CN110705301A (en) Entity relationship extraction method and device, storage medium and electronic equipment
CN110717325A (en) Text emotion analysis method and device, electronic equipment and storage medium
CN111694937A (en) Interviewing method and device based on artificial intelligence, computer equipment and storage medium
CN107862058B (en) Method and apparatus for generating information
CN112906361A (en) Text data labeling method and device, electronic equipment and storage medium
CN114021646A (en) Image description text determination method and related equipment thereof
CN112800043A (en) Internet of things terminal information extraction method, device, equipment and storage medium
CN113343701A (en) Extraction method and device for text named entities of power equipment fault defects
CN115082041B (en) User information management method, device, equipment and storage medium
CN116739408A (en) Power grid dispatching safety monitoring method and system based on data tag and electronic equipment
CN114969334B (en) Abnormal log detection method and device, electronic equipment and readable storage medium
CN113920497B (en) Nameplate recognition model training method, nameplate recognition method and related devices
CN116433934A (en) Multi-mode pre-training method for generating CT image representation and image report
CN113342932B (en) Target word vector determining method and device, storage medium and electronic device
CN115221977A (en) Text similarity calculation model training method, calculation method and related device
CN114638229A (en) Entity identification method, device, medium and equipment of record data
CN112395855A (en) Comment-based evaluation method and device
CN112651449A (en) Method and device for determining content characteristics of video, electronic equipment and storage medium
CN113378921A (en) Data screening method and device and electronic equipment
CN116822522B (en) Semantic analysis method, semantic analysis device, semantic analysis equipment and storage medium
CN108038230B (en) Information generation method and device based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination