CN114444443A - Identification recognition method and device and terminal equipment - Google Patents

Identification recognition method and device and terminal equipment Download PDF

Info

Publication number
CN114444443A
CN114444443A CN202011232011.4A CN202011232011A CN114444443A CN 114444443 A CN114444443 A CN 114444443A CN 202011232011 A CN202011232011 A CN 202011232011A CN 114444443 A CN114444443 A CN 114444443A
Authority
CN
China
Prior art keywords
sequence
internet
identification
things
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011232011.4A
Other languages
Chinese (zh)
Inventor
李小涛
游树娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN202011232011.4A priority Critical patent/CN114444443A/en
Publication of CN114444443A publication Critical patent/CN114444443A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides an identification recognition method, an identification recognition device and terminal equipment, and relates to the technical field of communication. Wherein the method comprises the following steps: acquiring an Internet of things identification code; dividing characters in the Internet of things identification codes to obtain a first sequence; inputting the first sequence into an encoder of a sequence model for encoding to obtain a semantic vector corresponding to the first sequence; inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a unique one-hot coding sequence; the one-hot coding sequence is used for indicating the number of the target coding type to which the Internet of things identification code belongs; and identifying the Internet of things identification code according to the target code type corresponding to the one-hot coding sequence. The scheme of the invention solves the problem that the analysis and intercommunication of different codes cannot be realized at present.

Description

Identification recognition method and device and terminal equipment
Technical Field
The present invention relates to the field of communications technologies, and in particular, to an identifier identification method and apparatus, and a terminal device.
Background
The internet of things identifier is a name tag used for identifying different objects of the internet of things. Commodity codes, equipment serial numbers, equipment network addresses, page Uniform Resource Identifiers (URIs), and the like are all internet of things identifiers. With the rapid development of the internet of things, the internet of things identification system is continuously provided in different countries and different fields, and the situation that various heterogeneous internet of things identifications coexist is brought. Due to the difference of the coding format and the parsing method of each identifier, the data mutual recognition and resource sharing between the internet of things systems face a serious challenge, and further development of the internet of things is hindered.
Because the coding modes adopted at present are various and different, a corresponding coding analysis system is needed to complete analysis when identification is carried out, and the analysis and intercommunication of different codes cannot be completed by using a uniform identification system. In addition, in order to realize automatic identification of the heterogeneous internet of things identifier, at present, a mapping rule of other codes can be agreed in advance through the compatible identifier to realize compatibility of the heterogeneous identifier, and further, automatic identification and analysis of the heterogeneous identifier are completed.
Disclosure of Invention
The invention aims to provide an identification recognition method, an identification recognition device and terminal equipment, and aims to solve the problem that the analysis and intercommunication of different codes cannot be realized at present.
To achieve the above object, an embodiment of the present invention provides an identifier recognition method, including:
acquiring an Internet of things identification code;
dividing characters in the Internet of things identification codes to obtain a first sequence;
inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence;
inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a one-hot (one-hot) coding sequence; the one-hot coding sequence is used for indicating the number of the target coding type to which the Internet of things identification code belongs;
and identifying the Internet of things identification code according to the target code type corresponding to the one-hot coding sequence.
Optionally, the segmenting the characters in the internet of things identification code to obtain a first sequence includes:
dividing the characters in the Internet of things identification codes according to each character to obtain a character sequence;
determining the character sequence as the first sequence.
Optionally, the inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence includes:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
taking the first one-hot vector as the input of a recurrent neural network unit in the encoder, and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the segmenting the characters in the internet of things identification code to obtain a first sequence includes:
segmenting characters in the Internet of things identification codes according to a Chinese language model (n-gram) to obtain a string sequence of the n-gram;
determining a sequence of n-gram strings as the first sequence.
Optionally, the inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence includes:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
taking the word vector as the input of a cyclic neural network unit in the encoder to obtain the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a one-hot encoded sequence includes:
inputting the semantic vector into a decoder of the sequence-to-sequence model, and converting the elements of each recurrent neural network unit input into the decoder into a second one-hot vector through an embedded layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the one-hot coding sequence.
Optionally, the identifying the internet of things identifier code according to the target code category corresponding to the one-hot code sequence includes:
calling an identification analysis service corresponding to the target coding type according to the target coding type corresponding to the one-hot coding sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
To achieve the above object, an embodiment of the present invention provides an identifier recognition apparatus, including:
the acquisition module is used for acquiring the identification code of the Internet of things;
the processing module is used for segmenting characters in the Internet of things identification codes to obtain a first sequence;
the encoding module is used for inputting the first sequence into an encoder in a sequence model for encoding processing to obtain a semantic vector corresponding to the first sequence;
the decoding module is used for inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding processing to obtain a one-hot coding sequence; the one-hot coding sequence is used for indicating the number of the target coding type to which the Internet of things identification code belongs;
and the identification module is used for identifying the Internet of things identification code according to the target code category corresponding to the one-hot coding sequence.
Optionally, the processing module includes:
the first processing unit is used for segmenting the characters in the Internet of things identification code according to each character to obtain a character sequence;
a second processing unit for determining the character sequence as the first sequence.
Optionally, the encoding module comprises:
a first conversion unit, configured to input the first sequence to an encoder in a sequence-to-sequence model, and convert each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
the first coding unit is used for taking the first one-hot vector as the input of the recurrent neural network unit in the coder and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and the first determining unit is used for determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processing module includes:
the third processing unit is used for segmenting the characters in the Internet of things identification codes according to n-grams to obtain a string sequence of the n-grams;
a fourth processing unit configured to determine a sequence of n-grams as the first sequence.
Optionally, the encoding module comprises:
a second conversion unit, configured to input the first sequence to an encoder in a sequence-to-sequence model, and convert each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
the second coding unit is used for taking the word vector as the input of the cyclic neural network unit in the coder to obtain the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and the second determining unit is used for determining the vector output by the last recurrent neural network unit as the semantic vector corresponding to the first sequence.
Optionally, the decoding module comprises:
a decoding unit, configured to input the semantic vector to a decoder in the sequence-to-sequence model, and convert an element of each recurrent neural network unit input to the decoder into a second one-hot vector through an embedded layer of the decoder;
and a third determining unit, configured to determine a sequence formed by output elements of all recurrent neural network units in the decoder as the one-hot encoding sequence.
Optionally, the identification module comprises:
the calling unit is used for calling the identification analysis service corresponding to the target coding type according to the target coding type corresponding to the one-hot coding sequence;
and the identification unit is used for identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
To achieve the above object, an embodiment of the present invention provides a terminal device, including: a transceiver and a processor;
the transceiver is used for acquiring an Internet of things identification code;
the processor is configured to:
dividing characters in the Internet of things identification codes to obtain a first sequence;
inputting the first sequence into an encoder of a sequence model for encoding to obtain a semantic vector corresponding to the first sequence;
inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a one-hot coding sequence; the one-hot coding sequence is used for indicating the number of the target coding type to which the identification code of the Internet of things belongs;
and identifying the Internet of things identification code according to the target code type corresponding to the one-hot coding sequence.
Optionally, the processor is further configured to:
dividing the characters in the Internet of things identification codes according to each character to obtain a character sequence;
determining the character sequence as the first sequence.
Optionally, the processor is further configured to:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
taking the first one-hot vector as the input of a recurrent neural network unit in the encoder, and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processor is further configured to:
segmenting characters in the Internet of things identification codes according to n-grams to obtain a string sequence of the n-grams;
determining a sequence of n-gram strings as the first sequence.
Optionally, the processor is further configured to:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
taking the word vector as the input of a cyclic neural network unit in the encoder to obtain the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processor is further configured to:
inputting the semantic vector to a decoder in the sequence-to-sequence model, and converting elements of each recurrent neural network unit input to the decoder into a second one-hot vector through an embedding layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the one-hot coding sequence.
Optionally, the processor is further configured to:
calling an identification analysis service corresponding to the target coding type according to the target coding type corresponding to the one-hot coding sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
To achieve the above object, an embodiment of the present invention provides a terminal device, including: a transceiver, a processor, a memory, and a program or instructions stored on the memory and executable on the processor; the processor, when executing the program or instructions, performs the steps of the identification recognition method as described above.
To achieve the above object, an embodiment of the present invention provides a readable storage medium on which a program or instructions are stored, which when executed by a processor implement the steps of the identification recognition method as described above.
In order to achieve the above object, an embodiment of the present invention provides an identifier identifying method, including:
acquiring an Internet of things identification code;
dividing characters in the Internet of things identification codes to obtain a first sequence;
inputting the first sequence into an encoder of a sequence model for encoding to obtain a semantic vector corresponding to the first sequence;
inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a binary coding sequence; the binary coding sequence is used for indicating the number of the target coding category to which the Internet of things identification code belongs;
and identifying the Internet of things identification code according to the target code category corresponding to the binary coding sequence.
Optionally, the segmenting the characters in the internet of things identification code to obtain a first sequence includes:
dividing the characters in the Internet of things identification codes according to each character to obtain a character sequence;
determining the character sequence as the first sequence.
Optionally, the inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence includes:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
taking the first one-hot coding vector as the input of a recurrent neural network unit in the coder, and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the segmenting the characters in the internet of things identification code to obtain a first sequence includes:
segmenting characters in the Internet of things identification codes according to n-grams to obtain a string sequence of the n-grams;
determining a sequence of n-gram strings as the first sequence.
Optionally, the inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence includes:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
taking the word vector as the input of a cyclic neural network unit in the encoder, and obtaining the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a binary coding sequence includes:
inputting the semantic vector into a decoder of the sequence-to-sequence model, and converting the elements of each recurrent neural network unit input into the decoder into a second one-hot vector through an embedded layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the binary coding sequence.
Optionally, the identifying the internet of things identification code according to the target code category corresponding to the binary coding sequence includes:
calling an identification analysis service corresponding to the target code type according to the target code type corresponding to the binary code sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
In order to achieve the above object, an embodiment of the present invention further provides an identifier recognition apparatus, including:
the acquisition module is used for acquiring the identification code of the Internet of things;
the processing module is used for segmenting characters in the Internet of things identification codes to obtain a first sequence;
the encoding module is used for inputting the first sequence into an encoder in a sequence model for encoding processing to obtain a semantic vector corresponding to the first sequence;
the decoding module is used for inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding processing to obtain a binary coding sequence; the binary coding sequence is used for indicating the number of the target coding category to which the Internet of things identification code belongs;
and the identification module is used for identifying the Internet of things identification code according to the target code category corresponding to the binary coding sequence.
Optionally, the processing module includes:
the first processing unit is used for segmenting the characters in the Internet of things identification code according to each character to obtain a character sequence;
a second processing unit for determining the character sequence as the first sequence.
Optionally, the encoding module comprises:
a first conversion unit, configured to input the first sequence to an encoder in a sequence model, and convert each character in the first sequence into a first one-hot vector through an embedding layer of the encoder;
the first coding unit is used for taking the first one-hot coding vector as the input of the recurrent neural network unit in the coder and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and the first determining unit is used for determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processing module includes:
the third processing unit is used for segmenting the characters in the Internet of things identification codes according to n-grams to obtain a string sequence of the n-grams;
a fourth processing unit configured to determine a sequence of n-grams as the first sequence.
Optionally, the encoding module comprises:
a second conversion unit, configured to input the first sequence to an encoder in a sequence-to-sequence model, and convert each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
the second coding unit is used for taking the word vector as the input of the cyclic neural network unit in the coder and obtaining the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and the second determining unit is used for determining the vector output by the last recurrent neural network unit as the semantic vector corresponding to the first sequence.
Optionally, the decoding module comprises:
a decoding unit, configured to input the semantic vector to a decoder in the sequence-to-sequence model, and convert an element of each recurrent neural network unit input to the decoder into a second one-hot vector through an embedded layer of the decoder;
and the third determining unit is used for determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the binary coding sequence.
Optionally, the identification module comprises:
the calling unit is used for calling the identification analysis service corresponding to the target coding type according to the target coding type corresponding to the binary coding sequence;
and the identification unit is used for identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
In order to achieve the above object, an embodiment of the present invention further provides a terminal device, including: a transceiver and a processor;
the transceiver is used for acquiring an Internet of things identification code;
the processor is configured to:
dividing characters in the Internet of things identification codes to obtain a first sequence;
inputting the first sequence into an encoder of a sequence model for encoding to obtain a semantic vector corresponding to the first sequence;
inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a binary coding sequence; the binary coding sequence is used for indicating the number of the target coding category to which the Internet of things identification code belongs;
and identifying the Internet of things identification code according to the target code category corresponding to the binary coding sequence.
Optionally, the processor is further configured to:
dividing the characters in the Internet of things identification codes according to each character to obtain a character sequence;
determining the character sequence as the first sequence.
Optionally, the processor is further configured to:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
taking the first one-hot coding vector as the input of a recurrent neural network unit in the coder, and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processor is further configured to:
segmenting characters in the Internet of things identification codes according to n-grams to obtain a string sequence of the n-grams;
determining a sequence of n-gram strings as the first sequence.
Optionally, the processor is further configured to:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
taking the word vector as the input of a recurrent neural network unit in the encoder, and obtaining the vector output by the last recurrent neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processor is further configured to:
inputting the semantic vector to a decoder in the sequence-to-sequence model, and converting elements of each recurrent neural network unit input to the decoder into a second one-hot vector through an embedding layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the binary coding sequence.
Optionally, the processor is further configured to:
calling an identification analysis service corresponding to the target code type according to the target code type corresponding to the binary code sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
To achieve the above object, an embodiment of the present invention provides a terminal device, including: a transceiver, a processor, a memory, and a program or instructions stored on the memory and executable on the processor; the processor, when executing the program or instructions, implements the steps of the identity recognition method as described above.
To achieve the above object, an embodiment of the present invention provides a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the identification recognition method as described above.
The technical scheme of the invention has the following beneficial effects:
according to the embodiment of the invention, the identification type of the Internet of things identification code is automatically identified through the sequence-to-sequence model, and the identification type of the Internet of things identification code is further analyzed, so that the automatic identification of the heterogeneous Internet of things identification code is realized on the basis of not creating a new identification system and not changing the existing identification system, and the problem that the analysis and intercommunication of different codes cannot be realized at present is solved.
Drawings
FIG. 1 is a flow chart of a method for identifying an identifier according to an embodiment of the present invention;
FIG. 2 is a block diagram of a sequence-to-sequence model according to an embodiment of the present invention;
FIG. 3 is a flow chart of sequence-to-sequence model training according to an embodiment of the present invention;
FIG. 4 is a second schematic diagram of a sequence-to-sequence model according to an embodiment of the present invention;
FIG. 5 is a third schematic diagram of a sequence-to-sequence model according to an embodiment of the present invention;
FIG. 6 is a second flowchart of an identification recognition method according to an embodiment of the present invention;
FIG. 7 is a fourth schematic diagram of a sequence-to-sequence model according to an embodiment of the present invention;
FIG. 8 is a fifth embodiment of a sequence-to-sequence model;
fig. 9 is a schematic diagram of a normal flow of a heterogeneous identity service use case according to an embodiment of the present invention;
FIG. 10 is a high level schematic illustration of a heterogeneous identity recognition service according to an embodiment of the present invention;
fig. 11 is one of block diagrams of a terminal device of the embodiment of the present invention;
fig. 12 is a second block diagram of a terminal device according to the embodiment of the present invention;
fig. 13 is a third block diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Additionally, the terms "system" and "network" are often used interchangeably herein.
In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
The embodiment of the invention provides an identification recognition method, which aims to solve the problem that the analysis and intercommunication of different codes cannot be realized at present.
Alternatively, according to the compatibility and expandability of the identification system, the identification can be divided into two categories, namely, an exclusive identification and a compatibility identification.
The exclusive mark has an independent coding structure, and has a fixed mark field or mark object, such as GS1 (including European Article Number (EAN), barcode (UCC), etc.), Electronic Product Code (EPC), sensor node mark, Internet Protocol Version 4 (IPv 4), Internet Protocol Version 6 (IPv 6), etc.
The compatible Identifier supports the identification of any Object in different fields, and is a comprehensive identification system, and common comprehensive identification systems include three types, namely Handle, Object Identifier (Object Identifier, OID) and national internet of things identification system (Entity Code, Ecode).
The Handle is a global distributed system, and defines a set of layered service models and corresponding global analytic system and operation and maintenance mechanism of segment management. Services such as identification definition, dynamic parsing, and security management can be provided for digital objects. The coding structure is as follows: authoritative domain (prefix)/local name (suffix). An authoritative domain may host several sub-authoritative domains, separated by ". times.. The prefix suffixes are separated by "/". For example: "86.1000.15/201308081001".
OID is an identification mechanism proposed jointly by International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC), International telecommunication union, telecommunication standards institute (International Telegraph and Telephone consensus Committee, ITU-T), and employs a hierarchical tree structure to globally and uniquely name any type of object, concept, or "thing". The coding structure is as follows: the tree structure is divided by' between different layers, and the number of the layers is not limited. When the object is identified, the identifier is formed by sequentially combining nodes on all paths from the root of the tree to the leaf.
The Ecode is a coding solution which is independently formulated by China and is suitable for any object of the Internet of things, and is composed of Ecode coding, data identification, middleware, an analysis system, information query and discovery service, a safety mechanism and the like, and is a complete system. The coding structure is as follows: version (V) + code system identification (NSI) + main code (MD). Where V is used to distinguish Ecode of different data structures. The NSI is used to indicate a code of a certain identification system. MD is used to represent standardized codes in a certain industry or application system.
The Ecode respectively appoints the combination of corresponding version (V) and coding system identification (NSI) for different identification systems, and takes the code in the original coding system as the main code (MD) in the Ecode, thereby forming the complete Ecode code. For example: "100036901234567892", wherein V is 1, NSI is 0003, the combination 1003 of V and NSI is GS1 goods coding system, and "6901234567892" is the goods identification code in GS 1. When the Ecode analyzes, the identification system of the MD part is judged through V and NSI, and then the MD part is sent to a corresponding analysis system to complete analysis.
As shown in fig. 1, an identifier identification method according to an embodiment of the present invention includes:
step 11: and acquiring an identification code of the Internet of things.
Step 12: and segmenting the characters in the Internet of things identification code to obtain a first sequence.
Step 13: and inputting the first sequence into an encoder of a sequence model for encoding to obtain a semantic vector corresponding to the first sequence.
Step 14: and inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a one-hot coding sequence.
And the one-hot coding sequence is used for indicating the number of the target coding type to which the identification code of the Internet of things belongs.
Step 15: and identifying the Internet of things identification code according to the target code type corresponding to the one-hot coding sequence.
For example: and acquiring Internet of things Identification codes on different carriers such as a two-dimensional code, a bar code, a Radio Frequency Identification (RFID) technology and the like. Of course, the obtaining route of the internet of things identification code in the embodiment of the present invention is not limited thereto.
Optionally, the characters in the internet of things identification code may be segmented according to a single character or a character string, so as to convert the identification into a sequence form. In this way, the sequence obtained by converting the internet of things identification code is used as the input of a sequence to sequence (sequence 2seq) model, and the sequence used for indicating the identification category to which the internet of things identification code belongs is obtained.
The sequence-to-sequence model is a kind of encoder-decoder (encoder-decoder) structure, and the basic idea is to use a Recurrent Neural Network (RNN), such as a Long Short-Term Memory Network (LSTM) as the encoder and decoder, as shown in fig. 2.
The Encoder is responsible for compressing an input sequence into a vector of a specified length, which can be regarded as the semantics of the sequence, and this process is called encoding. Alternatively, the implicit state h of the last input may bemAs the semantic vector, the last implicit state can be transformed to obtain the semantic vector, and all the implicit states of the input sequence can be transformed to obtain the semantic vector.
The decoder is responsible for generating the specified sequence from the semantic vector, a process also referred to as decoding. Optionally, the semantic variable obtained by the encoder may be input into the RNN of the decoder as an initial state to obtain an output sequence. For example: the output of the previous time is used as the input of the current time, and the semantic vector is used as the decoder initial state to participate in the operation.
In addition, there is an embedding layer (embedding layer) at each of the encoder and decoder of the sequence-to-sequence model, which is responsible for converting each element in the sequence into a vector form, thereby participating in the computation of the implicit state and prediction output of each unit.
According to the scheme, the identification type of the Internet of things identification code is automatically identified through the sequence-to-sequence model, the identification type of the Internet of things identification code is further analyzed, the automatic identification of the heterogeneous Internet of things identification code is realized on the basis of not creating a new identification system and not changing the existing identification system, and the problem that the analysis and intercommunication of different codes cannot be realized at present is solved.
In addition, the scheme adopts a one-hot coding mode at a decoder end, namely, a unique shaping number is distributed for each identification category, and then the number is converted into an N-bit (N is the total number of identification categories supported by a sequence-to-sequence model) one-hot coding as an output sequence, so that the output sequence is actually a binary sequence only containing 0 and 1, and each element in the output sequence can be represented by a 2-dimensional one-hot vector at an embedded layer of the decoder, thereby ensuring that the decoding process is simpler.
Optionally, a sequence-to-sequence model for identifying the identification category to which the internet of things identification code belongs may be trained in advance, and after the sequence-to-sequence model is trained, a sequence obtained by converting the internet of things identification code to be identified is used as an input of the sequence-to-sequence model, so as to obtain a sequence for indicating the identification category to which the internet of things identification code belongs.
The method comprises the steps of training a sequence-to-sequence model for identifying the identification category to which the identification code of the Internet of things belongs, namely converting automatic identification of the identification code category into a learning problem from the sequence to the sequence, and decoding an output sequence at a model decoder end to obtain a coding system of an input identification so as to realize accurate classification of the identification. As shown in fig. 3, the training process of the sequence-to-sequence model may include:
step 31: and taking the identification code of each identification category as a category sample, and taking the number appointed by the identification category as a label.
For example: the number of the figure corresponding to the identification type OID is 1, the number of the figure corresponding to the identification type Ecode is 2, the number of the figure corresponding to the identification type EAN is 3, and the like; the number corresponding to the specific identifier category may be preset, and the embodiment of the present invention is not limited thereto.
Step 32: and converting the identification code samples in the training set into a sequence form.
Step 33: each element of the input sequence is converted into a vector form by the embedded layer of the encoder as input to the encoder.
Step 34: and converting the number of the identification category corresponding to the identification coding sample into a sequence form.
For example: the number identifying the category may be converted to a sequence form containing 0, 1.
Step 35: each element of the output sequence is converted into a 2-dimensional one-hot vector through the embedding layer of the decoder as an input of the decoder.
Step 36: on the basis of completing serialization and vectorization, training a sequence to a sequence model.
Optionally, as an implementation: the step of segmenting the characters in the internet of things identification code to obtain a first sequence may specifically include:
dividing the characters in the Internet of things identification codes according to each character to obtain a character sequence;
determining the character sequence as the first sequence.
In the embodiment of the invention, in the sequence-to-sequence model, the identification code is input at the encoder end, and the identification category corresponding to the identification code is output of the encoder. In order to meet the requirements of the sequence-to-sequence model for input and output, the identifier and the system thereof need to be converted into a sequence form, and an embedded layer of an encoder and an embedded layer of a decoder need to be designed to obtain a vector representation of each sequence element.
Wherein each identification code is a character string consisting of characters, numbers and special symbols (such as ' ' - ') and the like. In this embodiment, it may be converted into a character sequence by character-segmenting it. For example: taking the EAN identifier "6945091708456" as an example, the "6945091708456" is divided into sequences {6,9,4,5,0,9,1,7,0,8,4,5,6} for each character.
Optionally, the step of inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence may specifically include:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
taking the first one-hot vector as the input of a recurrent neural network unit in the encoder, and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the converting, by the embedded layer of the encoder, each character in the first sequence into a first one-hot vector includes: and converting each character in the first sequence into a 128-dimensional first one-hot vector according to an ASCII code corresponding to the character through an embedded layer of the encoder.
For example: at the embedding layer of the encoder, each character is converted into a 128-dimensional one-hot vector using one-hot encoding. one-hot encoding is the classification of variables as a representation of a binary vector, with each integer value represented as a binary vector, except for the index of the integer, which is a zero value, which is labeled 1. In an ASCII coding system, each character corresponds to a shaping number of 0-127, and the index of 1 in one-hot vectors of the characters is the ASCII code of the characters. If the ASCII code of the character '6' is 54, its corresponding one-hot vector index is 54 (the 55 th element in the vector) and has a value of 1, and the rest is 0.
Further, after the semantic vector of the output is obtained by the encoder, it can be used as the input of the decoder, or after it is transformed, it can be used as the input of the decoder. In addition, semantic vectors obtained by transforming all the implicit states output by the encoder may also be used as the input of the decoder, which is not limited in the embodiments of the present invention.
Optionally, the step of inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a one-hot coding sequence may specifically include:
inputting the semantic vector into a decoder of the sequence-to-sequence model, and converting the elements of each recurrent neural network unit input into the decoder into a second one-hot vector through an embedded layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the one-hot coding sequence.
For example: at the decoder side, each identification class is assigned a unique shaping number, which is then converted into an N-bit (N being the total number of identification classes supported by the seq2seq model) one-hot code as the output sequence, such that the output sequence is actually a binary sequence containing only 0 and 1. In an embedded layer of a decoder, each element in an output sequence is represented by a 2-dimensional one-hot vector, and the output sequence of the decoder is decoded, so that an identification category corresponding to the identification code of the internet of things is obtained.
Optionally, the step of performing identification processing on the internet of things identifier code according to the target code category corresponding to the one-hot code sequence may specifically include:
calling an identification analysis service corresponding to the target coding type according to the target coding type corresponding to the one-hot coding sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
For example: the EAN identification code "6945091708456" is analyzed by the Chinese article coding center, so that the analysis of the "6945091708456" identification can be completed by calling the analysis service.
Therefore, the identification category to which the Internet of things identification code belongs is obtained through sequence-to-sequence model identification, and a proper analysis system is selected to automatically analyze the Internet of things identification code, so that automatic identification of the heterogeneous Internet of things identification code is realized on the basis of no need of creating a new identification system and no need of changing the existing identification system.
The above method is explained below with reference to specific examples:
as shown in fig. 4, the training set from the sequence to the sequence model includes 20 identification categories, for example, EAN identification code is "6945091708456", and the number corresponding to the EAN identification category is 3.
The EAN identification code "6945091708456" is converted into a sequence {6,9,4,5,0,9,1,7,0,8,4,5,6}, and at the embedding layer of the encoder, each character is converted into a 128-dimensional one-hot vector according to its ASCII code value. For example: the ASCII code for the character '6' is 54, then its corresponding one-hot vector index is 54 (the 55 th element in the vector) and has a value of 1, and the remainder is 0.
The output sequence at the decoder side is 00100 … 00 (length is 20, since EAN system corresponds to number 3, the 3 rd of the output sequence is set to 1, and the rest is 0). At the decoder end, the output of the previous LSTM is used as the input of the next LSTM (e.g., the output of H1 is used as the input of H2, and the input of H1 may be a starting default value, such as 0 or 1), and for the output sequence of the decoder, the element contains only '0' and '1', the vector of 0 is {1,0}, and the vector of 1 is {0,1 }.
In the scheme of the invention, the encoder adopts character encoding, and the decoder adopts one-hot encoding to express the identification category. At the embedding layer of the encoder, each character is represented by a 128-dimensional one-hot vector, and the index position of 1 in the vector is the ASCII encoded value corresponding to the character. At the decoder's embedded layer, each one-hot element value is represented by a 2-dimensional 0-1 vector. The scheme can accurately and effectively identify the identification category corresponding to the identification of the Internet of things aiming at the conditions of limited storage space and less identification categories, and is more concise.
Optionally, the identifier identification method may be applied to a terminal, and certainly may be applied to a cloud, which is not limited in this embodiment of the application.
Optionally, as another implementation: the step of segmenting the characters in the internet of things identification code to obtain a first sequence may specifically include:
segmenting characters in the Internet of things identification codes according to a Chinese language model n-gram to obtain a n-gram string sequence;
determining a sequence of n-gram strings as the first sequence.
For example: when the coding lengths and the element value ranges of two identifications belonging to different identification categories are the same and only the coding rules are different, the two identifications cannot be fully distinguished by the form of a character sequence alone. For example: the MSISDN identification code "6524054521877" and EAN identification code "6945091708456" each contain 13 digits. In the EAN identification code, different parts of the string can express definite physical meanings, for example, the string consisting of the first three characters is a country code. In the EAN identification code, the word string "694" indicates that the origin of the currently identified product is China. In addition, the string "652" composed of the first three characters in the MSISDN identification number is not used by any identifier in the EAN identification system. In summary, the representation of the signatures by n-gram string sequences has better diversity.
Optionally, the step of inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence may specifically include:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
taking the word vector as the input of a cyclic neural network unit in the encoder to obtain the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the step of converting each n-gram string in the first sequence into a word vector by the embedding layer of the encoder may include: converting, by the embedded layer of the encoder, each n-gram string in the first sequence into a word vector using a natural language pre-training model (ELMo).
For example: and (4) taking each n-gram string as a single word, taking the whole sequence as a sentence, inputting the sentence into the ELMo model, and obtaining a word vector corresponding to each n-gram string, wherein the dimensionality is 1024. Thus, the method for representing each n-gram string by using one-hot coding is solved by introducing the ELMo model, so that the dimensionality of a vector reaches 128nAnd the final trained sequence-to-sequence model is extremely huge and has poor precision.
Further, after the semantic vector of the output is obtained by the encoder, it can be used as the input of the decoder, or after it is transformed, it can be used as the input of the decoder. In addition, semantic vectors obtained by transforming all the implicit states output by the encoder may also be used as the input of the decoder, which is not limited in the embodiments of the present invention.
Optionally, the inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a one-hot encoded sequence includes:
inputting the semantic vector into a decoder of the sequence-to-sequence model, and converting the elements of each recurrent neural network unit input into the decoder into a second one-hot vector through an embedded layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the one-hot coding sequence.
For example: at the decoder side, each identification class is assigned a unique shaping number, which is then converted into an N-bit (N being the total number of identification classes supported by the seq2seq model) one-hot code as the output sequence, such that the output sequence is actually a binary sequence containing only 0 and 1. In an embedded layer of a decoder, each element in an output sequence is represented by a 2-dimensional one-hot vector, and the output sequence of the decoder is decoded, so that an identification category corresponding to the identification code of the internet of things is obtained.
Optionally, the step of performing identification processing on the internet of things identifier code according to the target code category corresponding to the one-hot code sequence may specifically include:
calling an identification analysis service corresponding to the target coding type according to the target coding type corresponding to the one-hot coding sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
For example: the EAN identification code "6945091708456" is analyzed by the Chinese article coding center, so that the analysis of the "6945091708456" identification can be completed by calling the analysis service.
Therefore, the identification type of the identification code of the Internet of things is obtained through sequence-to-sequence model identification, a proper analysis system is selected, and the identification code of the Internet of things is automatically analyzed, so that the automatic identification of the identification code of the heterogeneous Internet of things is realized on the basis of not creating a new identification system and not changing the existing identification system.
The above method is explained below with reference to specific examples:
as shown in fig. 5, the training set from the sequence to the sequence model includes 20 identification categories, for example, EAN identification code is "6945091708456", and the number corresponding to the EAN identification category is 3.
The EAN ID code "6945091708456" is converted into a sequence {694,509,170,845,6}, and each n-gram string is subjected to an ELMo model to obtain a 1024-dimensional real vector at the embedding layer of the encoder.
The output sequence at the decoder end is 00100 … 00 (length is 20, since EAN is corresponding to number 3, the 3 rd of the output sequence is set to 1, and the rest is 0). At the decoder end, the output of the previous LSTM is used as the input of the next LSTM (e.g., the output of H1 is used as the input of H2, and the input of H1 may be a starting default value, such as 0 or 1), and for the one-hot output sequence of the decoder, the element contains only '0' and '1', the vector of 0 is {1,0} and the vector of 1 is {0,1 }.
According to the scheme, n-gram coding is adopted at an encoder end, one-hot coding is adopted at a decoder end, and an ELMo model is adopted to generate 1024-dimensional vectors for each n-gram string at an embedded layer of the encoder. At the decoder's embedded layer, each one-hot element value is represented by a 2-dimensional 0-1 vector. The scheme can accurately and effectively identify the identification type corresponding to the Internet of things identification aiming at the conditions of sufficient storage space and less identification types, and has higher identification precision.
Optionally, the identifier identification method may be applied to a cloud, and certainly may also be applied to a terminal with a sufficient storage space, which is not limited in this embodiment of the application.
As shown in fig. 6, an embodiment of the present invention further provides an identifier identification method, including:
step 61: and acquiring an identification code of the Internet of things.
Step 62: and segmenting the characters in the Internet of things identification code to obtain a first sequence.
And step 63: and inputting the first sequence into an encoder of a sequence model for encoding to obtain a semantic vector corresponding to the first sequence.
Step 64: and inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a binary coding sequence.
The binary coding sequence is used for indicating the number of the target coding class to which the Internet of things identification code belongs.
Step 65: and identifying the Internet of things identification code according to the target code type corresponding to the binary coding sequence.
For example: and acquiring Internet of things identification codes on different carriers such as two-dimensional codes, bar codes and RFID. Of course, the obtaining route of the internet of things identification code in the embodiment of the present invention is not limited thereto.
Optionally, the characters in the internet of things identification code may be segmented according to a single character or a character string, so as to convert the identification into a sequence form. Therefore, the sequence obtained by converting the Internet of things identification code is used as the input of the sequence to the sequence model, and the sequence used for indicating the identification category to which the Internet of things identification code belongs is obtained.
The sequence-to-sequence model is one of encoder-decoder structures, and the basic idea is to use RNNs, such as LSTM, as encoders and decoders, as shown in fig. 2, for specific description, refer to the above embodiments, and no further description is given here to avoid repetition.
According to the scheme, the identification type of the Internet of things identification code is automatically identified through the sequence-to-sequence model, the identification type of the Internet of things identification code is further analyzed, the automatic identification of the heterogeneous Internet of things identification code is achieved on the basis that a new identification system does not need to be created and the existing identification system does not need to be changed, and the problem that the analysis and the intercommunication of different codes cannot be achieved at present is solved.
In addition, the scheme adopts a binary coding mode at a decoder end, and binary coding of a number corresponding to the identification category of the decoder end is used as an output sequence of the decoder end, so that the length of the output sequence of the decoder is equal to
Figure BDA0002765523320000231
(where N is the number of identification categories) thereby effectively reducing the length of the output sequence.
Optionally, a sequence-to-sequence model for identifying the identification category to which the internet of things identification code belongs may be trained in advance, and after the sequence-to-sequence model is trained, a sequence obtained by converting the internet of things identification code to be identified is used as an input of the sequence-to-sequence model, so as to obtain a sequence for indicating the identification category to which the internet of things identification code belongs.
The method comprises the steps of training a sequence-to-sequence model for identifying the identification category to which the identification code of the Internet of things belongs, namely converting automatic identification of the identification code category into a learning problem from the sequence to the sequence, and decoding an output sequence at a model decoder end to obtain a coding system of an input identification so as to realize accurate classification of the identification. For a specific sequence-to-sequence model training process, reference may be made to the above-mentioned embodiments, and details are not repeated here in order to avoid repetition.
Optionally, as an implementation: the step of segmenting the characters in the internet of things identification code to obtain a first sequence may specifically include:
dividing the characters in the Internet of things identification codes according to each character to obtain a character sequence;
determining the character sequence as the first sequence.
In the embodiment of the invention, in the sequence-to-sequence model, the identification code is input at the encoder end, and the identification category corresponding to the identification code is output of the encoder. In order to meet the requirements of the sequence-to-sequence model for input and output, the identifier and the system thereof need to be converted into a sequence form, and an embedded layer of an encoder and an embedded layer of a decoder need to be designed to obtain a vector representation of each sequence element.
Wherein each identification code is a character string consisting of characters, numbers and special symbols (such as ' ' - ') and the like. In this embodiment, it may be converted into a character sequence by character-segmenting it. For example: taking the EAN identifier "6945091708456" as an example, the "6945091708456" is divided into sequences {6,9,4,5,0,9,1,7,0,8,4,5,6} for each character.
Optionally, the step of inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence may specifically include:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
taking the first one-hot coding vector as the input of a recurrent neural network unit in the coder, and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the converting, by the embedded layer of the encoder, each character in the first sequence into a first one-hot vector includes: and converting each character in the first sequence into a 128-dimensional first one-hot vector according to an ASCII code corresponding to the character through an embedded layer of the encoder.
For example: at the embedding layer of the encoder, each character is converted into a 128-dimensional one-hot vector using one-hot encoding. one-hot encoding is the classification of variables as a representation of a binary vector, with each integer value represented as a binary vector, except for the index of the integer, which is a zero value, which is marked 1. In an ASCII coding system, each character corresponds to a shaping number of 0-127, and the index of 1 in one-hot vectors of the characters is the ASCII code of the characters. If the ASCII code of the character '6' is 54, its corresponding one-hot vector index is 54 (the 55 th element in the vector) and has a value of 1, and the rest is 0.
Further, after the semantic vector of the output is obtained by the encoder, it can be used as the input of the decoder, or after it is transformed, it can be used as the input of the decoder. In addition, semantic vectors obtained by transforming all the implicit states output by the encoder may also be used as input of the decoder, which is not limited in the embodiment of the present invention.
Optionally, the step of inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a binary coding sequence may specifically include:
inputting the semantic vector into a decoder of the sequence-to-sequence model, and converting the elements of each recurrent neural network unit input into the decoder into a second one-hot vector through an embedded layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the binary coding sequence.
For the case that the number of identification types of the training samples is large (for example, greater than 1000), if the one-hot coding mode is adopted, the number of network elements cyclically applied at one end of the decoder is excessive (the number of elements is equal to the number of sequence elements), so that the robustness of the model is deteriorated. Therefore, in the embodiment of the present application, the binary coding of the number corresponding to the identification category at the decoder end is used as the output sequence at the decoder end, so that the length of the output sequence is as long as
Figure BDA0002765523320000241
(where N is the number of identification categories), the length of the output sequence is effectively reduced, and the robustness from the sequence to the sequence model is improved.
Optionally, the step of performing identification processing on the internet of things identification code according to the target code category corresponding to the binary coding sequence may specifically include:
calling an identification analysis service corresponding to the target code type according to the target code type corresponding to the binary code sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
For example: the EAN identification code "6945091708456" is analyzed by the Chinese article coding center, so that the analysis of the "6945091708456" identification can be completed by calling the analysis service.
Therefore, the identification category to which the Internet of things identification code belongs is obtained through sequence-to-sequence model identification, and a proper analysis system is selected to automatically analyze the Internet of things identification code, so that automatic identification of the heterogeneous Internet of things identification code is realized on the basis of no need of creating a new identification system and no need of changing the existing identification system.
The above method is explained below with reference to specific examples:
as shown in fig. 7, the training set from the sequence to the sequence model includes 20 identification categories, for example, EAN identification code is "6945091708456", and the number corresponding to the EAN identification category is 3.
The EAN identification code "6945091708456" is converted into a sequence {6,9,4,5,0,9,1,7,0,8,4,5,6}, and at the embedding layer of the encoder, each character is converted into a 128-dimensional one-hot vector according to its ASCII code value. For example: the ASCII code for the character '6' is 54, then its corresponding one-hot vector index is 54 (the 55 th element in the vector) and has a value of 1, and the remainder is 0.
The output sequence at the decoder end is 00011 (length 5). The output of the previous LSTM at the decoder side is used as the input of the next LSTM (e.g., the output of H1 is used as the input of H2, and the input of H1 can be a starting default value, such as 0 or 1), and for the output sequence of the decoder, and for one-hot encoding, the vector of the element containing only '0' and '1', 0 is {1,0} and the vector of 1 is {0,1 }.
In the scheme of the invention, the encoder end adopts character coding, and the decoder adopts one-hot coding to express the identification category. At the embedding layer of the encoder, each character is represented by a 128-dimensional one-hot vector, and the index position of 1 in the vector is the ASCII encoded value corresponding to the character. At the decoder side, binary coding is adopted to represent the number corresponding to the identification category, the length of the output sequence at the decoder side is reduced, and at the embedding layer of the decoder, each one-hot element value is represented by 2-dimensional 0-1 vectors. The scheme can accurately and effectively identify the identification type corresponding to the Internet of things identification and ensure the robustness from the sequence to the sequence model aiming at the conditions of limited storage space and more identification types.
Optionally, the identifier identification method may be applied to a terminal, and certainly may be applied to a cloud, which is not limited in this embodiment of the application.
Optionally, as another implementation: the step of segmenting the characters in the internet of things identification code to obtain a first sequence may specifically include:
segmenting characters in the Internet of things identification codes according to a Chinese language model n-gram to obtain a n-gram string sequence;
determining a sequence of n-gram strings as the first sequence.
For example: when the coding lengths and the element value ranges of two identifications belonging to different identification categories are the same and only the coding rules are different, the two identifications cannot be fully distinguished by the form of a character sequence alone. For example: the MSISDN identification code "6524054521877" and EAN identification code "6945091708456" each contain 13 digits. In the EAN identification code, different parts of the string can express definite physical meanings, for example, the string consisting of the first three characters is a country code. In the EAN identification code, the word string "694" indicates that the origin of the currently identified product is China. In addition, the string "652" composed of the first three characters in the MSISDN identification number is not used by any identifier in the EAN identification system. In summary, the representation of the signatures by n-gram string sequences has better diversity.
Optionally, the step of inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence may specifically include:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
taking the word vector as the input of a cyclic neural network unit in the encoder, and obtaining the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the step of converting each n-gram string in the first sequence into a word vector by the embedding layer of the encoder may include: converting, by the embedded layer of the encoder, each n-gram string in the first sequence into a word vector using a natural language pre-training model (ELMo).
For example: and (3) taking each n-gram string as a single word, taking the whole sequence as a sentence, inputting the sentence into the ELMo model, and obtaining a word vector corresponding to each n-gram string, wherein the dimensionality is 1024. Thus, the method for representing each n-gram string by using one-hot coding is solved by introducing the ELMo model, so that the dimensionality of a vector reaches 128nAnd the final trained sequence-to-sequence model is extremely large and has poor precision.
Optionally, the step of inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a binary coding sequence may specifically include:
inputting the semantic vector into a decoder of the sequence-to-sequence model, and converting the elements of each recurrent neural network unit input into the decoder into a second one-hot vector through an embedded layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the binary coding sequence.
For the case that the number of identification types of the training samples is large (for example, greater than 1000), if the one-hot coding mode is adopted, the number of network elements cyclically applied at one end of the decoder is excessive (the number of elements is equal to the number of sequence elements), so that the robustness of the model is deteriorated. Therefore, in the embodiment of the present application, the binary coding of the number corresponding to the identification category at the decoder end is used as the output sequence at the decoder end, so that the length of the output sequence is as long as
Figure BDA0002765523320000271
(wherein N isThe number of identification categories), the length of the output sequence is effectively reduced, and the robustness from the sequence to the sequence model is improved.
Optionally, the step of performing identification processing on the internet of things identification code according to the target code category corresponding to the binary coding sequence may specifically include:
calling an identification analysis service corresponding to the target coding type according to the target coding type corresponding to the binary coding sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
For example: the EAN identification code "6945091708456" is analyzed by the Chinese article coding center, so that the analysis of the "6945091708456" identification can be completed by calling the analysis service.
Therefore, the identification type to which the Internet of things identification code belongs is obtained through sequence-to-sequence model identification, a proper analysis system is selected, and the Internet of things identification code is automatically analyzed, so that the automatic identification of the heterogeneous Internet of things identification code is realized on the basis that a new identification system is not required to be created and the existing identification system is not required to be changed, and the scheme can have better expandability and can continuously increase the identification types supporting the automatic identification.
The above method is explained below with reference to specific examples:
as shown in fig. 8, the training set from the sequence to the sequence model includes 20 identification categories, for example, EAN identification code is "6945091708456", and the number corresponding to the EAN identification category is 3.
The EAN ID code "6945091708456" is converted into a sequence {694,509,170,845,6}, and each n-gram string is subjected to an ELMo model to obtain a 1024-dimensional real vector at the embedding layer of the encoder.
The output sequence at the decoder end is 00011 (length 5). The output of the previous LSTM at the decoder side is used as the input of the next LSTM (e.g., the output of H1 is used as the input of H2, and the input of H1 can be a starting default value, such as 0 or 1), and for the output sequence of the decoder, and for one-hot encoding, the vector of the element containing only '0' and '1', 0 is {1,0} and the vector of 1 is {0,1 }.
According to the scheme, n-gram coding is adopted at an encoder end, one-hot coding is adopted at a decoder end, and an ELMo model is adopted to generate 1024-dimensional vectors for each n-gram string at an embedded layer of the encoder. At the decoder side, binary coding is adopted to represent the number corresponding to the identification category, the length of the output sequence at the decoder side is reduced, and at the embedding layer of the decoder, each one-hot element value is represented by 2-dimensional 0-1 vectors. The scheme can accurately and effectively identify the identification type corresponding to the Internet of things identification aiming at the conditions of sufficient storage space and more identification types, has higher identification precision, and can also ensure the robustness from the sequence to the sequence model.
Optionally, the identifier identification method may be applied to a cloud, and certainly may also be applied to a terminal with a sufficient storage space, which is not limited in this embodiment of the application.
The identification method of four identification categories in the embodiment of the present application is introduced above, where policy 1 is that an encoder uses character encoding, and a decoder uses one-hot encoding; strategy 2 is that the encoder adopts character coding and the decoder adopts binary coding; strategy 3 is that the encoder adopts n-gram coding, and the decoder adopts one-hot coding; strategy 4 is to use n-gram coding for the encoder and binary coding for the decoder.
In order to verify the effectiveness of the scheme of the present invention, a data set of 20 identification categories is taken as an example, and the classification accuracy of 7 typical sequence classification methods in the scheme of the present application on the data set is compared. The LR and SVM are classic machine learning algorithm, the LSTM model is good at the classification of time series data, and the other 4 types are deep neural networks based on one-dimensional sequence data. The classification results are shown in table 1, and it can be seen through comparison that the identification of sequence to sequence model identification categories realized by 4 different strategies in the scheme of the present invention achieves a classification accuracy of more than 94%, which exceeds other comparison schemes. The accuracy of the model realized by the strategy 3 and the strategy 4 exceeds the accuracy of the strategy 1 and the strategy 2, and the n-gram coding mode is verified to have a promotion effect on the accuracy of the identification classification.
TABLE 1
Figure BDA0002765523320000291
The scheme of the invention can be applied to the field of new retail represented by unmanned retail, such as unmanned supermarket business, and the automatic identification technology of the identification codes of the heterogeneous Internet of things provided by the scheme of the invention can support automatic shopping of commodities identified by different coding modes, does not need a seller to label the commodities again, and saves a large amount of labor force and operation cost. Moreover, the technology can support the verification of various identities (such as identity card numbers, passport numbers, mobile phone numbers and the like) in the future when being applied to unmanned retail, and is more favorable for the promotion of the customer volume and the product popularization.
The technical scheme of the invention can also be applied to the field of product tracing, wherein the product tracing refers to the process of recording and inquiring information such as product state, attribute, position and the like by using an identification technology in the whole life cycle process from design, production planning, manufacture, transportation, service to recovery of a product, and the aim of the invention is to comprehensively master the data of the product in the whole process, promote the interconnection and intercommunication of information data among systems in an enterprise, between enterprises and between the enterprise and clients, realize the optimized configuration of enterprise resources and improve the product quality, the production efficiency and the core competitiveness of the enterprise. The automatic identification and analysis of the identification are the key points for realizing product tracing. By utilizing the automatic identification technology of the identification code of the heterogeneous Internet of things, the identification (raw materials, production, transportation and sale) of the product at different stages of the whole life cycle can be identified, and further, the automatic acquisition of product information and the comprehensive management of real-time monitoring are realized.
In addition, 5G is an important direction for upgrading a new generation of information communication technology, and the industrial Internet is a development trend for transformation and upgrading of the manufacturing industry. The combination of 5G and the industrial Internet is undoubtedly becoming an important driving force for the digital transformation of the manufacturing industry. An industrial internet identification analysis system is a key hub for realizing information intercommunication of all industrial elements and all links. By giving an identifier to each object and by means of an industrial internet identifier analysis system, cross-region, cross-industry and cross-enterprise information query and sharing are achieved. In the industrial field, there are currently various identification resolution systems in the world, such as GS1 system, OID system, Handle system, Ecode system, UID system, and the like. By means of the scheme provided by the invention, compatibility and communication of various industrial identification systems can be effectively realized, sharing and application of industrial data are supported, and development of industrial internet is promoted.
As shown in fig. 9, a normal flow of a heterogeneous identity service use case is provided, where the heterogeneous identity service use case includes 9 steps, in step 2, it is explicitly provided that an M2M platform has an ability to identify a heterogeneous identity type (identity the identifier type), as shown in fig. 10, a high-level description of the heterogeneous identity identification service is provided, and according to an analyzed identity type, an analysis request is sent to an identity analysis system corresponding to the analyzed identity type, so as to complete a uniform analysis service of a heterogeneous identity. The embodiment of the invention provides a corresponding solution for the potential requirement of the automatic identification of the heterogeneous identifier.
An embodiment of the present invention further provides an identifier recognition apparatus, including:
the acquisition module is used for acquiring the identification code of the Internet of things;
the processing module is used for segmenting characters in the Internet of things identification codes to obtain a first sequence;
the encoding module is used for inputting the first sequence into an encoder in a sequence model for encoding processing to obtain a semantic vector corresponding to the first sequence;
the decoding module is used for inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding processing to obtain a one-hot coding sequence; the one-hot coding sequence is used for indicating the number of the target coding type to which the Internet of things identification code belongs;
and the identification module is used for identifying the Internet of things identification code according to the target code category corresponding to the one-hot coding sequence.
Optionally, the processing module includes:
the first processing unit is used for segmenting the characters in the Internet of things identification code according to each character to obtain a character sequence;
a second processing unit for determining the character sequence as the first sequence.
Optionally, the encoding module comprises:
a first conversion unit, configured to input the first sequence to an encoder in a sequence-to-sequence model, and convert each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
the first coding unit is used for taking the first one-hot vector as the input of the recurrent neural network unit in the coder and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and the first determining unit is used for determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processing module includes:
the third processing unit is used for segmenting the characters in the Internet of things identification codes according to n-grams to obtain a string sequence of the n-grams;
a fourth processing unit configured to determine a sequence of n-grams as the first sequence.
Optionally, the encoding module comprises:
a second conversion unit, configured to input the first sequence to an encoder in a sequence-to-sequence model, and convert each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
the second coding unit is used for taking the word vector as the input of the cyclic neural network unit in the coder to obtain the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and the second determining unit is used for determining the vector output by the last recurrent neural network unit as the semantic vector corresponding to the first sequence.
Optionally, the decoding module comprises:
a decoding unit, configured to input the semantic vector to a decoder in the sequence-to-sequence model, and convert an element of each recurrent neural network unit input to the decoder into a second one-hot vector through an embedded layer of the decoder;
and a third determining unit, configured to determine a sequence formed by output elements of all recurrent neural network units in the decoder as the one-hot encoding sequence.
Optionally, the identification module comprises:
the calling unit is used for calling the identification analysis service corresponding to the target coding type according to the target coding type corresponding to the one-hot coding sequence;
and the identification unit is used for identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
The identifier recognition device in the embodiment of the present invention can implement each process implemented by the method and achieve the same technical effect, and is not described herein again to avoid repetition.
The identification recognition device in the embodiment of the invention automatically recognizes the identification category to which the Internet of things identification code belongs through the sequence-to-sequence model, and further analyzes the Internet of things identification code by recognizing the identification category to which the obtained Internet of things identification code belongs, so that the automatic recognition of the heterogeneous Internet of things identification code is realized on the basis of not creating a new identification system and not changing the existing identification system, and the problem that the analysis and intercommunication of different codes cannot be realized at present is solved.
In addition, the scheme adopts a one-hot coding mode at a decoder end, namely, a unique shaping number is distributed for each identification category, and then the number is converted into an N-bit (N is the total number of identification categories supported by a sequence-to-sequence model) one-hot coding as an output sequence, so that the output sequence is actually a binary sequence only containing 0 and 1, and each element in the output sequence can be represented by a 2-dimensional one-hot vector at an embedded layer of the decoder, thereby ensuring that the decoding process is simpler.
As shown in fig. 11, an embodiment of the present invention further provides a terminal device 1100, including: a transceiver 1110 and a processor 1120;
the transceiver 1110 is configured to obtain an internet of things identification code;
the processor 1120 is configured to:
dividing characters in the Internet of things identification codes to obtain a first sequence;
inputting the first sequence into an encoder of a sequence model for encoding to obtain a semantic vector corresponding to the first sequence;
inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a one-hot coding sequence; the one-hot coding sequence is used for indicating the number of the target coding type to which the Internet of things identification code belongs;
and identifying the Internet of things identification code according to the target code type corresponding to the one-hot coding sequence.
Optionally, the processor 1120 is further configured to:
dividing the characters in the Internet of things identification codes according to each character to obtain a character sequence;
determining the character sequence as the first sequence.
Optionally, the processor 1120 is further configured to:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
taking the first one-hot vector as the input of a recurrent neural network unit in the encoder, and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processor 1120 is further configured to:
segmenting characters in the Internet of things identification codes according to n-grams to obtain a string sequence of the n-grams;
determining a sequence of n-gram strings as the first sequence.
Optionally, the processor 1120 is further configured to:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
taking the word vector as the input of a cyclic neural network unit in the encoder to obtain the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processor 1120 is further configured to:
inputting the semantic vector into a decoder of the sequence-to-sequence model, and converting the elements of each recurrent neural network unit input into the decoder into a second one-hot vector through an embedded layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the one-hot coding sequence.
Optionally, the processor 1120 is further configured to:
calling an identification analysis service corresponding to the target coding type according to the target coding type corresponding to the one-hot coding sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
The terminal device in the embodiment of the present invention can implement each process implemented by the method and achieve the same technical effect, and is not described herein again to avoid repetition.
According to the terminal equipment in the embodiment of the invention, the identification category to which the Internet of things identification code belongs is automatically identified through the sequence-to-sequence model, and the identification category to which the obtained Internet of things identification code belongs is identified to further analyze the Internet of things identification code, so that the automatic identification of the heterogeneous Internet of things identification code is realized on the basis of not creating a new identification system and not changing the existing identification system, and the problem that the analysis and intercommunication of different codes cannot be realized at present is solved.
In addition, the scheme adopts a one-hot coding mode at a decoder end, namely, a unique shaping number is distributed for each identification category, and then the number is converted into an N-bit (N is the total number of identification categories supported by a sequence-to-sequence model) one-hot coding as an output sequence, so that the output sequence is actually a binary sequence only containing 0 and 1, and each element in the output sequence can be represented by a 2-dimensional one-hot vector at an embedded layer of the decoder, thereby ensuring that the decoding process is simpler.
An embodiment of the present invention further provides an identifier recognition apparatus, including:
the acquisition module is used for acquiring the identification code of the Internet of things;
the processing module is used for segmenting characters in the Internet of things identification codes to obtain a first sequence;
the encoding module is used for inputting the first sequence into an encoder in a sequence model for encoding processing to obtain a semantic vector corresponding to the first sequence;
the decoding module is used for inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding processing to obtain a binary coding sequence; the binary coding sequence is used for indicating the number of the target coding category to which the Internet of things identification code belongs;
and the identification module is used for identifying the Internet of things identification code according to the target code category corresponding to the binary coding sequence.
Optionally, the processing module includes:
the first processing unit is used for segmenting the characters in the Internet of things identification code according to each character to obtain a character sequence;
a second processing unit for determining the character sequence as the first sequence.
Optionally, the encoding module comprises:
a first conversion unit, configured to input the first sequence to an encoder in a sequence-to-sequence model, and convert each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
the first coding unit is used for taking the first one-hot coding vector as the input of the recurrent neural network unit in the coder and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and the first determining unit is used for determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processing module includes:
the third processing unit is used for segmenting the characters in the Internet of things identification codes according to n-grams to obtain a string sequence of the n-grams;
a fourth processing unit configured to determine a sequence of n-grams as the first sequence.
Optionally, the encoding module comprises:
a second conversion unit, configured to input the first sequence to an encoder in a sequence-to-sequence model, and convert each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
the second coding unit is used for taking the word vector as the input of the cyclic neural network unit in the coder and obtaining the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and the second determining unit is used for determining the vector output by the last recurrent neural network unit as the semantic vector corresponding to the first sequence.
Optionally, the decoding module comprises:
a decoding unit, configured to input the semantic vector to a decoder in the sequence-to-sequence model, and convert an element of each recurrent neural network unit input to the decoder into a second one-hot vector through an embedded layer of the decoder;
and the third determining unit is used for determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the binary coding sequence.
Optionally, the identification module comprises:
the calling unit is used for calling the identification analysis service corresponding to the target coding type according to the target coding type corresponding to the binary coding sequence;
and the identification unit is used for identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
The identifier recognition device in the embodiment of the present invention can implement each process implemented by the method and achieve the same technical effect, and is not described herein again to avoid repetition.
According to the identification recognition device, the identification category to which the Internet of things identification code belongs is automatically recognized through the sequence-to-sequence model, and the identification category to which the obtained Internet of things identification code belongs is recognized to further analyze the Internet of things identification code, so that the automatic recognition of the heterogeneous Internet of things identification code is realized on the basis of not creating a new identification system and not changing the existing identification system, and the problem that the analysis and intercommunication of different codes cannot be realized at present is solved.
In addition, the scheme adopts a binary coding mode at a decoder end, and binary coding of a number corresponding to the identification category of the decoder end is used as an output sequence of the decoder end, so that the length of the output sequence of the decoder is equal to
Figure BDA0002765523320000361
(where N is the number of identification categories) thereby effectively reducing the length of the output sequence.
As shown in fig. 11, an embodiment of the present invention further provides a terminal device, including: a transceiver 1110 and a processor 1120;
the transceiver 1110 is configured to obtain an internet of things identification code;
the processor 1120 is configured to:
dividing characters in the Internet of things identification codes to obtain a first sequence;
inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence;
inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a binary coding sequence; the binary coding sequence is used for indicating the number of the target coding category to which the Internet of things identification code belongs;
and identifying the Internet of things identification code according to the target code category corresponding to the binary coding sequence.
Optionally, the processor 1120 is further configured to:
dividing the characters in the Internet of things identification codes according to each character to obtain a character sequence;
determining the character sequence as the first sequence.
Optionally, the processor 1120 is further configured to:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
taking the first one-hot coding vector as the input of a recurrent neural network unit in the coder, and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processor 1120 is further configured to:
segmenting characters in the Internet of things identification codes according to n-grams to obtain a string sequence of the n-grams;
determining a sequence of n-gram strings as the first sequence.
Optionally, the processor 1120 is further configured to:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
taking the word vector as the input of a cyclic neural network unit in the encoder, and obtaining the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
Optionally, the processor 1120 is further configured to:
inputting the semantic vector to a decoder in the sequence-to-sequence model, and converting elements of each recurrent neural network unit input to the decoder into a second one-hot vector through an embedding layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the binary coding sequence.
Optionally, the processor 1120 is further configured to:
calling an identification analysis service corresponding to the target code type according to the target code type corresponding to the binary code sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
The terminal device in the embodiment of the present invention can implement each process implemented by the method and achieve the same technical effect, and is not described herein again to avoid repetition.
According to the terminal equipment, the identification type of the Internet of things identification code is automatically identified through the sequence-to-sequence model, and the identification type of the Internet of things identification code is further analyzed, so that the automatic identification of the heterogeneous Internet of things identification code is realized on the basis of not creating a new identification system and not changing the existing identification system, and the problem that the analysis and intercommunication of different codes cannot be realized at present is solved.
In addition, the scheme adopts a binary coding mode at a decoder end, and binary coding of a number corresponding to the identification category of the decoder end is used as an output sequence of the decoder end, so that the length of the output sequence of the decoder is equal to
Figure BDA0002765523320000381
(where N is the number of identification categories) thereby effectively reducing the length of the output sequence.
An embodiment of the present invention further provides a terminal device, as shown in fig. 12, including a transceiver 1210, a processor 1200, a memory 1220, and a program or an instruction stored in the memory 1220 and executable on the processor 1200; the processor 1200 implements the steps applied to the identifier identifying method when executing the program or the instructions, and can achieve the same technical effect, and the steps are not described herein again to avoid repetition.
The transceiver 1210 for receiving and transmitting data under the control of the processor 1200.
Where in fig. 12, the bus architecture may include any number of interconnected buses and bridges, with various circuits of one or more processors represented by processor 1200 and memory represented by memory 1220 being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1210 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. The processor 1200 is responsible for managing the bus architecture and general processing, and the memory 1220 may store data used by the processor 1200 in performing operations.
An embodiment of the present invention further provides a terminal device, as shown in fig. 13, including a transceiver 1310, a processor 1300, a memory 1320, and a program or an instruction stored in the memory 1320 and executable on the processor 1300; the processor 1300 implements the steps applied to the above-mentioned identifier recognition method when executing the program or the instructions, and can achieve the same technical effect, and for avoiding repetition, the details are not repeated here.
The transceiver 1310 is used for receiving and transmitting data under the control of the processor 1300.
In fig. 13, among other things, the bus architecture may include any number of interconnected buses and bridges with various circuits being linked together, particularly one or more processors represented by processor 1300 and memory represented by memory 1320. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1310 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. User interface 1330 may also be an interface capable of interfacing with a desired device for different user devices, including but not limited to a keypad, display, speaker, microphone, joystick, etc.
The processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1320 may store data used by the processor 1300 in performing operations.
The embodiment of the present invention further provides a readable storage medium, on which a program or an instruction is stored, where the program or the instruction is executed by a processor to implement the steps of the above-mentioned identifier recognition method, and can achieve the same technical effects, and in order to avoid repetition, the detailed description is omitted here.
The processor is the processor in the terminal device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It is further noted that the terminals described in this specification include, but are not limited to, smart phones, tablets, etc., and that many of the functional components described are referred to as modules in order to more particularly emphasize their implementation independence.
In embodiments of the present invention, modules may be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions which may, for instance, be constructed as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different bits which, when joined logically together, comprise the module and achieve the stated purpose for the module.
Indeed, a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Likewise, operational data may be identified within the modules and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network.
When a module can be implemented by software, considering the level of existing hardware technology, a module implemented by software may build a corresponding hardware circuit to implement a corresponding function, without considering cost, and the hardware circuit may include a conventional Very Large Scale Integration (VLSI) circuit or a gate array and an existing semiconductor such as a logic chip, a transistor, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.
The exemplary embodiments described above are described with reference to the drawings, and many different forms and embodiments of the invention may be made without departing from the spirit and teaching of the invention, therefore, the invention is not to be construed as limited to the exemplary embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. In the drawings, the size and relative sizes of elements may be exaggerated for clarity. The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Unless otherwise indicated, a range of values, when stated, includes the upper and lower limits of the range and any subranges therebetween.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (20)

1. An identification recognition method, comprising:
acquiring an Internet of things identification code;
dividing characters in the Internet of things identification codes to obtain a first sequence;
inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence;
inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a unique one-hot coding sequence; the one-hot coding sequence is used for indicating the number of the target coding type to which the Internet of things identification code belongs;
and identifying the Internet of things identification code according to the target code type corresponding to the one-hot coding sequence.
2. The identifier recognition method according to claim 1, wherein the dividing the characters in the internet of things identifier code to obtain a first sequence comprises:
dividing the characters in the Internet of things identification codes according to each character to obtain a character sequence;
determining the character sequence as the first sequence.
3. The method according to claim 2, wherein the inputting the first sequence into an encoder in a sequence model for encoding to obtain the semantic vector corresponding to the first sequence comprises:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
taking the first one-hot vector as the input of a recurrent neural network unit in the encoder, and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
4. The identifier recognition method according to claim 1, wherein the segmenting the characters in the internet of things identifier code to obtain a first sequence comprises:
segmenting characters in the Internet of things identification codes according to a Chinese language model n-gram to obtain a n-gram string sequence;
determining a sequence of n-gram strings as the first sequence.
5. The method according to claim 4, wherein the inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence comprises:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each n-gram string in the first sequence into a word vector through an embedded layer of the encoder;
taking the word vector as the input of a cyclic neural network unit in the encoder to obtain the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
6. The method according to claim 1, wherein the inputting the semantic vector into a decoder in the sequence-to-sequence model for decoding to obtain a unique one-hot encoded sequence comprises:
inputting the semantic vector into a decoder of the sequence-to-sequence model, and converting the elements of each recurrent neural network unit input into the decoder into a second one-hot vector through an embedded layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the one-hot coding sequence.
7. The identification method according to claim 1, wherein the identifying the internet of things identification code according to the target code category corresponding to the one-hot coding sequence comprises:
calling an identification analysis service corresponding to the target coding type according to the target coding type corresponding to the one-hot coding sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
8. An identification recognition method, comprising:
acquiring an Internet of things identification code;
dividing characters in the Internet of things identification codes to obtain a first sequence;
inputting the first sequence into an encoder of a sequence model for encoding to obtain a semantic vector corresponding to the first sequence;
inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a binary coding sequence; the binary coding sequence is used for indicating the number of the target coding category to which the Internet of things identification code belongs;
and identifying the Internet of things identification code according to the target code category corresponding to the binary coding sequence.
9. The identifier recognition method according to claim 8, wherein the dividing the characters in the internet of things identifier code to obtain a first sequence comprises:
dividing the characters in the Internet of things identification codes according to each character to obtain a character sequence;
determining the character sequence as the first sequence.
10. The method according to claim 9, wherein the inputting the first sequence into an encoder in a sequence model for encoding to obtain a semantic vector corresponding to the first sequence comprises:
inputting the first sequence into an encoder of a sequence-to-sequence model, and converting each character in the first sequence into a first one-hot vector through an embedded layer of the encoder;
taking the first one-hot coding vector as the input of a recurrent neural network unit in the coder, and obtaining the vector output by the last recurrent neural network unit; inputting a first one-hot vector corresponding to one character into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
11. The identifier recognition method according to claim 8, wherein the dividing the characters in the internet of things identifier code to obtain a first sequence comprises:
segmenting characters in the Internet of things identification codes according to a Chinese language model n-gram to obtain a n-gram string sequence;
determining a sequence of n-gram strings as the first sequence.
12. The method according to claim 11, wherein the inputting the first sequence into an encoder in a sequence model for encoding to obtain the semantic vector corresponding to the first sequence comprises:
inputting the first sequence to an encoder in a sequence-to-sequence model, and converting each n-gram string in the first sequence into a word vector by an embedding layer of the encoder;
taking the word vector as the input of a cyclic neural network unit in the encoder, and obtaining the vector output by the last cyclic neural network unit; inputting a word vector corresponding to an n-gram word string into a recurrent neural network unit in the encoder;
and determining the vector output by the last recurrent neural network unit as a semantic vector corresponding to the first sequence.
13. The method according to claim 8, wherein the inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a binary coded sequence comprises:
inputting the semantic vector into a decoder of the sequence-to-sequence model, and converting the elements of each recurrent neural network unit input into the decoder into a second one-hot vector through an embedded layer of the decoder;
and determining a sequence formed by output elements of all the recurrent neural network units in the decoder as the binary coding sequence.
14. The identification recognition method according to claim 8, wherein the recognition processing of the internet of things identification code according to the target code category corresponding to the binary code sequence comprises:
calling an identification analysis service corresponding to the target code type according to the target code type corresponding to the binary code sequence;
and identifying the description information of the internet-of-things object corresponding to the internet-of-things identification code through the identification analysis service corresponding to the target code type.
15. An identification recognizing apparatus, comprising:
the acquisition module is used for acquiring the identification code of the Internet of things;
the processing module is used for segmenting characters in the Internet of things identification codes to obtain a first sequence;
the encoding module is used for inputting the first sequence into an encoder in a sequence model for encoding processing to obtain a semantic vector corresponding to the first sequence;
the decoding module is used for inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding processing to obtain a unique one-hot coded sequence; the one-hot coding sequence is used for indicating the number of the target coding type to which the Internet of things identification code belongs;
and the identification module is used for identifying the Internet of things identification code according to the target code category corresponding to the one-hot coding sequence.
16. A terminal device, comprising: a transceiver and a processor;
the transceiver is used for acquiring an Internet of things identification code;
the processor is configured to:
dividing characters in the Internet of things identification codes to obtain a first sequence;
inputting the first sequence into an encoder of a sequence model for encoding to obtain a semantic vector corresponding to the first sequence;
inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a unique one-hot coding sequence; the one-hot coding sequence is used for indicating the number of the target coding type to which the Internet of things identification code belongs;
and identifying the Internet of things identification code according to the target code type corresponding to the one-hot coding sequence.
17. An identification recognizing apparatus, comprising:
the acquisition module is used for acquiring the identification code of the Internet of things;
the processing module is used for segmenting characters in the Internet of things identification codes to obtain a first sequence;
the encoding module is used for inputting the first sequence into an encoder in a sequence model for encoding processing to obtain a semantic vector corresponding to the first sequence;
the decoding module is used for inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding processing to obtain a binary coding sequence; the binary coding sequence is used for indicating the number of the target coding category to which the Internet of things identification code belongs;
and the identification module is used for identifying the Internet of things identification code according to the target code category corresponding to the binary coding sequence.
18. A terminal device, comprising: a transceiver and a processor;
the transceiver is used for acquiring an Internet of things identification code;
the processor is configured to:
dividing characters in the Internet of things identification codes to obtain a first sequence;
inputting the first sequence into an encoder of a sequence model for encoding to obtain a semantic vector corresponding to the first sequence;
inputting the semantic vector into a decoder of the sequence-to-sequence model for decoding to obtain a binary coding sequence; the binary coding sequence is used for indicating the number of the target coding category to which the Internet of things identification code belongs;
and identifying the Internet of things identification code according to the target code category corresponding to the binary coding sequence.
19. A terminal device, comprising: a transceiver, a processor, a memory, and a program or instructions stored on the memory and executable on the processor; characterized in that the processor, when executing the program or instructions, carries out the steps of the identity recognition method according to any one of claims 1 to 14.
20. A readable storage medium on which a program or instructions are stored, which program or instructions, when executed by a processor, carry out the steps of the identity recognition method of any one of claims 1 to 14.
CN202011232011.4A 2020-11-06 2020-11-06 Identification recognition method and device and terminal equipment Pending CN114444443A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011232011.4A CN114444443A (en) 2020-11-06 2020-11-06 Identification recognition method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011232011.4A CN114444443A (en) 2020-11-06 2020-11-06 Identification recognition method and device and terminal equipment

Publications (1)

Publication Number Publication Date
CN114444443A true CN114444443A (en) 2022-05-06

Family

ID=81360636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011232011.4A Pending CN114444443A (en) 2020-11-06 2020-11-06 Identification recognition method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN114444443A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117978366A (en) * 2024-03-26 2024-05-03 杭州三一谦成科技有限公司 Vehicle information query system based on Internet of things

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117978366A (en) * 2024-03-26 2024-05-03 杭州三一谦成科技有限公司 Vehicle information query system based on Internet of things

Similar Documents

Publication Publication Date Title
CN103455574B (en) The method and apparatus of Internet of Things unifying identifier coding multimode recognition
CN112257417A (en) Multi-task named entity recognition training method, medium and terminal
CN111401486A (en) Identification method and device of Internet of things identifier and terminal equipment
CN115526236A (en) Text network graph classification method based on multi-modal comparative learning
CN103761532B (en) Label space dimensionality reducing method and system based on feature-related implicit coding
CN110457459A (en) Dialog generation method, device, equipment and storage medium based on artificial intelligence
CN112084752A (en) Statement marking method, device, equipment and storage medium based on natural language
CN106953841A (en) Enter customer service with Quick Response Code and carry out identity authentication method
CN112380238A (en) Database data query method and device, electronic equipment and storage medium
CN114020272A (en) Serialized encoding and decoding methods and devices and electronic equipment
CN116978011A (en) Image semantic communication method and system for intelligent target recognition
CN114444443A (en) Identification recognition method and device and terminal equipment
CN112598039A (en) Method for acquiring positive sample in NLP classification field and related equipment
CN111949720A (en) Data analysis method based on big data and artificial intelligence and cloud data server
CN116775875A (en) Question corpus construction method and device, question answering method and device and storage medium
CN115238009A (en) Metadata management method, device and equipment based on blood vessel margin analysis and storage medium
CN115099359A (en) Address recognition method, device, equipment and storage medium based on artificial intelligence
CN114861666A (en) Entity classification model training method and device and computer readable storage medium
CN111199259B (en) Identification conversion method, device and computer readable storage medium
CN112396111A (en) Text intention classification method and device, computer equipment and storage medium
Li et al. CCAH: A CLIP‐Based Cycle Alignment Hashing Method for Unsupervised Vision‐Text Retrieval
CN112416354A (en) Code readability assessment method based on multi-dimensional features and hybrid neural network
CN115880550A (en) Identification, device and equipment of Internet of things identification
CN113434657B (en) E-commerce customer service response method and corresponding device, equipment and medium thereof
CN114415829B (en) Cross-platform equipment universal interface implementation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination