CN111199259B - Identification conversion method, device and computer readable storage medium - Google Patents

Identification conversion method, device and computer readable storage medium Download PDF

Info

Publication number
CN111199259B
CN111199259B CN201811375603.4A CN201811375603A CN111199259B CN 111199259 B CN111199259 B CN 111199259B CN 201811375603 A CN201811375603 A CN 201811375603A CN 111199259 B CN111199259 B CN 111199259B
Authority
CN
China
Prior art keywords
identifier
identification
text
semantic
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811375603.4A
Other languages
Chinese (zh)
Other versions
CN111199259A (en
Inventor
杨震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201811375603.4A priority Critical patent/CN111199259B/en
Publication of CN111199259A publication Critical patent/CN111199259A/en
Application granted granted Critical
Publication of CN111199259B publication Critical patent/CN111199259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K17/00Methods or arrangements for effecting co-operative working between equipments covered by two or more of main groups G06K1/00 - G06K15/00, e.g. automatic card files incorporating conveying and reading operations

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an identification conversion method, an identification conversion device and a computer readable storage medium, and relates to the field of data processing. The identification conversion method comprises the following steps: obtaining keywords in the first identifier according to a conversion rule of a first identifier system to which the first identifier belongs; converting the first identifier into semantic identifiers according to a pre-established semantic identifier system, wherein the semantic identifier corresponding to the first identifier comprises a subject word to which each keyword in the first identifier belongs, each identifier in the semantic identifier system comprises one or more identifier fields, and each identifier field corresponds to one or more subject words; obtaining a second identifier corresponding to the semantic identifier, wherein the second identifier belongs to a second identifier system; and establishing a mapping relation between the first identifier and the second identifier. The method can correlate the same object under different identification systems, realizes the integration of information of the Internet of things, the industrial Internet and the Internet, and lays a foundation for more business applications and artificial intelligence applications in the future.

Description

Identification conversion method, device and computer readable storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a method and apparatus for converting an identifier, and a computer readable storage medium.
Background
The development of the internet of things needs to solve two basic problems: one is the communication technical problem of everything interconnection; another is the value discovery problem of networked objects. The second problem is the core of the development of the internet of things, namely why the internet of things is interconnected or what value the internet of things can generate. The intelligent manufacturing and industrial Internet are important components of the Internet of things, and along with the change of the world economic patterns, the world important economic bodies represented by the middle, the Mains and the Germans are greatly put into the field, and the advanced technology is expected to improve the traditional manufacturing industry, the production efficiency and the basic manufacturing industry competitiveness of the home country.
One of the key technologies and application directions of the industrial Internet in the future, such as the understanding and application of various fracture data related to intelligent manufacturing. And understanding the data requires knowledge of the source, flow process, use, etc. of the data. Currently, one of the tasks expected to be met is the identification resolution technique. Therefore, the identification resolution technology is increasingly paid attention to among all the participants of the industrial internet.
The identifier may be understood as a name label for identifying different objects, entities, and objects of the internet of things, and may be a character string composed of numbers, letters, symbols, characters, and the like with a certain rule. Currently, the mainstream identification technologies include Handle (name service system), OID (Object Identifier ), ecode (Entity Code for IOT, national internet of things identification system), EPC (Electronic Product Code ), UCode (electronic tag), and the like, which are respectively proposed by different organizations, and the starting points are the functions of uniquely marking and providing information query for object objects, digital objects, and the like, so as to develop an underlying information architecture.
The current identification analysis technology can find out the corresponding object through the corresponding protocol. In specific business applications, such as tracing, tracing and the like, the current state of the article can be obtained, and the requirements of related management and the like are further met.
However, as different identification resolution technology systems exist, the application between the systems has a difficult problem; meanwhile, the identification technology generally only provides codes, and the information understanding and application support corresponding to the articles are insufficient; in addition, semantic expressions on the internet are different from those of industries, and semantic expressions for the same article are also different from industry to industry. Therefore, the identification analysis technology in the related art has difficulty in associating the same object under different identification systems.
Disclosure of Invention
One technical problem to be solved by the embodiment of the invention is as follows: how to associate the same object under different identification systems.
According to a first aspect of some embodiments of the present invention, there is provided an identification conversion method, including: obtaining keywords in the first identifier according to a conversion rule of a first identifier system to which the first identifier belongs; converting the first identifier into semantic identifiers according to a pre-established semantic identifier system, wherein the semantic identifier corresponding to the first identifier comprises a subject word to which each keyword in the first identifier belongs, each identifier in the semantic identifier system comprises one or more identifier fields, and each identifier field corresponds to one or more subject words; obtaining a second identifier corresponding to the semantic identifier, wherein the second identifier belongs to a second identifier system; and establishing a mapping relation between the first identifier and the second identifier.
In some embodiments, obtaining the second identifier corresponding to the semantic identifier includes: obtaining keywords in the second identifier according to the conversion rule of the second identifier system to which the second identifier belongs; and converting the second identifier into a semantic identifier according to a pre-established semantic identifier system, wherein the semantic identifier corresponding to the second identifier comprises a subject term to which each keyword in the second identifier belongs.
In some embodiments, the semantic identification hierarchy includes topic distribution information under each category and word distribution information under each topic; according to a pre-established semantic identification system, converting the first identification into the semantic identification comprises: according to the topic distribution information under the category of the first identifier and the word distribution information under each topic, determining the topic word to which each keyword in the first identifier belongs; and constructing a semantic identifier based on the subject term to which each keyword in the first identifier belongs.
In some embodiments, the identity transformation method further comprises: mapping texts in a text library to a vector space and generating text feature data, wherein feature items of the vector space comprise words in the text library, and the values of the feature items are weights of the words; predicting the category of the text characteristic data by adopting a pre-trained classification model to obtain the category of the text in the text library; and performing topic analysis on texts in the same category in the text library to obtain topic distribution information in the same category in the semantic identification system and word distribution information in each topic.
In some embodiments, the identity transformation method further comprises: constructing a vector space by taking words in training texts as feature items, wherein each feature item has a weight, and each training text has a category marked in advance; mapping the training text to a vector space to generate training data; training a machine learning model by using training data and pre-marked categories corresponding to the training data to obtain a classification model.
In some embodiments, the training text is a standard description text, the standard description text including internet of things text; the identification conversion method further comprises the following steps: performing topic analysis on the supplementary description text to construct a key feature set comprising topics in the supplementary description text, wherein the supplementary description text comprises internet text; the weights of feature items belonging to the key feature set in the vector space are increased.
In some embodiments, performing a topic analysis on the supplemental descriptive text, constructing a set of key features that includes topics in the supplemental descriptive text includes: performing topic analysis on the training text and the supplementary description text to obtain topics in the training text and the supplementary description text and words under each topic, and distribution information of each word in the training text and the supplementary description text; extracting classification keywords from the standard description text; and constructing a key feature set by adopting words with differences between the distribution information in the supplementary description text and the distribution information of the classification keywords within a preset degree and the classification keywords.
In some embodiments, the identity transformation method further comprises: and adding the words in the key feature set into corresponding topic sequences in the semantic identification system according to topics to which the words in the key feature set belong, wherein the topic sequences represent one or more topics and a relation set of the words under the corresponding topics.
According to a second aspect of some embodiments of the present invention, there is provided a semantic identification conversion apparatus comprising: the first identification keyword obtaining module is configured to obtain keywords in the first identification according to a conversion rule of a first identification system to which the first identification belongs; the semantic identification conversion module is configured to convert the first identification into semantic identifications according to a pre-established semantic identification system, wherein the semantic identifications corresponding to the first identifications comprise subject matters to which each keyword in the first identifications belongs, each identification in the semantic identification system comprises one or more identification fields, and each identification field corresponds to one or more subject matters; the second identifier obtaining module is configured to obtain a second identifier corresponding to the semantic identifier, wherein the second identifier belongs to a second identifier system; and the mapping relation establishing module is configured to establish a mapping relation between the first identifier and the second identifier.
In some embodiments, the second identifier obtaining module is further configured to obtain the keyword in the second identifier according to a conversion rule of a second identifier system to which the second identifier belongs; and converting the second identifier into a semantic identifier according to a pre-established semantic identifier system, wherein the semantic identifier corresponding to the second identifier comprises a subject term to which each keyword in the second identifier belongs.
In some embodiments, the semantic identification hierarchy includes topic distribution information under each category and word distribution information under each topic; the semantic identifier conversion module is further configured to determine a subject term to which each keyword in the first identifier belongs according to the topic distribution information under the category of the first identifier and the word distribution information under each topic; and constructing a semantic identifier based on the subject term to which each keyword in the first identifier belongs.
In some embodiments, the identification conversion device further comprises: the semantic identification system construction module is configured to map texts in the text library to a vector space and generate text feature data, wherein feature items of the vector space comprise words in the text library, and the values of the feature items are weights of the words; predicting the category of the text characteristic data by adopting a pre-trained classification model to obtain the category of the text in the text library; and performing topic analysis on texts in the same category in the text library to obtain topic distribution information in the same category in the semantic identification system and word distribution information in each topic.
In some embodiments, the identification conversion device further comprises: a classification model training module configured to construct a vector space using words in training texts as feature items, wherein each feature item has a weight, and each training text has a pre-labeled class; mapping the training text to a vector space to generate training data; training a machine learning model by using training data and pre-marked categories corresponding to the training data to obtain a classification model.
In some embodiments, the training text is a standard description text, the standard description text including internet of things text; the identification conversion device further includes: the feature item weight updating module is configured to perform topic analysis on the supplementary description text and construct a key feature set comprising topics in the supplementary description text, wherein the supplementary description text comprises internet text; the weights of feature items belonging to the key feature set in the vector space are increased.
In some embodiments, the feature item weight updating module is further configured to perform topic analysis on the training text and the supplemental description text to obtain topics in the training text and the supplemental description text and terms under each topic, and distribution information of each term in the training text and the supplemental description text; extracting classification keywords from the standard description text; and constructing a key feature set by adopting words with differences between the distribution information in the supplementary description text and the distribution information of the classification keywords within a preset degree and the classification keywords.
In some embodiments, the identification conversion device further comprises: the semantic identification system expansion module is configured to add the words in the key feature set to corresponding topic sequences in the semantic identification system according to topics to which the words in the key feature set belong, wherein the topic sequences represent one or more topics and a relation set of words under the corresponding topics.
According to a third aspect of some embodiments of the present invention, there is provided an identification conversion device, comprising: a memory; and a processor coupled to the memory, the processor configured to perform any of the aforementioned identification conversion methods based on instructions stored in the memory.
According to a fourth aspect of some embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements any one of the aforementioned identification conversion methods.
Some of the embodiments of the above invention have the following advantages or benefits: according to the embodiment of the invention, semantic analysis can be performed according to text information directly corresponding to the first identifier and the second identifier, so that the semantic identifier corresponding to the first identifier and the second identifier corresponding to the semantic identifier are obtained, and the association between the first identifier and the second identifier is realized by means of the semantic identifier. Therefore, the same object under different identification systems can be associated, fusion of the information of the Internet of things, the industrial Internet and the Internet is realized, and a foundation is laid for more business applications and artificial intelligence applications in the future.
Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a flow chart of a method of identity transformation according to some embodiments of the present invention.
Fig. 2 is a flow chart of a semantic identification conversion method according to some embodiments of the present invention.
Fig. 3 is a flow chart of a semantic identification hierarchy construction method according to some embodiments of the present invention.
FIG. 4 is a flow chart of a classification model training method according to some embodiments of the invention.
Fig. 5 is a flow chart of a vector space construction method according to some embodiments of the invention.
Fig. 6 is a flow chart of a key feature set construction method according to some embodiments of the invention.
Fig. 7 is a schematic structural view of a label switching device according to some embodiments of the present invention.
Fig. 8 is a schematic structural view of a label switching device according to other embodiments of the present invention.
Fig. 9 is a schematic structural view of a label switching device according to still other embodiments of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Fig. 1 is a flow chart of a method of identity transformation according to some embodiments of the present invention. As shown in fig. 1, the identification conversion method of this embodiment includes steps S102 to S108.
In step S102, the keywords in the first identifier are obtained according to the conversion rule of the first identifier system to which the first identifier belongs.
For example, the first identifier system specifies that the first two digits of the identifier represent country codes, the 3 rd to 6 th digits represent product name codes, the 7 th to 8 th digits represent model codes, and so on, and specifies the meaning of different numbers in each code, and the keywords corresponding to the values of the respective codes in the first identifier can be obtained according to these rules.
In step S104, the first identifier is converted into a semantic identifier according to a pre-established semantic identifier system. The semantic identifications corresponding to the first identifications comprise subject words to which each keyword in the first identifications belongs.
The semantic identification system is an identification expressed by natural language. Each identifier in the semantic identifier hierarchy includes one or more identifier fields, each identifier field corresponding to one or more subject terms. The semantic identification system can be constructed by adopting files in a text library in advance.
In step S106, a second identifier corresponding to the semantic identifier is obtained, where the second identifier belongs to a second identifier system.
The second identification system is of a different identification system than the first identification system. The method of converting the second identification hierarchy into semantic identifications may refer to the method of converting the first identification hierarchy into semantic identifications.
In some embodiments, the keywords in the second identifier may be obtained according to a conversion rule of a second identifier system to which the second identifier belongs; and converting the second identifier into a semantic identifier according to a pre-established semantic identifier system, wherein the semantic identifier corresponding to the second identifier comprises a subject term to which each keyword in the second identifier belongs.
In step S108, a mapping relationship between the first identifier and the second identifier is established. Through the mapping relation, the mapping of the same entity in different identification systems can be completed. The mapping relationship may include a mapping relationship of the identifier or the code itself, and may further include a correspondence relationship between semantics associated with the identifier or the code.
The first identification system and the second identification system may be Handle, OID, ecode, EPC, UCode, for example, or may refer to any other coded identification system. It will be clear to a person skilled in the art that "first", "second" in the first identification system and the second identification system are only used to distinguish between the different identification systems and do not have any limiting effect on the invention.
Through the method of the embodiment, semantic analysis can be performed according to text information directly corresponding to the first identifier and the second identifier, so that the semantic identifier corresponding to the first identifier and the second identifier corresponding to the semantic identifier are obtained, and the association between the first identifier and the second identifier is realized by means of the semantic identifier. Therefore, the same object under different identification systems can be associated, fusion of the information of the Internet of things, the industrial Internet and the Internet is realized, and a foundation is laid for more business applications and artificial intelligence applications in the future.
The semantic identification system comprises topic distribution information under each category and word distribution information under each topic. By means of this information, a conversion process of the keywords in the first identifier or the second identifier can be realized. An embodiment of the semantic identity transformation method of the present invention is described below with reference to fig. 2.
Fig. 2 is a flow chart of a semantic identification conversion method according to some embodiments of the present invention. As shown in fig. 2, the semantic identification conversion method of this embodiment includes steps S202 to S204.
In step S202, a subject term to which each keyword in the first identifier belongs is determined according to the topic distribution information under the category of the first identifier and the word distribution information under each topic.
In some embodiments, the distribution information indicates a probability of occurrence. That is, each topic under each category has a corresponding probability of occurrence, and each term under each topic also has a corresponding probability of occurrence. After determining the category of the first identifier, for example, it is possible to check which topic under the category the keyword in the first identifier has the highest occurrence probability; for another example, among the plurality of topics to which the keyword belongs, a topic having the highest occurrence probability under the category may be viewed. These selected topics can then be used as constituent elements of the semantic identification.
In step S204, a semantic identifier is constructed based on the subject word to which each keyword in the first identifier belongs.
By the method of the embodiment, the topic with the strongest association with the keyword in the first identifier can be determined in a semantic analysis mode, so that the accuracy of converting the first identifier into the semantic identifier can be improved.
The invention can pre-construct a semantic identification system according to the collected data. An embodiment of the semantic identification system construction method of the present invention is described below with reference to fig. 3.
Fig. 3 is a flow chart of a semantic identification hierarchy construction method according to some embodiments of the present invention. As shown in fig. 3, the semantic identification system construction method of this embodiment includes steps S302 to S306.
In step S302, the text in the text library is mapped to a vector space, and text feature data is generated, wherein feature items in the vector space include words in the text library, and values of the feature items are weights of the words. The vector space may be pre-built, for example, from training data, which may be a subset of the text library, or intersect the text library.
In step S304, the category of the text feature data is predicted using a pre-trained classification model, and the category of the text in the text library is obtained.
In step S306, the subject analysis is performed on the texts of the same category in the text library, so as to obtain the subject distribution information of the same category in the semantic identification system and the word distribution information of each subject. In the topic analysis, for example, an LDA (Latent Dirichlet Allocation, implicit dirichlet topic distribution) algorithm may be used.
Through the method of the embodiment, the text can be classified by taking the words in the text as the characteristic items, so that the accuracy of text classification can be improved, the topic distribution and the word distribution under each category can be obtained more accurately, and the accuracy of identification conversion is improved.
In some embodiments, the classification model may also be trained in advance. An embodiment of the classification model training method of the present invention is described below with reference to fig. 4.
FIG. 4 is a flow chart of a classification model training method according to some embodiments of the invention. As shown in fig. 4, the classification model training method of this embodiment includes steps S402 to S406.
In step S402, a vector space is constructed using words in the training texts as feature items, where each feature item has a weight, and each training text has a category marked in advance.
In some embodiments, the training text may be presented in a training textThe text is subjected to word segmentation processing, stop words such as punctuation marks, auxiliary words, mood words and the like are removed, and a word bag model is constructed. In order to further improve the classification accuracy and the calculation efficiency, a dimension reduction process may be performed, for example, a feature selection algorithm may be adopted. In some embodiments, word frequency, information gain, χ may be utilized 2 Statistics, mutual information, expected cross entropy, etc.
Taking feature selection according to expected cross entropy as an example, let feature t and class c i X conforming to one degree of freedom 2 Distribution, then the feature term word t is for category c i Is χ of (2) 2 The statistical formula definition is shown in formula (1).
Figure BDA0001870640010000101
In formula (1), A represents a group containing t and belonging to c i B represents a document frequency containing t but not c i Is C, C means that the document frequency belongs to C i But does not contain t, D indicates that it is neither c i Nor t, and N represents the total number of documents. Selecting χ in each category 2 (t,c i ) The value is preceded by a predetermined percentage (e.g., the first 20-50%) of the feature term and a new bag of words model is formed. Therefore, dimension reduction processing of the bag-of-words model can be realized, and feature items in the vector space are determined.
The weight of the words can be preset and can be determined according to word frequency information or TF-IDF and other calculation models. The calculation method of the TF-IDF algorithm can refer to formula (2), for example.
Figure BDA0001870640010000102
In formula (2), d j To train the j text in the text, t i Is text d j Is the ith feature term, tf (t i ,d j ) Representing characteristic term t i In text d j In (c), N is the total number of texts, N (t) i ) Is a bagContaining characteristic term t i Is a text number of (c). Then, normalization processing may be performed as shown in formula (3).
Figure BDA0001870640010000103
In formula (3), w ij The weight of each feature term in the vector space is referred to. M is the dimension of the vector space and is also the feature total. Thus, the weight of each feature item in the vector space is obtained.
In step S404, training text is mapped to a vector space, and training data is generated.
In step S406, the machine learning model is trained using the training data and the pre-labeled class corresponding to the training data, and a classification model is obtained.
In some embodiments, an SVM (Support Vector Machine ) model may be employed. The generalization error of the SVM is small, and the sensitivity to noise is low, so that the prediction accuracy of the classification model can be improved.
By the method, the vector space can be pre-constructed according to the training data, and the classification model can be obtained, so that the accuracy of text classification can be improved, and the accuracy of identification conversion is improved.
In some embodiments, the training text may be standard expression text, such as internet of things text, including text with standard descriptions of product specifications, product specification documents, and the like, and may also include text in internet text describing standards. Although the expressions in the training text are refined and accurate, the weight of the feature items in the vector space can be updated by using the analysis result of the internet text data because the data in the internet is richer. An embodiment of the vector space construction method of the present invention is described below with reference to fig. 5.
Fig. 5 is a flow chart of a vector space construction method according to some embodiments of the invention. As shown in fig. 5, the vector space construction method of this embodiment includes steps S502 to S506.
In step S502, a vector space is constructed using words in the training texts as feature items, where each feature item has a weight, and each training text has a category that is labeled in advance. The training text is a standard description text, and the standard description text comprises an internet of things text.
In step S504, a topic analysis is performed on the supplemental descriptive text, and a set of key features is constructed that includes topics in the supplemental descriptive text, wherein the supplemental descriptive text includes internet text. The set of key features may include, for example, features that have a greater degree of impact on the classification results of the supplemental descriptive text.
In step S506, the weights of the feature items belonging to the key feature set in the vector space are increased. For example, the weight update of the feature term may be accomplished using equation (4).
w new =w+λ·u (4)
In formula (4), w new For the weight of the updated feature item, w is the weight of the feature item before updating, u is the mean value of the non-zero items in the vector space, and lambda is the adjustment parameter.
By the method, the vector space can be optimized by means of the supplementary description text, so that the accuracy of text classification is improved, and the accuracy of identification conversion is further improved.
The method of the above embodiment may further include step S508. In step S508, according to the topics to which the words in the key feature set belong, the words in the key feature set are added to corresponding topic sequences in the semantic identification system, where the topic sequences represent one or more topics and a set of relationships of words under the corresponding topics. The relationships between these topics or words may be, for example, tree relationships, mesh relationships, and the like. Therefore, the semantic identification system can be further expanded by adopting the supplementary description text such as the Internet information, so that the semantic identification system can be continuously updated along with the increase of the information, and the accuracy of semantic identification conversion is improved.
An embodiment of the present invention for constructing a key feature set is described below with reference to fig. 6.
Fig. 6 is a flow chart of a key feature set construction method according to some embodiments of the invention. As shown in fig. 6, the key feature set construction method of this embodiment includes steps S602 to S606.
In step S602, topic analysis is performed on the training text and the supplementary description text, so as to obtain topics in the training text and the supplementary description text and words under each topic, and distribution information of each word in the training text and the supplementary description text.
In some embodiments, the text may be subjected to implicit topic analysis using an LDA algorithm to obtain the probability distribution of the vocabulary in topic k
Figure BDA0001870640010000121
Wherein k is E [1, K]K is the total number of topics and the topic distribution theta of the nth text n Wherein n is E [1, N]N is the total number of documents. />
Figure BDA0001870640010000122
And theta n Obeying the Dirichlet distribution. The Gibbs sampling estimation parameter can be used>
Figure BDA0001870640010000123
And theta n The K value can be determined by means of a model's perplexity index.
In step S604, classification keywords are extracted from the standard description text. The classification keywords may be, for example, class tag names that embody a generalized description of the text of the class and are most distinct from other classes.
In step S606, a keyword feature set is constructed using the words and the classification keywords in which the differences between the distribution information in the supplementary description text and the distribution information of the classification keywords are within a preset degree.
When the distribution information of two words is close, meaning of two words is also closer. Therefore, the characteristic items with larger influence on the classification result can be extracted, and the weight of the characteristic items is improved, so that the classification accuracy can be further improved.
An embodiment of the identification conversion device of the present invention is described below with reference to fig. 7.
Fig. 7 is a schematic structural view of a label switching device according to some embodiments of the present invention. As shown in fig. 7, the identification converting apparatus 70 of this embodiment includes: the first identifier keyword obtaining module 710 is configured to obtain keywords in the first identifier according to a conversion rule of a first identifier system to which the first identifier belongs; the semantic identifier conversion module 720 is configured to convert the first identifier into semantic identifiers according to a pre-established semantic identifier system, wherein the semantic identifier corresponding to the first identifier comprises a subject word to which each keyword in the first identifier belongs, each identifier in the semantic identifier system comprises one or more identifier fields, and each identifier field corresponds to one or more subject words; a second identifier obtaining module 730 configured to obtain a second identifier corresponding to the semantic identifier, where the second identifier belongs to a second identifier hierarchy; the mapping relationship establishing module 740 is configured to establish a mapping relationship between the first identifier and the second identifier.
In some embodiments, the second identifier obtaining module 730 is further configured to obtain the keyword in the second identifier according to a conversion rule of a second identifier system to which the second identifier belongs; and converting the second identifier into a semantic identifier according to a pre-established semantic identifier system, wherein the semantic identifier corresponding to the second identifier comprises a subject term to which each keyword in the second identifier belongs.
In some embodiments, the semantic identification hierarchy includes topic distribution information under each category and word distribution information under each topic; the semantic identifier conversion module 720 is further configured to determine, according to the topic distribution information under the category of the first identifier and the word distribution information under each topic, a subject word to which each keyword in the first identifier belongs; and constructing a semantic identifier based on the subject term to which each keyword in the first identifier belongs.
In some embodiments, the identity transformation means 70 further comprises: the semantic identification system construction module 750 is configured to map the text in the text library to a vector space and generate text feature data, wherein feature items of the vector space comprise words in the text library, and the values of the feature items are weights of the words; predicting the category of the text characteristic data by adopting a pre-trained classification model to obtain the category of the text in the text library; and performing topic analysis on texts in the same category in the text library to obtain topic distribution information in the same category in the semantic identification system and word distribution information in each topic.
In some embodiments, the identity transformation means 70 further comprises: a classification model training module 760 configured to construct a vector space using words in training texts as feature items, wherein each feature item has a weight, and each training text has a pre-labeled class; mapping the training text to a vector space to generate training data; training a machine learning model by using training data and pre-marked categories corresponding to the training data to obtain a classification model.
In some embodiments, the training text is a standard description text, the standard description text including internet of things text; the identification conversion device 70 further includes: the feature item weight update module 770 is configured to perform topic analysis on the supplemental descriptive text, building a key feature set comprising topics in the supplemental descriptive text, wherein the supplemental descriptive text comprises internet text; the weights of feature items belonging to the key feature set in the vector space are increased.
In some embodiments, the feature item weight update module 770 is further configured to perform a topic analysis on the training text and the supplemental description text to obtain topics in the training text and the supplemental description text and terms under each topic, and distribution information for each term in the training text and the supplemental description text; extracting classification keywords from the standard description text; and constructing a key feature set by adopting words with differences between the distribution information in the supplementary description text and the distribution information of the classification keywords within a preset degree and the classification keywords.
In some embodiments, the identity transformation means 70 further comprises: the semantic identification system expansion module 780 is configured to add the words in the key feature set to a corresponding topic sequence in the semantic identification system according to topics to which the words in the key feature set belong, wherein the topic sequence represents one or more topics and a relation set of words under the corresponding topics.
Fig. 8 is a schematic structural view of a label switching device according to other embodiments of the present invention. As shown in fig. 8, the identification converting apparatus 800 of this embodiment includes: a memory 810 and a processor 820 coupled to the memory 810, the processor 820 being configured to perform the identity transformation method of any one of the previous embodiments based on instructions stored in the memory 810.
The memory 810 may include, for example, system memory, fixed nonvolatile storage media, and so forth. The system memory stores, for example, an operating system, application programs, boot Loader (Boot Loader), and other programs.
Fig. 9 is a schematic structural view of a label switching device according to still other embodiments of the present invention. As shown in fig. 9, the identification conversion device 900 of this embodiment includes: memory 910 and processor 920 may also include input/output interfaces 930, network interfaces 940, storage interfaces 950, and so forth. These interfaces 930, 940, 950 and the memory 910 and the processor 920 may be connected by a bus 960, for example. The input/output interface 930 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, and the like. Network interface 940 provides a connection interface for various networking devices. The storage interface 950 provides a connection interface for external storage devices such as SD cards, U discs, and the like.
An embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements any one of the aforementioned identification conversion methods.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (14)

1. An identification conversion method, comprising:
mapping texts in a text library to a vector space and generating text feature data, wherein feature items of the vector space comprise words in the text library, and the values of the feature items are weights of the words;
Predicting the category of the text characteristic data by adopting a pre-trained classification model to obtain the category of the text in the text library;
performing topic analysis on texts in the same category in a text library to obtain topic distribution information in the same category in a semantic identification system and word distribution information in each topic;
obtaining keywords in the first identifier according to a conversion rule of a first identifier system to which the first identifier belongs;
according to a pre-established semantic identification system, converting the first identification into a semantic identification comprises: according to the topic distribution information under the category of the first identifier and the word distribution information under each topic, determining the topic word to which each keyword in the first identifier belongs; the semantic identification is constructed based on the subject words to which each keyword in the first identification belongs, wherein the semantic identification corresponding to the first identification comprises the subject words to which each keyword in the first identification belongs, the semantic identification system comprises subject distribution information under each category and word distribution information under each subject, each identification in the semantic identification system comprises one or more identification fields, and each identification field corresponds to one or more subject words;
Obtaining a second identifier corresponding to the semantic identifier, wherein the second identifier belongs to a second identifier system;
and establishing a mapping relation between the first identifier and the second identifier.
2. The method of claim 1, wherein the obtaining the second identifier corresponding to the semantic identifier includes:
obtaining keywords in the second identifier according to the conversion rule of the second identifier system to which the second identifier belongs;
and converting the second identifier into a semantic identifier according to a pre-established semantic identifier system, wherein the semantic identifier corresponding to the second identifier comprises a subject term to which each keyword in the second identifier belongs.
3. The identification conversion method according to claim 1, further comprising:
constructing a vector space by taking words in training texts as feature items, wherein each feature item has a weight, and each training text has a category marked in advance;
mapping the training text to a vector space to generate training data;
training a machine learning model by using training data and pre-marked categories corresponding to the training data to obtain a classification model.
4. The identification conversion method according to claim 3, wherein the training text is a standard description text, and the standard description text comprises internet of things text;
The identification conversion method further comprises the following steps:
performing topic analysis on the supplementary description text, and constructing a key feature set comprising topics in the supplementary description text, wherein the supplementary description text comprises internet text;
and increasing the weight of the characteristic items belonging to the key characteristic set in the vector space.
5. The identification conversion method according to claim 4, wherein the performing a topic analysis on the supplementary description text, and constructing a key feature set including topics in the supplementary description text includes:
performing topic analysis on the training text and the supplementary description text to obtain topics in the training text and the supplementary description text and words under each topic, and distribution information of each word in the training text and the supplementary description text;
extracting classification keywords from the standard description text;
and constructing a key feature set by adopting words and classified keywords, wherein the difference between the distributed information in the supplementary description text and the distributed information of the classified keywords is within a preset degree.
6. The identification conversion method according to claim 4, further comprising:
and adding the words in the key feature set into corresponding topic sequences in the semantic identification system according to topics to which the words in the key feature set belong, wherein the topic sequences represent one or more topics and a relation set of the words under the corresponding topics.
7. A semantic identification conversion device comprising:
the semantic identification system construction module is configured to map texts in a text library to a vector space and generate text feature data, wherein feature items of the vector space comprise words in the text library, and the values of the feature items are weights of the words; predicting the category of the text characteristic data by adopting a pre-trained classification model to obtain the category of the text in the text library; performing topic analysis on texts in the same category in a text library to obtain topic distribution information in the same category in a semantic identification system and word distribution information in each topic;
the first identification keyword obtaining module is configured to obtain keywords in the first identification according to a conversion rule of a first identification system to which the first identification belongs;
the semantic identification conversion module is configured to convert the first identification into semantic identification according to a pre-established semantic identification system, and comprises the following steps: according to the topic distribution information under the category of the first identifier and the word distribution information under each topic, determining the topic word to which each keyword in the first identifier belongs; the semantic identification is constructed based on the subject words to which each keyword in the first identification belongs, wherein the semantic identification corresponding to the first identification comprises the subject words to which each keyword in the first identification belongs, the semantic identification system comprises subject distribution information under each category and word distribution information under each subject, each identification in the semantic identification system comprises one or more identification fields, and each identification field corresponds to one or more subject words;
The second identifier obtaining module is configured to obtain a second identifier corresponding to the semantic identifier, wherein the second identifier belongs to a second identifier system;
and the mapping relation establishing module is configured to establish a mapping relation between the first identifier and the second identifier.
8. The identity transformation apparatus of claim 7, wherein the second identity obtaining module is further configured to obtain keywords in the second identity according to transformation rules of a second identity hierarchy to which the second identity belongs; and converting the second identifier into a semantic identifier according to a pre-established semantic identifier system, wherein the semantic identifier corresponding to the second identifier comprises a subject term to which each keyword in the second identifier belongs.
9. The identification conversion device of claim 7, further comprising:
a classification model training module configured to construct a vector space using words in training texts as feature items, wherein each feature item has a weight, and each training text has a pre-labeled class; mapping the training text to a vector space to generate training data; training a machine learning model by using training data and pre-marked categories corresponding to the training data to obtain a classification model.
10. The identification conversion device of claim 9, wherein the training text is a standard description text, the standard description text comprising internet of things text;
the identification conversion device further includes:
the feature item weight updating module is configured to perform topic analysis on the supplementary description text and construct a key feature set comprising topics in the supplementary description text, wherein the supplementary description text comprises internet text; and increasing the weight of the characteristic items belonging to the key characteristic set in the vector space.
11. The identification conversion device according to claim 10, wherein the feature item weight updating module is further configured to perform topic analysis on the training text and the supplemental description text to obtain topics in the training text and the supplemental description text and words under each topic, and distribution information of each word in the training text and the supplemental description text;
extracting classification keywords from the standard description text;
and constructing a key feature set by adopting words and classified keywords, wherein the difference between the distributed information in the supplementary description text and the distributed information of the classified keywords is within a preset degree.
12. The identification conversion device of claim 10, further comprising:
The semantic identification system expansion module is configured to add the words in the key feature set to corresponding topic sequences in the semantic identification system according to topics to which the words in the key feature set belong, wherein the topic sequences represent one or more topics and a relation set of words under the corresponding topics.
13. An identification conversion device comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the identity transformation method of any one of claims 1-6 based on instructions stored in the memory.
14. A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the identification conversion method of any one of claims 1 to 6.
CN201811375603.4A 2018-11-19 2018-11-19 Identification conversion method, device and computer readable storage medium Active CN111199259B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811375603.4A CN111199259B (en) 2018-11-19 2018-11-19 Identification conversion method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811375603.4A CN111199259B (en) 2018-11-19 2018-11-19 Identification conversion method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111199259A CN111199259A (en) 2020-05-26
CN111199259B true CN111199259B (en) 2023-06-20

Family

ID=70744186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811375603.4A Active CN111199259B (en) 2018-11-19 2018-11-19 Identification conversion method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111199259B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780005B (en) * 2021-09-14 2024-04-16 码客工场工业科技(北京)有限公司 Semantic model-based Handle stock identification analysis method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06187373A (en) * 1992-12-16 1994-07-08 Sanyo Electric Co Ltd Key word extracting device
AU2009229679A1 (en) * 2008-03-24 2009-10-01 Min Soo Kang Keyword-advertisement method using meta-information related to digital contents and system thereof
CN101802776A (en) * 2008-07-29 2010-08-11 特克斯特怀茨有限责任公司 Method and apparatus for relating datasets by using semantic vectors and keyword analyses
CN101833710A (en) * 2010-05-07 2010-09-15 中国科学院自动化研究所 Semantics-based article information tracking and tracing method for Internet of things
CN103647813A (en) * 2013-11-29 2014-03-19 中国物品编码中心 A method and an apparatus for analyzing Internet of Things unified identification codes
CN107197001A (en) * 2017-05-05 2017-09-22 工业和信息化部电信研究院 A kind of industry internet module information method
CN107193973A (en) * 2017-05-25 2017-09-22 百度在线网络技术(北京)有限公司 The field recognition methods of semanteme parsing information and device, equipment and computer-readable recording medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06187373A (en) * 1992-12-16 1994-07-08 Sanyo Electric Co Ltd Key word extracting device
AU2009229679A1 (en) * 2008-03-24 2009-10-01 Min Soo Kang Keyword-advertisement method using meta-information related to digital contents and system thereof
CN101802776A (en) * 2008-07-29 2010-08-11 特克斯特怀茨有限责任公司 Method and apparatus for relating datasets by using semantic vectors and keyword analyses
CN101833710A (en) * 2010-05-07 2010-09-15 中国科学院自动化研究所 Semantics-based article information tracking and tracing method for Internet of things
CN103647813A (en) * 2013-11-29 2014-03-19 中国物品编码中心 A method and an apparatus for analyzing Internet of Things unified identification codes
CN107197001A (en) * 2017-05-05 2017-09-22 工业和信息化部电信研究院 A kind of industry internet module information method
CN107193973A (en) * 2017-05-25 2017-09-22 百度在线网络技术(北京)有限公司 The field recognition methods of semanteme parsing information and device, equipment and computer-readable recording medium

Also Published As

Publication number Publication date
CN111199259A (en) 2020-05-26

Similar Documents

Publication Publication Date Title
CN109885824B (en) Hierarchical Chinese named entity recognition method, hierarchical Chinese named entity recognition device and readable storage medium
CN109933670B (en) Text classification method for calculating semantic distance based on combined matrix
CN111324696B (en) Entity extraction method, entity extraction model training method, device and equipment
CN104008203B (en) A kind of Users' Interests Mining method for incorporating body situation
Kang et al. Repetition-based web page segmentation by detecting tag patterns for small-screen devices
CN113254507B (en) Intelligent construction and inventory method for data asset directory
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
CN111078835A (en) Resume evaluation method and device, computer equipment and storage medium
CN117196032A (en) Knowledge graph construction method and device for intelligent decision, electronic equipment and storage medium
CN115309915B (en) Knowledge graph construction method, device, equipment and storage medium
CN112686046A (en) Model training method, device, equipment and computer readable medium
CN111859984B (en) Intention mining method, device, equipment and storage medium
CN114900346B (en) Network security testing method and system based on knowledge graph
CN116245097A (en) Method for training entity recognition model, entity recognition method and corresponding device
CN113705242B (en) Intelligent semantic matching method and device for education consultation service
CN117171413B (en) Data processing system and method for digital collection management
CN111199259B (en) Identification conversion method, device and computer readable storage medium
CN116432125B (en) Code Classification Method Based on Hash Algorithm
CN116244484B (en) Federal cross-modal retrieval method and system for unbalanced data
CN112598039A (en) Method for acquiring positive sample in NLP classification field and related equipment
CN115335819A (en) Method and system for searching and retrieving information
CN112328653B (en) Data identification method, device, electronic equipment and storage medium
CN113342982B (en) Enterprise industry classification method integrating Roberta and external knowledge base
CN114372148A (en) Data processing method based on knowledge graph technology and terminal equipment
CN114330296A (en) New word discovery method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant