CN114090736A - Enterprise industry identification system and method based on text similarity - Google Patents

Enterprise industry identification system and method based on text similarity Download PDF

Info

Publication number
CN114090736A
CN114090736A CN202111372067.4A CN202111372067A CN114090736A CN 114090736 A CN114090736 A CN 114090736A CN 202111372067 A CN202111372067 A CN 202111372067A CN 114090736 A CN114090736 A CN 114090736A
Authority
CN
China
Prior art keywords
data
industry
module
enterprise
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111372067.4A
Other languages
Chinese (zh)
Inventor
张晖
冯海
杨弋
王铮
张鹏
魏兵兵
姚晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Institute Of Standardization
Southwest University of Science and Technology
Original Assignee
Sichuan Institute Of Standardization
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Institute Of Standardization, Southwest University of Science and Technology filed Critical Sichuan Institute Of Standardization
Priority to CN202111372067.4A priority Critical patent/CN114090736A/en
Publication of CN114090736A publication Critical patent/CN114090736A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an enterprise industry identification system and an enterprise industry identification method based on text similarity, which comprise a data preprocessing module, a data sampling module, a synonym expansion module, a vector space conversion module, a data labeling module and an industry identification module, wherein the data preprocessing module is used for preprocessing a text and generating verbs and noun word bags, the data sampling module is used for sampling and reading partial data of a unified social credit code database, and the synonym expansion module is used for performing synonym expansion on the sampled data and national economy industry classification data; the invention carries out synonym expansion on the data, improves the accuracy of similarity comparison, adopts a random sampling technology, extracts a small amount of data from the social unified credit code database and carries out similarity comparison on the data and national economy industry classification standard data, and the sampled data volume is less than the non-sampled data, thereby effectively improving the overall efficiency of industry identification.

Description

Enterprise industry identification system and method based on text similarity
Technical Field
The invention relates to the technical field of data processing, in particular to an enterprise industry identification system and an enterprise industry identification method based on text similarity.
Background
The unified social credit code library contains basic information of legal persons such as companies, but the 'operating range' field of an enterprise is automatically input by the enterprise, so that the phenomenon of non-standardization exists, the industry to which the unified social credit code library belongs cannot be directly obtained, and the subsequent analysis and statistics of sub-industries are difficult to carry out, while the national economy industry standard is released by the country at present, and the industry range of the unified social credit code library can be determined by comparing the 'operating range' text data of the enterprise in the unified social credit code library with the enterprise operating range text data in the standard;
at present, text similarity calculation methods are numerous and mainly include a method based on word distance, a method based on word bag and a method based on ontology, but enterprise operation range data has particularity, and short texts have low similarity calculation accuracy due to less contained information; the enterprise data is excessive and is compared with national standard data one by one, and the speed is low; the operating range words of each enterprise in the unified social credit code base are not standard, so that the identification is difficult or impossible, and therefore the invention provides an enterprise industry identification system and an enterprise industry identification method based on text similarity to solve the problems in the prior art.
Disclosure of Invention
Aiming at the problems, the invention aims to provide an enterprise industry identification system and an enterprise industry identification method based on text similarity, the enterprise industry identification system and the enterprise industry identification method based on the text similarity carry out synonym expansion on data, the similarity comparison accuracy is improved, a random sampling technology is adopted, a small amount of data is extracted from a social unified credit code database and is compared with national economy industry classification standard data in similarity, the amount of the sampled data is smaller than that of non-sampled data, and the overall efficiency of industry identification is effectively improved.
In order to realize the purpose of the invention, the invention is realized by the following technical scheme: an enterprise industry identification system based on text similarity comprises a data preprocessing module, a data sampling module, a synonym expansion module, a vector space conversion module, a data labeling module and an industry identification module, wherein the data preprocessing module is used for preprocessing a text and generating verbs and noun word bags, the data sampling module is used for sampling and reading partial data of a unified social credit code database, the synonym expansion module is used for performing synonym expansion on sampled data and national economy industry classification data, the vector space conversion module is used for converting data after synonym expansion and non-sampled data into a vector space through word embedding, the data labeling module is used for calculating and labeling the similarity between an enterprise operation range field in the sampled data and national economy industry enterprise operation range description data, the industry identification module is used for training the marked data by using a machine learning algorithm and acquiring the industry category of the unmarked social uniform credit code data by using a classification model obtained by training.
The further improvement lies in that: the data preprocessing module removes punctuation marks, stop words and participles in the text data during text preprocessing, and only keeps verbs and nouns after verbs and noun word bags are generated by the data preprocessing module and are labeled according to the parts of speech of the data.
The further improvement lies in that: the data sampling module randomly extracts partial data in the unified social credit code database according to a sampling proportion set by a user, and a random sampling technology is adopted, so that the amount of sampled data is smaller than that of non-sampled data, and the overall efficiency of industry identification is effectively improved.
The further improvement lies in that: the synonym expansion module searches words in the sampled data and the national economic industry classification data according to the number set by the user for the most similar words in the corresponding number through the synonym forest database and adds the words into the database;
the vector space conversion module converts data to a vector space through word2vec word embedding algorithm.
The further improvement lies in that: the data marking module calculates the cosine similarity between the operation range field of each sampled data and each national economic industry economic data one by one;
if more than one national economy industry with similarity higher than a preset threshold value is found, marking the industry of the enterprise as belonging to the industry;
and if the national economy industry higher than the preset threshold value is not found, manually marking.
The further improvement lies in that: the industry identification module trains the unified social credit code enterprise operation range data and national economy industry classification data subjected to word embedding and labeled by using an XGboost classification algorithm; and identifying the industry category of the non-sampled sample subjected to the word embedding by using the XGboost model obtained by training.
A recognition method of an enterprise industry recognition system based on text similarity comprises the following steps:
step one, inputting a unified social credit code database and a national economy industry classification database into a data preprocessing module to perform punctuation, stop word and word segmentation removal processing, and then performing part-of-speech tagging and keeping verbs and nouns;
setting a sampling proportion, randomly sampling the unified social credit code by using a data sampling module according to the set sampling proportion, and extracting a small amount of sampling data to form a training set;
step three, bringing the near synonyms and synonyms of the verbs and the nouns in the sampled data set and the classification data of the national economic industry into a calculation range, adopting downloaded synonym forest data, selecting a plurality of synonyms most similar to the verbs and the nouns one by one from the verbs and the nouns obtained in the data preprocessing module according to the number of the synonyms set by a user, and storing the synonyms and the synonyms in a database;
step four, converting word data into a vector space by using a word2vec word embedding algorithm;
fifthly, automatically and manually marking the unified social credit code data by using a data sampling module;
and step six, training by using a machine learning algorithm by using the training set in the step two, automatically identifying the industry category of the non-sampled uniform social credit code data by using the trained model, and outputting an identification result.
The further improvement lies in that: in the fifth step, data converted into a vector space is sequentially taken from the sampled unified social credit code data, cosine distances are calculated one by one between the data and national economy industry classification data, and when the similarity is higher than a threshold set by a user, the data is marked as belonging to the industry; and when the similarity of the data to all the industry data is lower than a threshold value, manually marking.
The beneficial effects of the invention are as follows: the invention adopts a random sampling technology, a small amount of data is extracted from the social unified credit code database and is compared with the national economy industry classification standard data in similarity, the amount of the sampled data is less than that of the non-sampled data, and the overall efficiency of industry identification is effectively improved;
the method carries out synonym expansion on verbs and nouns in the word bag, adds words with the same or similar semantics of the original words into the database, and realizes that industries similar to the semantics of the words can still be found under the condition that the words used by the unified social credit code data are not standard;
the invention divides the problem of automatic industry identification into a mode of combining small-amount data semi-automatic labeling and large-amount data machine learning, and improves the efficiency of industry identification while ensuring the accuracy of industry identification.
Drawings
Fig. 1 is a system structure diagram according to an embodiment of the invention.
FIG. 2 is a flowchart of a second method according to an embodiment of the present invention.
FIG. 3 is a flow chart of a two-step data preprocessing according to an embodiment of the present invention.
FIG. 4 is a flowchart illustrating a fifth step of the method according to the present invention.
Detailed Description
For the purpose of enhancing understanding of the present invention, the present invention will be further described in detail with reference to the following examples, which are provided for illustration only and are not intended to limit the scope of the present invention.
Example one
According to fig. 1, the embodiment provides an enterprise industry identification system based on text similarity, which includes a data preprocessing module, a data sampling module, a synonym expansion module, a vector space conversion module, a data labeling module and an industry identification module, wherein the data preprocessing module is used for preprocessing a text and generating verbs and noun word bags, the data sampling module is used for sampling and reading partial data of a unified social credit code database, the synonym expansion module is used for performing synonym expansion on the sampled data and national economic industry classification data, the vector space conversion module is used for converting data after synonym expansion and non-sampled data into a vector space through word embedding, the data labeling module is used for calculating and labeling similarity between an enterprise operation range field in the sampled data and national economic industry enterprise operation range description data, the industry identification module is used for training the marked data by using a machine learning algorithm and acquiring the industry category of the unmarked social uniform credit code data by using a classification model obtained by training.
The data preprocessing module removes punctuation marks, stop words and participles in the text data during text preprocessing, and only verbs and nouns are reserved after verbs and noun word bags generated by the data preprocessing module are used for part-of-speech tagging of the data.
The data sampling module randomly extracts partial data in the unified social credit code database according to a sampling proportion set by a user, and a random sampling technology is adopted, so that the amount of sampled data is smaller than that of non-sampled data, and the overall efficiency of industry identification is effectively improved.
The synonym expansion module searches the words in the sampled data and the national economic industry classification data according to the number set by the user for the most similar words in the corresponding number through the synonym forest database and adds the words into the database, and the synonym expansion module effectively improves the similarity comparison accuracy;
the vector space conversion module converts data to a vector space through word2vec word embedding algorithm.
The data marking module calculates the cosine similarity between the operation range field of each sampled data and each national economic industry economic data one by one;
if more than one national economy industry with similarity higher than a preset threshold value is found, marking the industry of the enterprise as belonging to the industry;
and if the national economy industry higher than the preset threshold value is not found, manually marking.
The industry identification module trains the unified social credit code enterprise operation range data and national economy industry classification data subjected to word embedding and labeled by using an XGboost classification algorithm; and identifying the industry class of the non-sampled sample subjected to word embedding by using the trained XGboost model.
Example two
As shown in fig. 2, 3, and 4, the embodiment provides a recognition method of an enterprise industry recognition system based on text similarity, which is characterized by comprising the following steps:
step one, inputting a unified social credit code database and a national economic industry classification database into a data preprocessing module to perform punctuation, stop word and word segmentation removal processing, then performing part-of-speech tagging, finally obtaining a word bag only containing verbs and nouns and storing the word bag into the database, wherein punctuation removal is realized by adopting regular expression programming, and all punctuation is deleted;
the participle adopts a conditional random domain algorithm, and the language database adopts a Chinese participle language database of Microsoft Asia institute; the stop word is compared with the stop word list, and the stop word after word segmentation is deleted;
the part-of-speech tagging adopts a conditional random domain algorithm, the language database adopts a national daily part-of-speech tagging language database, all verbs and nouns are screened out and stored in corresponding records in sequence;
setting a sampling proportion, randomly sampling the unified social credit code by using a data sampling module according to the set sampling proportion, and extracting a small amount of sampling data to form a training set;
step three, bringing the sampled data set and the synonyms of the verbs and the nouns in the national economic industry classification data into a calculation range, adopting downloaded synonym forest data, selecting one by one the most similar synonyms of the verbs and the nouns obtained in the data preprocessing module according to the number of the synonyms set by a user, and storing the synonyms in a database;
converting word data into a vector space by using a word2vec word embedding algorithm, and laying a foundation for similarity calculation and industry identification in the next step;
step five, automatically and manually marking the unified social credit code data by using a data sampling module according to the similarity between the operation range description in the sampled unified social credit code data and the operation range description in the national standard data of the national economy industry classification;
sequentially taking data converted into vector space from the sampled unified social credit code data, calculating cosine distances between the data and national economy industry classification data one by one, marking the data as belonging to the industry if the similarity is higher than a threshold set by a user, and manually marking if the similarity to all industry data is lower than the threshold
And step six, training by using a machine learning algorithm by using the training set in the step two, automatically identifying the industry type of the non-sampled uniform social credit code data by using the trained model, and outputting an identification result.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. The utility model provides an enterprise industry identification system based on text similarity which characterized in that: the system comprises a data preprocessing module, a data sampling module, a synonym expansion module, a vector space conversion module, a data labeling module and an industry identification module, wherein the data preprocessing module is used for preprocessing a text and generating verbs and noun word bags, the data sampling module is used for sampling and reading partial data of a unified social credit code database, the synonym expansion module is used for performing synonym expansion on sampled data and national economy industry classification data, the vector space conversion module is used for converting data after synonym expansion and unsampled data into a vector space through word embedding, the data labeling module is used for calculating the similarity between an enterprise operation range field in the sampled data and national economy industry enterprise operation range description data and labeling, the industry identification module is used for training the labeled data by using a machine learning algorithm and obtaining the unlabeled social credit code data by training The classification model of (a) obtains its industry classification.
2. The enterprise industry identification system based on text similarity according to claim 1, wherein: the data preprocessing module removes punctuation marks, stop words and participles in the text data during text preprocessing, and only keeps verbs and nouns after verbs and noun word bags are generated by the data preprocessing module and are labeled according to the parts of speech of the data.
3. The enterprise industry identification system based on text similarity according to claim 1, wherein: the data sampling module randomly extracts partial data in the unified social credit code database according to a sampling proportion set by a user.
4. The enterprise industry identification system based on text similarity according to claim 1, wherein: the synonym expansion module searches words in the sampled data and the national economic industry classification data according to the number set by the user for the most similar words in the corresponding number through the synonym forest database and adds the words into the database;
the vector space conversion module converts data to a vector space through word2vec word embedding algorithm.
5. The enterprise industry identification system based on text similarity according to claim 1, wherein: the data marking module calculates the cosine similarity between the operation range field of each sampled data and each national economic industry economic data one by one;
if more than one national economy industry with similarity higher than a preset threshold value is found, marking the industry of the enterprise as belonging to the industry;
and if the national economy industry higher than the preset threshold value is not found, manually marking.
6. The enterprise industry identification system based on text similarity according to claim 1, wherein: the industry identification module trains the marked unified social credit code enterprise operation range data and national economy industry classification data after word embedding by using an XGboost classification algorithm; and identifying the industry class of the non-sampled sample subjected to word embedding by using the trained XGboost model.
7. The identification method of the enterprise industry identification system based on the text similarity as claimed in claim 1, characterized by comprising the following steps:
step one, inputting a unified social credit code database and a national economy industry classification database into a data preprocessing module to perform punctuation, stop word and word segmentation removal processing, and then performing part-of-speech tagging and keeping verbs and nouns;
setting a sampling proportion, randomly sampling the unified social credit code by using a data sampling module according to the set sampling proportion, and extracting a small amount of sampling data to form a training set;
step three, bringing the near synonyms and synonyms of the verbs and the nouns in the sampled data set and the classification data of the national economic industry into a calculation range, adopting downloaded synonym forest data, selecting a plurality of synonyms most similar to the verbs and the nouns one by one from the verbs and the nouns obtained in the data preprocessing module according to the number of the synonyms set by a user, and storing the synonyms and the synonyms in a database;
step four, converting word data into a vector space by using a word2vec word embedding algorithm;
fifthly, automatically and manually marking the unified social credit code data by using a data sampling module;
and step six, training by using a machine learning algorithm by using the training set in the step two, automatically identifying the industry category of the non-sampled uniform social credit code data by using the trained model, and outputting an identification result.
8. The identification method of the enterprise industry identification system based on the text similarity as claimed in claim 7, wherein: in the fifth step, data converted into a vector space is sequentially taken from the sampled unified social credit code data, cosine distances are calculated one by one between the data and national economy industry classification data, and when the similarity is higher than a threshold set by a user, the data is marked as belonging to the industry; and when the similarity of the data to all the industry data is lower than a threshold value, manually marking.
CN202111372067.4A 2021-11-18 2021-11-18 Enterprise industry identification system and method based on text similarity Pending CN114090736A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111372067.4A CN114090736A (en) 2021-11-18 2021-11-18 Enterprise industry identification system and method based on text similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111372067.4A CN114090736A (en) 2021-11-18 2021-11-18 Enterprise industry identification system and method based on text similarity

Publications (1)

Publication Number Publication Date
CN114090736A true CN114090736A (en) 2022-02-25

Family

ID=80301965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111372067.4A Pending CN114090736A (en) 2021-11-18 2021-11-18 Enterprise industry identification system and method based on text similarity

Country Status (1)

Country Link
CN (1) CN114090736A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115587230A (en) * 2022-09-23 2023-01-10 国网江苏省电力有限公司营销服务中心 High-energy-consumption enterprise identification method and system combining industry text and power load
CN115827934A (en) * 2023-02-21 2023-03-21 四川省计算机研究院 Enterprise portrait intelligent analysis system and method based on unified social credit code
CN117216688A (en) * 2023-11-07 2023-12-12 西南科技大学 Enterprise industry identification method and system based on hierarchical label tree and neural network

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115587230A (en) * 2022-09-23 2023-01-10 国网江苏省电力有限公司营销服务中心 High-energy-consumption enterprise identification method and system combining industry text and power load
CN115587230B (en) * 2022-09-23 2024-04-12 国网江苏省电力有限公司营销服务中心 High-energy-consumption enterprise identification method and system combining industry text and electricity load
CN115827934A (en) * 2023-02-21 2023-03-21 四川省计算机研究院 Enterprise portrait intelligent analysis system and method based on unified social credit code
CN115827934B (en) * 2023-02-21 2023-05-09 四川省计算机研究院 Enterprise portrait intelligent analysis system and method based on unified social credit code
CN117216688A (en) * 2023-11-07 2023-12-12 西南科技大学 Enterprise industry identification method and system based on hierarchical label tree and neural network
CN117216688B (en) * 2023-11-07 2024-01-23 西南科技大学 Enterprise industry identification method and system based on hierarchical label tree and neural network

Similar Documents

Publication Publication Date Title
CN114090736A (en) Enterprise industry identification system and method based on text similarity
CN109145260B (en) Automatic text information extraction method
CN113158653B (en) Training method, application method, device and equipment for pre-training language model
CN110196977B (en) Intelligent warning condition supervision processing system and method
CN104199965A (en) Semantic information retrieval method
CN113569050B (en) Method and device for automatically constructing government affair field knowledge map based on deep learning
CN107784048B (en) Question classification method and device for question and answer corpus
CN110728117A (en) Paragraph automatic identification method and system based on machine learning and natural language processing
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN116150651A (en) AI-based depth synthesis detection method and system
CN113343701B (en) Extraction method and device for text named entities of power equipment fault defects
CN112380848B (en) Text generation method, device, equipment and storage medium
CN116629258B (en) Structured analysis method and system for judicial document based on complex information item data
CN116975738A (en) Polynomial naive Bayesian classification method for question intent recognition
CN111708862B (en) Text matching method and device and electronic equipment
CN114298041A (en) Network security named entity identification method and identification device
CN113688233A (en) Text understanding method for semantic search of knowledge graph
CN112990091A (en) Research and report analysis method, device, equipment and storage medium based on target detection
Nagasudha et al. Key word spotting using HMM in printed Telugu documents
CN110175268B (en) Longest matching resource mapping method
CN116484010B (en) Knowledge graph construction method and device, storage medium and electronic device
CN117150046B (en) Automatic task decomposition method and system based on context semantics
CN117131159A (en) Method, device, equipment and storage medium for extracting sensitive information
CN114662499A (en) Text-based emotion recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination