CN112115710A - Industry information identification method and device - Google Patents

Industry information identification method and device Download PDF

Info

Publication number
CN112115710A
CN112115710A CN201910476994.7A CN201910476994A CN112115710A CN 112115710 A CN112115710 A CN 112115710A CN 201910476994 A CN201910476994 A CN 201910476994A CN 112115710 A CN112115710 A CN 112115710A
Authority
CN
China
Prior art keywords
industry
text
recognized
target
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910476994.7A
Other languages
Chinese (zh)
Other versions
CN112115710B (en
Inventor
陈遥烽
叶龙
李佳
杨小宇
缪招兵
李倩
魏曼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910476994.7A priority Critical patent/CN112115710B/en
Publication of CN112115710A publication Critical patent/CN112115710A/en
Application granted granted Critical
Publication of CN112115710B publication Critical patent/CN112115710B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides an industry information identification method and device, and relates to the technical field of machine learning, wherein the method comprises the following steps: the method comprises the steps of obtaining a text to be recognized, and extracting target keywords corresponding to the text to be recognized by adopting an industry keyword dictionary, wherein the industry keyword dictionary comprises a keyword set corresponding to each industry. And then inputting the target keywords corresponding to the text to be recognized into an industry prediction model, and determining the target matching probability of the text to be recognized and each industry. And then, determining the target industry corresponding to the text to be recognized according to the target matching probability of the text to be recognized and each industry, so that compared with the manual industry information recognition, the method greatly improves the recognition efficiency, and is suitable for large data scenes. Secondly, when the determined target industry is not matched with the industry information input by the user, the abnormality of the industry information input by the user is determined, so that the problem that the user wrongly fills in the industry information is timely found, and the risk caused by the wrongly filled in the industry information is avoided.

Description

Industry information identification method and device
Technical Field
The embodiment of the application relates to the technical field of machine learning, in particular to an industry information identification method and device.
Background
The company name can reflect the experience range and the industry classification of the company to a certain extent, and a user may have a wrong filling condition when filling in the industry, so that certain risks are brought to auditing. At present, industry information filled by a user can be manually identified, and then whether the mismatch condition exists in the industry filled by the user is manually analyzed, although the method can accurately determine whether the user wrongly fills the industry information, a large amount of manpower is consumed, and a large data scene cannot be dealt with.
Disclosure of Invention
The method and the device for identifying the industry information manually have the advantages that labor consumption is high, and the method and the device are not suitable for large data scenes.
In one aspect, an embodiment of the present application provides an industry information identification method, where the method includes:
acquiring a text to be recognized, and extracting target keywords corresponding to the text to be recognized by adopting an industry keyword dictionary, wherein the industry keyword dictionary comprises a keyword set corresponding to each industry;
inputting target keywords corresponding to the text to be recognized into an industry prediction model, and determining the target matching probability of the text to be recognized and each industry;
and determining the target industry corresponding to the text to be recognized according to the target matching probability of the text to be recognized and each industry.
In one aspect, an embodiment of the present application provides an industry information identification apparatus, including:
the extraction module is used for acquiring a text to be recognized and extracting a target keyword corresponding to the text to be recognized by adopting an industry keyword dictionary, wherein the industry keyword dictionary comprises a keyword set corresponding to each industry;
the prediction module is used for inputting the target keywords corresponding to the text to be recognized into an industry prediction model and determining the target matching probability of the text to be recognized and each industry;
and the matching module is used for determining the target industry corresponding to the text to be recognized according to the target matching probability of the text to be recognized and each industry.
Optionally, the system further comprises a judging module;
the determination module is specifically configured to:
acquiring industry information configured corresponding to the text to be recognized;
and when the determined target industry is not matched with the correspondingly configured industry information, determining that the correspondingly configured industry information is abnormal.
Optionally, the extracting module is specifically configured to:
performing word segmentation on a text to be recognized by adopting an industry keyword dictionary, and determining a first class of keywords from the text to be recognized;
determining semantic interpretation texts corresponding to the first class of keywords according to a semantic dictionary;
adopting the industry keyword dictionary to perform word segmentation on the semantic interpretation texts corresponding to the first type of keywords, and determining second type of keywords from the semantic interpretation texts corresponding to the first type of keywords;
and determining the first class of keywords and the second class of keywords as target keywords corresponding to the text to be recognized.
Optionally, the prediction module is specifically configured to:
and aiming at each industry, determining the target matching probability of the text to be recognized and the industry according to the occurrence probability of each target keyword in the industry, the occurrence probability of the industry and the probability of the simultaneous occurrence of the target keywords corresponding to the text to be recognized.
Optionally, the prediction module is specifically configured to:
aiming at each industry, determining the initial matching probability of the text to be recognized and the industry according to the probability of each target keyword appearing in the industry, the probability of the industry appearing and the probability of the target keyword corresponding to the text to be recognized appearing at the same time;
determining the adjacent industries of the industries according to the similarity between the industries and other industries;
and determining the target matching probability of the text to be recognized and the industry according to the initial matching probability of the text to be recognized and the industry, the initial matching probability of the text to be recognized and the adjacent industry and the similarity of the industry and the adjacent industry.
Optionally, the prediction module is specifically configured to:
determining the similarity between the industry and any other industry according to the intersection and union of the keyword sets of the industry and the keyword sets of any other industry;
and determining other industries of which the similarity meets a preset condition as the adjacent industries of the industries.
Optionally, the prediction module is further configured to:
when the industry is not matched with the industry information correspondingly configured to the text to be recognized, the probability of each target keyword appearing under the industry accords with the following formula (1):
Figure BDA0002082596660000031
wherein, P1(tj|ci) Express when the industry ciWhen the industry information is not matched with the industry information correspondingly configured to the text to be recognized, the industry ciLower occurrence of target keyword tjProbability of (1), count (t)j,ci) Representing an industry ciLower occurrence of target keyword tjNumber of times, count (t)j,ck) Representing an industry ckLower occurrence of target keyword tjV is all keyword sets, m is industry number;
when the industry is matched with the industry information correspondingly configured to the text to be recognized, the probability of each target keyword appearing under the industry accords with the following formula (2):
Figure BDA0002082596660000032
wherein, P2(tj|ci) Express when the industry ciWhen the industry information is matched with the industry information correspondingly configured with the text to be recognized, the industry ciLower occurrence of target keyword tjProbability of (P)1(tj|ci) Express when the industry ciWhen the industry information is not matched with the industry information correspondingly configured to the text to be recognized, the industry ciLower occurrence of target keyword tjα and β are confidence parameters.
In the embodiment of the application, after the text to be recognized is obtained, the industry keyword dictionary is adopted to extract the target keywords in the text to be recognized, so that the keywords representing industry information are extracted from the text to be recognized, then the target keywords are input into the industry prediction model, the target matching probability of the text to be recognized and each industry is determined, the target industry corresponding to the text to be recognized is determined according to the target matching probability, compared with the manual recognition of the industry information, the recognition efficiency is greatly improved, and meanwhile, the method is suitable for big data scenes. By comparing the target industry with the industry information input by the user, whether the industry information input by the user is wrong is judged, so that the risk caused by filling in wrong industry information is avoided.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
fig. 2 is a schematic diagram of an application interface according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an application interface according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an application interface provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an industry information identification apparatus according to an embodiment of the present application;
fig. 6 is a schematic flowchart of an industry information identification method according to an embodiment of the present application;
FIG. 7 is a flowchart illustrating a method for building an industry keyword dictionary according to an embodiment of the present application;
fig. 8 is a schematic flowchart of a method for extracting a target keyword according to an embodiment of the present application;
fig. 9 is a schematic flowchart of an industry information identification method according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an industry information identification apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solution and beneficial effects of the present application more clear and more obvious, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
For convenience of understanding, terms referred to in the embodiments of the present application are explained below.
TF _ IDF algorithm: term frequency-inverse document frequency, a commonly used weighting technique for information retrieval and data mining. TF means Term Frequency (Term Frequency), and IDF means Inverse text Frequency index (Inverse Document Frequency).
In a specific practice process, the company name can reflect the experience range and the industry classification of the company to a certain extent, and a user may have a filling error condition when filling in the industry, so that certain risks are brought to auditing. At present, the industry information filled by the user can be manually identified, and then whether the mismatch condition exists in the industry filled by the user is manually analyzed, but the method is too dependent on manpower, and when the data volume is increased rapidly, the method for manually identifying the industry information is not suitable any more.
For this reason, in view of the fact that industry information generally adopts a specific word expression, corresponding identification information can be predicted by extracting keywords corresponding to the industry information in a company name or a company profile, and in this embodiment of the present application, there is provided an industry information identification method, including: firstly, a text to be recognized is obtained, and a target keyword corresponding to the text to be recognized is extracted by adopting an industry keyword dictionary, wherein the industry keyword dictionary comprises a keyword set corresponding to each industry. And then determining the target matching probability of the text to be recognized and each industry according to the target keywords corresponding to the text to be recognized and the industry prediction model, and then determining the target industry corresponding to the text to be recognized according to the target matching probability of the text to be recognized and each industry.
After the text to be recognized is obtained, the industry keyword dictionary is adopted to extract the target keywords in the text to be recognized, so that the target keywords representing industry information are extracted from the text to be recognized, then the target keywords are input into the industry prediction model, the target matching probability of the text to be recognized and each industry is determined, and the target industry corresponding to the text to be recognized is determined according to the target matching probability.
The industry information identification method in the embodiment of the application can be applied to an application scenario as shown in fig. 1, where the application scenario includes a terminal device 101 and a server 102.
The terminal device 101 is an electronic device with network communication capability, and the electronic device may be a smart phone, a tablet computer, a portable personal computer, or the like. An application program for industry information identification is installed in advance on the terminal device 101. The user can use the application program to identify the company name and company brief introduction to wait for the target industry corresponding to the identification text, and then can further judge whether the target industry is matched with the industry information input by the user, so that the problem that the user wrongly fills in the industry information can be timely found. The application program can identify and verify one text to be identified independently, and can also identify and verify a plurality of texts to be identified in batch.
Illustratively, as shown in fig. 2, after the user starts the application on the terminal device 101, the user may input a company name "XX company" in a company name box of the display interface or select the company name "XX company" in a pull-down menu form, then input industry information "XX industry" in an industry box, and then click a submit button.
In a possible implementation manner, the industry information identification device may be located in the terminal device 101, the terminal device 101 directly identifies a target industry corresponding to the company name, then compares the target industry with the industry information input by the user, if the target industry and the industry information are matched, a dialog box is popped out from the display interface, and "the industry information is normal" is displayed in the dialog box, specifically as shown in fig. 3, otherwise, the dialog box is popped out from the display interface, and "the industry information is abnormal" is displayed in the dialog box, specifically as shown in fig. 4.
In another possible embodiment, the terminal device 101 is connected to the server 102 via a wireless network. The industry information identification means may be located in the server 102, and the terminal device 101 transmits the company name and the industry information input by the user to the server 102. The server 102 identifies a target industry corresponding to the company name, then compares the target industry with industry information input by the user, if the target industry and the industry information are matched, returns a message of "normal industry information" to the terminal device 101, the terminal device 101 pops up a dialog box in a display interface, displays "normal industry information" in the dialog box, as shown in fig. 3 specifically, otherwise returns a message of "abnormal industry information" to the terminal device 101, the terminal device 101 pops up a dialog box in the display interface, and displays "abnormal industry information" in the dialog box, as shown in fig. 4 specifically.
Further, in the application scenario diagram shown in fig. 1, the structure of the industry information identification apparatus located in the terminal device 101 or the server 102 is shown in fig. 5, and includes a keyword extraction module 501, an industry prediction module 502, and a determination module 503.
The keyword extraction module 501 is configured to extract keywords in a text to be recognized by using an industry keyword dictionary, where the keywords are words with industry recognition capability. For example, when the company is named as Yiyige clothing shop in the screen area of Yibin city, the extracted keyword is the clothing shop.
The industry prediction module 502 is configured to determine a target matching probability between the text to be recognized and each industry according to the target keyword corresponding to the text to be recognized and the industry prediction model, and determine a target industry corresponding to the text to be recognized according to the target matching probability between the text to be recognized and each industry. For example, the probability that the Yiyige clothing shop in the Yibin Severe screen area in Yibin is determined according to the keyword 'clothing shop' and the industry prediction model, and then the industry with the probability meeting the preset condition is determined as the target industry of the Yibin Severe screen area Yiyi ge clothing shop, wherein the target industry can be one or more.
The determining module 503 is configured to determine whether the correspondingly configured industry information is abnormal according to the target industry corresponding to the text to be recognized determined by the industry predicting module and the industry information correspondingly configured to the text to be recognized, where the correspondingly configured industry information may be the industry information input by the user. For example, the industry prediction module 502 determines that the Yiyige clothing shop in Yibin city emerald area belongs to the clothing industry, and the industry information input by the user is the catering industry, so that the abnormality of the industry information input by the user can be judged.
Based on the application scenario diagram shown in fig. 1 and the schematic structural diagram of the industry information identification apparatus shown in fig. 5, an embodiment of the present application provides a flow of an industry information identification method, where the flow of the method may be executed by the industry information identification apparatus, as shown in fig. 6, including the following steps:
step S601, obtaining a text to be recognized, and extracting a target keyword corresponding to the text to be recognized by adopting an industry keyword dictionary.
Specifically, the text to be recognized may be a company name, or may be a paragraph such as a company profile. The industry keyword dictionary comprises a keyword set corresponding to each industry.
The process of constructing the industry keyword dictionary is shown in fig. 7, firstly obtaining a company name or company brief introduction as an initial corpus, segmenting words of the initial corpus by using a general dictionary to obtain a word list, filtering some unimportant words in the word list by using a filtering dictionary, such as names of province, city, district and county, and summarizing the filtered words according to each industry and a word list with (industry-word) format to obtain a training corpus. And calculating the TF-IDF value of each word in each industry by adopting a TF _ IDF algorithm to obtain a TF-IDF value library of the words in a format of (industry-word-TF-IDF value), screening the words according to the TF-IDF value, removing the words with the TF-IDF value lower than a set threshold value, and obtaining an industry keyword dictionary.
Step S602, inputting the target key words corresponding to the text to be recognized into an industry prediction model, and determining the target matching probability of the text to be recognized and each industry.
Specifically, the industry prediction model may be a naive bayes model, a neural network model, or the like. And inputting target keywords corresponding to the text to be recognized into an industry prediction model, and predicting the target matching probability of the text to be recognized and each industry by the industry prediction model according to the target keywords.
And step S603, determining a target industry corresponding to the text to be recognized according to the target matching probability of the text to be recognized and each industry.
When the probability of matching the text to be recognized with the target of a certain industry is higher, the target industry corresponding to the text to be recognized is more likely to be the industry. The target industry corresponding to the text to be recognized may be one or more, for example, the industry with the highest target matching probability is determined as the target industry corresponding to the text to be recognized. For another example, the texts to be recognized and the target matching probability of each industry are sorted from high to low, the top N industries are determined as the target industries corresponding to the texts to be recognized, and N is a preset threshold. For another example, the industries of which the target matching probability of the text to be recognized and each industry is greater than the preset probability are determined as the target industries corresponding to the text to be recognized.
After the text to be recognized is obtained, the industry keyword dictionary is adopted to extract the target keywords in the text to be recognized, so that the keywords representing industry information are extracted from the text to be recognized, then the target matching probability of the text to be recognized and each industry is determined according to the target keywords and the industry prediction model, and the target industry corresponding to the text to be recognized is determined according to the target matching probability.
Optionally, after step S603, industry information configured corresponding to the text to be recognized is obtained, and when the determined target industry is not matched with the industry information configured corresponding to the text to be recognized, it is determined that the industry information configured corresponding to the text to be recognized is abnormal.
In specific implementation, the industry information configured corresponding to the sample to be identified may be industry information input by a user. Exemplarily, the text to be recognized is set as 'Yiyige clothing shop in the emerald screen area of Yibin city', the industry information recognition is carried out on the 'Yige clothing shop in the emerald screen area of Yibin city', the target industries with the top three target matching probabilities ranked are determined as 'clothing industry, retail industry and wholesale industry', and the industry information input by the user is 'catering industry'. And determining that the industry information input by the user is abnormal because the industry information input by the user is not matched with the three identified target industries. By identifying the target industry corresponding to the company name and then comparing the target industry with the industry information input by the user, whether the industry information input by the user is wrong or not is judged, so that the risk brought by filling in wrong industry information is avoided.
Optionally, in step S601, when an industry keyword dictionary is used to extract a target keyword corresponding to a text to be recognized, the embodiment of the present application provides at least the following two implementation manners:
in one possible implementation mode, an industry keyword dictionary is adopted to segment the text to be recognized, a first class of keywords are determined from the text to be recognized, and semantic interpretation texts corresponding to the first class of keywords are determined according to the semantic dictionary. The method comprises the steps of segmenting semantic interpretation texts corresponding to first keywords by adopting an industry keyword dictionary, determining second keywords from the semantic interpretation texts corresponding to the first keywords, and then determining the first keywords and the second keywords as target keywords corresponding to texts to be recognized.
Exemplarily, as shown in fig. 8, a company name is set as "xx supermarket", and the "xx supermarket" is first segmented by using an industry keyword dictionary to obtain a first type keyword as "supermarket". And then, determining that the interpretation text of the first type of keyword supermarket is supermarket, namely supermarket, which generally refers to open display of commodities, self-shopping of customers, queuing for cash settlement and operation of shops mainly for fresh foods and miscellaneous articles by adopting a semantic dictionary. A retail enterprise with self-service shopping and uniform cash register settlement of consumers. The semantic interpretation texts corresponding to the first class of keywords are segmented by adopting an industry keyword dictionary to obtain a second class of keywords, namely supermarket, market, commodity, management, fresh food, daily miscellaneous, article, store, self-service, retail and enterprise, and further, a target keyword corresponding to the xx supermarket is obtained according to the first class of keywords and the second class of keywords, namely supermarket, market, commodity, management, fresh food, daily miscellaneous, article, store, self-service, retail and enterprise. When the target keywords of the text to be recognized are extracted, the keywords in the text to be recognized are extracted, and the keywords in the explanation text corresponding to the keywords are extracted, so that the characteristics of the text to be recognized can be represented more comprehensively by the finally obtained target keywords, and the accuracy of subsequent industry prediction is improved.
In one possible implementation mode, an industry keyword dictionary is adopted to perform word segmentation on a text to be recognized, and a target keyword corresponding to the text to be recognized is determined from the text to be recognized.
Illustratively, the text to be recognized is set as a company profile, specifically "Carrefour (Carrefour) stands in 1959, is the originator of the big market state, is the first retailer in europe, and the second international retail chain group in the world. There are 11000 operating retail units with a range of 30 countries and regions throughout the world. Groups lead to markets in three major business regimes: supermarkets, supermarkets and discount stores. In addition, carrefour has also developed convenience stores and club house mass vendors in some countries. The method comprises the steps of adopting an industry keyword dictionary to perform word segmentation on the text to be recognized, and obtaining target keywords of a store, a retail store, a supermarket, a discount store, a convenience store and a mass vendor store. When the file to be recognized is a paragraph of characters such as company brief introduction, the industry keyword dictionary can be directly adopted to segment the text to be recognized, and the target keyword corresponding to the text to be recognized is determined, so that the keyword extraction efficiency is improved.
Optionally, in step S602, when inputting the target keyword corresponding to the text to be recognized into the industry prediction model and determining the target matching probability between the text to be recognized and each industry, the embodiment of the present application provides at least the following two implementation manners:
in a possible implementation manner, for each industry, the target matching probability of the text to be recognized and the industry is determined according to the probability of occurrence of each target keyword in the industry, the probability of occurrence of the industry and the probability of simultaneous occurrence of the target keywords corresponding to the text to be recognized, and specifically accords with the following formula (3):
Figure BDA0002082596660000101
wherein, P (c)i|t1,t2,…,tn) Indicates that the target keyword is t1,t2,…,tnIn case of text and industry to be recognized ciTarget match probability of P (t)j|ci) Representing an industry ciLower occurrence of target keyword tjProbability of (2),P(ci) Representing an industry ciProbability of occurrence, P (t)1,t2,…,tn) Representing a target keyword t1,t2,…,tnProbability of simultaneous occurrence. In specific embodiments, P (t)1,t2,…,tn) For each probability P (t) of occurrence of the target keywordq) Q is 1, 2, …, n, P (c)i) And P (t)q) May be obtained statistically.
In another possible implementation manner, for each industry, the initial matching probability of the text to be recognized and the industry is determined according to the occurrence probability of each target keyword in the industry, the occurrence probability of the industry and the probability of the target keyword corresponding to the text to be recognized appearing at the same time, then the adjacent industry of the industry is determined according to the similarity between the industry and other industries, and then the target matching probability of the text to be recognized and the industry is determined according to the initial matching probability and the similarity between the industry and the adjacent industry.
Specifically, the initial matching probability of the text to be recognized and the industry is determined to be in accordance with the following formula (4) according to the probability of occurrence of each target keyword in the industry, the probability of occurrence of the industry and the probability of simultaneous occurrence of the target keywords corresponding to the text to be recognized:
Figure BDA0002082596660000111
wherein, Py(ci|t1,t2,…,tn) Indicates that the target keyword is t1,t2,…,tnIn case of text and industry to be recognized ciInitial match probability of P (t)j|ci) Representing an industry ciLower occurrence of target keyword tjProbability of (A), P (c)i) Representing an industry ciProbability of occurrence, P (t)1,t2,…,tn) Representing a target keyword t1,t2,…,tnProbability of simultaneous occurrence.
When determining the adjacent industry of the industry according to the similarity between the industry and other industries, the similarity between the industry and any other industry can be determined according to the intersection and union of the keyword set of the industry and the keyword set of any other industry, and specifically the similarity accords with the following formula (5):
Figure BDA0002082596660000112
wherein the content of the first and second substances,
Figure BDA0002082596660000113
representing an industry ciAnd trade of ckThe degree of similarity of (a) to (b),
Figure BDA0002082596660000114
representing an industry ciThe set of keywords of (a) is,
Figure BDA0002082596660000115
representing an industry ckThe set of keywords of (1).
And determining other industries with the similarity meeting the preset conditions as the adjacent industries of the industries.
In specific implementation, other industries can be ranked from large to small according to the similarity, the other industries ranked at the top M are determined as the adjacent industries of the industry, and M is a preset threshold. Illustratively, setting M to be 3, calculating the similarity between the clothing industry and other industries aiming at the clothing industry, and determining the three industries as the adjacent industries of the clothing industry when the similarity is ranked from large to small and the industries ranked in the first three are the retail industry, the wholesale industry and the textile industry.
And then determining the target matching probability of the text to be recognized and the industry according to the initial matching probability of the text to be recognized and the industry, the initial matching probability of the text to be recognized and the adjacent industry and the similarity of the industry and the adjacent industry, wherein the target matching probability of the text to be recognized and the industry is specifically in accordance with the following formula (6):
Figure BDA0002082596660000116
wherein, P (c)i|t1,t2,…,tn) Indicates that the target keyword is t1,t2,…,tnIn case of text and industry to be recognized ciTarget matching probability of, Py(ck|t1,t2,…,tn) Indicates that the target keyword is t1,t2,…,tnIn case of text and industry to be recognized ckThe probability of the initial match of (a),
Figure BDA0002082596660000117
representing an industry ciAnd trade of ckSimilarity of (c), S is trade ciThe similarity of the industry and the industry is 1.
Because industries with high similarity exist among the industries, hierarchical relations and different subdivision relations in the same industry may exist among the industries, when the probability of the industry to which the company name belongs is calculated, the probability that the company name belongs to the adjacent industry can be integrated, then weighted average is carried out according to the similarity of the industries, the information intercommunication of the similar industries is realized, the prediction accuracy is improved, and the false alarm rate of mismatch identification of the industries is reduced.
Optionally, when determining the probability of occurrence of each target keyword in any industry, the embodiments of the present application provide at least the following two implementations:
in one possible implementation, the probability of each target keyword occurring in any industry is determined using the following equation (7):
Figure BDA0002082596660000121
wherein, P (t)j|ci) Representing an industry ciLower occurrence of target keyword tjProbability of (1), count (t)j,ci) Representing an industry ciLower occurrence of target keyword tjNumber of times, count (t)j,ck) Representing an industry ckLower occurrence of target keyword tjV is the set of all keywords, and m is the number of industries.
In a possible implementation manner, since most users fill in real industry information when filling in industry information, and the real industry information helps to reduce false alarm rate, when calculating the probability of occurrence of each target keyword in any industry, different calculation methods can be adopted according to the matching condition of the industry and the industry information configured corresponding to the text to be recognized, specifically:
when the industry is not matched with the industry information correspondingly configured to the text to be recognized, the probability of each target keyword appearing under the industry accords with the following formula (1):
Figure BDA0002082596660000122
wherein, P1(tj|ci) Express when the industry ciWhen the industry information is not matched with the industry information correspondingly configured to the text to be recognized, the industry ciLower occurrence of target keyword tjProbability of (1), count (t)j,ci) Representing an industry ciLower occurrence of target keyword tjNumber of times, count (t)j,ck) Representing an industry ckLower occurrence of target keyword tjV is the set of all keywords, and m is the number of industries.
When the industry is matched with the industry information correspondingly configured to the text to be recognized, the probability of each target keyword appearing in the industry accords with the following formula (2):
Figure BDA0002082596660000131
wherein, P2(tj|ci) Express when the industry ciWhen the industry information is matched with the industry information correspondingly configured with the text to be recognized, the industry ciLower occurrence of target keyword tjProbability of (P)1(tj|ci) Express when the industry ciWhen the industry information is not matched with the industry information correspondingly configured to the text to be recognized, the industry ciLower occurrence of target keyword tjα and β are confidence parameters.
In the specific implementation, how much the user is believed is determined according to the matching degree of the company name and the industry information, if the matching degree of the industry information filled by the user and the company name is higher, the higher confidence is given to the fact that the user fills the correct industry information, and otherwise, the lower confidence is given to the fact that the user fills the correct industry information. Therefore, when the confidence coefficient parameter is determined, an embedded linear index enhancement formula is adopted, the confidence coefficient is exponentially increased or decreased according to the matching degree of the industry information filled by the user and the company name, and because the index function has the characteristic of increasing the slope, the differentiated confidence coefficient function can be realized according to the matching degree of the industry, namely, the confidence coefficient is increased continuously along with the increase of the matching degree.
The method and the device have the advantages that the industry information input by the user is integrated when the target matching probability of the text to be recognized and each industry is predicted, so that the prediction accuracy is improved, and the false alarm rate of the mismatch recognition of the industries is reduced.
In order to better explain the embodiment of the present application, an industry information identification method provided by the embodiment of the present application is described below with reference to a specific implementation scenario, as shown in fig. 9, the method includes the following steps:
the method comprises the steps of setting a text to be recognized as 'XX restaurant', extracting a keyword 'restaurant' in the 'XX restaurant' by adopting an industry keyword dictionary, and determining a semantic interpretation text 'restaurant' of the keyword 'restaurant' by adopting a semantic dictionary, wherein the restaurant is a facility or a public dining room which publicly provides food, beverage and other food to the general public in a certain place. ". Then, an industry keyword dictionary is adopted to extract keywords in the semantic interpretation text as 'food, beverage and catering'. The keyword "restaurant, food, beverage, restaurant" is used as a target keyword for the text "XX restaurant" to be recognized. And inputting the target keywords into an industry prediction model, and outputting the target matching probability of 'XX restaurant' and each industry, wherein the industry prediction model is a naive Bayes model. In specific implementation, the probability of occurrence of each target keyword in any industry is determined by using formula (1) and formula (2), and then the target matching probability of 'XX restaurant' and each industry is determined by using formulae (4) to (6). And sequencing all the industries according to the sequence of the target matching probability from large to small, and determining the top five industries as target industries corresponding to the XX restaurant. Setting target industries as 'industry A, industry B, industry C, industry D and industry E', setting the industry information input by the user as 'industry A', and determining that the industry information input by the user is normal because the industry information input by the user is matched with 'industry A' in the target industry.
After the text to be recognized is obtained, the industry keyword dictionary is adopted to extract the target keywords in the text to be recognized, so that the keywords representing industry information are extracted from the text to be recognized, then the target matching probability of the text to be recognized and each industry is determined according to the target keywords and the industry prediction model, and the target industry corresponding to the text to be recognized is determined according to the target matching probability. By comparing the target industry with the industry information input by the user, whether the industry information input by the user is wrong is judged, so that the risk caused by filling in wrong industry information is avoided.
Based on the same technical concept, the embodiment of the present application provides an industry information identification apparatus, as shown in fig. 10, the apparatus 1000 includes:
the extraction module 1001 is used for acquiring a text to be recognized and extracting a target keyword corresponding to the text to be recognized by adopting an industry keyword dictionary, wherein the industry keyword dictionary comprises a keyword set corresponding to each industry;
the prediction module 1002 is configured to input a target keyword corresponding to the text to be recognized into an industry prediction model, and determine a target matching probability between the text to be recognized and each industry;
and the matching module 1003 is configured to determine a target industry corresponding to the text to be recognized according to the target matching probability of the text to be recognized and each industry.
Optionally, a decision module 1004 is further included;
the determining module 1004 is specifically configured to:
acquiring industry information configured corresponding to the text to be recognized;
and when the determined target industry is not matched with the correspondingly configured industry information, determining that the correspondingly configured industry information is abnormal.
Optionally, the extracting module 1001 is specifically configured to:
performing word segmentation on a text to be recognized by adopting an industry keyword dictionary, and determining a first class of keywords from the text to be recognized;
determining semantic interpretation texts corresponding to the first class of keywords according to a semantic dictionary;
adopting the industry keyword dictionary to perform word segmentation on the semantic interpretation texts corresponding to the first type of keywords, and determining second type of keywords from the semantic interpretation texts corresponding to the first type of keywords;
and determining the first class of keywords and the second class of keywords as target keywords corresponding to the text to be recognized.
Optionally, the prediction module 1002 is specifically configured to:
and aiming at each industry, determining the target matching probability of the text to be recognized and the industry according to the occurrence probability of each target keyword in the industry, the occurrence probability of the industry and the probability of the simultaneous occurrence of the target keywords corresponding to the text to be recognized.
Optionally, the prediction module 1002 is specifically configured to:
aiming at each industry, determining the initial matching probability of the text to be recognized and the industry according to the probability of each target keyword appearing in the industry, the probability of the industry appearing and the probability of the target keyword corresponding to the text to be recognized appearing at the same time;
determining the adjacent industries of the industries according to the similarity between the industries and other industries;
and determining the target matching probability of the text to be recognized and the industry according to the initial matching probability of the text to be recognized and the industry, the initial matching probability of the text to be recognized and the adjacent industry and the similarity of the industry and the adjacent industry.
Optionally, the prediction module 1002 is specifically configured to:
determining the similarity between the industry and any other industry according to the intersection and union of the keyword sets of the industry and the keyword sets of any other industry;
and determining other industries of which the similarity meets a preset condition as the adjacent industries of the industries.
Optionally, the prediction module 1002 is further configured to:
when the industry is not matched with the industry information correspondingly configured to the text to be recognized, the probability of each target keyword appearing under the industry accords with the following formula (1):
Figure BDA0002082596660000151
wherein, P1(tj|ci) Express when the industry ciWhen the industry information is not matched with the industry information correspondingly configured to the text to be recognized, the industry ciLower occurrence of target keyword tjProbability of (1), count (t)j,ci) Representing an industry ciLower occurrence of target keyword tjNumber of times, count (t)j,ck) Representing an industry ckLower occurrence of target keyword tjV is all keyword sets, m is industry number;
when the industry is matched with the industry information correspondingly configured to the text to be recognized, the probability of each target keyword appearing under the industry accords with the following formula (2):
Figure BDA0002082596660000161
wherein, P2(tj|ci) Is shown asIndustry ciWhen the industry information is matched with the industry information correspondingly configured with the text to be recognized, the industry ciLower occurrence of target keyword tjProbability of (P)1(tj|ci) Express when the industry ciWhen the industry information is not matched with the industry information correspondingly configured to the text to be recognized, the industry ciLower occurrence of target keyword tjα and β are confidence parameters.
Based on the same technical concept, the embodiment of the present application provides a computer device, as shown in fig. 11, including at least one processor 1101 and a memory 1102 connected to the at least one processor, where a specific connection medium between the processor 1101 and the memory 1102 is not limited in the embodiment of the present application, and the processor 1101 and the memory 1102 are connected through a bus in fig. 11 as an example. The bus may be divided into an address bus, a data bus, a control bus, etc.
In the embodiment of the present application, the memory 1102 stores instructions executable by the at least one processor 1101, and the at least one processor 1101 may execute the steps included in the aforementioned industry information identification method by executing the instructions stored in the memory 1102.
The processor 1101 is a control center of the computer device, and may be connected to various portions of the computer device by using various interfaces and lines, and identify the industry information by executing or executing instructions stored in the memory 1102 and calling up data stored in the memory 1102. Optionally, the processor 1101 may include one or more processing units, and the processor 1101 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1101. In some embodiments, the processor 1101 and the memory 1102 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 1101 may be a general purpose processor such as a Central Processing Unit (CPU), a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
Memory 1102, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 1102 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 1102 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1102 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function to store program instructions and/or data.
The computer device further includes an input unit 1103, a display unit 1104, a radio frequency unit 1105, an audio circuit 1106, a speaker 1107, a microphone 1108, a Wireless Fidelity (WiFi) module 1109, a bluetooth module 1110, a power supply 1111, an external interface 1112, a headphone jack 1113, and the like.
The input unit 1103 may be configured to receive a request for downloading a target application input by a user, an instruction for installing the target application input by the user, an instruction for authorizing the application manager to use the network interception component input by the user, and the like. For example, the input unit 1103 may include a touch screen 11031 and other input devices 11032. The touch screen 11031 may collect touch operations of a user (e.g., operations of the user on or near the touch screen 11031 using any suitable object such as a finger, a joint, a stylus, etc.), i.e., the touch screen 11031 may be used to detect a touch pressure and a touch input position and a touch input area, and drive corresponding connection devices according to a preset program. The touch screen 11031 may detect a touch operation of the touch screen 11031 by a user, convert the touch operation into a touch signal and transmit the touch signal to the processor 1101, or may be understood as transmitting touch information of the touch operation to the processor 1101, and may receive and execute a command transmitted from the processor 1101. The touch information may include at least one of pressure magnitude information and pressure duration information. The touch screen 11031 may provide an input interface and an output interface between the computer device and a user. In addition, the touch screen 11031 may be implemented in various types, such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 1103 may include other input devices 11032 in addition to the touch screen 11031. For example, other input devices 11032 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The display unit 1104 may be used to display information input by the user or information provided to the user. Further, the touch screen 11031 may cover the display unit 1104, and when the touch screen 11031 detects a touch operation thereon or nearby, the touch screen 11031 may transmit the pressure information of the touch operation to the processor 1101 to be determined. In the embodiment of the present application, the touch screen 11031 and the display unit 1104 may be integrated into one component to implement input, output, and display functions of the computer device. For convenience of description, the embodiment of the present application is schematically illustrated by taking the touch screen 11031 as an example of the functional set of the touch screen 11031 and the display unit 1104, but in some embodiments, the touch screen 11031 and the display unit 1104 may also be taken as two separate components.
When the display unit 1104 and the touch panel are superimposed on each other in the form of layers to form the touch screen 11031, the display unit 1104 may function as an input device and an output device, and when functioning as an output device, may be used to display an image, for example, to display an installation interface of a target application. The Display unit 1104 may include at least one of a Liquid Crystal Display (LCD), a Thin Film Transistor Liquid Crystal Display (TFT-LCD), an Organic Light Emitting Diode (OLED) Display, an Active Matrix Organic Light Emitting Diode (AMOLED) Display, an In-Plane Switching (IPS) Display, a flexible Display, a 3D Display, and the like. Some of these displays may be configured to be transparent to allow a user to view from the outside, which may be referred to as transparent displays, and the computer device may include two or more display units, depending on the particular desired implementation.
The rf unit 1105 may be used for receiving and transmitting information or signals during a call. Typically, the radio frequency circuitry includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. Further, the radio frequency unit 1005 may also communicate with a network device and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc.
The audio circuitry 1106, speaker 1107, microphone 1108 can provide an audio interface between a user and a computer device. The audio circuit 1106 may transmit the electrical signal converted from the received audio data to the speaker 1107, and the electrical signal is converted into an acoustic signal by the speaker 1107 and output. On the other hand, the microphone 1108 converts the collected sound signal into an electrical signal, which is received by the audio circuit 1106 and converted into audio data, and then the audio data is processed by the audio data output processor 1101, and then sent to another electronic device via the rf unit 1105, or output to the memory 1102 for further processing, and the audio circuit may also include a headphone jack 1113 for providing a connection interface between the audio circuit and a headphone.
WiFi belongs to short-distance wireless transmission technology, and the computer equipment can help a user to send and receive e-mails, browse webpages, access streaming media and the like through the WiFi module 1109, and provides wireless broadband internet access for the user. Although fig. 11 shows the WiFi module 1109, it is understood that it does not belong to the essential constitution of the computer device, and may be omitted entirely as needed within the scope not changing the essence of the invention.
Bluetooth is a short-range wireless communication technology. By using the bluetooth technology, the communication between mobile communication computer devices such as a palm computer, a notebook computer, a mobile phone and the like can be effectively simplified, the communication between the devices and the Internet (Internet) can also be successfully simplified, the data transmission between the computer devices and the Internet becomes faster and more efficient through the bluetooth module 1110, and the way is widened for wireless communication. Bluetooth technology is an open solution that enables wireless transmission of voice and data. Although fig. 11 shows the WiFi module 1109, it is understood that it does not belong to the essential constitution of the computer device, and may be omitted entirely as needed within the scope not changing the essence of the invention.
The computer device may also include a power supply 1111, such as a battery, for receiving external power to power the various components within the computer device. Preferably, the power supply 1111 may be logically connected to the processor 1101 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.
The computer device may further include an external interface 1112, where the external interface 1112 may include a standard Micro USB interface, may also include a multi-pin connector, and may be used to connect the computer device to communicate with other devices, and may also be used to connect a charger to charge the computer device.
Although not shown, the computer device may further include a camera, a flash, and other possible functional modules, which are not described in detail herein.
Based on the same technical concept, embodiments of the present application provide a computer-readable storage medium storing a computer program executable by a computer device, which, when the program runs on the computer device, causes the computer device to perform the steps of the industry information identification method.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. An industry information identification method, comprising:
acquiring a text to be recognized, and extracting target keywords corresponding to the text to be recognized by adopting an industry keyword dictionary, wherein the industry keyword dictionary comprises a keyword set corresponding to each industry;
inputting target keywords corresponding to the text to be recognized into an industry prediction model, and determining the target matching probability of the text to be recognized and each industry;
and determining the target industry corresponding to the text to be recognized according to the target matching probability of the text to be recognized and each industry.
2. The method of claim 1, further comprising:
acquiring industry information configured corresponding to the text to be recognized;
and when the determined target industry is not matched with the correspondingly configured industry information, determining that the correspondingly configured industry information is abnormal.
3. The method of claim 1, wherein the extracting the target keyword corresponding to the text to be recognized by using the industry keyword dictionary comprises:
performing word segmentation on a text to be recognized by adopting an industry keyword dictionary, and determining a first class of keywords from the text to be recognized;
determining semantic interpretation texts corresponding to the first class of keywords according to a semantic dictionary;
adopting the industry keyword dictionary to perform word segmentation on the semantic interpretation texts corresponding to the first type of keywords, and determining second type of keywords from the semantic interpretation texts corresponding to the first type of keywords;
and determining the first class of keywords and the second class of keywords as target keywords corresponding to the text to be recognized.
4. The method of claim 1, wherein the inputting the target keywords corresponding to the text to be recognized into an industry prediction model and the determining the target matching probability of the text to be recognized and each industry comprises:
and aiming at each industry, determining the target matching probability of the text to be recognized and the industry according to the occurrence probability of each target keyword in the industry, the occurrence probability of the industry and the probability of the simultaneous occurrence of the target keywords corresponding to the text to be recognized.
5. The method of claim 1, wherein the determining the target matching probability of the text to be recognized and each industry according to the target keywords corresponding to the text to be recognized and an industry prediction model comprises:
aiming at each industry, determining the initial matching probability of the text to be recognized and the industry according to the probability of each target keyword appearing in the industry, the probability of the industry appearing and the probability of the target keyword corresponding to the text to be recognized appearing at the same time;
determining the adjacent industries of the industries according to the similarity between the industries and other industries;
and determining the target matching probability of the text to be recognized and the industry according to the initial matching probability of the text to be recognized and the industry, the initial matching probability of the text to be recognized and the adjacent industry and the similarity of the industry and the adjacent industry.
6. The method of claim 5, wherein said determining a neighbor industry of the industry as a function of similarity between the industry and other industries comprises:
determining the similarity between the industry and any other industry according to the intersection and union of the keyword sets of the industry and the keyword sets of any other industry;
and determining other industries of which the similarity meets a preset condition as the adjacent industries of the industries.
7. The method of any of claims 4 to 6, further comprising:
when the industry is not matched with the industry information correspondingly configured to the text to be recognized, the probability of each target keyword appearing under the industry accords with the following formula (1):
Figure FDA0002082596650000021
wherein, P1(tj|ci) Express when the industry ciWhen the industry information is not matched with the industry information correspondingly configured to the text to be recognized, the industry ciLower occurrence of target keyword tjProbability of (1), count (t)j,ci) Representing an industry ciLower occurrence of target keyword tjNumber of times, count (t)j,ck) Representing an industry ckLower occurrence of target keyword tjV is all keyword sets, m is industry number;
when the industry is matched with the industry information correspondingly configured to the text to be recognized, the probability of each target keyword appearing under the industry accords with the following formula (2):
Figure FDA0002082596650000031
wherein, P2(tj|ci) Express when the industry ciWhen the industry information is matched with the industry information correspondingly configured with the text to be recognized, the industry ciLower occurrence of target keyword tjProbability of (P)1(tj|ci) Express when the industry ciWhen the industry information is not matched with the industry information correspondingly configured to the text to be recognized, the industry ciLower occurrence of target keyword tjα and β are confidence parameters.
8. An industry information identification device, comprising:
the extraction module is used for acquiring a text to be recognized and extracting a target keyword corresponding to the text to be recognized by adopting an industry keyword dictionary, wherein the industry keyword dictionary comprises a keyword set corresponding to each industry;
the prediction module is used for inputting the target keywords corresponding to the text to be recognized into an industry prediction model and determining the target matching probability of the text to be recognized and each industry;
and the matching module is used for determining the target industry corresponding to the text to be recognized according to the target matching probability of the text to be recognized and each industry.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any one of claims 1 to 7 are performed by the processor when the program is executed.
10. A computer-readable storage medium, having stored thereon a computer program executable by a computer device, for causing the computer device to perform the steps of the method of any one of claims 1 to 7, when the program is run on the computer device.
CN201910476994.7A 2019-06-03 2019-06-03 Industry information identification method and device Active CN112115710B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910476994.7A CN112115710B (en) 2019-06-03 2019-06-03 Industry information identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910476994.7A CN112115710B (en) 2019-06-03 2019-06-03 Industry information identification method and device

Publications (2)

Publication Number Publication Date
CN112115710A true CN112115710A (en) 2020-12-22
CN112115710B CN112115710B (en) 2023-08-08

Family

ID=73795187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910476994.7A Active CN112115710B (en) 2019-06-03 2019-06-03 Industry information identification method and device

Country Status (1)

Country Link
CN (1) CN112115710B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343684A (en) * 2021-06-22 2021-09-03 广州华多网络科技有限公司 Core product word recognition method and device, computer equipment and storage medium
CN113377904A (en) * 2021-06-04 2021-09-10 百度在线网络技术(北京)有限公司 Industry action recognition method and device, electronic equipment and storage medium
CN116842180A (en) * 2023-08-30 2023-10-03 中电科大数据研究院有限公司 Method and device for identifying industry to which document belongs

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105190489A (en) * 2013-03-14 2015-12-23 微软技术许可有限责任公司 Language model dictionaries for text predictions
US20160124933A1 (en) * 2014-10-30 2016-05-05 International Business Machines Corporation Generation apparatus, generation method, and program
CN107436875A (en) * 2016-05-25 2017-12-05 华为技术有限公司 File classification method and device
CN107832287A (en) * 2017-09-26 2018-03-23 晶赞广告(上海)有限公司 A kind of label identification method and device, storage medium, terminal
CN108733778A (en) * 2018-05-04 2018-11-02 百度在线网络技术(北京)有限公司 The industry type recognition methods of object and device
CN108959247A (en) * 2018-06-19 2018-12-07 深圳市元征科技股份有限公司 A kind of data processing method, server and computer-readable medium
US20190005014A1 (en) * 2017-06-29 2019-01-03 bizocean Co., Ltd. Information input method, information input apparatus, and information input system
CN109670837A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 Recognition methods, device, computer equipment and the storage medium of bond default risk

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105190489A (en) * 2013-03-14 2015-12-23 微软技术许可有限责任公司 Language model dictionaries for text predictions
US20160124933A1 (en) * 2014-10-30 2016-05-05 International Business Machines Corporation Generation apparatus, generation method, and program
CN107436875A (en) * 2016-05-25 2017-12-05 华为技术有限公司 File classification method and device
US20190005014A1 (en) * 2017-06-29 2019-01-03 bizocean Co., Ltd. Information input method, information input apparatus, and information input system
CN107832287A (en) * 2017-09-26 2018-03-23 晶赞广告(上海)有限公司 A kind of label identification method and device, storage medium, terminal
CN108733778A (en) * 2018-05-04 2018-11-02 百度在线网络技术(北京)有限公司 The industry type recognition methods of object and device
CN108959247A (en) * 2018-06-19 2018-12-07 深圳市元征科技股份有限公司 A kind of data processing method, server and computer-readable medium
CN109670837A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 Recognition methods, device, computer equipment and the storage medium of bond default risk

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377904A (en) * 2021-06-04 2021-09-10 百度在线网络技术(北京)有限公司 Industry action recognition method and device, electronic equipment and storage medium
CN113377904B (en) * 2021-06-04 2024-05-10 百度在线网络技术(北京)有限公司 Industry action recognition method and device, electronic equipment and storage medium
CN113343684A (en) * 2021-06-22 2021-09-03 广州华多网络科技有限公司 Core product word recognition method and device, computer equipment and storage medium
CN113343684B (en) * 2021-06-22 2023-05-26 广州华多网络科技有限公司 Core product word recognition method, device, computer equipment and storage medium
CN116842180A (en) * 2023-08-30 2023-10-03 中电科大数据研究院有限公司 Method and device for identifying industry to which document belongs
CN116842180B (en) * 2023-08-30 2023-12-19 中电科大数据研究院有限公司 Method and device for identifying industry to which document belongs

Also Published As

Publication number Publication date
CN112115710B (en) 2023-08-08

Similar Documents

Publication Publication Date Title
CN111079022B (en) Personalized recommendation method, device, equipment and medium based on federal learning
US11645321B2 (en) Calculating relationship strength using an activity-based distributed graph
KR20200123015A (en) Information recommendation method, apparatus, device and medium
CN109189931B (en) Target statement screening method and device
US11830033B2 (en) Theme recommendation method and apparatus
CN112115710B (en) Industry information identification method and device
WO2019137485A1 (en) Service score determination method and apparatus, and storage medium
WO2021120875A1 (en) Search method and apparatus, terminal device and storage medium
CN111143543A (en) Object recommendation method, device, equipment and medium
CN111966886A (en) Object recommendation method, object recommendation device, electronic equipment and storage medium
CN109446431A (en) For the method, apparatus of information recommendation, medium and calculate equipment
CN110659817A (en) Data processing method and device, machine readable medium and equipment
CN111143678B (en) Recommendation system and recommendation method
CN115271931A (en) Credit card product recommendation method and device, electronic equipment and medium
CN114862480A (en) Advertisement putting orientation method and its device, equipment, medium and product
WO2022245469A1 (en) Rule-based machine learning classifier creation and tracking platform for feedback text analysis
CN113435523B (en) Method, device, electronic equipment and storage medium for predicting content click rate
CN112784861A (en) Similarity determination method and device, electronic equipment and storage medium
CN116204624A (en) Response method, response device, electronic equipment and storage medium
CN115238676A (en) Method and device for identifying hot spots of bidding demands, storage medium and electronic equipment
CN114996578A (en) Model training method, target object selection method, device and electronic equipment
CN115221954A (en) User portrait method, device, electronic equipment and storage medium
CN113505293A (en) Information pushing method and device, electronic equipment and storage medium
CN114547242A (en) Questionnaire investigation method and device, electronic equipment and readable storage medium
CN116205686A (en) Method, device, equipment and storage medium for recommending multimedia resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant