CN108460014A - Recognition methods, device, computer equipment and the storage medium of business entity - Google Patents

Recognition methods, device, computer equipment and the storage medium of business entity Download PDF

Info

Publication number
CN108460014A
CN108460014A CN201810121667.5A CN201810121667A CN108460014A CN 108460014 A CN108460014 A CN 108460014A CN 201810121667 A CN201810121667 A CN 201810121667A CN 108460014 A CN108460014 A CN 108460014A
Authority
CN
China
Prior art keywords
enterprise
word
full name
abbreviation
referred
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810121667.5A
Other languages
Chinese (zh)
Other versions
CN108460014B (en
Inventor
宋烈金
崔燕
岳爱珍
李维之
张琳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810121667.5A priority Critical patent/CN108460014B/en
Publication of CN108460014A publication Critical patent/CN108460014A/en
Application granted granted Critical
Publication of CN108460014B publication Critical patent/CN108460014B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of recognition methods of business entity, device, computer equipment and storage medium, wherein method includes:It acquires public sentiment text and carries out word segmentation processing, obtain word segmentation result;Enterprise's full name identification is carried out according to the word segmentation result, obtains first enterprise's full name that the public sentiment text includes;Enterprise is carried out according to the word segmentation result referred to as to identify, obtains the first enterprise abbreviation that the public sentiment text includes;First enterprise full name that will identify that and first enterprise are referred to as determined as the title of business entity.By this method, business entity can be extracted from network public-opinion text, improve the accuracy rate of business entity's identification.

Description

Recognition methods, device, computer equipment and the storage medium of business entity
Technical field
The present invention relates to Internet technical field more particularly to a kind of recognition methods of business entity, device, computers to set Standby and storage medium.
Background technology
Network public-opinion refer to reach and propagate by interconnection netlist, the public is concerned about oneself or is closely related with number one The set of mood, attitude, opinion and viewpoint held of event.Business entity usually carries crucial letter in network public-opinion Breath, it is the key that excavate network public-opinion that business entity, which is identified,.
However, business entity's nomenclature rule is not strong, use is more random, and often occurs in the form of abbreviation how Business entity is extracted from network public-opinion text becomes urgent problem to be solved.
Invention content
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, first purpose of the present invention is to propose a kind of recognition methods of business entity, by being looked forward to respectively Industry full name identifies and enterprise referred to as identifies, obtains enterprise's full name and enterprise referred to as, and then the enterprise's full name that will identify that and enterprise Referred to as it is determined as the title of business entity, business entity can be extracted from network public-opinion text, improves business entity's identification Accuracy rate.
Second object of the present invention is to propose a kind of identification device of device business entity.
Third object of the present invention is to propose a kind of computer equipment.
Fourth object of the present invention is to propose a kind of computer program product.
The 5th purpose of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
In order to achieve the above object, first aspect present invention embodiment proposes a kind of recognition methods of business entity, including:
It acquires public sentiment text and carries out word segmentation processing, obtain word segmentation result;
Enterprise's full name identification is carried out according to the word segmentation result, it is complete to obtain the first enterprise that the public sentiment text includes Claim;
Enterprise is carried out according to the word segmentation result referred to as to identify, obtains the first enterprise letter that the public sentiment text includes Claim;
First enterprise full name that will identify that and first enterprise are referred to as determined as the title of business entity.
The recognition methods of the business entity of the embodiment of the present invention carries out word segmentation processing by the public sentiment text to acquisition and obtains Word segmentation result carries out enterprise full name identification according to word segmentation result, obtains first enterprise's full name that public sentiment text includes, according to point Word result carries out enterprise and referred to as identifies, obtains the first enterprise abbreviation that public sentiment text includes, and then the first enterprise that will identify that Industry full name and the first enterprise are referred to as determined as the title of business entity.Thereby, it is possible to extract enterprise from network public-opinion text Entity improves the accuracy rate of business entity's identification.
In order to achieve the above object, second aspect of the present invention embodiment proposes a kind of identification device of business entity, including:
Word-dividing mode carries out word segmentation processing for acquiring public sentiment text, obtains word segmentation result;
Full name identification module is obtained for carrying out enterprise's full name identification according to the word segmentation result in the public sentiment text Including first enterprise's full name;
Abbreviation identification module is referred to as identified for carrying out enterprise according to the word segmentation result, is obtained in the public sentiment text Including the first enterprise referred to as;
Determining module, first enterprise full name and first enterprise for will identify that referred to as are determined as enterprise's reality The title of body.
The identification device of the business entity of the embodiment of the present invention carries out word segmentation processing by the public sentiment text to acquisition and obtains Word segmentation result carries out enterprise full name identification according to word segmentation result, obtains first enterprise's full name that public sentiment text includes, according to point Word result carries out enterprise and referred to as identifies, obtains the first enterprise abbreviation that public sentiment text includes, and then the first enterprise that will identify that Industry full name and the first enterprise are referred to as determined as the title of business entity.Thereby, it is possible to extract enterprise from network public-opinion text Entity improves the accuracy rate of business entity's identification.
In order to achieve the above object, third aspect present invention embodiment proposes a kind of computer equipment, including:It processor and deposits Reservoir;Wherein, the processor is held to run with described by reading the executable program code stored in the memory The corresponding program of line program code, for realizing the recognition methods of the business entity as described in first aspect embodiment.
In order to achieve the above object, fourth aspect present invention embodiment proposes a kind of computer program product, when the calculating The recognition methods of the business entity as described in first aspect embodiment is realized when instruction in machine program product is executed by processor.
In order to achieve the above object, fifth aspect present invention embodiment proposes a kind of non-transitory computer-readable storage medium Matter is stored thereon with computer program, realizes that the enterprise as described in first aspect embodiment is real when which is executed by processor The recognition methods of body.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obviously, or practice through the invention is recognized.
Description of the drawings
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, wherein:
The flow diagram of the recognition methods for the first business entity that Fig. 1 is provided by the embodiment of the present invention;
The flow diagram of the recognition methods for second of business entity that Fig. 2 is provided by the embodiment of the present invention;
The flow diagram of the recognition methods for the third business entity that Fig. 3 is provided by the embodiment of the present invention;
Fig. 4 is the method flow schematic diagram for building enterprise's abbreviation dictionary tree;
The flow diagram of the recognition methods for the 4th kind of business entity that Fig. 5 is provided by the embodiment of the present invention;
The structural schematic diagram of the identification device for the first business entity that Fig. 6 is provided by the embodiment of the present invention;
The structural schematic diagram of the identification device for second of business entity that Fig. 7 is provided by the embodiment of the present invention;
The structural schematic diagram of the identification device for the third business entity that Fig. 8 is provided by the embodiment of the present invention;
The structural schematic diagram of the identification device for the 4th kind of business entity that Fig. 9 is provided by the embodiment of the present invention;And
Figure 10 is the structural schematic diagram for the computer equipment that one embodiment of the invention proposes.
Specific implementation mode
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the recognition methods of the business entity of the embodiment of the present invention, device, computer equipment are described and is deposited Storage media.
Currently, in order to identify that the business entity in network public-opinion, existing recognition methods are mostly to be based on hidden Markov Model, conditional random field models or specific dictionary tree identify enterprise name.
However, the recognition methods based on hidden Markov model and conditional random field models needs the mould of structure training in advance Type, using the business entity in public sentiment text in the Model Identification after training, both recognition methods only can recognize that name compared with For enterprise's full name of specification, None- identified goes out referred to as or for title.
In view of the above-mentioned problems, the present invention proposes a kind of recognition methods of business entity, to accurately identify in public sentiment text Business entity, improve business entity identification accuracy rate.The first business entity that Fig. 1 is provided by the embodiment of the present invention The flow diagram of recognition methods.
As shown in Figure 1, the recognition methods of the business entity includes the following steps:
Step 101, acquisition public sentiment text carries out word segmentation processing, obtains word segmentation result.
Network public-opinion refers to the network public opinion to social concern different views of prevalence on the internet, is public opinion A kind of form of expression is that have relatively by force to what certain hot spots, focal issue in actual life were held by the public of transmission on Internet Influence power, tendentious speech and viewpoint.
Network public-opinion is expressed and is propagated using network as carrier, to which in the present embodiment, can be obtained from network needs The network public-opinion text of business entity's identification is carried out, and cutting word is carried out to the public sentiment text of acquisition, rejecting punctuation mark, goes to stop The pretreatment operations such as word.Wherein it is possible to carry out cutting word to public sentiment text using relevant tokenizer, cutting word granularity such as may be used Think basic granularities.
Step 102, enterprise's full name identification is carried out according to word segmentation result, it is complete obtains the first enterprise that public sentiment text includes Claim.
In the present embodiment, public sentiment text is segmented after obtaining word segmentation result, enterprise can be carried out according to word segmentation result Full name identifies, to obtain first enterprise's full name that public sentiment text includes.
As a kind of possible realization method, machine mould can be trained using known business entity's title, Machine mould after being trained, and then based on the machine mould after training, word segmentation result is input to the machine mould after training In type, the business entity's title exported, first enterprise's full name of the public sentiment text as acquisition.
Step 103, it carries out enterprise according to word segmentation result referred to as to identify, obtains the first enterprise letter that public sentiment text includes Claim.
Current existing business entity's recognition methods is concerned only with enterprise name, does not consider and the relevant attribute spy of enterprise Sign, is unfavorable for accurately identifying for business entity.It, can be with root after the word segmentation result for obtaining public sentiment text in the present embodiment Enterprise is carried out according to word segmentation result referred to as to identify, to obtain the first enterprise abbreviation that public sentiment text includes.For example, can dig in advance Enterprise's abbreviation of different enterprises and attribute information corresponding with enterprise are excavated, according to attribute information and enterprise's abbreviation, from participle As a result the first enterprise abbreviation that public sentiment text includes is identified in.
Herein it should be noted that carrying out enterprise's referred to as identification according to word segmentation result to obtain public sentiment text in the present embodiment The specific implementation process for the first enterprise's abbreviation for including will illustrate in subsequent content, to avoid repeating, not make herein Detailed description.
Step 104, the first enterprise's full name that will identify that and the first enterprise are referred to as determined as the title of business entity.
In the present embodiment, the first enterprise's full name and the first enterprise that public sentiment text includes are obtained respectively according to word segmentation result After industry abbreviation, first enterprise's full name and the first enterprise can be referred to as determined as to the title of business entity.
The recognition methods of the business entity of the present embodiment carries out word segmentation processing by the public sentiment text to acquisition and is segmented As a result, carrying out enterprise's full name identification according to word segmentation result, first enterprise's full name that public sentiment text includes is obtained, is tied according to participle Fruit carries out enterprise and referred to as identifies, obtains the first enterprise abbreviation that public sentiment text includes, and then the first enterprise that will identify that is complete Claim the title for being referred to as determined as business entity with the first enterprise.Thereby, it is possible to extract business entity from network public-opinion text, Improve the accuracy rate of business entity's identification.
In order to clearly describe to carry out enterprise's full name identification to word segmentation result in previous embodiment, public sentiment text is obtained The specific implementation process for the first enterprise's full name for including, the embodiment of the present invention propose the identification side of another business entity Method, the flow diagram of the recognition methods for second of business entity that Fig. 2 is provided by the embodiment of the present invention.
As shown in Fig. 2, on the basis of embodiment as shown in Figure 1, step 102 may comprise steps of:
Step 201, according to the semanteme segmented in word segmentation result, the participle in word segmentation result is labeled.
Enterprise name usually has fixed composed structure, generally by administrative division, font size, industry and organizational form structure At, wherein font size is most important component part in enterprise name, can be identified for that enterprise itself and is substantially distinguished from other enterprises Industry has the characteristics that expressing the meaning property, exclusivity.For example, in " Shenzhen has finance to melt Co., Ltd ", " Shenzhen " is administrative division, " is had Wealth " is font size, and " finance " is industry, and " Co., Ltd " is organizational form.In the present embodiment, the public sentiment text of acquisition is divided After word obtains word segmentation result, participle can be labeled further according to the semanteme of each participle in word segmentation result.
As a kind of possible realization method, can by the semanteme of each participle in artificial judgment word segmentation result, and according to Semanteme is labeled the participle for the enterprise name that the word segmentation result of public sentiment text includes, the enterprise for including by public sentiment text The participle of title marks the labels such as administrative division, font size, industry respectively, and remaining participle in public sentiment text is labeled as non-physical Part, non-physical part do not include the title of business entity.
As a kind of possible realization method, language model can be utilized to segment into rower each in word segmentation result Note, needs first to train to obtain language model at this time.Specifically, a large amount of public sentiment text can be collected from internet, and to carriage Feelings text segmented, is removed stop words, is rejected the pretreatment operations such as punctuation mark, obtains word segmentation result, and to word segmentation result into Rower is noted, and identifies the enterprise name that public sentiment text includes, and the marks such as font size, industry are marked to each participle of enterprise name Label.In turn, using the word segmentation result of public sentiment text as the input of machine learning model, by the participle of enterprise name and corresponding Output of the label as machine learning model, is trained machine learning model, obtains language model.To, then it is follow-up right During business entity is identified, the word segmentation result of public sentiment text can be input in language model, be segmented Annotation results.
Step 202, it by the markup information of participle and participle, is successively inputted to carry out enterprise's full name in conditional random field models Identification, the recognition result segmented;Wherein, recognition result includes the first information and the second information, and the first information is used for table Show that participle is the word in first enterprise's full name, the second information is location information of the participle in first enterprise's full name.
It, can be by the participle of gained and participle after being labeled to the word segmentation result of public sentiment text in the present embodiment Markup information is continuously input in conditional random field models and carries out enterprise's full name identification, the recognition result segmented.Wherein, The markup information of participle is the label being labeled to participle, and conditional random field models are that advance training obtains.
Condition random field (Conditional Random Fields, CRF) model is in the item for inputting one group of stochastic variable The conditional probability distribution model of another group of stochastic variable is exported under part, its main feature is that assuming that the stochastic variable of output constitutes Ma Erke Husband's random field.CRF models are different from production model, can use the feature of observation sequence that is abundant, overlapping each other, and There is no particularly severe hypotheses;Also different from maximum entropy Markov model equiprobability model, CRF models are not to list Global search is carried out again after one mark normalization, but an optimal flag sequence, energy are solved on entire observation sequence Enough the problem of preventing mark prejudice.
In the present embodiment, training corpus can be built in advance, and instructed to CRF models using the training corpus of structure Practice.When building training corpus, a large amount of public sentiment text can be obtained from network, and enterprise is built according to known enterprise's full name Full name dictionary tree traverses the public sentiment text of acquisition using enterprise's full name dictionary tree, obtains the public sentiment for including enterprise's full name Text.In turn, the public sentiment text comprising enterprise's full name is divided into entity part and non-physical part, and to entity part into one Step is structured as administrative division, font size, industry etc., i.e., is segmented and marked to entity part.Wherein, to the reality of public sentiment text The process that body portion is segmented and marked can also utilize the model realization after training by manually realizing.Will participle and The markup information of participle, will be for indicating that a certain participle is the first information and this point of the word in enterprise's full name as input Location information of the word in enterprise's full name is trained CRF models, the CRF models after being trained as output, for Subsequent enterprise's full name identification.
In the enterprise's full name for including in identifying public sentiment text, the participle of public sentiment text and markup information are input to training In CRF models afterwards, the first information and the participle that can obtain the participle that first enterprise's full name includes are complete in the first enterprise Location information in title.It can indicate that whether participle is to constitute the word of first enterprise's full name, and pass through by the first information Location information can learn the participle starting in first enterprise's full name, intermediate or ending.For example, S, M and E can be passed through Position of the participle in first enterprise's full name is represented respectively.Wherein, S indicates that beginning, M indicate intermediate, and E indicates ending.
CRF models can be simultaneously using the top n word of core word and rear M word as the contextual information of the core word (M, N are positive integer), the result of final core word label will be front and back word information joint effect as a result, being more in line with The actual conditions of enterprise's full name identification.To be CRF models in a kind of possible realization method of the embodiment of the present invention The observation window of preset length is set, wherein preset length can be voluntarily arranged, for example setting preset length is 5, i.e. N and M are 5.To, will participle and participle markup information be input in conditional random field models carry out the identification of enterprise full name when, can be with During conditional random field models identify participle, first point with participle there are context relation is determined by observation window Word obtains the markup information of the first participle and the first participle, in turn, markup information and the first participle based on participle and participle Enterprise's full name identification, the recognition result segmented are carried out with the markup information of the first participle.Thereby, it is possible to make enterprise's full name Identification process more matches the structure of enterprise name.
Step 203, if first information instruction includes that first enterprise's full name is extracted according to location information from participle Go out first enterprise's full name.
It, can be true according to the first information in word segmentation result after identification obtains the recognition result of participle in the present embodiment Determine in public sentiment text whether to include enterprise's full name, it, then can be according to second when first information instruction includes first enterprise's full name The location information of the participle of information instruction, extracts first enterprise's full name from participle.
Further, in a kind of possible realization method of the embodiment of the present invention, as shown in Fig. 2, may be used also after step 203 To include:
Step 204, using the enterprise's full name dictionary tree built in advance, it is complete in enterprise's full name dictionary library to inquire the first enterprise Claim.
In the present embodiment, after identifying first enterprise's full name in public sentiment text, the first enterprise can be further judged Full name whether there is in enterprise's full name dictionary library.Wherein, the enterprise's full name stored in enterprise's full name dictionary library and structure in advance Enterprise's full name dictionary tree in enterprise's full name it is consistent.
After obtaining first enterprise's full name, first enterprise's full name can be matched in enterprise's full name dictionary tree.First, The node consistent with the first character of first enterprise's full name or word is inquired in enterprise's full name dictionary tree, if not inquiring and the The consistent node of the first character or word of one enterprise's full name then shows that there is no first enterprise is complete in enterprise's full name dictionary library Claim;Otherwise, whether the child node for continuing to inquire the node is consistent with second word or word of the first enterprise full name, if differing It causes, then shows that the first enterprise full name is not present in enterprise's full name dictionary library;Otherwise, the child node for continuing to inquire the child node is It is no consistent with next word of the first enterprise full name or word, it repeats the above steps, until having traversed the first enterprise full name. If the last character or word of the first enterprise full name inquire in the child node of enterprise's full name dictionary tree, can be true The fixed first enterprise full name is present in enterprise's full name dictionary library.
Step 205, if not inquiring first enterprise's full name in enterprise's full name dictionary library, first enterprise's full name is increased Into enterprise's full name dictionary library, enterprise's full name dictionary library is updated.
Step 206, using first enterprise's full name, enterprise's full name dictionary tree is updated.
In the present embodiment, if not inquiring the first enterprise's full name identified in enterprise's full name dictionary library, by One enterprise's full name is added in enterprise's full name dictionary library, to realize the update to enterprise's full name dictionary library, and utilizes the first enterprise Full name is updated enterprise's full name dictionary tree.
It specifically, can be from enterprise's full name dictionary tree when using first enterprise's full name more new spectra full name dictionary tree Root node starts to search for, and inquires the node where the first character of first enterprise's full name, and utilizes the of first enterprise's full name The child node of node where two words search first character, if the child node for including second word is not found, in first character Increase child node at the node of place, the word of the child node is second word of first enterprise's full name, and utilizes first enterprise's full name In remaining word build new child node successively;If the node where finding second word where first character at node, Continue to increase newly if not finding in the node where the third word for searching first enterprise's full name where second word at node Node;If finding, the node where next word is continued to search for, until first enterprise's full name is added to enterprise's full name dictionary In tree.
The recognition methods of enterprise's full name of the present embodiment will be segmented by being labeled to participle according to the semantic of participle Continuously be input in conditional random field models with the markup information of participle and carry out enterprise full name identification, obtain comprising the first information and The recognition result of second information, when first information instruction includes first enterprise's full name, according to the location information in the second information First enterprise's full name is extracted from participle, can accurately identify enterprise's full name in public sentiment text.By using advance structure Enterprise's full name dictionary tree, first enterprise's full name is inquired in enterprise's full name dictionary library, and when not inquiring by the first enterprise Full name is added in enterprise's full name dictionary library, is updated to enterprise's full name dictionary library, and using first enterprise's full name to enterprise Full name dictionary tree is updated, can realize enterprise's full name dictionary tree from iteration, people is reduced in automation expanded enterprise's entity library Work intervention improves the recall rate of business entity's identification.
In order to clearly describe in previous embodiment, enterprise is carried out to word segmentation result and is referred to as identified, public sentiment text is obtained The specific implementation process of this first enterprise's abbreviation for including, the embodiment of the present invention propose the identification side of another business entity Method, the flow diagram of the recognition methods for the third business entity that Fig. 3 is provided by the embodiment of the present invention.
As shown in figure 3, on the basis of embodiment as shown in Figure 1, step 103 may comprise steps of:
Step 301, using the enterprise's abbreviation dictionary library built in advance, the second enterprise letter that word segmentation result includes is matched Claim.
Wherein, the second enterprise is referred to as the enterprise being present in the enterprise's abbreviation dictionary library built in advance referred to as, enterprise's letter Claiming dictionary library is determined according to the click logs of historical search word and/or the font size information of enterprise.
It, can be with before matching second enterprise's abbreviation in word segmentation result using enterprise's abbreviation dictionary library in the present embodiment First structure enterprise abbreviation dictionary library.In turn, using the enterprise's abbreviation dictionary library built in advance, match what word segmentation result included Second enterprise is referred to as.For example, can be traversed to the word segmentation result of public sentiment text using enterprise's abbreviation dictionary library, tied from participle Filtered out in fruit with the participle of the abbreviation matching in enterprise abbreviation dictionary library as the second enterprise referred to as.
Herein it should be noted that the building process of enterprise's abbreviation dictionary library will provide in subsequent content, to avoid weight It is multiple, it is not described in detail herein.
Step 302, from public sentiment text, acquisition includes the context sentence of second enterprise's abbreviation, based on context sentence Build the first term vector.
It specifically, can be according to the second enterprise after matching second enterprise's abbreviation in the word segmentation result of public sentiment text Referred to as, the context sentence for including second enterprise's abbreviation is obtained from public sentiment text, and based on context sentence builds the first word Vector.
For example, can be segmented to the context sentence comprising second enterprise's abbreviation in public sentiment text, reject stop words Etc. pretreatment operations, context sentence is converted into word set, and then the word in word set is converted into the first term vector.For example, can To obtain term vector using word2vector.
Step 303, the between the second term vector of the enterprise attributes word of the first term vector and second enterprise's abbreviation is calculated One similarity.
For example, cosine similarity can be utilized to calculate the between the first term vector and the second term vector of enterprise attributes word One similarity, shown in calculation formula such as formula (1).
Wherein, D indicates the first similarity;xiAnd yiI-th yuan in the first term vector and the second term vector is indicated respectively Element, n indicate the dimension of the first term vector and the second term vector;θ indicates the angle between the first term vector and the second term vector.
Herein it should be noted that the acquisition process of enterprise attributes word will provide in subsequent content, to avoid repeating, this Place is not described in detail.
Step 304, if the first similarity has exceeded preset threshold value, by the second enterprise referred to as in public sentiment text The first enterprise referred to as.
Wherein, the threshold value of the first similarity can be preset, and threshold value setting is higher, and the second enterprise is referred to as determined as the The threshold of one enterprise's abbreviation is higher, and the accuracy rate of first enterprise's abbreviation of identification is higher.
In the present embodiment, when the first similarity between the first term vector and the second term vector for calculating gained is higher than default Threshold value when, then by the second enterprise matched referred to as public sentiment text the first enterprise abbreviation.
The recognition methods of the business entity of the present embodiment matches by using the enterprise's abbreviation dictionary library pre-established Two enterprises referred to as, obtain the context sentence for including second enterprise's abbreviation, and based on context sentence is built from public sentiment text First term vector calculates the first similarity between the first term vector and the second term vector of enterprise attributes word, similar by first Second enterprise of the degree beyond preset threshold value as the first enterprise of public sentiment text abbreviation, can referred to as identify from public sentiment text Go out enterprise referred to as, improves the accuracy rate of business entity's identification.
In order to match the second enterprise referred to as using enterprise's abbreviation dictionary library, enterprise's abbreviation dictionary can be first constructed Library specifically can build enterprise's abbreviation dictionary library in terms of two.
On the one hand, the abbreviation of enterprise can be determined according to click logs, and referred to as using determining abbreviation structure enterprise Dictionary library.Fig. 4 is the method flow schematic diagram for building enterprise's abbreviation dictionary tree.As shown in figure 4, in embodiment as shown in Figure 3 On the basis of, before step 301, it can also include the following steps:
Step 401, the click logs of historical search word are obtained.
The click logs of user can be as the form of expression of user's search need, it is also possible to show user to enterprise Therefore the understanding and address of industry can determine the abbreviation of enterprise according to click logs.In the present embodiment, it can draw from search The click logs of historical search word are obtained in the server held up, wherein historical search word is related with enterprise, for example can be enterprise Industry type, enterprise abbreviation etc..
Step 402, first with the uniform resource position mark URL of enterprise there are linking relationship is extracted from click logs Historical search word.
In click logs, it is understood that there may be multiple historical search words correspond to the uniform resource locator of the same enterprise The case where (Uniform Resource Locator, URL), i.e., when being scanned for using different historical search words, Neng Goulian It is connected to the URL of the same enterprise.In the present embodiment, the URL with the same enterprise can be extracted from the click logs of acquisition There are all historical search words of linking relationship, as the first search term.
Step 403, the common prefix specified between the corresponding first historical search words of URL is obtained;Wherein, specified URL is Any one in the URL of all enterprises.
For the URL of any one enterprise, before public between the first historical search word corresponding with the URL can be obtained Sew, that is, obtains the common word for including in the first historical search word.
Step 404, referred to as using the first historical search word and/or common prefix substring as enterprise, it is added to enterprise referred to as In dictionary library.
In the present embodiment, can using the common prefix substring of acquisition and/or the first historical search word as enterprise referred to as, and Enterprise is referred to as added in enterprise's abbreviation dictionary library.For example, when the first historical search word of extraction only there are one when, can will The first historical search word is as enterprise's abbreviation;When the first historical search word of extraction is multiple and there is common prefix substring When, referred to as using common prefix substring as enterprise;When the first historical search word of extraction is multiple and at least one first is gone through When common prefix substring is not present in history search term and other the first historical search words, can by the first historical search word and it is public before Sew substring as enterprise's abbreviation.
In a kind of possible realization method of the embodiment of the present invention, the first historical search word or common prefix can be obtained Weight of the substring at specified URL, screens according to weight pair the first historical search word or common prefix substring, will screen The first historical search word and/or common prefix substring afterwards referred to as, is added in enterprise's abbreviation dictionary library as enterprise.
Specifically, it when obtaining the weight of the first historical search word or common prefix substring at specified URL, can first obtain It takes under conditions of specified URL, clicks the first historical search word or common prefix substring is linked to the first condition of specified URL Probability, and obtain under conditions of clicking the first historical search word or common prefix substring, jump to the second of specified URL Conditional probability, and then according to first condition probability and second condition probability, obtain weight.The calculation formula of weight such as formula (2) It is shown.
W (query | url)=P (query | url) * P (url | query) (2)
Wherein, W (query | url) indicates the weight of the first historical search word or common prefix substring at specified URL;P (query | url) expression first condition probability, P (url | query) indicate second condition probability, first condition probability and Article 2 Part probability can count acquisition.For example, when obtaining first condition probability, can count can link to all of specified URL The total number of historical search word or common prefix substring, and therefrom determine the first historical search word and the first historical search word Common prefix substring number, the ratio (number/total number) of the two is first condition probability.Obtaining, second condition is general When rate, the total number for clicking all URL that the first historical search word or common prefix substring can jump to can be counted, And the number of specified URL is therefrom filtered out, the ratio (number/total number) of the two is second condition probability.
In turn, common prefix substring can be obtained according to weight pair the first historical search word of gained to screen, for example, The the first historical search word or common prefix substring that weight can be not up to preset weight threshold screen out, and retain weight The the first historical search word and/or common prefix substring for reaching weight threshold are added in enterprise's abbreviation dictionary library.
In conclusion the click logs by obtaining historical search word, extract from click logs and are deposited with the URL of enterprise In the first historical search word of linking relationship, the common prefix specified between the corresponding first historical search words of URL is obtained, and will Common prefix and/or the first historical search word are referred to as added in enterprise's abbreviation dictionary library as enterprise, can construct user Enterprise's abbreviation dictionary library of approval lays the foundation for enterprise's referred to as identification.
On the other hand, the font size structure enterprise abbreviation dictionary tree of enterprise can be utilized.As it was noted above, font size is enterprise's name Most important component part in title can be identified for that enterprise itself and is substantially distinguished from other enterprises, to, can in the present embodiment To obtain the font size information of enterprise from known enterprise's full name, referred to as using the font size information of enterprise as enterprise, and it is added to In enterprise's abbreviation dictionary library.
Since font size not necessarily can be by customer acceptance, further, in a kind of possible realization of the embodiment of the present invention In mode, the font size information of acquisition can also be screened.Specifically, the system of enterprise corresponding with font size information can be obtained The click proportion of one Resource Locator URL and URL is filtered all font size information according to proportion is clicked, filters out enterprise Industry characterizes the weaker font size information of ability, referred to as using the font size information retained after filtering as enterprise, is added to enterprise's abbreviation word In allusion quotation library.
It is used as enterprise's referred to as structure enterprise abbreviation dictionary library by obtaining font size information, can recognize that with significant Enterprise referred to as, lays a good foundation for enterprise's referred to as identification.
Herein it should be noted that enterprise's abbreviation dictionary can be built only with the mode described in embodiment as shown in Figure 4 Library can also build enterprise's abbreviation dictionary library by the way of obtaining font size information, but in order to expand enterprise's abbreviation word as far as possible The coverage area in allusion quotation library, it is preferable that the method structure enterprise full name dictionary library combined using two ways is as more as possible to identify Abbreviation.
In order to calculate the first similarity between the first term vector and the second term vector of enterprise attributes word, Ke Yixian Obtain the enterprise attributes word of enterprise's abbreviation and the second term vector of enterprise attributes word.It should be noted that obtaining enterprise attributes The realization process of word and the second term vector can be executed in any time before calculating the first similarity, and the embodiment of the present invention is only To obtain enterprise before matching second enterprise's abbreviation that word segmentation result includes using the enterprise's abbreviation dictionary tree built in advance It is illustrated for industry attribute word and the second term vector.The knowledge for the 4th kind of business entity that Fig. 5 is provided by the embodiment of the present invention The flow diagram of other method.
As shown in figure 5, on the basis of embodiment as shown in Figure 3, can also include the following steps before step 301:
Step 501, referred to as the enterprise in enterprise's abbreviation dictionary library, the corresponding type of business is excavated from enterprise's abbreviation Word.
As it was noted above, enterprise name is usually made of administrative division, font size, industry and organizational form, wherein industry energy It is enough to show the type of business or management style to a certain extent.The case where referred to as unique mark enterprise being unable in order to avoid enterprise, More information can be obtained referred to as to supplement enterprise.For example, the type of business in enterprise name can be excavated to mend Fill enterprise referred to as.
In the present embodiment, referred to as the enterprise in enterprise's abbreviation dictionary library, it can be excavated from enterprise's abbreviation corresponding Type of business word.Specifically, referred to as each enterprise in enterprise's abbreviation dictionary library, it is referred to as corresponding that enterprise can first be obtained Enterprise's full name, then excavate from enterprise's full name the industry of enterprise as type of business word referred to as corresponding with enterprise.
Step 502, referred to as it regard enterprise as seed words, from history public sentiment text, excavation includes the history point of seed words Hit search term.
Referred to as each enterprise in enterprise's abbreviation dictionary library, the enterprise is referred to as regard as seed words, to acquisition Each history public sentiment text is traversed, and search is clicked to excavate the history comprising the seed words from history public sentiment text Word.
Further, in a kind of possible realization method of the embodiment of the present invention, can also to history click search term into Row screening therefrom filters out and clicks the higher history click search term of proportion, so that the enterprise attributes word subsequently obtained and enterprise The degree of correlation between abbreviation is higher.
Step 503, the word in search term in addition to seed words is clicked using history, generates third term vector.
In the present embodiment, after excavating history click search term, search term can be clicked to history and carry out rejecting punctuate Symbol, the removal pretreatment operations such as stop words, and pretreated history is clicked in search term, the word in addition to seed words into Row conversion, transforms into term vector by word using word2vector, obtains third term vector.
Step 504, third term vector is calculated and previously according to second between the 4th term vector of industry type word structure Similarity.
In the present embodiment, it can be directed to enterprise in advance and referred to as obtain matched industry type word, and utilize word2vector Industry type word is transformed into term vector, obtains the 4th term vector, in turn, is calculated between third term vector and the 4th term vector The second similarity.For example, the cosine similarity that can be calculated between third term vector and the 4th term vector is similar as second Degree.
Step 505, industry type word of the corresponding industry type word of highest second similarity as enterprise's abbreviation is chosen.
Search term is clicked for the history comprising any one seed words, utilizes the third word of the word in addition to the seed words Vector and the 4th term vector that builds in advance, are calculated after at least one second similarity, can relatively gained the The size of two similarities, using the corresponding industry type word of maximum second similarity as the industry type word of enterprise's abbreviation.
Step 506, using type of business word and industry type word, the enterprise attributes word of enterprise's abbreviation is formed.
Step 507, according to enterprise attributes morphology at the second term vector.
In the present embodiment, it is determined that after the industry type word of enterprise's abbreviation, type of business word and industry class can be utilized Type morphology at enterprise's abbreviation enterprise attributes word.In turn, the second term vector can be obtained according to enterprise attributes word.For example, can be with Using word2vector according to enterprise attributes morphology at the second term vector.
Originally the recognition methods of the business entity implemented, by excavating type of business word from enterprise's abbreviation, and from going through History of the excavation comprising enterprise's abbreviation clicks search term in history public sentiment text, and is clicked using history and remove enterprise in search term referred to as Except word generate third term vector, calculate third term vector and previously according to industry type word structure the 4th term vector between The second similarity, filter out the highest industry type word of the second similarity, according to industry type word and type of business morphology at The enterprise attributes word of enterprise's abbreviation, according to enterprise attributes morphology at the second term vector, can obtain has certain relationship with enterprise Enterprise attributes word, and then enterprise is identified from public sentiment text referred to as according to enterprise attributes word, enterprise can be improved and referred to as known Other accuracy rate.
In order to realize that above-described embodiment, the present invention also propose a kind of identification device of business entity.
The structural schematic diagram of the identification device for the first business entity that Fig. 6 is provided by the embodiment of the present invention.
As shown in fig. 6, the identification device 60 of the business entity includes:Word-dividing mode 610, full name identification module 620, abbreviation Identification module 630 and determining module 640.Wherein,
Word-dividing mode 610 carries out word segmentation processing for acquiring public sentiment text, obtains word segmentation result.
Full name identification module 620, for carrying out enterprise's full name identification according to word segmentation result, obtaining public sentiment text includes First enterprise's full name.
Abbreviation identification module 630 referred to as identifies that obtaining public sentiment text includes for carrying out enterprise according to word segmentation result First enterprise is referred to as.
Determining module 640, first enterprise's full name and the first enterprise for will identify that referred to as are determined as business entity Title.
Further, in a kind of possible realization method of the embodiment of the present invention, as shown in fig. 7, implementing as shown in Figure 6 On the basis of example, full name identification module 620 includes:
Unit 621 is marked, for according to the semanteme segmented in word segmentation result, being labeled to the participle in word segmentation result.
Recognition unit 622, the markup information for that will segment and segment, is successively inputted to carry out in conditional random field models Enterprise's full name identification, the recognition result segmented;Wherein, recognition result includes the first information and the second information, the first letter For breath for indicating that participle is the word in first enterprise's full name, the second information is position letter of the participle in first enterprise's full name Breath.
In a kind of possible realization method of the embodiment of the present invention, the sight of preset length is provided in conditional random field models Window is examined, at this point, recognition unit 622 is specifically used for during conditional random field models identify participle, it is true by observation window Fixed there are the first participles of context relation with participle, obtain the markup information of the first participle and the first participle;Based on participle and The markup information and the first participle of participle and the markup information of the first participle carry out enterprise's full name identification, the identification segmented As a result.
Extraction unit 623, for when first information instruction includes first enterprise's full name, then according to location information, from dividing First enterprise's full name is extracted in word.
By being labeled to participle according to the semantic of participle, the markup information of participle and participle is continuously input to condition Enterprise's full name identification is carried out in random field models, the recognition result for including the first information and the second information is obtained, in the first information When instruction includes first enterprise's full name, first enterprise's full name, energy are extracted from participle according to the location information in the second information Enough accurately identify enterprise's full name in public sentiment text.
In a kind of possible realization method of the embodiment of the present invention, as shown in fig. 7, the identification device 60 of the business entity is also May include:
Update module 650 is complete using the enterprise built in advance for after extracting first enterprise's full name in participle Claim dictionary tree, inquire first enterprise's full name in enterprise's full name dictionary library, and does not inquire the from enterprise's full name dictionary library When one enterprise's full name, then first enterprise's full name is increased in enterprise's full name dictionary library, enterprise's full name dictionary library is updated; And using first enterprise's full name, enterprise's full name dictionary tree is updated.
By using the enterprise's full name dictionary tree built in advance, first enterprise's full name is inquired in enterprise's full name dictionary library, And first enterprise's full name is added in enterprise's full name dictionary library when not inquiring, enterprise's full name dictionary library is updated, And enterprise's full name dictionary tree is updated using first enterprise's full name, can realize enterprise's full name dictionary tree from iteration, from Manual intervention is reduced in Dong Hua expanded enterprises entity library, improves the recall rate of business entity's identification.
In a kind of possible realization method of the embodiment of the present invention, as shown in figure 8, on the basis of embodiment as shown in Figure 6 On, abbreviation identification module 630 includes:
Abbreviation matching unit 631, for using the enterprise's abbreviation dictionary library built in advance, matching word segmentation result to include The second enterprise referred to as;Wherein, the second enterprise is referred to as the enterprise being present in the enterprise's abbreviation dictionary library built in advance referred to as, Enterprise's abbreviation dictionary library is determined according to the click logs of historical search word and/or the font size information of enterprise.
Acquiring unit 632, for from public sentiment text, acquisition to include the context sentence of second enterprise's abbreviation, according to upper Hereafter sentence builds the first term vector.
Computing unit 633, the second term vector of the enterprise attributes word for calculating the first term vector and second enterprise's abbreviation Between the first similarity.
Determination unit 634, for when the first similarity has exceeded preset threshold value, then referred to as regarding the second enterprise as carriage The first enterprise in feelings text is referred to as.
The second enterprise is matched by using the enterprise's abbreviation dictionary library pre-established referred to as, and packet is obtained from public sentiment text Context sentence containing second enterprise's abbreviation, and based on context sentence builds the first term vector, calculates the first term vector and enterprise The first similarity between second term vector of industry attribute word, referred to as by second enterprise of first similarity beyond preset threshold value The first enterprise as public sentiment text referred to as, can identify enterprise referred to as from public sentiment text, improve business entity's identification Accuracy rate.
Further, in a kind of possible realization method of the embodiment of the present invention, as shown in figure 9, implementing as shown in Figure 8 On the basis of example, the identification device 60 of the business entity further includes:
Enterprise's abbreviation dictionary library builds module 601, and the click logs for obtaining historical search word are carried from click logs There are the first historical search words of linking relationship for the uniform resource position mark URL of taking-up and enterprise;It obtains and specifies URL corresponding the Common prefix substring between one historical search word;Wherein, any one in the URL that URL is all enterprises is specified;By first Historical search word and/or common prefix substring are added to as enterprise's abbreviation in enterprise's abbreviation dictionary library.
Specifically, enterprise's abbreviation dictionary library structure module 601 is made by the first historical search word and/or common prefix substring When being referred to as added in enterprise's abbreviation dictionary library for enterprise, the first historical search word can be obtained or common prefix substring is referring to Determine the weight under URL;It is screened according to weight pair the first historical search word or common prefix substring;By first after screening Historical search word and/or common prefix substring are added to as enterprise's abbreviation in enterprise's abbreviation dictionary library.
As a kind of possible realization method, enterprise's abbreviation dictionary library structure module 601 is obtaining the first historical search word Or when weight of the common prefix substring at specified URL, can obtain under conditions of specified URL, it clicks the first history and searches Rope word or common prefix substring are linked to the first condition probability of specified URL, and obtain and clicking the first historical search word Or under conditions of common prefix substring, the second condition probability of specified URL is jumped to;According to first condition probability and Article 2 Part probability, obtains weight.
By obtaining the click logs of historical search word, is extracted from click logs and link pass with the URL of enterprise presence First historical search word of system, obtains the common prefix specified between URL corresponding first historical search words, and by common prefix And/or first historical search word as enterprise referred to as be added in enterprise's abbreviation dictionary library, the enterprise of customer acceptance can be constructed Industry abbreviation dictionary library lays the foundation for enterprise's referred to as identification.
In a kind of possible realization method of the embodiment of the present invention, enterprise's abbreviation dictionary library structure module 601 can also obtain The font size information for taking enterprise referred to as using font size information as enterprise is added in enterprise's abbreviation dictionary library.
Since font size not necessarily can be by customer acceptance, further, in a kind of possible realization of the embodiment of the present invention In mode, enterprise's abbreviation dictionary library structure module 601 can also screen the font size information of acquisition.Specifically, enterprise's letter Claim dictionary library structure module 601 that can obtain the uniform resource position mark URL of enterprise corresponding with font size information and the point of URL Proportion is hit, all font size information is filtered according to proportion is clicked, filters out the weaker font size information of enterprise's characterization ability, it will The font size information retained after filtering referred to as, is added in enterprise's abbreviation dictionary library as enterprise.
It is used as enterprise's referred to as structure enterprise abbreviation dictionary library by obtaining font size information, can recognize that with significant Enterprise referred to as, lays a good foundation for enterprise's referred to as identification.
Enterprise attributes word acquisition module 602, enterprise's abbreviation for being directed in enterprise's abbreviation dictionary library, from enterprise's abbreviation Excavate corresponding type of business word;Referred to as it regard enterprise as seed words, from history public sentiment text, excavation includes going through for seed words History clicks search term;The word in search term in addition to seed words is clicked using history, generates third term vector;Calculate third word to The second similarity between amount and the 4th term vector built previously according to industry type word;Choose highest second similarity pair Industry type word of the industry type word answered as enterprise's abbreviation;Using type of business word and industry type word, enterprise's letter is formed The enterprise attributes word of title;According to enterprise attributes morphology at the second term vector.
By the excavation type of business word from enterprise's abbreviation, and excavated comprising enterprise's abbreviation from history public sentiment text History clicks search term, and clicks word in addition to enterprise's abbreviation in search term using history and generate third term vector, calculates the The second similarity between three term vectors and the 4th term vector built previously according to industry type word, filters out the second similarity Highest industry type word, according to industry type word and type of business morphology at the enterprise attributes word of enterprise's abbreviation, according to enterprise Attribute morphology can obtain the enterprise attributes word for having certain relationship with enterprise, and then according to enterprise attributes at the second term vector Word identifies enterprise referred to as from public sentiment text, can improve the accuracy rate of enterprise's referred to as identification.
It should be noted that the explanation of the aforementioned recognition methods embodiment to business entity is also applied for the embodiment Business entity identification device, realization principle is similar, and details are not described herein again.
The identification device of the business entity of the present embodiment carries out word segmentation processing by the public sentiment text to acquisition and is segmented As a result, carrying out enterprise's full name identification according to word segmentation result, first enterprise's full name that public sentiment text includes is obtained, is tied according to participle Fruit carries out enterprise and referred to as identifies, obtains the first enterprise abbreviation that public sentiment text includes, and then the first enterprise that will identify that is complete Claim the title for being referred to as determined as business entity with the first enterprise.Thereby, it is possible to extract business entity from network public-opinion text, Improve the accuracy rate of business entity's identification.
In order to realize that above-described embodiment, the present invention also propose a kind of computer equipment, including:Processor and memory;Its In, processor runs journey corresponding with executable program code by reading the executable program code stored in memory Sequence, for realizing the recognition methods of business entity as in the foregoing embodiment.
Figure 10 is the structural schematic diagram for the computer equipment that one embodiment of the invention proposes, is shown suitable for being used for realizing this The block diagram of the exemplary computer device 90 of invention embodiment.The computer equipment 90 that Figure 10 is shown is only an example, Any restrictions should not be brought to the function and use scope of the embodiment of the present invention.
As shown in Figure 10, computer equipment 90 is showed in the form of universal computing device.The component of computer equipment 90 can To include but not limited to:One or more processor or processing unit 906, system storage 910 connect different system group The bus 908 of part (including system storage 910 and processing unit 906).
Bus 908 indicates one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures.It lifts For example, these architectures include but not limited to industry standard architecture (Industry Standard Architecture;Hereinafter referred to as:ISA) bus, microchannel architecture (Micro Channel Architecture;Below Referred to as:MAC) bus, enhanced isa bus, Video Electronics Standards Association (Video Electronics Standards Association;Hereinafter referred to as:VESA) local bus and peripheral component interconnection (Peripheral Component Interconnection;Hereinafter referred to as:PCI) bus.
Computer equipment 90 typically comprises a variety of computer system readable media.These media can be it is any can be by The usable medium that computer equipment 90 accesses, including volatile and non-volatile media, moveable and immovable medium.
System storage 910 may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (Random Access Memory;Hereinafter referred to as:RAM) 911 and/or cache memory 912.Computer is set Standby 90 may further include other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only As an example, storage system 913 can be used for reading and writing immovable, non-volatile magnetic media (Figure 10 do not show, commonly referred to as " hard disk drive ").Although being not shown in Figure 10, can provide for reading removable non-volatile magnetic disk (such as " floppy disk ") The disc driver write, and to removable anonvolatile optical disk (such as:Compact disc read-only memory (Compact Disc Read Only Memory;Hereinafter referred to as:CD-ROM), digital multi CD-ROM (Digital Video Disc Read Only Memory;Hereinafter referred to as:DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving Device can be connected by one or more data media interfaces with bus 908.System storage 910 may include at least one There is one group of (for example, at least one) program module, these program modules to be configured to perform this for program product, the program product Apply for the function of each embodiment.
Computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated, Wherein carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission for by instruction execution system, device either device use or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with one or more programming languages or combinations thereof come write for execute the application operation computer Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partly executes or executed on a remote computer or server completely on the remote computer on the user computer.
Program/utility 914 with one group of (at least one) program module 9140 can be stored in such as system and deposit In reservoir 910, such program module 9140 includes but not limited to operating system, one or more application program, Qi Tacheng Sequence module and program data may include the realization of network environment in each or certain combination in these examples.Program Module 9140 usually executes function and/or method in embodiment described in the invention.
Computer equipment 90 can also be with one or more external equipments 10 (such as keyboard, sensing equipment, display 100 Deng) communication, can also be enabled a user to one or more equipment interact with the computer equipment 90 communicate, and/or with make The computer equipment 90 any equipment (such as network interface card, the modulatedemodulate that can be communicated with one or more of the other computing device Adjust device etc.) communication.This communication can be carried out by input/output (I/O) interface 902.Also, computer equipment 90 may be used also To pass through network adapter 900 and one or more network (such as LAN (Local Area Network;Hereinafter referred to as: LAN), wide area network (Wide Area Network;Hereinafter referred to as:WAN) and/or public network, for example, internet) communication.Such as figure Shown in 10, network adapter 900 is communicated by bus 908 with other modules of computer equipment 90.Although should be understood that Figure 10 In be not shown, can in conjunction with computer equipment 90 use other hardware and/or software module, including but not limited to:Microcode is set Standby driver, redundant processing unit, external disk drive array, RAID system, tape drive and data backup storage system System etc..
Processing unit 906 is stored in program in system storage 910 by operation, to perform various functions using with And data processing, such as realize the recognition methods of the business entity referred in previous embodiment.
In order to realize that above-described embodiment, the present invention also propose a kind of computer program product, when the computer program produces The recognition methods of business entity as in the foregoing embodiment is realized in instruction in product when being executed by processor.
In order to realize that above-described embodiment, the present invention also propose a kind of non-transitorycomputer readable storage medium, deposit thereon Computer program is contained, which realizes the recognition methods of business entity as in the foregoing embodiment when being executed by processor.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or when necessary with it His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage Or firmware is realized.Such as, if realized in another embodiment with hardware, following skill well known in the art can be used Any one of art or their combination are realized:With for data-signal realize logic function logic gates from Logic circuit is dissipated, the application-specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile Journey gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the present invention System, those skilled in the art can be changed above-described embodiment, change, replace and become within the scope of the invention Type.

Claims (16)

1. a kind of recognition methods of business entity, which is characterized in that including:
It acquires public sentiment text and carries out word segmentation processing, obtain word segmentation result;
Enterprise's full name identification is carried out according to the word segmentation result, obtains first enterprise's full name that the public sentiment text includes;
Enterprise is carried out according to the word segmentation result referred to as to identify, obtains the first enterprise abbreviation that the public sentiment text includes;
First enterprise full name that will identify that and first enterprise are referred to as determined as the title of business entity.
2. according to the method described in claim 1, it is characterized in that, described carry out enterprise full name identification to the word segmentation result, First enterprise's full name that the public sentiment text includes is obtained, including:
According to the semanteme segmented in the word segmentation result, the participle in the word segmentation result is labeled;
By the markup information of the participle and the participle, it is successively inputted to carry out enterprise's full name knowledge in conditional random field models Not, the recognition result of the participle is obtained;Wherein, the recognition result includes the first information and the second information, and described first Information is used to indicate that the participle to be the word in first enterprise full name, and second information is the participle described the Location information in one enterprise's full name;
If the first information instruction includes first enterprise full name, according to the positional information, from the participle Extract first enterprise full name.
3. according to the method described in claim 2, it is characterized in that, being provided with preset length in the conditional random field models Observation window, the markup information by the participle and the participle are successively inputted to carry out enterprise in conditional random field models Full name identifies, obtains the recognition result of the participle, including:
During the conditional random field models identify the participle, is determined by the observation window and deposited with the participle In the first participle of context relation, the markup information of the first participle and the first participle is obtained;
Based on the participle and the markup information of the participle and the markup information of the first participle and the first participle Enterprise's full name identification is carried out, the recognition result of the participle is obtained.
4. according to the method described in claim 2, it is characterized in that, described to extract first enterprise from the participle complete After referred to as, further include:
Using the enterprise's full name dictionary tree built in advance, first enterprise full name is inquired in enterprise's full name dictionary library;
If not inquiring first enterprise full name in enterprise's full name dictionary library, first enterprise full name is increased Into enterprise's full name dictionary library, enterprise's full name dictionary library is updated.
5. according to the method described in claim 4, it is characterized in that, described be updated it to enterprise's full name dictionary library Afterwards, further include:
Using first enterprise full name, enterprise's full name dictionary tree is updated.
6. according to the method described in claim 1, it is characterized in that, it is described to the word segmentation result carry out enterprise referred to as identify, The first enterprise abbreviation that the public sentiment text includes is obtained, including:
Using the enterprise's abbreviation dictionary library built in advance, the second enterprise abbreviation that the word segmentation result includes is matched;Wherein, Second enterprise is referred to as the enterprise being present in the enterprise's abbreviation dictionary library built in advance referred to as, enterprise's abbreviation dictionary Library is determined according to the click logs of historical search word and/or the font size information of enterprise;
From the public sentiment text, acquisition includes the context sentence of second enterprise abbreviation, and based on context sentence is built First term vector;
Calculate the first phase between first term vector and the second term vector of the enterprise attributes word of second enterprise abbreviation Like degree;
If first similarity has exceeded preset threshold value, by second enterprise referred to as in the public sentiment text First enterprise referred to as.
7. according to the method described in claim 6, it is characterized in that, described using enterprise's abbreviation dictionary tree for building in advance, Before allotting second enterprise's abbreviation that the word segmentation result includes, further include:
Obtain the click logs of historical search word;
The first historical search there are linking relationship with the uniform resource position mark URL of enterprise is extracted from the click logs Word;
Obtain the common prefix substring specified between the corresponding first historical search words of URL;Wherein, the specified URL is all Any one in the URL of enterprise;
Referred to as using the first historical search word and/or the common prefix substring as enterprise, it is added to the enterprise referred to as In dictionary library.
8. the method according to the description of claim 7 is characterized in that described by the first historical search word and/or the public affairs Prefix substring is added to as enterprise's abbreviation in enterprise's abbreviation dictionary library altogether, including:
Obtain the weight of the first historical search word or the common prefix substring at the specified URL;
The first historical search word or the common prefix substring are screened according to the weight;
Using after screening the first historical search word and/or the common prefix substring as enterprise referred to as, be added to described In enterprise's abbreviation dictionary library.
9. according to the method described in claim 8, it is characterized in that, the acquisition the first historical search word or the public affairs Weight of the prefix substring at the specified URL altogether, including:
It obtains under conditions of the specified URL, clicks the first historical search word or common prefix substring link To the first condition probability of the specified URL;
It obtains under conditions of clicking the first historical search word or the common prefix substring, jumps to described specified The second condition probability of URL;
According to the first condition probability and the second condition probability, the weight is obtained.
10. method according to claim 7 or 8, which is characterized in that further include:
The font size information for obtaining enterprise referred to as using the font size information as enterprise is added in enterprise's abbreviation dictionary library.
11. according to the method described in claim 10, it is characterized in that, it is described using the font size information as enterprise referred to as, add It is added to enterprise's abbreviation dictionary library, including:
The click proportion of the URL and the URL of enterprise corresponding with the font size information are obtained,;
All font size information is filtered according to the click proportion, using the font size information retained after filtering as enterprise Industry referred to as, is added in enterprise's abbreviation dictionary library.
12. according to the method described in claim 6, it is characterized in that, described using enterprise's abbreviation dictionary tree for building in advance, Before allotting second enterprise's abbreviation that the word segmentation result includes, further include:
Referred to as the enterprise in enterprise's abbreviation dictionary library, corresponding type of business word is excavated from enterprise's abbreviation;
Referred to as it regard the enterprise as seed words, from history public sentiment text, excavation includes that the history click of the seed words is searched Rope word;
The word in search term in addition to the seed words is clicked using the history, generates third term vector;
Calculate the second similarity between the third term vector and the 4th term vector built previously according to industry type word;
Choose industry type word of the corresponding industry type word of highest second similarity as enterprise's abbreviation;
Using the type of business word and the industry type word, the enterprise attributes word of enterprise's abbreviation is formed;
According to the enterprise attributes morphology at second term vector.
13. a kind of identification device of business entity, which is characterized in that including:
Word-dividing mode carries out word segmentation processing for acquiring public sentiment text, obtains word segmentation result;
Full name identification module, for carrying out enterprise's full name identification according to the word segmentation result, obtaining the public sentiment text includes First enterprise's full name;
Abbreviation identification module referred to as identifies that obtaining the public sentiment text includes for carrying out enterprise according to the word segmentation result The first enterprise referred to as;
Determining module, first enterprise full name and first enterprise for will identify that referred to as are determined as business entity Title.
14. a kind of computer equipment, which is characterized in that including processor and memory;
Wherein, the processor can perform to run with described by reading the executable program code stored in the memory The corresponding program of program code, for realizing the recognition methods of the business entity as described in any one of claim 1-12.
15. a kind of computer program product, which is characterized in that when the instruction in the computer program product is executed by processor The recognition methods of business entities of the Shi Shixian as described in any one of claim 1-12.
16. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program The recognition methods of the business entity as described in any one of claim 1-12 is realized when being executed by processor.
CN201810121667.5A 2018-02-07 2018-02-07 Enterprise entity identification method and device, computer equipment and storage medium Active CN108460014B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810121667.5A CN108460014B (en) 2018-02-07 2018-02-07 Enterprise entity identification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810121667.5A CN108460014B (en) 2018-02-07 2018-02-07 Enterprise entity identification method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108460014A true CN108460014A (en) 2018-08-28
CN108460014B CN108460014B (en) 2022-02-25

Family

ID=63239855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810121667.5A Active CN108460014B (en) 2018-02-07 2018-02-07 Enterprise entity identification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108460014B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635276A (en) * 2018-11-12 2019-04-16 厦门市美亚柏科信息股份有限公司 A kind of information matching method and terminal
CN109726398A (en) * 2018-12-27 2019-05-07 北京奇安信科技有限公司 A kind of Entity recognition and determined property method, system, equipment and medium
CN109800332A (en) * 2018-12-04 2019-05-24 北京明略软件系统有限公司 Method, apparatus, computer storage medium and the terminal of processing field name
CN110188357A (en) * 2019-05-31 2019-08-30 阿里巴巴集团控股有限公司 The industry recognition methods of object and device
CN110381115A (en) * 2019-06-14 2019-10-25 平安科技(深圳)有限公司 Information-pushing method, device, computer readable storage medium and computer equipment
CN110728150A (en) * 2019-10-08 2020-01-24 支付宝(杭州)信息技术有限公司 Named entity screening method, device, equipment and readable medium
CN110738055A (en) * 2019-10-23 2020-01-31 北京字节跳动网络技术有限公司 Text entity identification method, text entity identification equipment and storage medium
CN111104791A (en) * 2019-11-14 2020-05-05 北京金堤科技有限公司 Industry information acquisition method and apparatus, electronic device and medium
CN111177391A (en) * 2019-12-31 2020-05-19 北京明略软件系统有限公司 Method and device for acquiring social public opinion volume and computer-readable storage medium
CN111191103A (en) * 2019-12-30 2020-05-22 河南拓普计算机网络工程有限公司 Method, device and storage medium for identifying and analyzing enterprise subject information from internet
CN111339319A (en) * 2020-03-02 2020-06-26 北京百度网讯科技有限公司 Disambiguation method and device for enterprise name, electronic equipment and storage medium
CN111353308A (en) * 2018-12-20 2020-06-30 北京深知无限人工智能研究院有限公司 Named entity recognition method, device, server and storage medium
WO2020133291A1 (en) * 2018-12-28 2020-07-02 深圳市优必选科技有限公司 Text entity recognition method and apparatus, computer device, and storage medium
CN111597304A (en) * 2020-05-15 2020-08-28 上海财经大学 Secondary matching method for accurately identifying Chinese enterprise name entity
CN111651987A (en) * 2020-05-18 2020-09-11 北京金堤科技有限公司 Identity distinguishing method and device, computer readable storage medium and electronic equipment
CN111814479A (en) * 2020-07-09 2020-10-23 上海明略人工智能(集团)有限公司 Enterprise short form generation and model training method and device
CN111899090A (en) * 2020-07-14 2020-11-06 苏宁金融科技(南京)有限公司 Enterprise associated risk early warning method and system
CN112015865A (en) * 2020-08-26 2020-12-01 京北方信息技术股份有限公司 Full-name matching search method, device and equipment based on word segmentation and storage medium
CN112613299A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Method and device for constructing enterprise synonym library and electronic equipment
CN113065343A (en) * 2021-03-25 2021-07-02 天津大学 Enterprise research and development resource information modeling method based on semantics
CN113177412A (en) * 2021-04-05 2021-07-27 北京智慧星光信息技术有限公司 Named entity identification method and system based on bert, electronic equipment and storage medium
CN113190682A (en) * 2021-06-30 2021-07-30 平安科技(深圳)有限公司 Method and device for acquiring event influence degree based on tree model and computer equipment
WO2021159757A1 (en) * 2020-09-09 2021-08-19 平安科技(深圳)有限公司 Method and apparatus for entity recognition in abbreviated data based on model, and computer
CN113626600A (en) * 2021-08-18 2021-11-09 企查查科技有限公司 Text processing method and device, computer equipment and storage medium
CN115438145A (en) * 2022-04-13 2022-12-06 盐城金堤科技有限公司 Method and device for adding enterprise detail internal chain
CN115618824A (en) * 2022-10-31 2023-01-17 上海苍阙信息科技有限公司 Data set labeling method and device, electronic equipment and medium
CN116522911A (en) * 2023-06-29 2023-08-01 恒生电子股份有限公司 Entity alignment method and device
CN116976320A (en) * 2023-09-22 2023-10-31 湖南财信数字科技有限公司 Mechanism short extraction method, device, computer equipment and storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060069693A1 (en) * 2004-09-14 2006-03-30 International Business Machines Corporation System and method for using demographic organization and segmentation to manage large scale projects
CN104731771A (en) * 2015-03-27 2015-06-24 大连理工大学 Term vector-based abbreviation ambiguity elimination system and method
CN105574092A (en) * 2015-12-10 2016-05-11 百度在线网络技术(北京)有限公司 Information mining method and device
CN105574111A (en) * 2015-12-10 2016-05-11 天津海量信息技术有限公司 Enterprise entity authentication method based on enterprise attribute library
CN105630884A (en) * 2015-12-18 2016-06-01 中国科学院信息工程研究所 Geographic position discovery method for microblog hot event
CN105718586A (en) * 2016-01-26 2016-06-29 中国人民解放军国防科学技术大学 Word division method and device
CN105740353A (en) * 2016-01-26 2016-07-06 中国人民解放军国防科学技术大学 Calculation method and system for relevance degree of individual share and article
CN105975555A (en) * 2016-05-03 2016-09-28 成都数联铭品科技有限公司 Bidirectional recursive neural network-based enterprise abbreviation extraction method
CN105975491A (en) * 2016-04-26 2016-09-28 重庆誉存企业信用管理有限公司 Enterprise news analysis method and system
CN106250524A (en) * 2016-08-04 2016-12-21 浪潮软件集团有限公司 Organization name extraction method and device based on semantic information
US20170060919A1 (en) * 2015-08-31 2017-03-02 Salesforce.Com, Inc. Transforming columns from source files to target files
CN106991085A (en) * 2017-04-01 2017-07-28 中国工商银行股份有限公司 The abbreviation generation method and device of a kind of entity
CN107193959A (en) * 2017-05-24 2017-09-22 南京大学 A kind of business entity's sorting technique towards plain text
CN107357779A (en) * 2017-06-27 2017-11-17 北京神州泰岳软件股份有限公司 A kind of method and device for obtaining organization names
CN107423285A (en) * 2017-06-23 2017-12-01 广州市万隆证券咨询顾问有限公司 A kind of company's abbreviation recognition methods and system based on text rule
CN107463935A (en) * 2016-06-06 2017-12-12 工业和信息化部电信研究院 Application class methods and applications sorter

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060069693A1 (en) * 2004-09-14 2006-03-30 International Business Machines Corporation System and method for using demographic organization and segmentation to manage large scale projects
CN104731771A (en) * 2015-03-27 2015-06-24 大连理工大学 Term vector-based abbreviation ambiguity elimination system and method
US20170060919A1 (en) * 2015-08-31 2017-03-02 Salesforce.Com, Inc. Transforming columns from source files to target files
CN105574092A (en) * 2015-12-10 2016-05-11 百度在线网络技术(北京)有限公司 Information mining method and device
CN105574111A (en) * 2015-12-10 2016-05-11 天津海量信息技术有限公司 Enterprise entity authentication method based on enterprise attribute library
CN105630884A (en) * 2015-12-18 2016-06-01 中国科学院信息工程研究所 Geographic position discovery method for microblog hot event
CN105718586A (en) * 2016-01-26 2016-06-29 中国人民解放军国防科学技术大学 Word division method and device
CN105740353A (en) * 2016-01-26 2016-07-06 中国人民解放军国防科学技术大学 Calculation method and system for relevance degree of individual share and article
CN105975491A (en) * 2016-04-26 2016-09-28 重庆誉存企业信用管理有限公司 Enterprise news analysis method and system
CN105975555A (en) * 2016-05-03 2016-09-28 成都数联铭品科技有限公司 Bidirectional recursive neural network-based enterprise abbreviation extraction method
CN107463935A (en) * 2016-06-06 2017-12-12 工业和信息化部电信研究院 Application class methods and applications sorter
CN106250524A (en) * 2016-08-04 2016-12-21 浪潮软件集团有限公司 Organization name extraction method and device based on semantic information
CN106991085A (en) * 2017-04-01 2017-07-28 中国工商银行股份有限公司 The abbreviation generation method and device of a kind of entity
CN107193959A (en) * 2017-05-24 2017-09-22 南京大学 A kind of business entity's sorting technique towards plain text
CN107423285A (en) * 2017-06-23 2017-12-01 广州市万隆证券咨询顾问有限公司 A kind of company's abbreviation recognition methods and system based on text rule
CN107357779A (en) * 2017-06-27 2017-11-17 北京神州泰岳软件股份有限公司 A kind of method and device for obtaining organization names

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
SEKINE SATOSHI 等: "Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy", 《LREC》 *
TOMAS MIKOLOV 等: "Context dependent recurrent neural network language model", 《2012 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP》 *
伍大勇: "搜索引擎中命名实体查询处理相关技术研究", 《中国博士学位论文全文数据库 信息科技辑》 *
史杰: "基于语义的全文检索优化和改进", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
孙丽萍 等: "基于构成模式和条件随机场的企业简称预测", 《计算机应用》 *

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635276A (en) * 2018-11-12 2019-04-16 厦门市美亚柏科信息股份有限公司 A kind of information matching method and terminal
CN109800332A (en) * 2018-12-04 2019-05-24 北京明略软件系统有限公司 Method, apparatus, computer storage medium and the terminal of processing field name
CN111353308A (en) * 2018-12-20 2020-06-30 北京深知无限人工智能研究院有限公司 Named entity recognition method, device, server and storage medium
CN109726398A (en) * 2018-12-27 2019-05-07 北京奇安信科技有限公司 A kind of Entity recognition and determined property method, system, equipment and medium
CN109726398B (en) * 2018-12-27 2023-07-07 奇安信科技集团股份有限公司 Entity identification and attribute judgment method, system, equipment and medium
WO2020133291A1 (en) * 2018-12-28 2020-07-02 深圳市优必选科技有限公司 Text entity recognition method and apparatus, computer device, and storage medium
CN110188357A (en) * 2019-05-31 2019-08-30 阿里巴巴集团控股有限公司 The industry recognition methods of object and device
CN110188357B (en) * 2019-05-31 2023-06-20 创新先进技术有限公司 Industry identification method and device for objects
CN110381115A (en) * 2019-06-14 2019-10-25 平安科技(深圳)有限公司 Information-pushing method, device, computer readable storage medium and computer equipment
CN110381115B (en) * 2019-06-14 2022-03-11 平安科技(深圳)有限公司 Information pushing method and device, computer readable storage medium and computer equipment
CN110728150B (en) * 2019-10-08 2023-06-20 支付宝(杭州)信息技术有限公司 Named entity screening method, named entity screening device, named entity screening equipment and readable medium
CN110728150A (en) * 2019-10-08 2020-01-24 支付宝(杭州)信息技术有限公司 Named entity screening method, device, equipment and readable medium
CN110738055A (en) * 2019-10-23 2020-01-31 北京字节跳动网络技术有限公司 Text entity identification method, text entity identification equipment and storage medium
CN111104791A (en) * 2019-11-14 2020-05-05 北京金堤科技有限公司 Industry information acquisition method and apparatus, electronic device and medium
CN111104791B (en) * 2019-11-14 2024-02-20 北京金堤科技有限公司 Industry information acquisition method and device, electronic equipment and medium
CN111191103A (en) * 2019-12-30 2020-05-22 河南拓普计算机网络工程有限公司 Method, device and storage medium for identifying and analyzing enterprise subject information from internet
CN111191103B (en) * 2019-12-30 2021-08-24 河南拓普计算机网络工程有限公司 Method, device and storage medium for identifying and analyzing enterprise subject information from internet
CN111177391A (en) * 2019-12-31 2020-05-19 北京明略软件系统有限公司 Method and device for acquiring social public opinion volume and computer-readable storage medium
CN111177391B (en) * 2019-12-31 2023-08-08 北京明略软件系统有限公司 Method and device for acquiring social public opinion volume and computer readable storage medium
CN111339319A (en) * 2020-03-02 2020-06-26 北京百度网讯科技有限公司 Disambiguation method and device for enterprise name, electronic equipment and storage medium
CN111339319B (en) * 2020-03-02 2023-08-04 北京百度网讯科技有限公司 Enterprise name disambiguation method and device, electronic equipment and storage medium
CN111597304B (en) * 2020-05-15 2023-04-07 上海财经大学 Secondary matching method for accurately identifying Chinese enterprise name entity
CN111597304A (en) * 2020-05-15 2020-08-28 上海财经大学 Secondary matching method for accurately identifying Chinese enterprise name entity
CN111651987A (en) * 2020-05-18 2020-09-11 北京金堤科技有限公司 Identity distinguishing method and device, computer readable storage medium and electronic equipment
CN111651987B (en) * 2020-05-18 2023-10-20 北京金堤科技有限公司 Identity discrimination method and device, computer readable storage medium and electronic equipment
CN111814479B (en) * 2020-07-09 2023-08-25 上海明略人工智能(集团)有限公司 Method and device for generating enterprise abbreviations and training model thereof
CN111814479A (en) * 2020-07-09 2020-10-23 上海明略人工智能(集团)有限公司 Enterprise short form generation and model training method and device
CN111899090A (en) * 2020-07-14 2020-11-06 苏宁金融科技(南京)有限公司 Enterprise associated risk early warning method and system
CN112015865B (en) * 2020-08-26 2023-09-26 京北方信息技术股份有限公司 Word segmentation-based full scale matching search method, device, equipment and storage medium
CN112015865A (en) * 2020-08-26 2020-12-01 京北方信息技术股份有限公司 Full-name matching search method, device and equipment based on word segmentation and storage medium
WO2021159757A1 (en) * 2020-09-09 2021-08-19 平安科技(深圳)有限公司 Method and apparatus for entity recognition in abbreviated data based on model, and computer
CN112613299A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Method and device for constructing enterprise synonym library and electronic equipment
CN113065343A (en) * 2021-03-25 2021-07-02 天津大学 Enterprise research and development resource information modeling method based on semantics
CN113177412A (en) * 2021-04-05 2021-07-27 北京智慧星光信息技术有限公司 Named entity identification method and system based on bert, electronic equipment and storage medium
CN113190682A (en) * 2021-06-30 2021-07-30 平安科技(深圳)有限公司 Method and device for acquiring event influence degree based on tree model and computer equipment
CN113190682B (en) * 2021-06-30 2021-09-28 平安科技(深圳)有限公司 Method and device for acquiring event influence degree based on tree model and computer equipment
CN113626600A (en) * 2021-08-18 2021-11-09 企查查科技有限公司 Text processing method and device, computer equipment and storage medium
CN113626600B (en) * 2021-08-18 2024-03-19 企查查科技股份有限公司 Text processing method, device, computer equipment and storage medium
CN115438145A (en) * 2022-04-13 2022-12-06 盐城金堤科技有限公司 Method and device for adding enterprise detail internal chain
CN115438145B (en) * 2022-04-13 2024-05-14 盐城天眼察微科技有限公司 Method and device for adding enterprise detail inner links
CN115618824A (en) * 2022-10-31 2023-01-17 上海苍阙信息科技有限公司 Data set labeling method and device, electronic equipment and medium
CN115618824B (en) * 2022-10-31 2023-10-27 上海苍阙信息科技有限公司 Data set labeling method and device, electronic equipment and medium
CN116522911B (en) * 2023-06-29 2023-10-03 恒生电子股份有限公司 Entity alignment method and device
CN116522911A (en) * 2023-06-29 2023-08-01 恒生电子股份有限公司 Entity alignment method and device
CN116976320A (en) * 2023-09-22 2023-10-31 湖南财信数字科技有限公司 Mechanism short extraction method, device, computer equipment and storage medium
CN116976320B (en) * 2023-09-22 2023-12-15 湖南财信数字科技有限公司 Mechanism short extraction method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN108460014B (en) 2022-02-25

Similar Documents

Publication Publication Date Title
CN108460014A (en) Recognition methods, device, computer equipment and the storage medium of business entity
Liu et al. A survey of CRF algorithm based knowledge extraction of elementary mathematics in Chinese
US11216504B2 (en) Document recommendation method and device based on semantic tag
CN108170773A (en) Media event method for digging, device, computer equipment and storage medium
CN104252533B (en) Searching method and searcher
US7788099B2 (en) Method and apparatus for query expansion based on multimodal cross-vocabulary mapping
CN107301227A (en) Search information analysis method and device based on artificial intelligence
CN105243129A (en) Commodity property characteristic word clustering method
EP3933657A1 (en) Conference minutes generation method and apparatus, electronic device, and computer-readable storage medium
CN111291566B (en) Event main body recognition method, device and storage medium
CN110046350A (en) Grammatical bloopers recognition methods, device, computer equipment and storage medium
CN111046656B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN108681537A (en) Chinese entity linking method based on neural network and word vector
JP2005526317A (en) Method and system for automatically searching a concept hierarchy from a document corpus
CN109325201A (en) Generation method, device, equipment and the storage medium of entity relationship data
CN109710759A (en) Text dividing method, device, computer equipment and readable storage medium storing program for executing
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
CN113220835B (en) Text information processing method, device, electronic equipment and storage medium
CN109815500A (en) Management method, device, computer equipment and the storage medium of unstructured official document
US20230111911A1 (en) Generation and use of content briefs for network content authoring
CN109710710A (en) The event method for digging and its device of point of interest
CN115248839A (en) Knowledge system-based long text retrieval method and device
Repke et al. Extraction and representation of financial entities from text
CN114141384A (en) Method, apparatus and medium for retrieving medical data
CN111814481A (en) Shopping intention identification method and device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant