CN107247707B - Enterprise association relation information extraction method and device based on completion strategy - Google Patents

Enterprise association relation information extraction method and device based on completion strategy Download PDF

Info

Publication number
CN107247707B
CN107247707B CN201710502217.6A CN201710502217A CN107247707B CN 107247707 B CN107247707 B CN 107247707B CN 201710502217 A CN201710502217 A CN 201710502217A CN 107247707 B CN107247707 B CN 107247707B
Authority
CN
China
Prior art keywords
clause
name
enterprise
target
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710502217.6A
Other languages
Chinese (zh)
Other versions
CN107247707A (en
Inventor
李德彦
席丽娜
晋耀红
Original Assignee
Dingfu Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dingfu Intelligent Technology Co Ltd filed Critical Dingfu Intelligent Technology Co Ltd
Priority to CN201710502217.6A priority Critical patent/CN107247707B/en
Publication of CN107247707A publication Critical patent/CN107247707A/en
Application granted granted Critical
Publication of CN107247707B publication Critical patent/CN107247707B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying

Abstract

The invention provides a completion strategy-based enterprise incidence relation information extraction method and device, and belongs to the technical field of computers. The method comprises the following steps: acquiring a text to be detected; splitting the text to be detected to obtain at least one clause; determining a clause containing a preset associated keyword in at least one clause; determining the sentence pattern type of each clause containing the associated keywords, and determining enterprise association relation information contained in the clause according to the sentence pattern type of the clause, the part of speech of the associated keywords and the positions of the associated keywords in the clause; and if the number of the characters of the target enterprise name without the included enterprise name suffix in the enterprise association relationship information is less than or equal to a first preset numerical value, completing the target enterprise name based on the characters before the enterprise name suffix in the clause, and updating the completed target enterprise name to the enterprise association relationship information. By adopting the method and the device, the efficiency of extracting the incidence relation information of the enterprise can be improved.

Description

Enterprise association relation information extraction method and device based on completion strategy
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for extracting enterprise incidence relation information based on a completion strategy.
Background
In the current financial market, a company manager needs to quickly and accurately master the association relationship between enterprises when trying to run a tent in a designated strategic decision. Through the incidence relation among enterprises, enterprise managers can avoid investment risks as much as possible and make more reasonable decisions.
In the prior art, news reports of economic types are generally searched on a network manually, association relations among enterprises are determined, the association relations among the enterprises become complicated with the increase of the enterprises, a large amount of time is spent on manual searching, and efficiency is low.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a method and an apparatus for extracting enterprise association relationship information based on a completion policy. The technical scheme is as follows:
in a first aspect, a completion strategy-based method for extracting enterprise association relation information is provided, where the method includes:
acquiring a text to be detected;
splitting the text to be detected to obtain at least one clause;
determining a clause containing a preset associated keyword in the at least one clause;
determining a sentence pattern type of each clause containing the associated keywords, and determining enterprise association relation information contained in the clause according to the sentence pattern type of the clause, the part of speech of the associated keywords and the positions of the associated keywords in the clause;
and if the number of characters of the target enterprise name without the included enterprise name suffix in the enterprise association relationship information is less than or equal to a first preset numerical value, completing the target enterprise name based on the characters before the enterprise name suffix in the clause, and updating the completed target enterprise name into the enterprise association relationship information.
Optionally, the completing the target business name based on each character before the business name suffix in the clause includes:
in the clause, if a place name exists before the enterprise name suffix, intercepting a character string from the starting position of the place name to the ending position of the enterprise name suffix, and determining the character string as the complemented target enterprise name.
Optionally, in the clause, if a place name exists before the business name suffix, intercepting a character string from a starting position of the place name to an ending position of the business name suffix, and determining the character string as the complemented target business name includes:
in the clause, if a place name exists before the business name suffix, determining whether the place name is marked by parentheses;
if the place name is not marked by the brackets, intercepting a character string from the starting position of the place name to the ending position of the postfix of the enterprise name, and determining the character string as the complemented target enterprise name; and if the place name is marked by the brackets, determining a plurality of adjacent noun phrases existing before the target enterprise name, and adding the place name and the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form the complemented target enterprise name.
Optionally, the completing the target business name based on each character before the business name suffix in the clause includes:
in the clause, if a plurality of adjacent noun phrases exist before the target enterprise name, adding the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form a complemented target enterprise name; alternatively, the first and second electrodes may be,
in the clause, if a plurality of adjacent noun phrases and verb phrases exist before the target business name, adding the plurality of adjacent noun phrases and verb phrases to the position in the clause in sequence before the target business name to form a completed target business name.
Optionally, the method further includes:
matching any clause which contains the associated keywords and does not determine the enterprise association relation information with the enterprise name contained in a preset enterprise name word bank, wherein the number of characters contained in the clause is larger than a second preset numerical value;
and if at least two enterprise names included in the preset enterprise name word bank are matched, determining enterprise association relation information of the at least two enterprise names based on the part of speech of the associated keyword, the sentence pattern type of any clause and the position of the associated keyword in any clause.
In a second aspect, an apparatus for extracting enterprise association relation information based on a completion policy is provided, where the apparatus includes:
the acquisition module is used for acquiring the text to be detected;
the splitting module is used for splitting the text to be detected to obtain at least one clause;
the determining module is used for determining a clause containing a preset associated keyword in the at least one clause;
the determining module is used for determining the sentence pattern type of each clause containing the associated keywords, and determining enterprise association relation information contained in the clause according to the sentence pattern type of the clause, the part of speech of the associated keywords and the positions of the associated keywords in the clause;
and a completion module, configured to, if the number of characters of the target enterprise name excluding the included enterprise name suffix in the enterprise association relationship information is less than or equal to a first preset value, complete the target enterprise name based on the characters before the enterprise name suffix in the clause, and update the completed target enterprise name into the enterprise association relationship information.
Optionally, the completion module is configured to:
in the clause, if a place name exists before the enterprise name suffix, intercepting a character string from the starting position of the place name to the ending position of the enterprise name suffix, and determining the character string as the complemented target enterprise name.
Optionally, the completion module includes a determination sub-module and a completion sub-module, where:
the determining submodule is used for determining whether the place name is marked by parentheses or not if the place name exists before the enterprise name suffix in the clause;
the completion submodule is used for intercepting a character string from the starting position of the place name to the ending position of the postfix of the enterprise name if the place name is not marked by the bracket, and determining the character string as the completed target enterprise name; and if the place name is marked by the brackets, determining a plurality of adjacent noun phrases existing before the target enterprise name, and adding the place name and the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form the complemented target enterprise name.
Optionally, the completion module is configured to:
in the clause, if a plurality of adjacent noun phrases exist before the target enterprise name, adding the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form a complemented target enterprise name; alternatively, the first and second electrodes may be,
in the clause, if a plurality of adjacent noun phrases and verb phrases exist before the target business name, adding the plurality of adjacent noun phrases and verb phrases to the position in the clause in sequence before the target business name to form a completed target business name.
Optionally, the apparatus further comprises:
the matching module is used for matching any clause which contains the associated keywords and does not determine the enterprise association relation information with the enterprise name contained in the preset enterprise name word bank, wherein the number of characters contained in the clause is larger than a second preset numerical value;
the determining module is further configured to determine enterprise association relationship information of the at least two enterprise names based on the part of speech of the associated keyword, the sentence pattern type of any clause, and the position of the associated keyword in any clause if the at least two enterprise names included in the preset enterprise name lexicon are matched.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, a text to be detected is obtained, the text to be detected is split to obtain at least one clause, the clause containing preset associated keywords is determined in the at least one clause, the sentence pattern type of the clause is determined for each clause containing the associated keywords, enterprise association relation information contained in the clause is determined according to the sentence pattern type of the clause, the part of speech of the associated keywords and the positions of the associated keywords in the clause, and if the number of characters of a target enterprise name except an enterprise name suffix in the enterprise association relation information is smaller than or equal to a first preset numerical value, the target enterprise name is complemented in the clause based on each character before the enterprise name suffix, and the complemented target enterprise name is updated to the enterprise association relation information. Therefore, the enterprise incidence relation information contained in the text to be detected can be directly acquired without manual checking, and the efficiency of extracting the enterprise incidence relation information is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an enterprise association relationship information extraction method based on a completion policy according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating display of enterprise association relationship information according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an enterprise association relationship information extraction apparatus based on a completion policy according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an enterprise association relationship information extraction apparatus based on a completion policy according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an enterprise association relationship information extraction apparatus based on a completion policy according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The embodiment of the invention provides an enterprise incidence relation information extraction method based on a completion strategy, and an execution main body of the method can be a server. The server is a background server of an application program for identifying the incidence relation of the enterprise, the server can be provided with a processor, a memory, a transceiver and the like, the processor can be used for processing the process of extracting the incidence relation information of the enterprise, the memory can be used for extracting data required and generated in the process of extracting the incidence relation information of the enterprise, and the transceiver can be used for receiving and sending messages and the like.
As shown in fig. 1, the processing flow of the method may include the following steps:
step 101, acquiring a text to be detected.
In implementation, a technician prestores a plurality of websites, such as websites of news media websites, in a server, the server can access the websites at preset time intervals to determine whether new announcements exist in the websites, if new announcements exist, the new announcements are copied, and the copied new announcements can be called texts to be detected.
In addition, a technician may also operate the terminal to send the text to be detected to the server, which is not limited in the embodiment of the present invention.
And 102, splitting the text to be detected to obtain at least one clause.
In implementation, after the server obtains the text to be detected, the server may search for the preset punctuation marks included in the text to be detected from the start position of the text to be detected, and determine the character between two preset punctuation marks as a clause to obtain at least one clause. The preset punctuation mark may be a separator between sentences, and may include a period, a comma, an exclamation mark, a semicolon, and the like.
Step 103, determining clauses containing preset associated keywords in at least one clause.
The technical staff may pre-store a thesaurus of the associated keywords in the server, where the thesaurus includes a plurality of preset associated keywords, the associated keywords are words that can represent a relationship between two objects, such as company a purchasing company B, company a being a subsidiary company of company C, and the like, and the associated keywords may be "investment", "stock holder", "purchase", "parent company", and the like.
In implementation, after the server obtains a plurality of clauses, word segmentation processing can be performed on each clause according to a preset word segmentation rule, the plurality of words obtained by word segmentation are matched with preset associated keywords, and if any associated keyword is included in the plurality of words obtained by word segmentation of a certain clause, the clause is determined as the clause including the preset associated keyword, so that the clause including the preset associated keyword can be determined. For example, a certain clause is "# ltd company invested # ltd company", the word obtained after the segmentation is "# ltd company", "invested", "# # ltd company", and the preset associated keyword includes "invested", so that the clause is a clause including the preset associated keyword.
It should be noted that the preset word segmentation rule mentioned above may be any word segmentation rule, such as a CRF (Conditional Random Field) model, and the embodiment of the present invention is not limited thereto.
And 104, determining the sentence pattern type of the clause for each clause containing the associated keywords, and determining enterprise association relation information contained in the clause according to the sentence pattern type of the clause, the part of speech of the associated keywords and the positions of the associated keywords in the clause.
In the word library of the pre-stored associated keywords, each associated keyword corresponds to its part of speech, the parts of speech of the associated keywords include verbs, nouns and the like, for example, the verbs include investments, acquisition and the like, and the nouns include stock control persons, subsidiaries, parent companies, stock control owners and the like. The enterprise incidence relation information includes incidence relations among a plurality of enterprises constructed based on the incidence keywords, for example, the incidence keywords are 'acquisition', and the incidence relation information is an implementation object-acquisition-receiving object, an implementation object-parent company-receiving object, and the like.
In implementation, for a certain clause containing the associated keyword, the server may first determine the sentence pattern type of the clause, and then determine the enterprise association relationship information contained in the clause by using the sentence pattern type of the clause, the part of speech of the associated keyword, and the position of the associated keyword in the clause.
Alternatively, the server may use the sentence pattern template of the associated keyword to determine the sentence pattern type of the clause containing the preset associated keyword, and the corresponding process may be as follows:
and determining the sentence pattern type of the clause according to the preset associated keywords, the sentence pattern template of the associated keywords and the corresponding relation of the sentence pattern type.
In practice, the relation between the associated keyword, the sentence pattern template of the associated keyword and the sentence pattern type may be preset by a technician and stored in the server, and after the technician determines the associated keyword, the technician may design a sentence pattern template of different associated keywords, such as "investment" for the associated keyword, … … (investment) "," (invested) "… … (investment)", "… … (investment) … …" for the associated keyword, and "… … (parent company)" for the associated keyword. The skilled person also defines several sentence pattern types, such as passive sentence pattern type, hidden relation sentence pattern type and obvious relation sentence pattern type, and the skilled person determines the sentence pattern types for the sentence pattern templates of the associated keywords, such as associated keywords being "investment", sentence pattern templates being "(give | pair to | pair) … … (investment)", sentence pattern types being hidden relation sentence pattern types, associated keywords being "investment", sentence pattern templates being "(quilt) … … (investment)", sentence pattern types being passive sentence pattern types, etc., associated keywords being "investment", sentence pattern templates being "… … (investment) … …", sentence pattern types being obvious relation sentence pattern types. Then, the above contents are stored in the corresponding relationship between the associated keywords, sentence pattern templates of the associated keywords and sentence pattern types, as shown in table one.
Watch 1
Figure BDA0001333956390000071
After determining the clause containing the preset associated keyword, the server may match the clause containing the preset associated keyword with the sentence pattern template of the associated keyword to the sentence pattern template of the associated keyword, and may determine the sentence pattern type corresponding to the sentence pattern template of the associated keyword as the sentence pattern type of the clause. For example, the clause is "# ltd company purchased by # # ltd company", and after the clause is "# ltd company", "was" # # ltd company "," purchased ", and the associated keyword included in the clause is" investment ", and from the sentence pattern template corresponding to" investment ", … … (investment)" is matched (by), so the sentence pattern type of the clause is a passive sentence pattern type.
It should be noted that the words in the sentence pattern template except the associated keywords may be referred to as sentence type words, for example, the sentence pattern template is "(quilt) … … (investment)", and the sentence type word is "quilt".
Optionally, if the clause includes preset parallel words, the processing of step 104 may be as follows:
and if the clauses contain preset parallel words, replacing preset parallel keywords contained in the clauses with preset characters, and determining enterprise association relation information contained in the replaced clauses based on the sentence pattern type of the replaced clauses, the positions of the associated keywords in the replaced clauses and the parts of speech of the associated keywords contained in the replaced clauses.
The preset parallel words can be preset by technicians and stored in the server, and the preset parallel words are words which represent that a plurality of objects are in parallel relation, such as ' joint ', ' association ', ' and the like. The preset characters are also preset by the technician and stored in the server, such as "&", etc.
In implementation, the server determines that a certain clause includes a preset parallel word, based on a word obtained after the clause is subjected to word segmentation processing, whether the clause includes the preset parallel word is checked, if the sentence includes the preset parallel word, the sentence is replaced with a preset character, for example, the clause is that "a company and B company joint develop O products", the clause "and" the joint "in the clause can be replaced with a preset character" & ", and the sentence after the replacement is that" a company & B company & develop O products ". And then determining enterprise incidence relation information contained in the replaced clause by using the sentence pattern type of the clause, the part of speech of the associated key word and the position of the associated key word in the replaced clause.
Optionally, if the sentence pattern type of the clause is a passive sentence pattern type or a hidden relation sentence pattern type, the clause may be adjusted first, and then enterprise association relation information included in the adjusted clause is determined. The corresponding processing of step 104 may be as follows:
if the sentence pattern type of the clause is a passive sentence pattern type or a hidden relational sentence pattern type, adjusting the associated keywords contained in the clause to a first position, and determining enterprise associated relationship information contained in the adjusted clause based on the position of the associated keywords in the adjusted clause and the part of speech of the associated keywords contained in the adjusted clause, wherein the first position is a position adjacent to sentence pattern type words in front of the sentence pattern words in the sentence pattern template contained in the clause, if the clause is "# limited responsibility company bought by # # limited responsibility company", the first position is a position adjacent to "being" in front of the sentence pattern template, and the sentence pattern words are words except the associated keywords in the sentence pattern template of the associated keywords.
In implementation, if the sentence pattern type of the clause is a passive sentence pattern or a hidden relation sentence pattern, the server may adjust the position of the associated keyword included in the clause, and may adjust the associated keyword to the first position. And then determining enterprise incidence relation information contained in the adjusted clause by using the position of the incidence key words in the adjusted clause and the part of speech of the incidence key words contained in the adjusted clause. For example, the sentence pattern type of the clause is a passive sentence pattern type, the clause is "# ltd liability company" purchased by # ltd liability company ", the phrase is" # ltd liability company "," accepted "," # ltd liability company "purchased", and before the first position in the clause is "accepted", the "purchase" may be adjusted to be before "accepted", so that the adjusted clause is "# ltd liability company", and the business association relation information included in the clause may be determined based on the part of speech of the "purchase" and the position of the "purchase" in the adjusted clause. For another example, the sentence pattern type of the clause is a hidden relation sentence pattern, the clause is "# limited responsibility company investment to # # limited responsibility company", the "# limited responsibility company", "forward", "# # limited responsibility company" and "investment increase" after the clause, the "investment increase" can be adjusted to the front of the "forward", the adjusted clause is "# limited responsibility company investment to # limited responsibility", and then the enterprise association relation information contained in the clause can be determined based on the part of speech of the "investment increase" and the position of the "investment increase" in the adjusted clause.
Optionally, if the sentence pattern type of the clause is a passive sentence pattern type, and the part of speech of the associated keyword included in the clause is a verb, the method for determining the enterprise association relationship information included in the adjusted clause based on the position of the associated keyword in the adjusted clause and the part of speech of the associated keyword included in the adjusted clause is as follows:
identifying backwards from the position of the associated keyword in the adjusted clause, determining the identified first enterprise name as the name of the implementation object of the associated keyword, identifying forwards, and determining the identified second enterprise name as the name of the implementation object of the associated keyword;
and generating enterprise association relation information between the name of the implementation object and the name of the implemented object based on the association keywords.
In implementation, the sentence pattern type of the clause is a passive sentence pattern type, when the clause is subjected to the clause processing by using a CRF model, the part of speech of each clause can be marked at the same time, the associated keyword starts from the position of the adjusted clause, the context information of the word marked as a noun after the associated keyword is combined, the word marked as a noun after the associated keyword is identified backwards from the position of the associated keyword, if the first enterprise name can be identified, the first enterprise name is determined as an implementation object of the associated keyword, the context information of the word marked as the noun before the associated keyword is combined, the associated keyword starts from the position of the adjusted clause, the word marked as a noun before the associated keyword is identified forwards, the identified first enterprise name is determined as the name of the implementation object of the associated keyword, the determined enterprise association relation information is 'first enterprise name' association key word 'second enterprise name'.
For example, the clause is "company a is purchased by company B", "the part of speech of the purchase" is a verb, the clause after adjustment is "company a is purchased by company B", the "purchase" is recognized backward, company B is recognized, company B is determined as the name of the implementation object of "purchase", the "purchase" is recognized forward, company a is recognized, company a is determined as the name of the implementation object of "purchase", and the business association relation information is "company B" - "purchase" - "company a".
Optionally, if the sentence pattern type of the clause is a passive sentence pattern type, and the part of speech of the associated keyword included in the clause is a noun, based on the position of the associated keyword in the adjusted clause and the part of speech of the associated keyword included in the adjusted clause, the method for determining the enterprise association relationship information included in the adjusted clause is as follows:
identifying forwards from the position of the associated keyword in the adjusted clause, determining the identified third enterprise name as the name of the implementation object of the associated keyword, identifying backwards, and determining the identified fourth enterprise name as the name of the implementation object of the associated keyword;
and generating enterprise association relation information between the name of the implementation object and the name of the implemented object based on the association keywords.
In implementation, the sentence pattern type of the clause is a passive sentence pattern type, when the clause is subjected to the clause processing by using a CRF model, the part of speech of each clause can be marked at the same time, the associated keyword starts from the position of the adjusted clause, the context information of the word marked as a noun before the associated keyword is combined, the word marked as a noun before the associated keyword is recognized forwards from the position of the associated keyword, if the third enterprise name can be recognized, the third enterprise name is determined as the implementation object of the associated keyword, the context information of the word marked as a noun after the associated keyword is combined, the fourth enterprise name recognized is determined as the name of the implementation object of the associated keyword from the position of the adjusted clause after the associated keyword is recognized backwards, the determined enterprise association relation information is 'third enterprise name' association key word 'fourth enterprise name'.
For example, the clause is "a company is a parent company of C company", "the part of speech of the parent company" is a noun, the adjusted clause is "a company parent company is C company", the "parent company" is recognized forward, the "a company" is recognized, the a company is determined as the name of the implementation object of the "parent company", the "C company" is recognized backward, the C company is determined as the name of the implementation object of the "parent company", and the enterprise association relationship information is "a company" - "parent company" - "C company".
Optionally, if the sentence pattern type of the clause is a hidden relation sentence pattern type, and the part of speech of the associated keyword included in the clause is a verb, the method for determining the enterprise association relation information included in the adjusted clause based on the position of the associated keyword in the adjusted clause and the part of speech of the associated keyword included in the adjusted clause is as follows:
identifying forwards from the position of the associated keyword in the adjusted clause, determining the identified fifth enterprise name as the name of the implementation object of the associated keyword, identifying backwards, and determining the identified sixth enterprise name as the name of the implementation object of the associated keyword;
and generating enterprise association relation information between the name of the implementation object and the name of the implemented object based on the association keywords.
In implementation, the sentence pattern type of the clause is a hidden relation sentence pattern type, when a CRF model is used for performing word segmentation processing on each clause, the part of speech of the clause can be marked at the same time, the associated keywords are arranged at the position of the adjusted clause, the context information of the words marked as nouns before the associated keywords is combined, the words marked as nouns before the associated keywords are identified from the position of the associated keywords to the front, if the fifth enterprise name can be identified, the fifth enterprise name is determined as an implementation object of the associated keywords, the context information of the words marked as nouns after the associated keywords is combined, the words marked as nouns after the associated keywords are identified backwards from the position of the adjusted clause of the associated keywords, the identified sixth enterprise name is determined as the name of the implementation object of the associated keywords, the determined enterprise association relation information is 'fifth enterprise name' association key word 'sixth enterprise name'.
For example, the clause is "E corporation adds 300 ten thousand yuan to C corporation", the part of speech of the addition is a verb, the clause after adjustment is "E corporation adds 300 ten thousand yuan to C corporation", the clause is recognized from "add", the E corporation is recognized, the E corporation is determined as the name of the implementation object of "add", the C corporation is recognized after recognition, the C corporation is determined as the name of the implemented object of "add", and the enterprise association relation information is "E corporation" - "add" - "C corporation".
Optionally, if the sentence pattern type of the clause is an obvious relational sentence pattern type, the part of speech of the associated keyword needs to be determined, and if the part of speech is a noun, the processing of step 104 may be as follows:
in each clause containing the associated keyword, starting from the position of the associated keyword in the clause, identifying backwards, if a seventh business name is identified, determining the seventh business name as the name of the implementation object of the associated keyword, and identifying forwards from the position of the associated keyword in the clause, determining the identified eighth business name as the name of the implementation object of the associated keyword; and generating enterprise association relation information between the name of the implementation object and the name of the implemented object based on the association keywords.
The parts of speech of the associated keywords are nouns, such as "stockholder", "parent company", "subsidiary company", and so on. The seventh business name and the eighth business name are any business name.
In implementation, after the server determines the clauses including the preset associated keywords, when a CRF model is used to perform word segmentation processing on each clause, the part of speech of each clause can be marked at the same time, and then the words obtained by the word segmentation processing of each clause can be arranged from front to back according to the original word sequence of the clause. For a certain clause, the server may determine the position of the associated keyword in the clause, combine context information of a word labeled as a noun after the associated keyword, identify backwards from the position of the associated keyword, and identify the word labeled as the noun after the associated keyword, if a seventh business name can be identified, determine the seventh business name as an implementation object of the associated keyword, and combine context information of a word labeled as a noun before the associated keyword, identify the word labeled as the noun before the associated keyword forward from the position of the associated keyword, and determine an identified eighth business name as the name of the implementation object of the associated keyword. And then using the association key words to obtain the enterprise association relation information which is the seventh enterprise name, the association key words and the eighth enterprise name, so that the enterprise association relation information contained in the clause can be determined, and the enterprise association relation information contained in each clause containing the association key words can be determined by analogy.
For example, the clause including the related keyword is "a subsidiary of company a is company F", the obtained words are "a", "of", "subsidiary" and "company F" from front to back after the clause processing is performed, the server can recognize the word "company F" from the word "subsidiary" to back, identify the word "company F" as the name of the object of implementation of the related keyword, and then recognize the word "company a" from the word "subsidiary" to back, and the thus identified company association information is "company F" - "subsidiary" - "company a".
Optionally, if the sentence pattern type of the clause is an obvious relational sentence pattern type, the part of speech of the associated keyword needs to be determined, and if the part of speech is a verb, the processing of step 104 may be as follows:
in each clause containing the associated keyword, identifying the associated keyword from the position of the associated keyword in the clause in the forward direction, if the ninth enterprise name is identified, determining the ninth enterprise name as the name of the implementation object of the associated keyword, identifying the associated keyword from the position of the associated keyword in the clause in the backward direction, and determining the identified tenth enterprise name as the name of the implementation object of the associated keyword; and generating enterprise association relation information between the name of the implementation object and the name of the implemented object based on the association keywords.
The part of speech of the associated keyword is a verb, such as "investment", "acquisition", and the like. The ninth business name and the tenth business name are any business name.
In implementation, after the server determines the clauses including the preset associated keywords, when a CRF model is used to perform word segmentation processing on each clause, the part of speech of each clause can be marked at the same time, and then the words obtained by the word segmentation processing of each clause can be arranged from front to back according to the original word sequence of the clause. For a certain clause, the server can determine the position of the associated keyword in the clause, combine the context information of the word marked as the noun before the associated keyword, recognize from the position of the associated keyword forward, the word marked as the noun before the associated keyword, if the ninth enterprise name can be recognized, determine the ninth enterprise name as the implementation object of the associated keyword, and combine the context information of the word marked as the noun after the associated keyword, recognize backward from the position of the associated keyword, the word marked as the noun after the associated keyword, and determine the recognized tenth enterprise name as the name of the implementation object of the associated keyword. And then using the association key words to obtain the enterprise association relation information which is the ninth enterprise name, the association key words and the tenth enterprise name, so that the enterprise association relation information contained in the clause can be determined, and the enterprise association relation information contained in each clause containing the association key words can be determined by analogy.
For example, the clause including the related keyword is "a invests D, the obtained words are" a "," invested "," D ", and" D "from front to back after the clause processing is performed, the server can recognize the word" a "from the" invested "onward, recognize the word" a ", determine the word" a "as the name of the object to which the related keyword is applied, and then recognize the word" D "from the" invested "onward, and thus determine the enterprise association relationship information as" a "-" invested "-" D ".
It should be noted that, in the embodiment of the present invention, a preset parallel word may be replaced with a preset character, then a sentence pattern type of the clause is determined, if the sentence pattern type of the clause is an obvious relational sentence pattern type, and a part of speech of the associated keyword included in the clause is a verb, starting from a position of the associated keyword in the adjusted clause, a word labeled as a noun before the associated keyword may be identified forward, if an eleventh enterprise name is identified and a preset character exists in front of the eleventh enterprise name, the word labeled as the noun continues to be identified forward, and if a twelfth enterprise name is identified and no preset character exists in front of the eleventh enterprise name, the eleventh enterprise name and the twelfth enterprise name are determined as names of implementation objects of the associated keyword. And identifying words labeled as nouns after the associated keywords from the positions of the associated keywords in the adjusted clauses backward and forward, and if a thirteenth business name is identified and preset characters do not exist behind the thirteenth business name, determining the thirteenth business name as the name of the applied object of the associated keywords. The determined enterprise association relation information is 'eleventh enterprise name, twelfth enterprise name' — 'association keyword' — 'thirteenth enterprise name'.
For example, the preset character is &, the clause is 'A company and S company joint investment P company', the clause after replacement is 'A company & S company & investment P company', forward identification is carried out, after the S company is identified, the preset symbol is arranged in front of the S company &, forward identification is carried out continuously, the A company is identified, no preset symbol is arranged in front of the A company, identification is stopped, then identification is carried out backwards from 'investment', the 'P company' is identified, no preset character is arranged behind the 'P company', identification is stopped, and the determined enterprise association relationship information is 'A company, S company-' investment '-P company'.
It should be noted that, here, only whether the preset parallel words are included is determined first, and then the sentence pattern type is determined, and similarly, the sentence pattern type may also be determined first, and then whether the preset parallel words are included is determined, which is not limited in the embodiment of the present invention. In addition, only the sentence pattern type of the clause is taken as an example for explanation, and similarly, for the passive sentence pattern type and the hidden relation sentence pattern type, the position of the associated keyword needs to be adjusted, the enterprise association relation information is identified based on the adjusted clause, and the identification of the enterprise association relation information based on the adjusted clause is the same as the processing method in the obvious relation sentence pattern type, and is not repeated here.
It should be noted that, if a certain sentence includes preset parallel words and the associated keywords are certain preset words, the associated keywords do not exist in the executed objects, such as partners, strategic partnerships, etc., after the server detects the associated keywords in the sentence, the server may replace the parallel words with preset characters, then, at the position of the associated keywords, identify forward the words marked as nouns before the associated keywords, if the enterprise name is identified, determine whether the preset characters exist before the enterprise name, if the preset characters exist, continue to identify forward the words marked as nouns, identify another enterprise name, determine whether the preset characters exist before another enterprise name, if the preset characters do not exist, determine the two enterprise names as the names of the executed objects of the associated keywords, if the preset character exists, until the preset character does not exist. For example, the clause is "a company and G company are partners", there is a parallel word "and", the "and" is replaced with &, the clause after replacement is "a company & G company are partners", and the determined enterprise association relationship information may be "a company, G company-partner", which indicates that a company and G company are partners.
Optionally, after determining the enterprise association relationship information included in the clause, determining whether the enterprise association relationship information is accurate based on a negative term included in the clause, where the corresponding processing may be as follows:
and if any two enterprise names included in the clause do not include the preset negative words, the determined enterprise association relation information is stored, and if any two enterprise names included in the clause include the preset negative words, the determined enterprise association relation information is not stored.
The preset negative words refer to words containing negative meanings, can be preset by technicians, and are stored in the server, if not, quasi and the like.
In implementation, after the server determines the enterprise association relationship information included in a certain clause, the server may further use a preset negative word to match the clause, determine whether the clause includes the preset negative word, if the clause includes the preset negative word, further determine whether the preset negative word is between any two enterprise names, if the preset negative word is between any two enterprise names, the server does not store the enterprise association relationship information included in the clause, and if the preset negative word is not between any two enterprise names, the server stores the enterprise association relationship information included in the clause.
For example, if the clause is "company a plans to invest company B", the determined enterprise association relationship information is "company a-investment-company B", the server determines that the clause includes a preset negative word "plan", and the "plan" is between two enterprise names, the enterprise association relationship information included in the clause is not stored. For another example, the clause is "the message that company a purchased company F is not issued by us", the determined association relationship information of the enterprises is "company a-purchase-company F", the server determines that the clause includes the preset negative word "not", but "not" is not between two names of enterprises, and the association relationship information of the enterprises included in the clause may be stored.
Optionally, the embodiment of the present invention further provides that before determining the included incidence relation information of the enterprise, clauses including preset incidence keywords are filtered, and corresponding processing may be as follows:
and filtering the clauses containing the associated keywords based on the preset interference filtering keywords.
In practice, the technician pre-stores the interference filtering keywords in the server, which are words related to news reports that may appear in sentences, such as reports, publications, disclosures, reporters, news, and the like.
After determining the clauses containing the preset associated keywords, the server can use the preset interference filtering words to match each clause containing the preset associated keywords, and if any interference filtering keyword is matched in a certain clause, the clause is filtered, and the clause is not processed for determining the enterprise association relation information. For example, a phrase "the # news media limited" reports that the stock holder of the # limited company is # limited, and the phrase obtained after the phrase is subjected to the word segmentation processing is "# # news media limited", "report" "is" "," "the # limited company", "the" "" stock holder "," is "", "# # limited company", and the phrase is matched with the interference filtering keyword "report", so that the phrase can be filtered. Thus, the business names of news media can be mistakenly identified, and the determined business association relation information is inaccurate, so that words related to news reports are filtered.
In addition, a sentence pattern template composed of interference filtering keywords can be used to filter clauses containing preset associated keywords, and the corresponding processing can be as follows:
technical personnel prestore preset sentence pattern templates composed of interference filtering keywords, such as' report/disclosure/release/disclosure "," … … broadcast disclosure/release/disclosure ", and the like according to … …, the server can filter clauses including preset associated keywords by using the preset sentence pattern templates composed of the interference filtering keywords, all clauses including the preset sentence pattern templates composed of the interference filtering keywords are filtered, and the clauses are not processed for determining included enterprise association relationship information.
It should be noted that, in the embodiment of the present invention, the determined enterprise association information is stored in the form of "implementation object-association keyword-implementation object", and similarly, other manners may also be used to store the enterprise association information, which is not limited in the embodiment of the present invention.
In the embodiment of the invention, the included enterprise incidence relation information is determined in each clause of the text to be detected, which contains the preset incidence keywords, the server can perform duplication checking processing, delete the repeated enterprise incidence relation information, and send the determined contradictory enterprise incidence relation information to the terminal used by the technical staff for confirmation by the technical staff.
In addition, before determining the enterprise association relationship information, the text to be detected may be preprocessed, and the corresponding processing may be as follows:
technical personnel prestore a plurality of indication pronouns in a server, the indication pronouns can be divided into a back-pointing type indication pronoun and a pre-pointing type indication pronoun, the back-pointing type indication pronoun indicates that an object to be indicated by the pronoun is in front of the pronoun, the pre-pointing type indication pronoun indicates that the object to be indicated by the pronoun is behind the pronoun, the back-pointing type indication pronoun can be divided into a global back-pointing type indication pronoun and a local back-pointing type indication pronoun, the global back-pointing type indication pronoun has 'the company' or 'the company', and the pre-pointing type indication pronoun has 'a parent company' or 'a child company' or the like.
If the global back-pointing type indication pronouns appear in the text to be detected, the position of the global back-pointing type indication pronouns can be identified forwards until the position of the pronouns appearing in the text to be detected is the earliest, and then the enterprise names closest to the pronouns are identified forwards. And replacing the positions where the subsequent global back-pointing class indication pronouns appear with the recognized enterprise names. For example, recognizing that the enterprise name corresponding to "this company" is "ABC limited responsible company," the "this company" appearing in the text to be detected may be replaced with "ABC limited responsible company.
If the local back-pointing class indication pronouns appear in the text to be detected, the local back-pointing class indication pronouns can be recognized forwards from the positions of the local back-pointing class indication pronouns, the names of enterprises nearest to the positions are recognized, then the local back-pointing class indication pronouns are replaced by the recognized names of the enterprises, for example, the local back-pointing class indication pronouns are the company, the forwardly recognized names of the enterprises are the D company with limited responsibility, and the company can be replaced by the D company with limited responsibility.
If the pre-indication pronouns appear in the text to be detected, the enterprise names closest to the position can be identified backwards, then the pre-indication pronouns are replaced by the identified enterprise names, and the subsequently appearing pre-indication pronouns in the text to be detected are replaced by the identified enterprise names. For example, the pre-indication type indicator is "sub-company", the sentence at the position where the "sub-company" appears in the text to be detected is "sub-company of E-company" is F-company ", it can be determined that the sub-company represents" F-company ", and the" sub-company "appearing later in the text to be detected can be replaced with" F-company ". It should be noted that this processing method is only used for indicating that the pre-indication type pronouns in the text to be detected only represent one company, and if "the subsidiary of the company E is company F, and the subsidiary of the company a is company D" appears in the text to be detected, this situation does not apply.
The technician also stores sentence templates, such as (… ), (…, …) and the like, in the server. If such a sentence template exists in the clause, it can be identified forward, and the identified business name and the business name after the "title" in the sentence template are determined as the same business name. The sentence pattern template may also be (…, … or …), if such sentence pattern template exists in the clause, it can be identified forward, and the identified business name and the business name after "call" or "in the sentence pattern template are determined as the same business name.
By the preprocessing of the text to be detected, the enterprise names contained in the text to be detected can be more complete, and the determined enterprise association relation information is more complete.
And 105, if the number of the characters of the target enterprise name excluding the included enterprise name suffix in the enterprise association relationship information is less than or equal to a first preset numerical value, completing the target enterprise name based on the characters before the enterprise name suffix in the clause, and updating the completed target enterprise name into the enterprise association relationship information.
The method for determining the enterprise name suffix mentioned herein is as follows: 40000 enterprise name full names are used as a training set, and a common enterprise name suffix appears at the end of the enterprise name full names, so that any two enterprise name full names are matched from the end of the enterprise name full names until no same character exists, the determined same character strings are recorded until any two of the 40000 enterprise name full names are matched, a plurality of different character strings are recorded, and the different character strings are the enterprise name suffixes. In addition, in order to prevent errors in the determined business name suffix, a technician may perform manual screening to determine the final business name suffix.
The target enterprise name is any enterprise name in the enterprise association relation information included in the clause. The first preset value may be preset by a technician and stored in the server, for example, if the first preset value is 3, the number of characters of the full name of the general enterprise excluding the included enterprise name suffix should also be greater than a certain value, which is the first preset value, and if the number of characters of the full name of the enterprise excluding the included enterprise name suffix is smaller, it indicates that the enterprise name is the enterprise name with the missing full name.
In an implementation, for each clause, after determining the business association relationship information, a preset business name suffix may be used to match the business name in the business association relationship information, and matching may be performed from the last edge of the business name, if the target business name in the business association relationship information matches the business name suffix, the number of characters of the target business name excluding the included business name suffix may be determined, and if the number of characters is less than or equal to a first preset value, the target business name may be determined to be the full-name missing business name. And then, complementing the target enterprise name by using the characters before the enterprise name suffix in the clause, and then updating the complemented target enterprise name into the enterprise association relation information. Therefore, the target enterprise name in the enterprise association relation information is a full name, so that the enterprise names in the enterprise association relation information are more uniform and more convenient to count. For example, if the sentence is "reported that the network innovative technology limited liability company has bought the network sharing limited liability company", the first preset numerical value is 3, the name of the target enterprise in the enterprise association relation information is the technology limited liability company, the character after the limited liability company is removed is "technology", and the number of characters is less than 3, the name of the target enterprise is determined to be the name of the enterprise which is totally lost, and the name of the target enterprise can be complemented by using each character before the limited liability company.
In addition, if the number of characters of the target enterprise name excluding the included enterprise name suffix in the enterprise association relationship information is larger than a first preset numerical value, the target enterprise name is determined not to be the enterprise name with missing full name, and the enterprise name in the enterprise association relationship information is not complemented.
Alternatively, the target business name may be complemented with the place name existing before the business name suffix, and the corresponding process may be as follows:
in the clause, if the place name exists before the enterprise name suffix, the character string from the starting position of the place name to the ending position of the enterprise name suffix is intercepted and determined as the complemented target enterprise name.
In an implementation, in step 102, when the clause is participated by using the CRF model, the place name existing in the clause may be marked, and if the place name exists before the business name suffix in the target business name, the server may intercept a character string from the start position of the place name to the end position of the business name suffix, and then determine the character string as the complemented target business name. For example, the sentence is that "beijing innovation media limited responsibility company invests in the Shandong novel media limited responsibility company", the enterprise association relationship information is that "beijing innovation media limited responsibility company-investment-media limited responsibility company" and "media limited responsibility company", the character after removing the suffix of the enterprise name is "media", the number of the character is smaller than a first preset numerical value, the character string from "Shandong" to "limited responsibility company" can be intercepted, the character string is a full name of the target enterprise name, the target enterprise after completion is called as "Shandong novel media limited responsibility company", and then the enterprise association relationship information is "beijing innovation media limited responsibility company-investment-Shandong novel media limited company".
Alternatively, if a place name exists before the business name suffix included in the target business name, it is also necessary to consider whether the place name is marked by parentheses, and the corresponding process may be as follows:
in the clause, if a place name exists before the business name suffix, determining whether the place name is marked by brackets;
if the place name is not marked by brackets, intercepting a character string from the starting position of the place name to the finishing position of the postfix of the enterprise name, and determining the character string as the complemented target enterprise name; and if the place name is marked by brackets, determining a plurality of adjacent noun phrases existing before the target enterprise name, and adding the place name and the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form the complemented target enterprise name.
In step 102, when the clause is participated by using the CRF model, the place names in the clause and the parts of speech of the words after the clause participated may be marked out. If the place name exists before the enterprise name suffix in the target enterprise name, then judging whether the place name is marked by brackets, if the brackets are small brackets "()" in Chinese and English, the forms are (Beijing), (Shanxi), (Hainan) and the like. If the place name is not marked by brackets, a character string from the starting position of the place name to the ending position of the postfix of the enterprise name can be intercepted, and then the character string is determined as the complemented target enterprise name.
If the place name is marked by brackets, noun phrases marked as nouns before the target enterprise name in the clause can be determined, then whether the noun phrases marked as nouns are adjacent or not is judged, if yes, the place name and a plurality of adjacent noun phrases can be added to the front of the target enterprise name according to the position sequence in the clause, and therefore the completed target enterprise name can be formed. For example, the sentence is that "network science and technology (Shanghai) Innovation Limited liability company invests Shandong New media Limited liability company", the obtained enterprise association relation information is "Innovation Limited liability company-investing-Shandong New media Limited liability company", the target enterprise name is "Innovation Limited liability company", "Limited liability company", the place name exists before the target enterprise name, and is marked by brackets, in the clause, the target business name is preceded by two adjacent noun phrases "network", "technology", the 'network', 'science and technology', 'Shanghai' can be added before the name of the target enterprise to obtain the complemented name of the target enterprise as 'network science and technology (Shanghai) Innovation Limited liability company', and the final enterprise association relation information is acquired as 'network science and technology (Shanghai) Innovation Limited liability company-investment-Shandong novel media Limited liability company'.
In addition, if the place name is marked by brackets, the word marked as a noun or a verb before the business name suffix in the clause can be determined, then whether the noun phrase marked as a noun or the verb phrase marked as a verb is adjacent or not is judged, if so, the place name, a plurality of adjacent noun phrases and verb phrases can be added to the front of the target business name according to the position sequence in the clause, and thus, the completed target business name can be formed. For example, the sentence is that "innovative science and technology (Shanghai) media Limited liability company invests Shandong new media Limited liability company", the obtained enterprise association relation information is "media Limited liability company-invest-Shandong new media Limited liability company", the target enterprise name is "media Limited liability company", "Limited liability company" has place name before, and is marked by parentheses, in the clause, two adjacent noun phrases and verb phrases "innovate", "science" exist before the name of the target enterprise, the innovation, science and technology and Shanghai can be added before the name of the target enterprise to obtain the complemented name of the target enterprise, namely the innovative science and technology (Shanghai) media finite responsibility company, and the final enterprise association relation information is 'innovative science and technology (Shanghai) media finite responsibility company-investment-Shandong novel media finite responsibility company'.
In addition, if the place name does not exist before the business name suffix, a plurality of adjacent noun phrases existing before the target business name can be directly determined, the plurality of adjacent noun phrases are added to the front of the target business name according to the position sequence in the clause to form the complemented target business name, or the plurality of adjacent noun phrases and verb phrases existing before the target business name are directly determined, the plurality of adjacent noun phrases and verb phrases are added to the front of the target business name according to the position sequence in the clause to form the complemented target business name, and the specific processing procedure is described in detail later.
Optionally, the target business name may be complemented based on noun phrases and verb phrases before the target business name, and the corresponding processing may be as follows:
in the clause, if a plurality of adjacent noun phrases exist before the target enterprise name, adding the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form a completed target enterprise name; alternatively, the first and second electrodes may be,
in the clause, if a plurality of adjacent noun phrases and verb phrases exist before the target business name, the plurality of adjacent noun phrases and verb phrases are added to the front of the target business name according to the position sequence in the clause to form the complemented target business name.
In implementation, after determining the enterprise association relationship information included in the clause, a noun phrase labeled as a noun before the target enterprise name may be searched for in a plurality of words obtained after the clause and the participle processing, and if a plurality of noun phrases labeled as nouns are adjacent, the plurality of adjacent noun phrases may be added before the target enterprise name to form a completed target enterprise name. For example, the clause shows that the network technology innovation llc invests in the Shandong new media llc, the obtained association relationship information is "innovation llc-invest-Shandong new media llc", the target enterprise is named "innovation llc", in the clause, two adjacent noun phrases "network" and "science" exist before the target enterprise name, the target enterprise after completion can be named "network technology innovation llc", and the final association relationship information is "network technology innovation llc-invest-Shandong new media llc".
Or after determining the enterprise association relation information included in the clause, searching a noun phrase labeled as a noun and a verb phrase labeled as a verb before the target enterprise name in a plurality of words obtained after the clause and the word processing, and if the plurality of labeled noun phrases are adjacent to the verb phrase, adding the plurality of adjacent noun phrases and verb phrases to the target enterprise name to form the completed target enterprise name. For example, the sentence is that "innovative science and technology media limited responsibility company invests in the Shandong novel media limited responsibility company", the obtained enterprise association relationship information is "media limited responsibility company-investment-Shandong novel media limited responsibility company", the target enterprise is named as "media limited responsibility company", two adjacent noun phrases and verb phrases "innovation" and "science" exist before the target enterprise name, the "innovation" and the "science" can be added before the target enterprise name, the complemented target enterprise is named as "innovative science and technology media limited responsibility company", and the final enterprise association relationship information is "innovative science and technology media limited responsibility company-investment-Shandong novel media limited responsibility company".
Optionally, in order to more accurately complement the target business name, the following processing may be performed:
in the clause, if a plurality of adjacent noun phrases exist in a preset character number range before the target enterprise name, adding the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form a completed target enterprise name; alternatively, the first and second electrodes may be,
in the clause, if a plurality of adjacent noun phrases and verb phrases exist in a preset character number range before the target enterprise name, the plurality of adjacent noun phrases and verb phrases are added to the front of the target enterprise name according to the position sequence in the clause to form the complemented target enterprise name.
The preset number range of characters can be preset by a technician and stored in the server, such as within 6 characters before the name of the target enterprise.
In implementation, after determining enterprise association relationship information included in a clause, noun phrases labeled as nouns in a preset character number range before a target enterprise name can be searched in a plurality of words obtained after sentence segmentation and word processing, and if a plurality of noun phrases labeled as nouns are adjacent, the plurality of adjacent noun phrases can be added before the target enterprise name to form a completed target enterprise name. Thus, noun phrases in the preset character number range before the target business name is limited can exclude phrases which are not included in the business name as much as possible, so that the supplemented target business name is more accurate.
Or after determining the enterprise association relation information included in the clause, in a plurality of words obtained after the clause is processed, in a preset character number range before the target enterprise name, a noun phrase labeled as a noun and a verb phrase labeled as a verb can be searched, and if the plurality of labeled noun phrases are adjacent to the verb phrase, the plurality of adjacent noun phrases and verb phrases can be added to the front of the target enterprise name to form a completed target enterprise name. In this way, noun phrases and verb phrases within a preset character number range before the target business name are limited can be excluded as much as possible in order to exclude phrases which are not included in the business name, so that the completed target business name is more accurate.
In addition, after the target enterprise name is supplemented, the supplemented enterprise name can be sent to a terminal used by a technician, the technician can check the supplemented target enterprise name, if the supplemented target enterprise name is correct, the supplemented target enterprise name can be stored in a dictionary model in a CRF (learning control and reporting) model, and then the supplemented target enterprise name can be directly used as a participle when the participle is carried out in a sentence.
In addition, although the number of characters of the target business name excluding the business name suffix in the business association information is equal to or less than the first preset value, in the clause, the place name is not present before the business name suffix of the target business name, a plurality of adjacent noun phrases are not present, and a plurality of adjacent noun phrases and verb phrases are not present, the business association information included in the clause may not be stored.
Optionally, the embodiment of the present invention further provides a method for re-extracting enterprise association relationship information, and corresponding processing may be as follows:
matching any clause which contains the associated keywords and does not determine the enterprise association relation information with the enterprise name contained in the preset enterprise name word bank, wherein the number of characters contained in the clause is larger than a second preset numerical value; and if at least two enterprise names included in the preset enterprise name word bank are matched, determining enterprise association relation information of the at least two enterprise names based on the part of speech of the associated keywords, the sentence pattern type of any clause and the position of the associated keywords in any clause.
The second preset value may be preset by a technician and stored in the server, for example, the second preset value may be 2. The preset enterprise name word library comprises a large number of enterprise name short names, enterprise name full names and corresponding relations between the enterprise name short names and the enterprise name full names.
In implementation, in a plurality of clauses containing associated keywords in a text to be detected, clauses containing the associated keywords and not determining enterprise association relationship information are determined, the enterprise association relationship information can be re-extracted for the clauses, and for any clause in the clauses, clauses with the character number larger than a second preset numerical value after word segmentation processing can be determined, and then each clause is matched with an enterprise name included in a preset enterprise name word bank. If the clauses contained in the clause are matched with at least two enterprise names and the matched clauses are full names of the enterprise names, the part of speech of the associated keyword, the sentence pattern type of any clause and the position of the associated keyword in any clause can be used for determining the enterprise association relation information of the at least two enterprise names. If the participles contained in the clause are matched with at least two enterprise names and the matched enterprise names are the enterprise names for short, the part of speech of the associated keywords, the sentence pattern type of any clause and the positions of the associated keywords in any clause can be used for determining the enterprise association relation information of the at least two enterprise names, then the full name corresponding to the enterprise names for short is determined from a preset enterprise name word library, and the enterprise names for short in the enterprise association relation information are replaced by the full name of the enterprise names.
It should be noted that, after at least two enterprise names are matched in any clause, the method for determining the enterprise association relationship information of the at least two enterprise names is the same as the processing of step 104, and all that is, the sentence pattern type of any clause is determined first, and then the enterprise association relationship information included in any clause is determined by using the part of speech of the associated keyword and the position of the associated keyword in the clause, and the specific processing process may refer to the processing of step 104, which is not described herein again.
In addition, when a user wants to check the enterprise association relation information, an identification application program of the enterprise association relation can be installed in the terminal, the identification application program is opened, the terminal can display a main interface of the identification application program, a search box and search options are displayed in the main interface, the user can input the enterprise name to be searched in the search box and then click the search options, the terminal can detect a click instruction of the search options, and an association relation query request of the enterprise name is sent to the server. After receiving the association relationship information, the server may search for the association relationship related to the enterprise name, and then send the association relationship to the terminal, where the terminal may display the enterprise association relationship information corresponding to the enterprise name, and the display manner may be in a form of a graph, for example, M corporation invests N corporation, M corporation' S subsidiary is O corporation, M purchases P corporation and S corporation, M corporation invests U corporation, and the association relationship information may also be displayed in a form of a table, as shown in fig. 2.
It should be noted that, in the embodiment of the present invention, a corresponding relationship between a full name of an enterprise name and a short name of the enterprise name is pre-stored in the server, where the enterprise name included in the enterprise association relationship information is the full name of the enterprise name, and if the short name of the enterprise name is identified in the clause, the short name of the enterprise name may be corresponding to the full name of the enterprise name, and the full name of the enterprise name may be stored in the enterprise association relationship information.
In the embodiment of the invention, a text to be detected is obtained, the text to be detected is split to obtain at least one clause, the clause containing preset associated keywords is determined in the at least one clause, the sentence pattern type of the clause is determined for each clause containing the associated keywords, enterprise association relation information contained in the clause is determined according to the sentence pattern type of the clause, the part of speech of the associated keywords and the positions of the associated keywords in the clause, and if the number of characters of a target enterprise name except an enterprise name suffix in the enterprise association relation information is smaller than or equal to a first preset numerical value, the target enterprise name is complemented in the clause based on each character before the enterprise name suffix, and the complemented target enterprise name is updated to the enterprise association relation information. Therefore, the enterprise incidence relation information contained in the text to be detected can be directly acquired without manual checking, and the efficiency of extracting the enterprise incidence relation information is improved.
Based on the same technical concept, an embodiment of the present invention further provides an enterprise association relationship information extraction apparatus based on a completion policy, as shown in fig. 3, the apparatus includes:
an obtaining module 310, configured to obtain a text to be detected;
the splitting module 320 is configured to split the text to be detected to obtain at least one clause;
a determining module 330, configured to determine, in the at least one clause, a clause including a preset associated keyword;
the determining module 330 is configured to determine, for each clause that includes the associated keyword, a sentence pattern type of the clause, and determine enterprise association relationship information included in the clause according to the sentence pattern type of the clause, the part of speech of the associated keyword, and a position of the associated keyword in the clause;
a completion module 340, configured to, if the number of characters of the target enterprise name excluding the included enterprise name suffix in the enterprise association relationship information is less than or equal to a first preset value, complete the target enterprise name based on the characters before the enterprise name suffix in the clause, and update the completed target enterprise name into the enterprise association relationship information.
Optionally, the completing module 340 is configured to:
in the clause, if a place name exists before the enterprise name suffix, intercepting a character string from the starting position of the place name to the ending position of the enterprise name suffix, and determining the character string as the complemented target enterprise name.
Optionally, as shown in fig. 4, the completion module 340 includes a determination sub-module 341 and a completion sub-module 342, where:
the determining sub-module 341 is configured to determine, in the clause, whether a place name is marked by parentheses if the place name exists before the business name suffix;
the completion sub-module 342 is configured to, if the place name is not labeled by the parentheses, intercept a character string from a start position of the place name to an end position of the suffix of the enterprise name, and determine the character string as the completed target enterprise name; and if the place name is marked by the brackets, determining a plurality of adjacent noun phrases existing before the target enterprise name, and adding the place name and the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form the complemented target enterprise name.
Optionally, the completing module 340 is configured to:
in the clause, if a plurality of adjacent noun phrases exist before the target enterprise name, adding the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form a complemented target enterprise name; alternatively, the first and second electrodes may be,
in the clause, if a plurality of adjacent noun phrases and verb phrases exist before the target business name, adding the plurality of adjacent noun phrases and verb phrases to the position in the clause in sequence before the target business name to form a completed target business name.
Optionally, as shown in fig. 5, the apparatus further includes:
the matching module 650 is configured to match, for any clause that includes an association keyword and for which enterprise association relationship information is not determined, an clause whose number of characters included in the clause is greater than a second preset value with an enterprise name included in a preset enterprise name lexicon;
the determining module 340 is further configured to determine, if at least two enterprise names included in the preset enterprise name lexicon are matched, enterprise association relationship information of the at least two enterprise names based on the part of speech of the associated keyword, the sentence pattern type of any clause, and the position of the associated keyword in any clause.
In the embodiment of the invention, a text to be detected is obtained, the text to be detected is split to obtain at least one clause, the clause containing preset associated keywords is determined in the at least one clause, the sentence pattern type of the clause is determined for each clause containing the associated keywords, enterprise association relation information contained in the clause is determined according to the sentence pattern type of the clause, the part of speech of the associated keywords and the positions of the associated keywords in the clause, and if the number of characters of a target enterprise name except an enterprise name suffix in the enterprise association relation information is smaller than or equal to a first preset numerical value, the target enterprise name is complemented in the clause based on each character before the enterprise name suffix, and the complemented target enterprise name is updated to the enterprise association relation information. Therefore, the enterprise incidence relation information contained in the text to be detected can be directly acquired without manual checking, and the efficiency of extracting the enterprise incidence relation information is improved.
It should be noted that: the device for extracting enterprise association relationship information provided in the above embodiment is only illustrated by the division of the above functional modules when extracting enterprise association relationship information, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the apparatus for extracting enterprise association relationship information and the method embodiment for extracting enterprise association relationship information provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not described herein again.
Referring to fig. 6, a schematic structural diagram of a server according to an embodiment of the present invention is shown, where the server may be used to implement the method for extracting enterprise association relationship information based on a completion policy provided in the foregoing embodiment. Specifically, the method comprises the following steps:
the server 600 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Further, the central processor 1922 may be configured to communicate with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 600.
The server 600 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, &lTtTtranslation = L "&gTtL &lTt/T &gTtinxTM, FreeBSDTM, and so on.
The server 600 may include memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:
acquiring a text to be detected;
splitting the text to be detected to obtain at least one clause;
determining a clause containing a preset associated keyword in the at least one clause;
determining a sentence pattern type of each clause containing the associated keywords, and determining enterprise association relation information contained in the clause according to the sentence pattern type of the clause, the part of speech of the associated keywords and the positions of the associated keywords in the clause;
and if the number of characters of the target enterprise name without the included enterprise name suffix in the enterprise association relationship information is less than or equal to a first preset numerical value, completing the target enterprise name based on the characters before the enterprise name suffix in the clause, and updating the completed target enterprise name into the enterprise association relationship information.
Optionally, the completing the target business name based on each character before the business name suffix in the clause includes:
in the clause, if a place name exists before the enterprise name suffix, intercepting a character string from the starting position of the place name to the ending position of the enterprise name suffix, and determining the character string as the complemented target enterprise name.
Optionally, in the clause, if a place name exists before the business name suffix, intercepting a character string from a starting position of the place name to an ending position of the business name suffix, and determining the character string as the complemented target business name includes:
in the clause, if a place name exists before the business name suffix, determining whether the place name is marked by parentheses;
if the place name is not marked by the brackets, intercepting a character string from the starting position of the place name to the ending position of the postfix of the enterprise name, and determining the character string as the complemented target enterprise name; and if the place name is marked by the brackets, determining a plurality of adjacent noun phrases existing before the target enterprise name, and adding the place name and the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form the complemented target enterprise name.
Optionally, the completing the target business name based on each character before the business name suffix in the clause includes:
in the clause, if a plurality of adjacent noun phrases exist before the target enterprise name, adding the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form a complemented target enterprise name; alternatively, the first and second electrodes may be,
in the clause, if a plurality of adjacent noun phrases and verb phrases exist before the target business name, adding the plurality of adjacent noun phrases and verb phrases to the position in the clause in sequence before the target business name to form a completed target business name.
Optionally, the method further includes:
matching any clause which contains the associated keywords and does not determine the enterprise association relation information with the enterprise name contained in a preset enterprise name word bank, wherein the number of characters contained in the clause is larger than a second preset numerical value;
and if at least two enterprise names included in the preset enterprise name word bank are matched, determining enterprise association relation information of the at least two enterprise names based on the part of speech of the associated keyword, the sentence pattern type of any clause and the position of the associated keyword in any clause.
In the embodiment of the invention, a text to be detected is obtained, the text to be detected is split to obtain at least one clause, the clause containing preset associated keywords is determined in the at least one clause, the sentence pattern type of the clause is determined for each clause containing the associated keywords, enterprise association relation information contained in the clause is determined according to the sentence pattern type of the clause, the part of speech of the associated keywords and the positions of the associated keywords in the clause, and if the number of characters of a target enterprise name except an enterprise name suffix in the enterprise association relation information is smaller than or equal to a first preset numerical value, the target enterprise name is complemented in the clause based on each character before the enterprise name suffix, and the complemented target enterprise name is updated to the enterprise association relation information. Therefore, the enterprise incidence relation information contained in the text to be detected can be directly acquired without manual checking, and the efficiency of extracting the enterprise incidence relation information is improved.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method for extracting enterprise incidence relation information based on a completion strategy is characterized by comprising the following steps:
acquiring a text to be detected;
splitting the text to be detected to obtain at least one clause;
determining a clause containing a preset associated keyword in the at least one clause;
determining a sentence pattern type of each clause containing the associated keywords, if the sentence pattern type of each clause is a passive sentence pattern type or a hidden relation sentence pattern type, adjusting the position of the associated keywords contained in the clause to a first position, and determining enterprise association relation information contained in the adjusted clause based on the position of the key keywords contained in the adjusted clause and the part of speech of the associated keywords; if the sentence pattern type of the clause is an obvious relational sentence pattern type, determining enterprise incidence relation information contained in the clause according to the part of speech of the associated key words in the clause and the positions of the associated key words in the clause; if the clauses contain preset parallel words, replacing the preset parallel words contained in the clauses with preset characters, and determining enterprise association relation information contained in the replaced clauses based on sentence pattern types of the replaced clauses, positions of associated keywords in the replaced clauses and parts of speech of the associated keywords contained in the replaced clauses;
and if the number of characters of the target enterprise name without the included enterprise name suffix in the enterprise association relationship information is less than or equal to a first preset numerical value, completing the target enterprise name based on the characters before the enterprise name suffix in the clause, and updating the completed target enterprise name into the enterprise association relationship information.
2. The method of claim 1, wherein said complementing the target business name based on characters preceding the business name suffix in the clause comprises:
in the clause, if a place name exists before the enterprise name suffix, intercepting a character string from the starting position of the place name to the ending position of the enterprise name suffix, and determining the character string as the complemented target enterprise name.
3. The method according to claim 2, wherein the intercepting a character string from a start position of the place name to an end position of the business name suffix in the clause if the place name exists before the business name suffix, determining as the complemented target business name, comprises:
in the clause, if a place name exists before the business name suffix, determining whether the place name is marked by parentheses;
if the place name is not marked by the brackets, intercepting a character string from the starting position of the place name to the ending position of the postfix of the enterprise name, and determining the character string as the complemented target enterprise name; and if the place name is marked by the brackets, determining a plurality of adjacent noun phrases existing before the target enterprise name, and adding the place name and the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form the complemented target enterprise name.
4. The method of claim 1, wherein said complementing the target business name based on characters preceding the business name suffix in the clause comprises:
in the clause, if a plurality of adjacent noun phrases exist before the target enterprise name, adding the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form a complemented target enterprise name; alternatively, the first and second electrodes may be,
in the clause, if a plurality of adjacent noun phrases and verb phrases exist before the target business name, adding the plurality of adjacent noun phrases and verb phrases to the position in the clause in sequence before the target business name to form a completed target business name.
5. The method of claim 1, further comprising:
matching any clause which contains the associated keywords and does not determine the enterprise association relation information with the enterprise name contained in a preset enterprise name word bank, wherein the number of characters contained in the clause is larger than a second preset numerical value;
and if at least two enterprise names included in the preset enterprise name word bank are matched, determining enterprise association relation information of the at least two enterprise names based on the part of speech of the associated keyword, the sentence pattern type of any clause and the position of the associated keyword in any clause.
6. An enterprise incidence relation information extraction device based on a completion strategy is characterized by comprising:
the acquisition module is used for acquiring the text to be detected;
the splitting module is used for splitting the text to be detected to obtain at least one clause;
the determining module is used for determining a clause containing a preset associated keyword in the at least one clause;
the determining module is configured to determine a sentence pattern type of each clause including the associated keyword, adjust a position of the associated keyword included in the clause to a first position if the sentence pattern type of the clause is a passive sentence pattern type or a hidden relation sentence pattern type, and determine enterprise association relationship information included in the adjusted clause based on the position of the key keyword included in the adjusted clause and the part of speech of the associated keyword; if the sentence pattern type of the clause is an obvious relational sentence pattern type, determining enterprise incidence relation information contained in the clause according to the part of speech of the associated key words in the clause and the positions of the associated key words in the clause; if the clauses contain preset parallel words, replacing the preset parallel words contained in the clauses with preset characters, and determining enterprise association relation information contained in the replaced clauses based on sentence pattern types of the replaced clauses, positions of associated keywords in the replaced clauses and parts of speech of the associated keywords contained in the replaced clauses;
and a completion module, configured to, if the number of characters of the target enterprise name excluding the included enterprise name suffix in the enterprise association relationship information is less than or equal to a first preset value, complete the target enterprise name based on the characters before the enterprise name suffix in the clause, and update the completed target enterprise name into the enterprise association relationship information.
7. The apparatus of claim 6, wherein the completion module is configured to:
in the clause, if a place name exists before the enterprise name suffix, intercepting a character string from the starting position of the place name to the ending position of the enterprise name suffix, and determining the character string as the complemented target enterprise name.
8. The apparatus of claim 7, wherein the completion module comprises a determination submodule and a completion submodule, wherein:
the determining submodule is used for determining whether the place name is marked by parentheses or not if the place name exists before the enterprise name suffix in the clause;
the completion submodule is used for intercepting a character string from the starting position of the place name to the ending position of the postfix of the enterprise name if the place name is not marked by the bracket, and determining the character string as the completed target enterprise name; and if the place name is marked by the brackets, determining a plurality of adjacent noun phrases existing before the target enterprise name, and adding the place name and the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form the complemented target enterprise name.
9. The apparatus of claim 6, wherein the completion module is configured to:
in the clause, if a plurality of adjacent noun phrases exist before the target enterprise name, adding the plurality of adjacent noun phrases to the front of the target enterprise name according to the position sequence in the clause to form a complemented target enterprise name; alternatively, the first and second electrodes may be,
in the clause, if a plurality of adjacent noun phrases and verb phrases exist before the target business name, adding the plurality of adjacent noun phrases and verb phrases to the position in the clause in sequence before the target business name to form a completed target business name.
10. The apparatus of claim 6, further comprising:
the matching module is used for matching any clause which contains the associated keywords and does not determine the enterprise association relation information with the enterprise name contained in the preset enterprise name word bank, wherein the number of characters contained in the clause is larger than a second preset numerical value;
the determining module is further configured to determine enterprise association relationship information of the at least two enterprise names based on the part of speech of the associated keyword, the sentence pattern type of any clause, and the position of the associated keyword in any clause if the at least two enterprise names included in the preset enterprise name lexicon are matched.
CN201710502217.6A 2017-06-27 2017-06-27 Enterprise association relation information extraction method and device based on completion strategy Active CN107247707B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710502217.6A CN107247707B (en) 2017-06-27 2017-06-27 Enterprise association relation information extraction method and device based on completion strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710502217.6A CN107247707B (en) 2017-06-27 2017-06-27 Enterprise association relation information extraction method and device based on completion strategy

Publications (2)

Publication Number Publication Date
CN107247707A CN107247707A (en) 2017-10-13
CN107247707B true CN107247707B (en) 2020-08-04

Family

ID=60015114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710502217.6A Active CN107247707B (en) 2017-06-27 2017-06-27 Enterprise association relation information extraction method and device based on completion strategy

Country Status (1)

Country Link
CN (1) CN107247707B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107608949B (en) * 2017-10-16 2019-04-16 北京神州泰岳软件股份有限公司 A kind of Text Information Extraction method and device based on semantic model
CN108182179B (en) * 2018-01-29 2019-07-30 北京神州泰岳软件股份有限公司 A kind of natural language processing method and device
CN108763507A (en) 2018-05-30 2018-11-06 北京百度网讯科技有限公司 Enterprise's incidence relation method for digging and device
CN108959575B (en) * 2018-07-06 2019-09-24 北京神州泰岳软件股份有限公司 A kind of enterprise's incidence relation information mining method and device
CN109543002B (en) * 2018-10-19 2020-12-11 中南民族大学 Method, device and equipment for restoring abbreviated characters and storage medium
CN109902148B (en) * 2019-02-21 2023-05-26 陈包容 Automatic enterprise name completion method for address book contacts
CN111126052B (en) * 2019-12-26 2023-11-03 鼎富智能科技有限公司 Function point generation method, device, electronic equipment and computer readable storage medium
CN111369294B (en) * 2020-03-06 2023-06-23 中国铁塔股份有限公司 Software cost estimation method and device
CN111507088B (en) * 2020-04-15 2022-12-16 深圳前海微众银行股份有限公司 Sentence completion method, equipment and readable storage medium
CN111783467A (en) * 2020-07-21 2020-10-16 致诚阿福技术发展(北京)有限公司 Enterprise name identification method and device
CN112836919A (en) * 2020-11-30 2021-05-25 广东电网有限责任公司 Supplier association analysis method and device based on knowledge graph
CN113553360A (en) * 2021-07-30 2021-10-26 北京金堤征信服务有限公司 Multi-enterprise relationship analysis method, device, electronic equipment, storage medium and computer program
CN116127009A (en) * 2022-11-17 2023-05-16 上海倍通医药科技咨询有限公司 Enterprise information matching system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699645A (en) * 2013-12-26 2014-04-02 中国人民银行征信中心 System and method for identifying association relations among enterprises
CN105138652A (en) * 2015-08-28 2015-12-09 山东合天智汇信息技术有限公司 Enterprise association recognition method and system
CN105718444A (en) * 2016-01-26 2016-06-29 中国人民解放军国防科学技术大学 Financial concept and corresponding stock associating method based on news corpora and device thereof
CN106598999A (en) * 2015-10-19 2017-04-26 北京国双科技有限公司 Method and device for calculating text theme membership degree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226854A1 (en) * 2005-12-12 2013-08-29 Qin Zhang Search Methods and Various Applications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699645A (en) * 2013-12-26 2014-04-02 中国人民银行征信中心 System and method for identifying association relations among enterprises
CN105138652A (en) * 2015-08-28 2015-12-09 山东合天智汇信息技术有限公司 Enterprise association recognition method and system
CN106598999A (en) * 2015-10-19 2017-04-26 北京国双科技有限公司 Method and device for calculating text theme membership degree
CN105718444A (en) * 2016-01-26 2016-06-29 中国人民解放军国防科学技术大学 Financial concept and corresponding stock associating method based on news corpora and device thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
企业知识库关键技术的研究与应用;左玲;《中国优秀硕士学位论文全文数据库信息科技辑》;20160115;第I138-951页 *

Also Published As

Publication number Publication date
CN107247707A (en) 2017-10-13

Similar Documents

Publication Publication Date Title
CN107247707B (en) Enterprise association relation information extraction method and device based on completion strategy
US10049100B2 (en) Financial event and relationship extraction
US9864741B2 (en) Automated collective term and phrase index
RU2571373C2 (en) Method of analysing text data tonality
US9524291B2 (en) Visual display of semantic information
US10762293B2 (en) Using parts-of-speech tagging and named entity recognition for spelling correction
CN106462604B (en) Identifying query intent
US9639522B2 (en) Methods and apparatus related to determining edit rules for rewriting phrases
US9934220B2 (en) Content revision using question and answer generation
KR101495240B1 (en) Method and system for statistical context-sensitive spelling correction using confusion set
US10592236B2 (en) Documentation for version history
US11593557B2 (en) Domain-specific grammar correction system, server and method for academic text
US10606903B2 (en) Multi-dimensional query based extraction of polarity-aware content
US20190303522A1 (en) Document implementation tool for pcb refinement
GB2555207A (en) System and method for identifying passages in electronic documents
US20140149106A1 (en) Categorization Based on Word Distance
US20240028650A1 (en) Method, apparatus, and computer-readable medium for determining a data domain associated with data
US20190303437A1 (en) Status reporting with natural language processing risk assessment
US10509812B2 (en) Reducing translation volume and ensuring consistent text strings in software development
CN107908792B (en) Information pushing method and device
CN109992651A (en) A kind of problem target signature automatic identification and abstracting method
US20190303521A1 (en) Document implementation tool for pcb refinement
CN112733517B (en) Method for checking requirement template conformity, electronic equipment and storage medium
CN113919352A (en) Database sensitive data identification method and device
Nguyen-Son et al. Identifying adversarial sentences by analyzing text complexity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190905

Address after: 100089 Unit 6, Floor 3, 25 Shangdi East Road, Haidian District, Beijing

Applicant after: China Science and Technology (Beijing) Co., Ltd.

Address before: Room 601, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Applicant before: Beijing Shenzhou Taiyue Software Co., Ltd.

Applicant before: China Science and Technology (Beijing) Co., Ltd.

TA01 Transfer of patent application right
CB02 Change of applicant information

Address after: 230000 zone B, 19th floor, building A1, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Applicant after: Dingfu Intelligent Technology Co., Ltd

Address before: 100089 Haidian District East Road, No. three, floor 6, unit 25,

Applicant before: DINFO (BEIJING) SCIENCE DEVELOPMENT Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant