CN111539806A - Method and related device for structuring announcement content - Google Patents

Method and related device for structuring announcement content Download PDF

Info

Publication number
CN111539806A
CN111539806A CN202010290894.8A CN202010290894A CN111539806A CN 111539806 A CN111539806 A CN 111539806A CN 202010290894 A CN202010290894 A CN 202010290894A CN 111539806 A CN111539806 A CN 111539806A
Authority
CN
China
Prior art keywords
information
name information
model
content
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010290894.8A
Other languages
Chinese (zh)
Inventor
席丽娜
晋耀红
刘大双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dingfu Intelligent Technology Co Ltd
Original Assignee
Dingfu Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dingfu Intelligent Technology Co Ltd filed Critical Dingfu Intelligent Technology Co Ltd
Priority to CN202010290894.8A priority Critical patent/CN111539806A/en
Publication of CN111539806A publication Critical patent/CN111539806A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Abstract

The embodiment of the invention provides a method and a related device for structuring announcement content, which are used for automatically identifying effective information contained in enterprise announcements and outputting the effective information in a structured manner. The method provided by the embodiment of the invention comprises the following steps: the method comprises the steps of obtaining announcement content, identifying first class name information, second class name information and third class name information in the announcement content, marking position information, processing the announcement content according to a preset rule, processing the first class name information, the second class name information and the third class name information in the announcement content for assimilation, inputting the processed announcement content into a pre-trained first model and a pre-trained second model, obtaining entity information output by the first model and relation information output by the second model, and aggregating and outputting the entity information and the relation information according to the preset rule. The effective information in the announcement content is automatically identified and structurally output by carrying out data processing on the announcement content and inputting the announcement content after the data processing into a pre-trained model.

Description

Method and related device for structuring announcement content
Technical Field
The present invention relates to the field of word processing, and in particular, to a method and a related apparatus for structuring advertisement content.
Background
Banks and some investment institutions often need to monitor risks of companies to be invested or invested companies, generally, business information of a target company is acquired through website bulletins of the target company, and then the bulletins are sorted by special operators.
Disclosure of Invention
The embodiment of the invention provides a method and a related device for structuring announcement content, which are used for automatically identifying effective information contained in enterprise announcements and outputting the effective information in a structured manner.
The first aspect of the present invention provides a method for structuring advertisement content, including:
acquiring announcement content;
identifying first type name information, second type name information and third type name information in the announcement content, and marking position information of the first type name information, the second type name information and the third type name information, wherein the position information is used for mapping the first type name information, the second type name information and the third type name information;
processing the announcement content according to the mapping, wherein the processing is used for assimilating first class name information, second class name information and third class name information in the announcement content;
inputting the processed announcement content into a pre-trained first model and a second model, wherein the first model is used for analyzing the announcement content to obtain entity information, and the second model is used for analyzing the announcement content to obtain relationship information;
and acquiring entity information output by the first model and relationship information output by the second model, and aggregating and outputting the entity information and the relationship information according to a preset rule.
Optionally, the identifying the first category name information in the advertisement content includes:
and inputting the announcement content into a pre-trained third model, wherein the second model is used for naming and identifying so as to obtain the first class name information.
Optionally, the identifying the second class name information and the third class name information in the advertisement content includes:
inputting the announcement content into a pre-trained fourth model, wherein the fourth model is used for extracting the announcement content according to rules so as to obtain the second type name information and the third type name information.
Optionally, processing the advertisement content according to a preset rule includes:
replacing the second type name information with the first type name information according to the mapping;
and deleting the third type name information in the announcement content.
Optionally, after the replacing the second type of name information with the first type of name information and deleting the third type of name information in the advertisement content, processing the advertisement content according to a preset rule further includes:
and inputting the announcement content into a pre-trained fifth model, wherein the fourth model is used for processing a sentence pattern structure of the multi-party relation so as to remove interference type relation data.
Optionally, the analyzing the advertisement content by the second model to obtain the relationship information includes:
acquiring entity information output by the first model and building a relation frame according to the entity information;
and identifying the relation information contained in the announcement content according to a preset expression and the relation framework.
Optionally, the aggregating and outputting the entity information and the relationship information according to a preset rule includes:
judging whether the aggregated entity information and the relationship information have repeated content or not;
if so, only one item is output for the duplicate content.
Optionally, the aggregating and outputting the entity information and the relationship information according to a preset rule includes:
judging whether the relation information is verb part of speech;
if so, carrying out forward aggregation on the entity information and the relationship information and outputting;
if not, performing reverse aggregation on the entity information and the relationship information and outputting.
A second aspect of the present invention provides a system for advertising content structuring, comprising:
an acquisition unit configured to acquire the advertisement content;
the identification unit is used for identifying first-class name information, second-class name information and third-class name information in the announcement content and marking position information of the first-class name information, the second-class name information and the third-class name information, wherein the position information is used for mapping the first-class name information, the second-class name information and the third-class name information;
the processing unit is used for processing the announcement content according to the mapping, and the processing is used for assimilating first class name information, second class name information and third class name information in the announcement content;
the processing unit is further configured to input the processed advertisement content into a pre-trained first model and a second model, where the first model is used to analyze the advertisement content for entity information, and the second model is used to analyze the advertisement content for obtaining relationship information;
and the output unit is used for acquiring the entity information output by the first model and the relationship information output by the second model, and aggregating and outputting the entity information and the relationship information according to a preset rule.
A third aspect of an embodiment of the present invention provides a computer apparatus, including:
a processor, a memory, an input-output device, and a bus;
the processor, the memory and the input and output equipment are respectively connected with the bus;
the processor is configured to perform the method according to any of the preceding embodiments.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium having a computer program stored thereon, characterized in that: which when executed by a processor implements the steps of the method according to the previous embodiment.
According to the technical scheme, the embodiment of the invention has the following advantages: in this embodiment, a method for analyzing a content of a bulletin includes obtaining a content of the bulletin, identifying first class name information, second class name information, and third class name information in the content of the bulletin, and marking location information of the first class name information, the second class name information, and the third class name information, where the location information is used to map the first class name information, the second class name information, and the third class name information, and process the content of the bulletin according to the mapping, where the processing is used to assimilate the first class name information, the second class name information, and the third class name information in the content of the bulletin, and input the processed content of the bulletin into a pre-trained first model and a pre-trained second model, where the first model is used to analyze the content of the bulletin to obtain entity information, and the second model is used to analyze the content of the bulletin to obtain relationship information, and acquiring entity information output by the first model and relationship information output by the second model, and aggregating and outputting the entity information and the relationship information according to a preset rule. The effective information in the announcement content is automatically identified and structurally output by carrying out data processing on the announcement content and inputting the announcement content after the data processing into a pre-trained model.
Drawings
FIG. 1 is a diagram illustrating an embodiment of a method for advertising content structuring according to an embodiment of the present invention;
FIG. 2 is another diagram illustrating an embodiment of a method for advertising content structuring according to an embodiment of the present invention;
FIG. 3 is another diagram illustrating an embodiment of a method for advertising content structuring according to an embodiment of the present invention;
FIG. 4 is another diagram illustrating an embodiment of a method for advertising content structuring according to an embodiment of the present invention;
FIG. 5 is another diagram illustrating an embodiment of a method for advertising content structuring according to an embodiment of the present invention;
FIG. 6 is another diagram illustrating an embodiment of a method for advertising content structuring according to an embodiment of the present invention;
FIG. 7 is another diagram illustrating an embodiment of a method for advertising content structuring according to an embodiment of the present invention;
FIG. 8 is another diagram illustrating an embodiment of a method for advertising content structuring according to an embodiment of the present invention;
FIG. 9 is a diagram of an embodiment of a system for advertising content structuring in accordance with an embodiment of the present invention;
fig. 10 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The embodiment of the invention provides a method and a related device for structuring announcement content, which are used for automatically identifying effective information contained in enterprise announcements and outputting the effective information in a structured manner.
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a specific flow in the embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a method for structuring advertisement content in the embodiment of the present invention includes:
101. acquiring announcement content;
specifically, the invention is mainly applied to the situation that effective information needs to be obtained from the announcement content of the target website, for example, when a bank evaluates the risk level of a loaned company, a part of operations of the bank are to obtain the business situation of the company from the announcements of the company disclosed on the official website, so as to evaluate the risk level of the company.
102. Identifying first type name information, second type name information and third type name information in the announcement content, and marking position information of the first type name information, the second type name information and the third type name information, wherein the position information is used for mapping the first type name information, the second type name information and the third type name information;
in this embodiment, most announcements have more redundant contents, and the key information included in the announcement is only required to be extracted, so that the name information in the announcement can be processed in advance in order to prevent the effect of extracting the key information from being influenced by the fact that too many names exist in the announcement for the same company.
Specifically, the present invention divides a plurality of names that may exist in a bulletin for the same company into three types, where the first type of name information is full name information, the second type of name information is abbreviated name information, and the third type of name information is interference type abbreviated name information, for example, "XXX limited company" is full name information, "XXX" is abbreviated name information, and for the occurrence form of "XXX limited company (hereinafter referred to as" XXX ")", the content in parentheses is interference type abbreviated name information, and the content of interference type abbreviated name information is similar to the abbreviated name information, except that the abbreviated name information defined in the present invention appears independently, and the interference type abbreviated name information appears together with the full name information.
Specifically, the announcement content acquired from the target web page is input into a pre-trained naming recognition model, the naming recognition model can recognize all company name information in the announcement, the short-name information and the interference short-name information in all the name information are analyzed by using an information extraction model, the recognized full-name information, the short-name information and the interference short-name information are marked, the position information of the recognized full-name information, the short-name information and the interference short-name information in the announcement content is marked, the position information can be used for mapping the full-name information and the short-name information of the same company due to more than one company possibly appearing in the same announcement content, namely, the relation is established between different names of the same company in the announcement content, and the mapping relation is used for carrying out assimilation processing on the full-name information and the short-name information of the.
103. Processing the announcement content according to the mapping, wherein the processing is used for assimilating first class name information, second class name information and third class name information in the announcement content;
specifically, in order to eliminate the interference of different names on the extraction effect of the announcement content existing in the same company, the name information in the announcement content is processed, mainly the assimilation processing is carried out on different names of the same company, wherein the full name information is reserved, that is, the content identified as the full name information in the announcement content is not processed, the short name information is replaced by the corresponding full name information, that is, the content identified as abbreviation information in the announcement content is replaced by the corresponding full name information according to the mapping relation in the content, and the interference class abbreviation information is deleted, because the interference class name information of the same company is connected with the full name information, when name assimilation is carried out, only the interference class information needs to be deleted, and the announcement content after name information assimilation only leaves the full name information of a company, so that the difficulty in extracting the relation information is reduced.
104. Inputting the processed announcement content into a pre-trained first model and a second model, wherein the first model is used for analyzing the announcement content to obtain entity information, and the second model is used for analyzing the announcement content to obtain relationship information;
in this embodiment, the entity information is obtained by inputting the notice content assimilated by the enterprise name into the pre-trained classification model, and the relationship information between the entity information is obtained by inputting the notice content assimilated by the enterprise name into the pre-trained relationship model.
Specifically, an attribute analysis model of the announcement content is constructed, attribute labels corresponding to the content in the announcement are classified and divided into an entity class and a non-entity class, wherein the entity class information is organization and natural people in the invention, and other information possibly appearing in the enterprise announcement, such as registration date, capital, operation range and the like, is determined as non-entity class information. And constructing a classification model according to the difference between the entity type and the non-entity type so as to identify the analysis result of the attribute of the entity type label, wherein the classification rule is to judge whether the attribute value of the attribute label corresponds to the entity type or the non-entity type so as to distinguish the content in the notice into the entity type and the non-entity type. After the enterprise attribute identification result is obtained, the attribute label names with the result are automatically classified by using a classification model, and an enterprise attribute analysis result set of the specified entity type is screened out.
Specifically, for the extraction of the relationship information, a list of analyzed enterprise relationship names is predetermined, and a relationship model framework is built, wherein the relationship model framework can cover all effective relationship information between entity type information, such as holding shares, investment establishment, acquisition, subsidiary companies, parent companies, providing guarantees and the like. And then, establishing a relationship analysis rule expression aiming at the target relationship name respectively, and extracting relationship information which is in accordance with the expression mode from the announcement content by the relationship model through the rule expression.
105. And acquiring entity information output by the first model and relationship information output by the second model, and aggregating and outputting the entity information and the relationship information according to a preset rule.
Specifically, the operations of preprocessing, analyzing, extracting and the like of the announcement content are to express the announcement content in a more convenient and understandable form, so that the information about the business name and the natural person output by the first model and the relationship information output by the second model are aggregated, wherein the announcement content may only have the business name information or may have both the business name information and the natural person information, and then the aggregation output is performed according to the relationship information in the specific output process.
In this embodiment, a method for analyzing a content of a bulletin includes obtaining a content of the bulletin, identifying first class name information, second class name information, and third class name information in the content of the bulletin, and marking location information of the first class name information, the second class name information, and the third class name information, where the location information is used to map the first class name information, the second class name information, and the third class name information, and process the content of the bulletin according to the mapping, where the processing is used to assimilate the first class name information, the second class name information, and the third class name information in the content of the bulletin, and input the processed content of the bulletin into a pre-trained first model and a pre-trained second model, where the first model is used to analyze the content of the bulletin to obtain entity information, and the second model is used to analyze the content of the bulletin to obtain relationship information, and acquiring entity information output by the first model and relationship information output by the second model, and aggregating and outputting the entity information and the relationship information according to a preset rule. The effective information in the announcement content is automatically identified and structurally output by carrying out data processing on the announcement content and inputting the announcement content after the data processing into a pre-trained model.
In the embodiment of the present invention, based on the embodiment shown in fig. 1, further description is given to the acquisition of the first-type name information, specifically referring to fig. 2, another embodiment of a method for structuring a bulletin content includes:
201. and inputting the announcement content into a pre-trained third model, wherein the third model is used for naming and identifying so as to obtain the first class name information.
In the invention, the third model is a named entity recognition model, the named entity recognition is a subtask of information extraction, and aims to locate and classify named entities in the text into predefined categories, such as personnel, organizations, positions, time expressions, quantities, currency values, percentages and the like.
Specifically, the company name ending with the feature keyword of the full name suffix of the common enterprise such as "limited company" or "stock company" appearing first is acquired as the company full name information, because the company of the main body of the announcement necessarily appears first in the announcement content in the full name form.
In the embodiment of the present invention, based on the embodiment described in fig. 1, further description is given to the acquisition of the second type name information and the third type name information, and specifically referring to fig. 3, another embodiment of a method for structuring advertisement content includes:
301. inputting the announcement content into a pre-trained fourth model, wherein the fourth model is used for extracting the announcement content according to rules so as to obtain the second type name information and the third type name information.
In the invention, for the convenience of expression of the same company in the enterprise bulletin contents, the whole name is often used at the beginning, and the following form and content of the company name use short information, and the short information may also have important information for expressing the bulletin contents, for example, in a purchase bulletin of an enterprise, since the initial part of the bulletin has already mentioned the whole name, the short information may be adopted in the part for describing the purchase contents, and for the bulletin, the part for describing the purchase contents is just what the invention wants to collect, so a plurality of short information included in the bulletin can be obtained and processed.
Specifically, for the extraction of the second type of name information, which is abbreviated as information, the announcement content is input into the fourth model, i.e., the rule extraction model, so that the extracted content of all enterprise abbreviated as information in the announcement content and the position index in the text can be obtained, and the position index is stored.
Specifically, for the extraction of the third type name information, the third type name information is interference type abbreviation information, which refers to a case of adjacent full name information, such as "XXX limited (hereinafter, abbreviated as" XXX ")". The content in parentheses, namely the interference type short information, is input into the fourth model, namely the rule extraction model, so that the extraction content of all the interference type enterprise short information in the announcement content can be obtained.
When the interference information is stored, the interference information is further judged, and whether the character space between the interference information identified in the previous step and the currently identified interference information abbreviation information is 0-1 is identified to contain the information identified as the enterprise name or not is identified by using all the enterprise names identified in the previous step. When the rule extraction model identifies whether the target information is the interference class name information, whether the target information conforms to the form of the brackets is mainly judged, if enterprise full name information exists in the interval between 0 and 1 characters of the brackets, the interference class short name information which indicates that the current interference class short name information is paraphrasing enterprise names is subjected to full name-short name mapping with the identified enterprise names, and the mapping information is stored. Otherwise, the name of the non-enterprise is called paraphrase information for short, and the interference type information for short is not saved. Such as (hereinafter referred to as "this heavy asset reorganization") and (hereinafter referred to as "control enterprise").
In this embodiment of the present invention, based on the embodiment shown in fig. 1, for further introducing the assimilation of the first type name information, the second type name information, and the third type name information, referring to fig. 4 in particular, another embodiment of a method for structuring advertisement content includes:
401. and replacing the second type of name information with the first type of name information according to the mapping, and deleting the third type of name information in the announcement content.
In the invention, in order to reduce the interference of multiple names of the same enterprise on information extraction, the information of the multiple names is assimilated.
Specifically, the full-name information is reserved, namely the content identified as the full-name information in the announcement content is not processed, the short-name information having a mapping relation with the full-name information is replaced by the full-name information, namely the content identified as the short-name information in the announcement content is replaced by the corresponding full-name information according to the mapping relation in the content, and the interference short-name information is deleted.
In the embodiment of the present invention, based on the embodiment shown in fig. 1, further description is made on the processing of the possibly existing interference-type relationship data, specifically referring to fig. 5, another embodiment of a method for advertising content structuring includes
501. And inputting the announcement content into a pre-trained fifth model, wherein the fourth model is used for processing a sentence pattern structure of the multi-party relation so as to remove interference type relation data.
Specifically, the announcement content is input into the fifth model, namely, the sentence pattern structure of the multi-party relation is preprocessed by utilizing the pre-constructed data cleaning model, and text information influencing the correlation analysis result is cleaned. For example, "a × collects C by B", "a collects C by a protocol signed with B", and the like. The text information of the foregoing exemplary to-be-cleaned relationship structure is first identified, and then the information of the interference relationship analysis, such as "by B", "B-held", "signed with B", is extracted and replaced with an empty character string, so that the above contents are replaced with "a × acquisition C", "a acquisition × C", "a × acquisition C" and "a × acquisition C" agreements.
In the embodiment of the present invention, based on the embodiment described in fig. 1, with specific reference to fig. 6, for further introducing the relationship extraction from the preprocessed advertisement content, another embodiment of a method for structuring advertisement content includes:
601. acquiring entity information output by the first model and building a relation frame according to the entity information;
specifically, the entity information acquired by the first model is explained, the attribute tags of the advertisement content are classified in the process of inputting the preprocessed advertisement content into the first model, namely the enterprise attribute analysis model, and the attribute tags of the advertisement content can be divided into two types, namely an entity type and a non-entity type. And constructing a classification model by using the attribute tags to realize the analysis result of identifying the attribute of the entity class tag from the attribute analysis result, wherein the classification rule of the classification model is to judge whether the attribute value corresponding to the notice content is the entity class or the non-entity class and output the content of which the attribute value is the entity class.
Further, the second model, i.e. the association model, establishes a frame related to the entity class information through the entity class information output by the first model, and the relationship frame can cover all effective relationship information between the entity class information, such as holding shares, investment setup, acquisition, subsidiary companies, parent companies, providing guarantees, and the like. And then, establishing a relationship analysis rule expression aiming at the target relationship name respectively, and extracting relationship information which is in accordance with the expression mode from the announcement content by the relationship model through the rule expression.
602. And identifying the relation information contained in the announcement content according to a preset expression and the relation framework.
Specifically, the foremost and rearmost of the relationship information are entity class information, and the relationship information may also be understood as a relationship between an entity and an entity, and in the extraction of the relationship information, the business name and the natural person name related to the business are dealt with in front and behind. Two ways are supported in the model to design the identification rules of the two types of entity class information.
One is to provide two kinds of entity operators of [ nt ] organization name and [ nr ] natural person name, and the entity operator provided by the model is to automatically identify the entity information of nr and nt in the natural language text through a pre-trained named entity identification model.
The second is the way of semantic concepts. The relationship information between entity classes contained in the announcement content is analyzed by accumulating and constructing the organization name list in the concept resources.
In the embodiment of the present invention, based on the embodiment described in fig. 1, with reference to fig. 7 for further introducing the aggregation output of the entity information and the relationship information according to the preset rule, another embodiment of a method for structuring advertisement content includes:
701. judging whether the aggregated entity information and the relationship information have repeated content or not;
specifically, the relationship information output by the relationship model is generated according to the relationship information of the occurrence of the advertisement content, and if the occurrence frequency of the same relationship information in the advertisement content is greater than 1, the relationship information output by the relationship model may have a duplicate, and at this time, the aggregated entity information and relationship information have duplicate content.
702. If so, only one item is output for the duplicate content.
Specifically, if the same relationship information is referred to in the announcement content for multiple times, the relationship information output by the relationship model may have a repeated condition, and the aggregated entity information and relationship information have a repeated content, and for the condition that the number of times of occurrence of the same content is greater than 1, the invention adopts a method of outputting only one item, so as to prevent the repeated content from influencing the effect of extracting information by a user.
In the embodiment of the present invention, based on the embodiment described in fig. 1, for further introducing the aggregation condition of the entity information and the relationship information, specifically referring to fig. 8, another embodiment of a method for advertising content structuring includes:
801. and judging whether the relation information is verb part of speech, if so, carrying out forward aggregation on the entity information and the relation information and outputting the entity information and the relation information, and if not, carrying out backward aggregation on the entity information and the relation information and outputting the entity information and the relation information.
Specifically, the entity information and the relationship information are aggregated by adopting a triple structure relationship rule, wherein the structure of the triple structure rule is XXX + XXX + XXX, and the triple structure rule is in a form of three element combinations, and the front and the back are entity type information.
If no sentence pattern structural operator is set, the relation is analyzed directly according to the design sequence of the previous triple expressions, and the triple information of the position relation is identified. If A acquires B, the direction of the relationship is also identified according to the position sequence, and the expression form of the direction of the relationship is set, such as A-acquisition-B, the expression form can be correctly understood, but if the shareholder of A is B, the expression form of the direction of the relationship is set, such as A-shareholder-B, the expression form of the direction of the relationship may be incorrectly understood. This is because, from the perspective of Chinese grammar analysis, the part-of-speech of the relationship may affect the analysis result of the relationship orientation, and the relationship of verb property, the relationship orientation is forward, that is, the relationship orientation attribute value is set to be forward according to the sequence of the appearance positions of the relationship and the two associated entities, and if the relationship is noun property, the relationship orientation attribute value is set to be reverse, so that a "organ" for setting the relationship part-of-speech is built in the sentence pattern structure operator of the relationship model, where a value of 0 indicates that the nominal part-of-speech relationship is being analyzed, and a value of 1 indicates that the dynamic part-of.
The method in the embodiment of the present invention is introduced above, and the embodiment of the present invention is described below from the perspective of a virtual device.
Referring to fig. 9, an embodiment of a system for structuring advertisement content according to an embodiment of the present invention includes:
an acquisition unit 901 configured to acquire advertisement content;
an identifying unit 902, configured to identify first type name information, second type name information, and third type name information in the advertisement content, and mark location information of the first type name information, the second type name information, and the third type name information, where the location information is used to map the first type name information, the second type name information, and the third type name information;
a processing unit 903, configured to process the advertisement content according to the mapping, where the processing is used to assimilate first type name information, second type name information, and third type name information in the advertisement content;
the processing unit 903 is further configured to input the processed advertisement content into a pre-trained first model and a second model, where the first model is used to analyze the advertisement content for entity information, and the second model is used to analyze the advertisement content for obtaining relationship information;
an output unit 904, configured to obtain entity information output by the first model and relationship information output by the second model, and aggregate and output the entity information and the relationship information according to a preset rule.
In this embodiment, the device includes an obtaining unit 901 configured to obtain advertisement content, an identifying unit 902 configured to identify first class name information, second class name information, and third class name information in the advertisement content, a processing unit 903 configured to process the advertisement content according to a preset rule, where the processing is configured to assimilate the first class name information, the second class name information, and the third class name information in the advertisement content, the processing unit 903 is further configured to input the processed advertisement content into a pre-trained first model and a pre-trained second model, where the first model is configured to analyze the advertisement content to obtain entity information, the second model is configured to analyze the advertisement content to obtain relationship information, and an output unit 904 configured to obtain the entity information output by the first model and the relationship information output by the second model, and aggregating and outputting the entity information and the relationship information according to a preset rule.
Referring to fig. 10, a computer device according to an embodiment of the present invention is described below from the perspective of a physical device, where an embodiment of the computer device according to the present invention includes:
the computing device 1000 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1001 (e.g., one or more processors) and a memory 1005, where the memory 1005 stores one or more applications or data.
The memory 1005 may be volatile memory or persistent storage, among others. The program stored in the memory 1005 may include one or more modules, each of which may include a series of instructions for the server to operate on. Still further, the central processing unit 1001 may be configured to communicate with the memory 1005, and execute a series of instruction operations in the memory 1005 on the smart terminal 1000.
The computer device 1000 may also include one or more power supplies 1002, one or more wired or wireless network interfaces 1003, one or more input-output interfaces 1004, and/or one or more operating systems, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
It should be understood that, in various embodiments of the present invention, the sequence numbers of the above steps do not mean the execution sequence, and the execution sequence of each step should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for advertising content structuring, comprising:
acquiring announcement content;
identifying first type name information, second type name information and third type name information in the announcement content, and marking position information of the first type name information, the second type name information and the third type name information, wherein the position information is used for mapping the first type name information, the second type name information and the third type name information;
processing the announcement content according to the mapping, wherein the processing is used for assimilating first class name information, second class name information and third class name information in the announcement content;
inputting the processed announcement content into a pre-trained first model and a second model, wherein the first model is used for analyzing the announcement content to obtain entity information, and the second model is used for analyzing the announcement content to obtain relationship information;
and acquiring entity information output by the first model and relationship information output by the second model, and aggregating and outputting the entity information and the relationship information according to a preset rule.
2. The method of claim 1, wherein the identifying the first type name information in the advertisement content comprises:
and inputting the announcement content into a pre-trained third model, wherein the third model is used for named entity recognition to obtain the first class name information.
3. The method of claim 1, wherein identifying second and third class name information in the advertising content comprises:
inputting the announcement content into a pre-trained fourth model, wherein the fourth model is used for extracting the announcement content according to rules so as to obtain the second type name information and the third type name information.
4. The method of claim 1, wherein processing the announcement content according to a preset rule comprises:
replacing the second type name information with the first type name information according to the mapping;
and deleting the third type name information in the announcement content.
5. The method according to claim 4, wherein after the replacing the second type name information with the first type name information and the deleting the third type name information in the advertisement content, processing the advertisement content according to a preset rule further comprises:
and inputting the announcement content into a pre-trained fifth model, wherein the fifth model is used for processing a sentence pattern structure of the multi-party relation so as to remove interference type relation data.
6. The method of claim 1, wherein the second model analyzing the advertisement content for relationship information comprises:
acquiring entity information output by the first model and building a relation frame according to the entity information;
and identifying the relation information contained in the announcement content according to a preset expression and the relation framework.
7. The method according to claim 1, wherein before aggregating and outputting the entity information and the relationship information according to a preset rule, the method comprises:
judging whether the aggregated entity information and the relationship information have repeated content or not;
if so, only one item is output for the duplicate content.
8. The method of claim 1, wherein aggregating and outputting the entity information and the relationship information according to a preset rule comprises:
judging whether the relation information is verb part of speech;
if so, carrying out forward aggregation on the entity information and the relationship information and outputting;
if not, performing reverse aggregation on the entity information and the relationship information and outputting.
9. A system for advertising content structuring, comprising
An acquisition unit configured to acquire the advertisement content;
the identification unit is used for identifying first-class name information, second-class name information and third-class name information in the announcement content and marking position information of the first-class name information, the second-class name information and the third-class name information, wherein the position information is used for mapping the first-class name information, the second-class name information and the third-class name information;
the processing unit is used for processing the announcement content according to the mapping, and the processing is used for assimilating first class name information, second class name information and third class name information in the announcement content;
the processing unit is further configured to input the processed advertisement content into a pre-trained first model and a second model, where the first model is used to analyze the advertisement content for entity information, and the second model is used to analyze the advertisement content for obtaining relationship information;
and the output unit is used for acquiring the entity information output by the first model and the relationship information output by the second model, and aggregating and outputting the entity information and the relationship information according to a preset rule.
10. A computer device, comprising:
a processor, a memory, an input-output device, and a bus;
the processor, the memory and the input and output equipment are respectively connected with the bus;
the processor is configured to perform the method of any one of claims 1 to 8.
CN202010290894.8A 2020-04-14 2020-04-14 Method and related device for structuring announcement content Pending CN111539806A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010290894.8A CN111539806A (en) 2020-04-14 2020-04-14 Method and related device for structuring announcement content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010290894.8A CN111539806A (en) 2020-04-14 2020-04-14 Method and related device for structuring announcement content

Publications (1)

Publication Number Publication Date
CN111539806A true CN111539806A (en) 2020-08-14

Family

ID=71979917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010290894.8A Pending CN111539806A (en) 2020-04-14 2020-04-14 Method and related device for structuring announcement content

Country Status (1)

Country Link
CN (1) CN111539806A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403375A (en) * 2017-04-19 2017-11-28 北京文因互联科技有限公司 A kind of listed company's bulletin classification and abstraction generating method based on deep learning
US10146751B1 (en) * 2014-12-31 2018-12-04 Guangsheng Zhang Methods for information extraction, search, and structured representation of text data
CN109766552A (en) * 2019-01-08 2019-05-17 安徽省泰岳祥升软件有限公司 A kind of reference resolution method and device based on notice information
CN110990525A (en) * 2019-11-15 2020-04-10 华融融通(北京)科技有限公司 Natural language processing-based public opinion information extraction and knowledge base generation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10146751B1 (en) * 2014-12-31 2018-12-04 Guangsheng Zhang Methods for information extraction, search, and structured representation of text data
CN107403375A (en) * 2017-04-19 2017-11-28 北京文因互联科技有限公司 A kind of listed company's bulletin classification and abstraction generating method based on deep learning
CN109766552A (en) * 2019-01-08 2019-05-17 安徽省泰岳祥升软件有限公司 A kind of reference resolution method and device based on notice information
CN110990525A (en) * 2019-11-15 2020-04-10 华融融通(北京)科技有限公司 Natural language processing-based public opinion information extraction and knowledge base generation method

Similar Documents

Publication Publication Date Title
CN109635117B (en) Method and device for recognizing user intention based on knowledge graph
CN105138652B (en) A kind of enterprise's incidence relation recognition methods and system
CN107784051B (en) Online customer service response system and method
CN108595583A (en) Dynamic chart class page data crawling method, device, terminal and storage medium
CN108549723B (en) Text concept classification method and device and server
CN111259160A (en) Knowledge graph construction method, device, equipment and storage medium
CN110008463A (en) Method, apparatus and computer-readable medium for event extraction
CN111143505A (en) Document processing method, device, medium and electronic equipment
CN110795697A (en) Logic expression obtaining method and device, storage medium and electronic device
CN109766552B (en) Announcement information-based reference resolution method and device
CN111428503A (en) Method and device for identifying and processing same-name person
KR102001375B1 (en) Apparatus and Method for DistinguishingSpam in Financial News
EP2916238A1 (en) Corpus generating device, corpus generating method, and corpus generating program
CN111639250A (en) Enterprise description information acquisition method and device, electronic equipment and storage medium
CN111539806A (en) Method and related device for structuring announcement content
CN109683727A (en) A kind of data processing method and device
CN109165295A (en) A kind of intelligence resume appraisal procedure
CN112328812B (en) Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment
CN110941638A (en) Application classification rule base construction method, application classification method and device
CN112767933B (en) Voice interaction method, device, equipment and medium of highway maintenance management system
CN112561714B (en) Nuclear protection risk prediction method and device based on NLP technology and related equipment
Talahaturuson et al. Exploring Indonesian Netizen's Emotional Behavior Through Investment Sentiment Analysis Using TextBlob-NLTK (Natural Language Toolkit)
CN109597879B (en) Service behavior relation extraction method and device based on 'citation relation' data
CN112818215A (en) Product data processing method, device, equipment and storage medium
US20140236940A1 (en) System and method for organizing search results

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination