CN115357688B

CN115357688B - Enterprise list information acquisition method and device, storage medium and electronic equipment

Info

Publication number: CN115357688B
Application number: CN202211248926.3A
Authority: CN
Inventors: 李凯
Original assignee: Beijing Jindi Technology Co Ltd
Current assignee: Beijing Jindi Technology Co Ltd
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2023-02-21
Anticipated expiration: 2042-10-12
Also published as: CN115357688A

Abstract

The invention provides a method and a device for acquiring enterprise list information, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring a title of public opinion data, judging whether preset keyword information exists in the title, and if so, preprocessing a public opinion text of the public opinion data; acquiring target data according to the title and the preprocessed public opinion text, wherein the target data comprises: one or more of an enterprise listed in the negative enterprise list, a list issuing organization, listing time and a penalty type; and taking the acquired target data as enterprise list information. By extracting the enterprise list information in the public opinion data by using the steps, invalid information can be effectively filtered, and four-dimensional information of the enterprise list key is obtained: the time, the mechanism, the company and the type are directly provided for the user with structured information, so that the reading cost of the user for complicated public sentiment data is reduced.

Description

Enterprise list information acquisition method and device, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of computers, in particular to an enterprise list information acquisition method, an enterprise list information acquisition device, a storage medium, electronic equipment and a computer program product.

Background

At present, a great number of news opinions are generated every day, and many of the news opinions are related to blacklists. Many users want to obtain public sentiments related to the blacklist from a large amount of public sentiment information, and meanwhile, the blacklist listing time, a publishing organ, a blacklist enterprise, punishment types and the like can be rapidly known. The information can help the user to quickly find out important dynamic states of related enterprises and can assist the user to make decisions in time. However, due to the huge number of news opinions and the complex content, the user cannot directly obtain the relevant information from the news opinions.

Most of the prior art directly obtains relevant information of the blacklist by manually constructing an extraction rule or manually browsing and screening, and the accuracy of an extraction result is low because the public sentiment format is complex and all situations cannot be covered by simply adopting the extraction rule; manual browsing and screening is labor-intensive and inefficient.

Therefore, how to obtain public opinion information of a blacklist in news public opinions is a technical problem to be solved.

Disclosure of Invention

Based on this, the invention provides a method, a device, a storage medium, an electronic device and a computer program product for acquiring enterprise list information, aiming at the problem that the prior art can not effectively acquire public opinion information related to an enterprise list in news public opinions.

In a first aspect, an embodiment of the present invention provides a method for acquiring enterprise list information, where the method includes:

acquiring a title of public opinion data, judging whether preset keyword information exists in the title, and if so, preprocessing a public opinion text of the public opinion data;

acquiring target data according to the title and the preprocessed public opinion text, wherein the target data comprises: one or more of an enterprise listed in the negative enterprise list, a list issuing organization, listing time and a penalty type;

and taking the acquired target data as enterprise list information.

Optionally, the preset keyword information includes preset negative keyword information corresponding to a negative enterprise list and preset positive keyword information corresponding to a positive enterprise list,

the obtaining of the target data according to the title and the preprocessed public opinion text comprises:

traversing each sentence in the preprocessed public sentiment text, and extracting a first type of target sentence which contains negative keyword information and does not contain positive keyword information;

and extracting business entity information from the first type of target statements, and determining the businesses listed in the negative business list based on the extracted business entity information.

Optionally, if the first type of target statement cannot be extracted, or the entity information of the enterprise cannot be extracted from the first type of target statement, the following steps are performed:

respectively determining positive keyword information, negative keyword information and positioning information of an enterprise entity in the preprocessed public opinion text;

sequencing the positive keyword information, the negative keyword information and the positioning information of the enterprise entities from front to back to obtain a target array;

and determining the enterprises listed in the negative enterprise list according to the relative positions of the positioning information of the negative keyword information in the target array and the behavior information of the enterprise entity.

Optionally, in the pre-processed public opinion body, positive keyword information, negative keyword information, and positioning information of an enterprise entity are respectively determined, and the positive keyword information, the negative keyword information, and the respective positioning information of the enterprise entity are sorted from front to back to obtain a target array, including:

respectively determining first type positioning information of the first character of the positive keyword and the first character of the negative keyword in the preprocessed public opinion text, and sequencing all the first type positioning information from front to back to obtain an initial array;

carrying out entity identification on the preprocessed public sentiment text to obtain an enterprise entity contained in the public sentiment text;

respectively determining second type positioning information of the first character of each enterprise entity in the preprocessed public opinion text;

and inserting the second type of positioning information into the initial array, and sequencing all positioning information in the initial array from front to back to obtain a target array.

Optionally, the obtaining of target data according to the title and the preprocessed public opinion text includes:

and in a first preset number of characters at the beginning and a second preset number of characters at the end of the public opinion text, extracting a date entity and an organization entity in the first preset number of characters and the second preset number of characters by using a named entity identification mode to respectively serve as the listing time and the list publishing organization.

inputting the title and a third preset number of characters at the beginning of the public sentiment text into a classification model trained in advance, and determining the punishment type in the target data based on an output result of the classification model.

Optionally, if the named entity identification method fails to extract the date entity and the institution entity, the following steps are performed:

extracting a source data address of the public opinion data from the public opinion data;

and acquiring an original text from the source data address, and extracting a date entity and an organization entity from the original text to be respectively used as the listing time and the listing issuing organization.

Optionally, the extracting a source data address of the public opinion data from the public opinion data includes:

extracting all links in the public opinion data according to a first preset matching rule;

traversing the extracted links, and analyzing the link data title of the link data corresponding to each link;

screening the link data titles with the number larger than or equal to the number of the preset title characters as first similar titles;

acquiring a title of the public opinion data, screening a second similar title from the first similar title according to the title of the public opinion data and the first similar title, and taking corresponding link data as similar data;

acquiring the text content of the public opinion data and the text content of similar data, calculating the text similarity between the text content of the public opinion data and the text content of the similar data, and taking the address of the similar data with the highest text similarity as the source data address of the public opinion data.

Optionally, if the link does not exist in the public opinion data, or the similar data does not exist in the link data, the method further includes:

if the public opinion data does not exist, determining that the source data address does not exist in the public opinion data;

or if the public opinion data does not exist, the title of the public opinion data is forwarded to a target search engine for search operation, and at least one search result link is obtained;

and acquiring text content of link data corresponding to the search result link, calculating text similarity between the text content of the public opinion data and the text content of the link data, and taking the address of the link data with the highest text similarity as the source data address of the public opinion data.

In a second aspect, an embodiment of the present invention provides an apparatus for obtaining enterprise list information, where the apparatus includes:

the pre-processing module is used for acquiring a title of public opinion data, judging whether preset keyword information exists in the title, and if so, pre-processing the public opinion text of the public opinion data;

the target data acquisition module is used for acquiring target data according to the title and the preprocessed public sentiment text, and the target data comprises: one or more of an enterprise listed in the negative enterprise list, a list issuing organization, listing time and a penalty type;

and the enterprise list information acquisition module is used for taking the acquired target data as enterprise list information.

the target data acquisition module is used for acquiring the target data according to the title and the preprocessed public sentiment text, and is specifically used for:

Optionally, the target data obtaining module, when the first type of target statement is not extracted or the enterprise entity information is not extracted from the first type of target statement, is specifically configured to:

and determining the enterprises listed in the negative enterprise list according to the relative positions of the positioning information of the negative keyword information in the target array and the positioning information of the enterprise entity.

Optionally, the target data obtaining module is configured to determine, in the preprocessed public opinion context, positive keyword information, negative keyword information, and location information of the enterprise entity, sort the positive keyword information, the negative keyword information, and the location information of the enterprise entity from front to back, and when a target array is obtained, specifically configured to:

respectively determining first type positioning information of the first character of the positive keyword and the first character of the negative keyword in the pre-processed public opinion text, and sequencing the first type positioning information from front to back to obtain an initial array;

and inserting the second type of positioning information into the initial array, and sequencing all the positioning information in the initial array from front to back to obtain a target array.

Optionally, the target data obtaining module is configured to obtain the target data according to the title and the preprocessed public opinion text, and specifically is configured to:

and extracting a date entity and a mechanism entity in a first preset number of characters at the beginning and a second preset number of characters at the end of the public opinion text in a named entity identification mode to respectively serve as the listing time and the list publishing mechanism.

Optionally, the target data obtaining module is specifically configured to, when obtaining the target data according to the title and the preprocessed public opinion text:

In a third aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, where the computer program is used to execute the steps of the method.

In a fourth aspect, an embodiment of the present invention provides a computer program product, which includes a computer program that, when executed by a processor, implements the steps of the above-described method.

The invention provides a method, a device, a storage medium and electronic equipment for acquiring enterprise list information, wherein the method comprises the following steps: acquiring a title of public opinion data, judging whether preset keyword information exists in the title, and if so, preprocessing a public opinion text of the public opinion data; acquiring target data according to the title and the preprocessed public opinion text, wherein the target data comprises: one or more of an enterprise listed in the negative enterprise list, a list issuing organization, listing time and a penalty type; and taking the obtained target data as enterprise list information. By extracting the enterprise list information in the public opinion data by using the steps, invalid information can be effectively filtered, and the key four-dimensional information of the enterprise list is obtained: time, mechanism, company and type are directly provided for the user with structured information, and the reading cost of the user for complicated public opinion data is reduced. Meanwhile, the scheme basically adopts a regular expression and model extraction combined mode to extract information, so that higher processing performance is guaranteed, and for mass public opinion data, the method can work faster, and saves manpower and material resources.

Drawings

Exemplary embodiments of the present invention may be more completely understood in consideration of the following drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a flowchart of an enterprise list information obtaining method according to an exemplary embodiment of the present invention;

fig. 2 is a schematic structural diagram of an apparatus of a method for obtaining enterprise list information according to an exemplary embodiment of the present invention;

FIG. 3 illustrates a schematic diagram of an electronic device provided by an exemplary embodiment of the invention;

fig. 4 is a schematic diagram of a computer-readable medium according to an exemplary embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the present invention belongs.

In addition, the terms "first" and "second", etc. are used to distinguish different objects, and are not used to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The embodiment of the invention provides a method and a device for acquiring enterprise list information, a storage medium and electronic equipment, which are described in the following with reference to the attached drawings.

Fig. 1 is a flowchart of a method for obtaining enterprise list information according to an exemplary embodiment of the present invention, where as shown in fig. 1, the method includes the following steps:

step S101: and acquiring a title of the public opinion data, judging whether preset keyword information exists in the title, and if so, preprocessing the public opinion text of the public opinion data.

After the public opinion data is obtained, whether preset keyword information exists in the title of the public opinion data is judged according to the obtained public opinion data. The preset keyword information is a preselected keyword, and comprises preset negative keyword information corresponding to the negative enterprise list and preset positive keyword information corresponding to the positive enterprise list. The negative keyword information may be, for example, keywords such as "black list", and the positive keywords may be, for example, keywords such as "red list", "red list". The following description will be given by taking the keyword "blacklist" as an example.

Since the title of the public opinion data of the enterprise list type often has keyword information such as "black list", the public opinion data is divided into two types, namely the public opinion data which may contain the enterprise list information and the public opinion data which does not contain the enterprise list information, by judging whether the title of the public opinion data has the keyword information such as "black list".

The method can accurately identify the public opinion data containing the enterprise list information by preliminarily judging the title of the public opinion data aiming at the public opinion data possibly containing the enterprise information.

Meanwhile, after the keyword information blacklist is judged to be contained in the title of the public sentiment data, subsequent operation is carried out on the public sentiment data; if the title of the public opinion data does not contain the keyword information of the blacklist, the public opinion data is removed, and subsequent operations are not executed on the public opinion data.

After the title of the public opinion data is judged to contain the keyword information of the blacklist, the public opinion text of the public opinion data is obtained, and the obtained public opinion text is preprocessed.

Since the public opinion text is html format data, a preprocessing step is firstly carried out on the public opinion data before a subsequent step is carried out.

Specifically, the pretreatment step comprises: the method comprises the steps of replacing html labels in html format data with spaces or empty characters, replacing return with spaces, removing redundant space characters, replacing part of English symbols with Chinese symbols and the like. Meanwhile, when the redundant space symbol is removed, the space replaced by the html label and the space replaced by carriage return are also removed.

For example, when the following public opinion texts are obtained:

“<div>·↵

the procurement (Zhaoqing) project purchase of < span > #2 unit phase advance test project and the like </span >. 8629

·<div>↵

Question announcement "

Wherein "·" represents a space, "\ 8629;" represents a carriage return symbol, and the processing process of the public opinion text is as follows:

firstly, replacing html labels and carriage returns with blanks: "(procurement of machine group of. Cndot. Cndot. # 2. Cndot.) (test item) Zhaoqing) project purchase · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · resultant bulletin · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · fruit bulletin · · · · · · · · · · · · · · · · · · resultant bulletin · · · · · · · · · · · · · · · · · · · · · · · · · · fruit advertisement · · · · · · · · · · · · · · · · · · · · · · · · · · resultant advertisement · ·;

then, the redundant blank spaces are removed, and the final result is 'the purchasing (Zhaoqing) item purchasing and result announcement' of the #2 unit phasing test item and the like.

After the public opinion text is preprocessed, not only can keyword information in the text be acquired, but also the data volume can be effectively reduced.

In an actual application scenario, when the public sentiment text is preprocessed, the selection and setting of the preprocessing mode can be performed according to the actual situation, and the method is not limited here.

Step S102: acquiring target data according to the title and the preprocessed public sentiment text, wherein the target data comprises: one or more of a negative business listing business, a listing issuing authority, a listing time, and a penalty type.

After a public opinion text is preprocessed through the steps, information of one or more dimensions in four dimensions of an enterprise list is mined according to the title of the public opinion data and the public opinion text: enterprises listed in negative enterprise lists, list issuing organizations, listing time and penalty types.

Step S103: and taking the obtained target data as enterprise list information.

In an optional implementation manner, when acquiring target data according to a title and a preprocessed public sentiment text, the method comprises the following steps:

step S1021, traversing each sentence in the preprocessed public opinion text, and extracting a first type of target sentence which contains negative keyword information and does not contain positive keyword information;

step S1022, the business entity information is extracted from the first type target sentence, and the business listed in the negative business list is determined based on the extracted business entity information.

Specifically, all sentences including negative keywords such as "black list" and "black list" may be extracted from the public opinion text, and for each sentence, it may be determined whether only the negative keyword appears in the sentence and no positive keyword (for example, "red list" or "red list") appears, and if so, the sentence may be regarded as the first type target sentence. In the same public opinion body, the first kind of target sentences may have 0, 1 or more.

And after the first type of target sentences are obtained, extracting the enterprise entities contained in the target sentences. And if the enterprise is extracted, returning the result.

Optionally, if the first type of target statement cannot be extracted or the enterprise information cannot be extracted from the first type of target statement, the following steps are performed:

step S1023, positive keyword information, negative keyword information and positioning information of an enterprise entity are respectively determined in the preprocessed public opinion positive text;

step S1024, sequencing the positive keyword information, the negative keyword information and the positioning information of the enterprise entities from front to back to obtain a target array;

and S1025, determining the enterprises listed in the negative enterprise list according to the relative positions of the positioning information of the negative keyword information in the target array and the positioning information of the enterprise entity.

Specifically, first type positioning information of the first character of the positive keyword and the first character of the negative keyword in the pre-processed public opinion text can be respectively determined, and the first type positioning information is sequenced from front to back to obtain an initial array; then, entity recognition is carried out on the preprocessed public sentiment text to obtain enterprise entities contained in the public sentiment text; respectively determining second type positioning information of the first character of each enterprise entity in the pre-processed public opinion text; and finally, inserting the second type of positioning information into the initial array, and sequencing all the positioning information in the initial array from front to back to obtain a target array.

Firstly, the positions of keywords such as a black list, a red list and the like in a text of the public opinion are sequenced from front to back to obtain an initial array. All business entities that are in-line with the body text are obtained through, for example, public sentiment services. And then traversing each enterprise entity, searching the position of the enterprise entity in the text, and finding the sequencing position of the initial array where the enterprise position is located in an insertion sequencing mode. And judging whether the enterprise is a blacklisted enterprise according to the previous position. If the former position is the position of the negative keywords such as the blacklist or the blacklist, the position is judged to be the blacklist enterprise; otherwise, the enterprise is not a blacklisted enterprise.

For example, the following steps are carried out: the text is ' XX network 2 month 25 day news (reporter flood XX communicator Zhao XX) for further implementing the traffic safety management work of the transport company, and XX traffic police department publishes the full-city passenger and freight transport company and the network appointment platform ' Red Heiban ' in 2021 month in the front. Hongbang (the responsibility of traffic safety subject is in a better company): XX division of XX group, inc., fujian province; XX county XX automobile transport, inc. Blacklist (high risk of traffic safety company): fujian XX Co., ltd; XX Petroleum gas Co., ltd, XX City, fujian province; XX city XX district XX Limited. And (3) traffic police reminding: the traffic safety risk of passenger and freight transportation, tourism passenger transportation and hazardous chemical transport is high, and the problem of road traffic safety is not ignored. The traffic police department reminds the drivers of strict traffic regulations and safe civilized driving! ".

First, the positions of the "black board" and the "red board" are located, the first character positions of the "black board" are 75 and 131, the first character position of the "red board" is 79, and the array A is obtained as [75, 79, 131] according to the ranking from small to large of the positions.

The body entity and its location are then obtained, for example, by an in-line service, with the following results: XX division of XX group company Limited in Fujian province, the first character position is 98; XX county XX car transport limited, the first character position is 117. Fujian XX Co., ltd, the first character position is 145; XX petroleum gas Co., ltd, XX city, fujian province, with a first character position of 156; XX city XX area XX limited, first character position 174.

After the position of the company is obtained, sequentially inserting first character positions of the company into an array A, for example, the position of an XX branch of an XX group limited company in Fujian province is 98, and after the first character positions are inserted into the array A, obtaining [75, 79, 98, 131], the former position of 98 is 79, and the former position of 79 corresponds to a red chart, so that the XX branch of the XX group limited company in Fujian province is a red chart company; the position of 'Fujian XX limited' is 145, the insertion array A has [75, 79, 131 and 145], the former position is 'blacklist' corresponding to 131 and 131, so the 'Fujian XX limited' is the blacklist company. And the labels of all companies are obtained by analogy.

In an optional implementation manner, when obtaining target data according to a title and a preprocessed public sentiment text, the method comprises the following steps of:

The publishing agency and the publishing time are often found at the head or tail of the public opinion text. Therefore, 150 characters at the beginning and the end of the text of the public sentiment body can be selected, and a total of 300 characters are used as the text to be extracted. I.e. starting with the first character at the beginning of the body and selecting backwards until 150 characters are selected, and starting with the last character at the end of the body and selecting forwards until 150 characters are selected. Alternatively, if the complete sentence is truncated at 150 characters selected forward/backward, the selection may continue until the complete sentence is enclosed.

Meanwhile, the above-mentioned selection manner of 150 characters is a preferred character number selection value, and may also be to select 50, 100 or 200 characters, and the selection number of the characters may be set according to an actual situation, which is only an exemplary illustration and is not specifically limited.

When the named entity identification mode is used for extracting the date entity and the institution entity, the following modes can be adopted:

constructing a model: model training is performed by using batch labeling data, wherein the model can adopt an ERNIE model. The ERNIE model is a pre-training language model which is provided by Baidu and is constructed based on the thought of BERT and added with knowledge information of a knowledge graph. Compared with a BERT model, the ERNIE model can capture semantic information between words in text. The training method is similar to BERT, and ERNIE is finely adjusted and constructed by using labeled data to obtain a model of a specific task.

And inputting the text data of the Chinese Wikipedia serving as a pre-training corpus into the model so that the model learns the association relationship between the characters in the pre-training corpus and the contexts of the characters to obtain the pre-trained model.

And manually marking the name information, the enterprise information and the organization information in the public data set of the XX daily report to obtain a training data set. And inputting the training data set into the pre-trained model for model fine tuning training.

After the model is trained, the sentence to be predicted is input into the model, and the result can be obtained. For example, on 24 days in 1 month, the XX property supervision and management office formally releases the integrity level assessment result, wherein 22 integrity levels of the XX company, the XX company and the like are unqualified, and the XX company, the XX company and the like enter a black board and are disclosed. "input to model, model output: "B-TIMEI-TIMEI-TIMEI-TIMEI-TIMEOB-ORGI-ORGI-ORGI-ORGI-ORGI-ORGI-ORGI-OROOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOB-COMI-COMI-COMOOOOOOOOOOOOOOOOOOOOOOOOOOOOO". Wherein O represents no label; B-ORG represents the first character mark of the mechanism type entity, and I-ORG represents the non-first character mark of the mechanism type entity; B-COM represents a first character mark of a company entity, and I-COM represents a non-first character mark of the company entity; B-TIME represents the first character mark of the TIME class entity, and I-TIME represents the non-first character mark of the TIME class entity. Obtaining an organization entity by taking out all B-ORG and I-ORG marked texts in the output; and taking out all B-TIME and I-TIME marked texts in the output to obtain the TIME class entity.

Punishment types can be divided into 7 types: product quality/traffic safety/delinquent wages/violation/travel/credit/others. And the discipline type is subjected to model training by using batch labeling data, and a classification model is constructed for processing, wherein the classification model is constructed by using a pre-training model ERNIE. During model prediction, the first 512 characters of the title + text are input into the model, and the model outputs the corresponding penalty type.

In an alternative implementation, if the named entity identification method is used to fail to extract the date entity and the institution entity, the following steps are performed:

extracting a source data address of the public opinion data from the public opinion data; and acquiring an original text from the source data address, and extracting a date entity and an organization entity from the original text to be respectively used as the listing time and the listing issuing organization.

In an optional implementation manner, when extracting a source data address of the public opinion data from the public opinion data, the method includes the following steps:

extracting all links in the public opinion data according to a first preset matching rule; traversing the extracted links, and analyzing the link data title of the link data corresponding to each link; screening out the link data titles with the number larger than or equal to the number of the characters of the preset title as first similar titles; acquiring a title of the public opinion data, screening a second similar title from the first similar title according to the title of the public opinion data and the first similar title, and taking corresponding link data as similar data; acquiring the text content of the public opinion data and the text content of similar data, calculating the text similarity between the text content of the public opinion data and the text content of the similar data, and taking the address of the similar data with the highest text similarity as the source data address of the public opinion data.

In an optional implementation manner, if the link does not exist in the public opinion data or the similar data does not exist in the link data, the following steps may be further performed:

or if the public opinion data does not exist, the title of the public opinion data is forwarded to a target search engine for search operation, and at least one search result link is obtained; and acquiring text content of link data corresponding to the search result link, calculating text similarity between the text content of the public opinion data and the text content of the link data, and taking the address of the link data with the highest text similarity as the source data address of the public opinion data.

The target search engine may be a hundred-degree search engine, a google search engine, a 360-degree search engine, etc., and those skilled in the art may flexibly set the target search engine according to actual needs, which is not limited herein.

It should be noted that when a plurality of search document links are searched by the target search engine, the top P search document links may be extracted for subsequent operations, where P is a positive integer.

Optionally, if the number of similar documents corresponding to the highest text similarity is multiple, the method further includes: acquiring webpage source codes of a plurality of similar documents; responding to the searching operation of the time attribute tag in each webpage source code, and acquiring the text sending time of a plurality of similar documents; and sequencing the text sending times of the plurality of similar documents according to the time sequence, and taking the similar document corresponding to the most front text sending time as a source document of the source tracing document.

Specifically, taking the similar document as an html webpage as an example, after a plurality of html webpage source codes are obtained, a time tag (time attribute tag) in the html webpage source codes can be searched for so as to obtain the text sending time of the similar document after the time tag.

Fig. 2 is a schematic structural diagram of an apparatus for obtaining an enterprise list information according to an exemplary embodiment of the present invention. As shown in fig. 2, the apparatus includes:

the pre-processing module 201 is configured to obtain a title of public opinion data, determine whether preset keyword information exists in the title, and if so, pre-process a public opinion text of the public opinion data;

a target data obtaining module 202, configured to obtain target data according to a title and a preprocessed public opinion text, where the target data includes: one or more of an enterprise listed in the negative enterprise list, a list issuing organization, listing time and a penalty type;

and an enterprise list information obtaining module 203, configured to use the obtained target data as enterprise list information.

Optionally, the preset keyword information includes preset negative keyword information corresponding to the negative business list and preset positive keyword information corresponding to the positive business list,

Optionally, the target data obtaining module, when the first type of target statement is not extracted or the enterprise information is not extracted from the first type of target statement, is specifically configured to:

inputting the title and a third preset number of characters at the beginning of the public sentiment text into a classification model trained in advance, and determining a punishment type in the target data based on an output result of the classification model.

Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

In some implementation manners of the embodiments of the present invention, the enterprise list information obtaining apparatus provided in the embodiments of the present invention has the same beneficial effects as the enterprise list information obtaining method provided in the foregoing embodiments of the present invention.

An embodiment of the present invention further provides an electronic device corresponding to the method for acquiring enterprise list information provided in the foregoing embodiment, where the electronic device may be an electronic device for a server, such as a server, including an independent server and a distributed server cluster, to execute the method for acquiring enterprise list information; the electronic device may also be an electronic device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, so as to execute the above enterprise list information obtaining method.

Fig. 3 is a schematic diagram of an electronic device according to an exemplary embodiment of the present invention, and as shown in fig. 3, the electronic device 40 includes: a processor 400, a memory 401, a bus 402 and a communication interface 403, wherein the processor 400, the communication interface 403 and the memory 401 are connected through the bus 402; the memory 401 stores a computer program that can be executed on the processor 400, and the processor 400 executes the method for obtaining the enterprise list information according to the present invention when executing the computer program.

The Memory 401 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 403 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

Bus 402 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 401 is configured to store a program, and the processor 400 executes the program after receiving an execution instruction, where the method for acquiring enterprise list information disclosed in any embodiment of the present invention may be applied to the processor 400, or implemented by the processor 400.

Processor 400 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware integrated logic circuits or software in the processor 400. The Processor 400 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 401, and the processor 400 reads the information in the memory 401 and completes the steps of the method in combination with the hardware.

The electronic device provided by the embodiment of the invention and the enterprise list information acquisition method provided by the embodiment of the invention have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.

Referring to fig. 4, a computer-readable storage medium is shown as an optical disc 50, on which a computer program (i.e., a program product) is stored, where the computer program is executed by a processor to execute the method for obtaining the enterprise list information.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memories (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical and magnetic storage media, which are not described in detail herein.

The computer-readable storage medium provided by the above embodiment of the present invention and the method for acquiring the enterprise list information provided by the embodiment of the present invention are based on the same inventive concept, and have the same beneficial effects as methods adopted, operated or implemented by application programs stored in the computer-readable storage medium.

It should be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A method for acquiring enterprise list information is characterized by comprising the following steps:

taking the obtained target data as enterprise list information;

the preset keyword information comprises preset negative keyword information corresponding to a negative enterprise list and preset positive keyword information corresponding to a positive enterprise list, and the target data is acquired according to the title and the preprocessed public opinion text, and the target data comprises:

traversing each sentence in the preprocessed public sentiment text, and extracting a first type of target sentence which contains negative keyword information and does not contain positive keyword information; extracting enterprise entity information from the first type of target statements, and determining enterprises listed in a negative enterprise list based on the extracted enterprise entity information;

if the first type of target sentences cannot be extracted or the enterprise entity information cannot be extracted from the first type of target sentences, the following steps are executed:

respectively determining positive keyword information, negative keyword information and positioning information of an enterprise entity in the preprocessed public opinion text; sequencing the positive keyword information, the negative keyword information and the positioning information of the enterprise entities from front to back to obtain a target array; and determining the enterprises listed in the negative enterprise list according to the relative positions of the positioning information of the negative keyword information in the target array and the positioning information of the enterprise entity.

2. The method of claim 1, wherein the determining positive keyword information, negative keyword information, and positioning information of the business entity in the pre-processed public opinion body, respectively, and sorting the positive keyword information, the negative keyword information, and the positioning information of the business entity from front to back to obtain a target array comprises:

3. The method for obtaining the enterprise list information according to claim 1, wherein the obtaining of the target data according to the title and the preprocessed public sentiment text comprises:

4. The method for obtaining enterprise list information according to any one of claims 1-3, wherein the obtaining target data according to the title and the preprocessed public opinion text comprises:

5. The method of claim 3, wherein if the named entity cannot be identified to the date entity and the organization entity, the following steps are performed:

6. The method for obtaining enterprise list information according to claim 5, wherein the extracting the source data address of the public opinion data from the public opinion data includes:

acquiring a title of the public opinion data, screening a second similar title from a first similar title according to the title of the public opinion data and the first similar title, and taking corresponding link data as similar data;

7. The method of claim 6, wherein if the link does not exist in the public opinion data or the similar data does not exist in the link data, the method further comprises:

8. An apparatus for obtaining enterprise list information, the apparatus comprising:

the enterprise list information acquisition module is used for taking the acquired target data as enterprise list information;

the preset keyword information comprises preset negative keyword information corresponding to a negative enterprise list and preset positive keyword information corresponding to a positive enterprise list, and the target data acquisition module is specifically used for acquiring target data according to a title and a preprocessed public opinion text:

if the first type of target sentences cannot be extracted or the enterprise entity information cannot be extracted from the first type of target sentences, the following steps are executed: respectively determining positive keyword information, negative keyword information and positioning information of an enterprise entity in the preprocessed public opinion text; sequencing the positive keyword information, the negative keyword information and the positioning information of the enterprise entities from front to back to obtain a target array; and determining the enterprises listed in the negative enterprise list according to the relative positions of the positioning information of the negative keyword information in the target array and the positioning information of the enterprise entity.

9. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing the processor-executable instructions;

the processor is used for reading the executable instructions from the memory and executing the executable instructions to realize the method of any one of the claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for performing the method of any of the preceding claims 1-7.