CN115238676A

CN115238676A - Method and device for identifying hot spots of bidding demands, storage medium and electronic equipment

Info

Publication number: CN115238676A
Application number: CN202210929119.1A
Authority: CN
Inventors: 裴迎栋; 田盼; 左芳芳; 邓丽华; 李国钦
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-08-03
Filing date: 2022-08-03
Publication date: 2022-10-25
Anticipated expiration: 2042-08-03
Also published as: CN115238676B

Abstract

The application belongs to the technical field of emerging information, and relates to a method and a device for identifying a hotspot requiring bidding, a storage medium and electronic equipment. The method comprises the following steps: preprocessing each bidding requirement text in a bidding requirement text set corresponding to a target industry to obtain a keyword corresponding to the bidding requirement text, and generating a bidding requirement corpus according to the keyword; inputting the bidding requirement corpus into a theme model, and extracting information of the bidding requirement corpus through the theme model to obtain theme-word probability; and determining a topic association matching degree according to the TF-IDF value corresponding to the keyword and the topic-word probability, and determining a bidding requirement hotspot corresponding to the target industry according to the topic association matching degree. The method and the device can improve the identification efficiency and the identification accuracy of the bidding requirement hot spot.

Description

Method and device for identifying hotspot of bidding demand, storage medium and electronic equipment

Technical Field

The present application relates to the field of emerging information technologies, and in particular, to a bid inviting requirement hotspot identification method, a bid inviting requirement hotspot identification system, a computer storage medium, and an electronic device.

Background

At present, china is in the comprehensive development era of digital industry and is an important opportunity period for developing and constructing 5G in China, and three operators strive to seek new opportunities and business opportunities in the active layout industry digitalization and informatization.

However, with the vigorous development of emerging technologies and industries, tens of thousands of bidding information are released every day by purchasing websites of national government offices, public institutions and national enterprises, and the traditional marking requirements by means of manual work or expert experience cannot meet the information processing and mining of massive bidding texts. At present, methods for constructing text topic mining aiming at bidding data are few, topic mining in texts is mainly based on weight probability distribution of word senses in the texts, and extraction of hotspot requirement information in bidding industries is lacked.

It should be noted that the information disclosed in the background section above is only used to enhance understanding of the background of the present application.

Disclosure of Invention

The application aims to provide a bid inviting demand hotspot identification method, a bid inviting demand hotspot identification system, a computer storage medium and electronic equipment, so that industry hotspot demand information in a bid inviting text is identified quickly, conveniently and accurately at least to a certain extent.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to a first aspect of the application, a method for identifying a hotspot of a bidding demand is provided, which comprises the following steps:

preprocessing each bidding requirement text in a bidding requirement text set corresponding to a target industry to obtain a keyword corresponding to the bidding requirement text, and generating a bidding requirement corpus according to the keyword;

inputting the bidding requirement corpus into a topic model, and extracting information of the bidding requirement corpus through the topic model to obtain topic-word probability;

and determining topic association matching degree according to the TF-IDF value corresponding to the keyword and the topic word probability, and determining a bidding demand hotspot corresponding to the target industry according to the topic association matching degree.

According to a second aspect of the present application, there is provided a bidding requirement hotspot identification device, comprising:

the system comprises a preprocessing module, a database processing module and a bidding requirement database processing module, wherein the preprocessing module is used for preprocessing each bidding requirement text in a bidding requirement text set corresponding to a target industry to obtain a keyword corresponding to the bidding requirement text and generate a bidding requirement corpus according to the keyword;

the theme processing module is used for inputting the bidding requirement corpus into a theme model and extracting information of the bidding requirement corpus through the theme model to acquire theme-word probability;

and the hotspot identification module is used for determining the topic association matching degree according to the TF-IDF value corresponding to the keyword and the topic word probability and determining the bidding requirement hotspot corresponding to the target industry according to the topic association matching degree.

In one embodiment of the present application, the preprocessing module is configured to:

performing word segmentation and stop word removal on the bidding requirement text to obtain a text to be processed;

calculating TF-IDF values of all participles in the text to be processed, and sequencing all participles contained in the bidding requirement text set from large to small according to the TF-IDF values to obtain a participle sequence;

and sequentially acquiring a preset number of word segmentation from the word segmentation sequence as the keywords.

In one embodiment of the application, the bidding requirement corpus comprises a plurality of bidding requirement texts and keywords corresponding to the bidding requirement texts; the theme handling module is configured to:

acquiring a text vector corresponding to the bidding requirement text and a word vector corresponding to the keyword, and constructing an input matrix according to the text vector and the word vector;

and inputting the input matrix into the topic model, and extracting information of the input matrix through the topic model to output the topic-word probability, wherein the topic-word probability is used for indicating the probability that each keyword in the bidding requirement text corresponds to a preset topic.

In one embodiment of the present application, the subject processing module is further configured to:

before the bidding requirement corpus is input into the theme model, calculating the perplexity corresponding to the theme model when different theme quantities are set;

constructing a theme number-confusion graph according to the theme number and the confusion, and acquiring inflection points in the theme number-confusion graph;

and taking the number of the topics corresponding to the inflection point as the optimal number of the topics, and determining the topic-word probability based on the optimal number of the topics.

In an embodiment of the present application, the bidding requirement hot spot identification apparatus further includes:

the sample acquisition module is used for acquiring a bidding requirement text sample set corresponding to the target industry before the bidding requirement corpus is input into a topic model;

the sample preprocessing module is used for preprocessing each bidding requirement text sample in the bidding requirement text sample set to obtain a keyword sample corresponding to the bidding requirement text sample;

and the training module is used for generating a bidding requirement corpus sample base according to the keyword samples and training a to-be-trained topic model according to the bidding requirement corpus sample base so as to obtain the topic model.

In an exemplary embodiment of the present application, the hot spot identification module includes:

the calculation unit is used for determining a plurality of topic association matching degrees corresponding to each bidding requirement text according to the TF-IDF value corresponding to each keyword in each bidding requirement text and the topic-word probability;

and the determining unit is used for taking the maximum theme association matching degree in the plurality of theme association matching degrees corresponding to each invitation requirement text as the target theme association matching degree corresponding to the invitation requirement text.

In an exemplary embodiment of the present application, the calculation unit is configured to:

calculating the topic association matching degree according to formula (1):

wherein p (T) is the topic association matching degree corresponding to the current topic T, v _i Is the ith key word in the key word set V corresponding to the current topic T, alpha is the topic weight, and alpha belongs to [0,1 ]]，TF _vi Is a key word v _i Corresponding TF value, IDF _vi Is and keyword v _i Corresponding IDF value, p (v) _i I T) as a keyword v _i Topic-word probabilities with the current topic T.

In an exemplary embodiment of the present application, the hotspot identification module is further configured to:

and taking the subject corresponding to the target subject association matching degree as the bidding requirement hotspot.

In an exemplary embodiment of the present application, the bidding requirement hot spot identifying apparatus 600 is further configured to:

before preprocessing each bidding requirement text in a bidding requirement text set corresponding to a target industry, acquiring the bidding text set corresponding to the target industry, wherein the bidding text set comprises one or more bidding texts;

extracting a target text corresponding to the service requirement in the bidding text, and constructing the bidding requirement text set according to the target text.

According to a third aspect of the present application, there is provided a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the bidding requirement hotspot identification method described above.

According to a fourth aspect of the present application, there is provided an electronic apparatus, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to execute the method for bidding requirement hotspot identification described above via execution of the executable instructions.

As can be seen from the foregoing technical solutions, the bid inviting requirement hotspot identification method, the bid inviting requirement hotspot identification device, the computer storage medium and the electronic device in the exemplary embodiments of the present application have at least the following advantages and positive effects:

the method for identifying the bidding demand hotspots comprises the steps of obtaining a bidding demand text set corresponding to a target industry, preprocessing each bidding demand text in the bidding demand text set to obtain a keyword corresponding to the bidding demand text set, generating a bidding demand corpus according to the keyword, extracting information of the bidding demand corpus through a topic model to obtain topic-word probability, calculating topic association matching degree between a topic and a bidding text according to TF-IDF values corresponding to the keyword and the topic-word probability, and obtaining the bidding demand hotspots according to the topic association matching degree. On one hand, the method and the device for identifying the bidding requirement hot spot can effectively integrate keyword probability information and topic-word probability information to calculate the correlation matching degree between the topic and the bidding text, and compared with the method that topic model extraction is directly carried out on the original text, the method and the device for identifying the bidding requirement hot spot improve the identification accuracy; on the other hand, the bid inviting requirement hotspot identification is carried out only according to the keyword probability information and the topic-word probability information, so that the bid inviting requirement hotspot identification method is convenient, simple and quick to realize.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 schematically shows a structural diagram of a system architecture to which the method for identifying a bidding requirement hotspot in the embodiment of the present application is applied.

Fig. 2 schematically shows a flowchart of a bidding requirement hotspot identification method in an embodiment of the present application.

Fig. 3 schematically shows a flowchart of acquiring a keyword in an embodiment of the present application.

Fig. 4 schematically shows a flowchart of determining an optimal number of topics based on a confusion degree in an embodiment of the present application.

Fig. 5 schematically shows a flow diagram for acquiring a hotspot of a bidding requirement in the construction industry in an embodiment of the present application.

Fig. 6 schematically shows a block diagram of a structure of a bidding requirement hotspot identification device in the present application.

FIG. 7 schematically illustrates a block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The terms "a," "an," "the," and "said" are used in this specification to denote the presence of one or more elements/components/parts/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. other than the listed elements/components/etc.; the terms "first" and "second", etc. are used merely as labels, and are not limiting on the number of their objects.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In the related technology of the application, when text theme mining is performed on the bid inviting file, the text theme mining is mainly performed based on the weight probability distribution of word senses in a text, but the mining mode cannot effectively acquire the bid inviting demand hotspot in the bid inviting file, and for a bidding enterprise, the industry business opportunity cannot be timely and accurately grasped, so that great economic loss can be caused to the bidding enterprise.

Aiming at the problems in the related art, the application provides a hot spot identification method for bidding requirements, and relates to the technical field that a virtualized network element selects a corresponding acceleration scheme to forward a message according to different service scenes.

Before explaining the technical solutions in the embodiments of the present application in detail, terms that may be involved in the embodiments of the present application are explained and explained first.

(1) TF-IDF: term Frequency-Inverse Document Frequency is a common weighting technique used for information retrieval and data mining, TF is Term Frequency (Term Frequency), and IDF is Inverse text Frequency index (Inverse Document Frequency).

(2) LDA topic model: the Latent Dirichlet Allocation implies a Dirichlet distribution topic model to infer the topic distribution of the document. The method can give the theme of each document in the document set in the form of probability distribution, so that after some documents are analyzed and the theme distribution is extracted, theme clustering or text classification can be performed according to the theme distribution.

(3) And (3) bidding: is an industry term of bid-inviting and bidding, and refers to the behavior that a bidder (buyer) sends out a bid-inviting notice or a bid sheet in advance, and the variety, quantity, technical requirements and related transaction conditions propose to invite the bidder (seller) to bid at a specified time and place.

(4) And (4) bidding: the term is a term of art for bid inviting, and refers to an action of submitting a bid to a bidder within a specified period according to conditions specified by a bid invitation or a bid invitation.

After introducing possible technical terms involved in the embodiments of the present application, the method for identifying a hot spot of a bidding requirement in the present application will be described in detail.

Fig. 1 schematically shows an exemplary system architecture block diagram to which the technical solution of the present application is applied.

As shown in fig. 1, system architecture 100 may include terminal device 101, server 102, and network 103. The terminal device 101 may include various electronic devices with display screens, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart television, and an intelligent vehicle-mounted terminal. The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing a cloud computing service. The network 103 may be a communication medium of various connection types capable of providing a communication link between the terminal device 101 and the server 102, and may be, for example, a wired communication link or a wireless communication link.

The system architecture in the embodiments of the present application may have any number of terminal devices, networks, and servers, according to implementation needs. For example, the server may be a server group consisting of a plurality of server devices.

The technical scheme provided by the embodiment of the application can be applied to the terminal device 101 or the server 102, and when the server 102 executes the bidding requirement hotspot identification method in the application, the server can be a cloud server providing cloud computing service.

Cloud computing (cloud computing) is a computing model that distributes computing tasks over a large pool of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.

According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, saaS and PaaS are upper layers relative to IaaS.

The following describes in detail the technical solutions of the bidding requirement hotspot identification method, the bidding requirement hotspot identification device, the computer readable medium, the electronic device, and the like provided in the present application with reference to specific embodiments.

Fig. 2 is a flowchart illustrating a method for identifying a hotspot of a bid request, and as shown in fig. 2, the method for identifying a hotspot of a bid request includes:

step S210: preprocessing each bidding requirement text in a bidding requirement text set corresponding to a target industry to obtain a keyword corresponding to the bidding requirement text, and generating a bidding requirement corpus according to the keyword;

step S220: inputting the bidding requirement corpus into a topic model, and extracting information of the bidding requirement corpus through the topic model to obtain topic-word probability;

step S230: and determining a topic association matching degree according to the TF-IDF value corresponding to the keyword and the topic probability, and determining a bidding requirement hotspot corresponding to the target industry according to the topic association matching degree.

The method for identifying the bidding demand hotspots comprises the steps of obtaining a bidding demand text set corresponding to a target industry, preprocessing each bidding demand text in the bidding demand text set to obtain a keyword corresponding to the bidding demand text set, generating a bidding demand corpus according to the keyword, extracting information of the bidding demand corpus through a topic model to obtain topic-word probabilities, calculating topic association matching degrees between topics and bidding texts according to TF-IDF values corresponding to the keywords and the topic-word probabilities, and obtaining the bidding demand hotspots according to the topic association matching degrees. On one hand, the method and the device can effectively fuse keyword probability information and topic-word probability information to calculate the correlation matching degree between the topic and the bidding text, and improve the identification accuracy of the hotspot required by bidding compared with the method of directly extracting the topic model on the original text; on the other hand, the hotspot identification of the bid inviting requirement is carried out only according to the probability information of the keyword and the probability information of the subject-word, so that the hotspot identification method of the bid inviting requirement is convenient, simple and quick to realize.

The following describes each step of the bid request hotspot identification method shown in fig. 2 in detail.

In step S210, each bidding requirement text in the bidding requirement text set corresponding to the target industry is preprocessed to obtain a keyword corresponding to the bidding requirement text, and a bidding requirement corpus is generated according to the keyword.

In an exemplary embodiment of the present application, in order to obtain a bidding requirement hotspot, a set of bidding texts including a bidding text is first obtained, where the set of bidding texts includes one or more bidding texts, and then the bidding requirement hotspot is extracted from each bidding text of the set of bidding texts. When the bid inviting text set is obtained, a large number of bid inviting texts can be downloaded from the bid inviting website by logging in the bid inviting website disclosed on the network to form the bid inviting text set, and certainly, a large number of bid inviting texts can be collected by other methods, which is not specifically limited in this embodiment of the present application.

In the exemplary embodiment of the application, after the bid inviting text set is obtained, bid inviting demand texts corresponding to service demands can be extracted from each bid inviting text included in the bid inviting text set, then, a bid inviting demand text set is constructed according to the bid inviting demand texts corresponding to each bid inviting text, and further, a hotspot for the bid inviting demand can be identified and obtained from the bid inviting demand text set. The extraction of the target text corresponding to the service requirement is related to the technical problem to be solved by the present application, and the specific requirements of the tenderer on the bidder are usually recorded in detail in the project service requirement part of the tendering text, and the tendering requirement hotspot just belongs to the specific requirements of the tenderer on the bidder, so that the tendering requirement text corresponding to the service requirement can be directly extracted from the tendering text, and thus, the information processing amount can be reduced, and the identification efficiency and accuracy of the tendering requirement hotspot can be improved.

In an exemplary embodiment of the present application, after the invitation requirement text set is obtained, each invitation requirement text contained in the invitation requirement text set may be preprocessed to obtain a keyword corresponding to the invitation requirement text.

Fig. 3 schematically illustrates a flow diagram of obtaining a keyword, and as shown in fig. 3, in step S301, performing word segmentation and stop word removal on the bid-soliciting text to obtain a text to be processed; in step S302, TF-IDF values of the respective participles in the text to be processed are calculated, and all the participles included in the bidding requirement text set are sorted from large to small according to the TF-IDF values to obtain a participle sequence; in step S303, a preset number of segmented words are sequentially obtained from the segmented word sequence as the keyword.

In step S301, when performing word segmentation on the bidding requirement text, word segmentation may be performed according to a dictionary, or word segmentation may be performed through a Machine learning Model based on statistics, and when performing word segmentation according to the dictionary, word segmentation may be performed by using a forward maximum matching method, a reverse maximum matching method, a two-way matching word segmentation method, and when performing word segmentation by using a Machine learning Model based on statistics, a word segmentation device combining a Machine learning algorithm and a dictionary may be used to implement the Machine learning algorithm, where the Machine learning algorithm may specifically be algorithms such as Hidden Markov (HMM), conditional Random Field (CRF), support Vector Machine (SVM), and deep learning. Stop words refer to that in information retrieval, in order to save storage space and improve search efficiency, some characters or words are automatically filtered before or after processing natural language data and text, and these characters or words are called stop words, and the stop words may be "for example", "for", such as "and the like.

In step S302, after the word segmentation corresponding to the bidding requirement text is obtained, the TF-IDF value corresponding to each word segmentation may be calculated, where the TF value is a ratio of a certain entry in a certain text, and the IDF value is obtained by dividing the total text number by the number of texts including the certain entry, and then taking a logarithm with a base of 10 as an obtained quotient, and the TF-IDF value corresponding to each word segmentation may be calculated according to the above calculation method. When the frequency of the high-frequency words in a certain bidding requirement text in the bidding requirement text set is low, the larger the TF-IDF value is, the larger the probability that the high-frequency words become keywords is, and the keywords are related to the topic of the bidding text and the bidding requirement hotspot, so that the keywords can be determined from all the participles in the bidding requirement text set according to the TF-IDF values corresponding to the participles, and a bidding requirement corpus can be constructed according to the keywords.

When determining a keyword according to TF-IDF values corresponding to the participles, specifically, the TF-IDF values corresponding to all the participles may be sorted from large to small to form a participle sequence, and then a preset number of the participles are sequentially obtained from the participle sequence as the keyword, where the preset number may be, for example, 50% or 60% of the total participles, and the like.

In step S220, the bidding requirement corpus is input into a topic model, and information extraction is performed on the bidding requirement corpus through the topic model to obtain a topic-word probability.

In an exemplary embodiment of the present application, after obtaining the bidding requirement corpus, information extraction may be performed on the bidding requirement corpus through a trained topic model to obtain topic-word probabilities corresponding to the bidding requirement text, where the topic model is specifically an LDA topic model, and may of course be other topic models, which is not specifically limited in this embodiment of the present application.

In an exemplary embodiment of the application, the bidding demand corpus includes all bidding demand texts in the bidding demand corpus and keywords included in each bidding demand text, when the bidding demand corpus is subjected to information extraction through a topic model, the bidding demand texts and the keywords in the bidding demand corpus are converted into text vectors and word vectors, then an input matrix is constructed according to the text vectors and the word vectors, and the information extraction is performed on the input matrix through the topic model to obtain probabilities that each keyword included in each bidding demand text corresponds to a preset topic.

In an exemplary embodiment of the present application, before the LDA topic model is used to process the bidding requirement corpus to obtain the topic-word probability, an optimal topic number may be determined based on topic consistency or an optimal topic number selection method based on confusion, and then the topic-word probability corresponding to each keyword in each bidding requirement text is determined based on the determined optimal topic number. Fig. 4 schematically shows a flowchart for determining an optimum number of subjects based on the degree of confusion, and as shown in fig. 4, in step S401, the degree of confusion corresponding to the subject model when different numbers of subjects are set is calculated; in step S402, constructing a topic number-confusion graph according to the number of topics and the confusion, and obtaining inflection points in the topic number-confusion graph; in step S403, the number of topics corresponding to the inflection point is used as an optimal number of topics, and the topic-word probability is determined based on the optimal number of topics.

In an exemplary embodiment of the present application, before extracting information from a bidding requirement corpus by using a trained topic model, a topic model to be trained needs to be trained to obtain a stable topic model. When the topic model to be trained is trained, firstly, a bidding requirement text sample set corresponding to a target industry needs to be obtained, a bidding requirement corpus sample library is constructed based on the bidding requirement text sample set, and then the topic model to be trained is trained according to the bidding requirement corpus sample library. The method for obtaining the bidding requirement corpus sample library is similar to the method for obtaining the bidding requirement corpus in the embodiment of the application, and the method comprises the steps of performing word segmentation, stop word removal and other processing on the bidding requirement text samples in the bidding requirement text sample set to obtain keywords corresponding to the bidding requirement text samples, and then generating the bidding requirement corpus sample library according to the keywords.

In step S230, a topic association matching degree is determined according to the TF-IDF value corresponding to the keyword and the topic-word probability, and a bidding requirement hotspot corresponding to the target industry is determined according to the topic association matching degree.

In an exemplary embodiment of the present application, after the topic-word probability is obtained, a matching degree of the topic related to the bid text, that is, a topic association matching degree, may be determined according to the TF-IDF value and the topic-word probability corresponding to the keyword. The TF-IDF value corresponding to the keyword reflects the particularity of the current keyword, namely the capability of distinguishing the bidding text where the current keyword is located from other bidding texts, the topic-word probability reflects the correlation between the current keyword and the current topic, and the correlation between the current topic and the bidding text containing the current keyword can be accurately determined by comprehensively considering the particularity of the current keyword and the correlation between the current keyword and the current topic.

In the exemplary embodiment of the present application, the particularity of the current keyword and the correlation between the current keyword and the current topic may be comprehensively considered in a weighted summation manner, specifically, the topic association matching degree may be calculated according to the topic weight, the TF-IDF value corresponding to the keyword and the topic-word probability, and the specific calculation formula is as shown in formula (1):

wherein p (T) is the topic associated matching degree corresponding to the current topic T, v _i Is the ith keyword in the keyword set V corresponding to the current topic T, alpha is the topic weight, and alpha belongs to [0,1 ]]，TF _vi Is and keyword v _i Corresponding TF value, IDF _vi Is a key word v _i Corresponding IDF value, p (v) _i | T) as a keyword v _i Topic-word probabilities with the current topic T.

Further, the value of α may be set according to the business needs, for example, when the service requirement definition in the bid requirement text is relatively broad and has no more specific requirements, the value of α may be set to be relatively large, for example, 0.8, etc., that is, the topic association matching degree is mainly determined by the topic-word probability between the current topic and the current keyword, and when the service requirement definition in the bid requirement text is relatively specific, the value of α may be set to be relatively small, for example, 0.2, etc., that is, the topic association matching degree is mainly determined by the TF-IDF value of the current keyword.

In an exemplary embodiment of the present application, the bidding requirement text set corresponding to the target industry includes one or more bidding requirement texts, each bidding requirement text may correspond to the same or different theme, and therefore, a bidding requirement hotspot corresponding to the target industry needs to be determined according to the theme corresponding to each bidding requirement text. In the embodiment of the application, firstly, a plurality of topic association matching degrees corresponding to each bidding requirement text can be determined according to the TF-IDF value and the topic-word probability corresponding to each keyword in each bidding requirement text; then, respectively taking the maximum theme association matching degree in the plurality of theme association matching degrees corresponding to each invitation product requirement text as a target theme association matching degree corresponding to the invitation product requirement text; and finally, taking the theme corresponding to the target theme association matching degree as a bidding requirement hotspot corresponding to the target industry. For example, the bidding requirement text set comprises bidding requirement texts a and B, the keywords contained in the text a are w1 and w2, the keywords contained in the text B are w3, w4 and w5, the optimal number of topics is 3, the corresponding topics are a, B and c, and the topic association matching degrees between different topics and different bidding requirement texts calculated according to the formula (1) are respectively: { a,0.8}, { B, a,0.5}, { c, a,0.6}, { a, B,0.3}, { B,0.5}, { c, B,0.6}, it can be determined that the target subject association matching degree corresponding to the bidding requirement text a is 0.8, the target subject association matching degree corresponding to the bidding requirement text B is 0.6, and correspondingly, the subjects a and c are the bidding requirement hotspots corresponding to the target industry.

The bid inviting requirement hotspot identification method can be applied to any industry, such as the building industry, the medical industry, the education industry, the traffic industry and the like. In order to make the method for identifying the hot spots of the bidding requirements clearer, the construction industry is taken as an example to specifically describe the method for identifying the hot spots of the bidding requirements.

Fig. 5 schematically illustrates a flow chart of obtaining a building industry bid solicitation requirement hotspot, as shown in fig. 5, in step S501, a bid solicitation text set corresponding to the building industry is obtained; in step S502, extracting a target text corresponding to the service requirement from the bidding texts included in the bidding text set, and constructing a bidding requirement text set according to the target text; in step S503, preprocessing the bidding requirement texts in the bidding requirement text set to obtain keywords corresponding to each bidding requirement text; in step S504, a bidding requirement corpus is constructed according to the TF-IDF values corresponding to the keywords; in step S505, the bidding requirement corpus is input to the topic model with the set optimal topic number to obtain topic-word probabilities corresponding to the keywords in the bidding requirement texts; in step S506, calculating topic association matching degrees when each bidding requirement document corresponds to different topics according to the TF-IDF value and topic-word probability corresponding to each keyword in each bidding requirement document; in step S507, obtaining a maximum topic association matching degree corresponding to each bidding requirement document; in step S508, the topic corresponding to each maximum topic association matching degree is taken as a bidding requirement hotspot corresponding to the construction industry.

According to the method for identifying the invitation requirement hotspots in the embodiment of the application, an invitation requirement text set corresponding to a target industry is obtained, each invitation requirement text in the invitation requirement text set is preprocessed to obtain a keyword corresponding to the invitation requirement text set, an invitation requirement corpus is generated according to the keyword, information extraction is carried out on the invitation requirement corpus through a topic model to obtain topic-word probability, finally, the topic association matching degree between a topic and the invitation text is calculated according to TF-IDF values corresponding to the keywords and the topic-word probability, and the invitation requirement hotspots are obtained according to the topic association matching degree. On one hand, the method and the device can effectively fuse keyword probability information and topic-word probability information to calculate the correlation matching degree between the topic and the bidding text, and improve the identification accuracy of the hotspot required by bidding compared with the method of directly extracting the topic model on the original text; on the other hand, the hotspot identification of the bid inviting requirement is carried out only according to the probability information of the keyword and the probability information of the subject-word, so that the hotspot identification method of the bid inviting requirement is convenient, simple and quick to realize.

Fig. 6 shows a schematic structural diagram of the device for identifying a hotspot of an invitation requirement, and as shown in fig. 6, the device 600 for identifying a hotspot of an invitation requirement may include a preprocessing module 601, a theme processing module 602, and a hotspot identification module 603. Wherein:

the system comprises a preprocessing module 601, a database module and a database module, wherein the preprocessing module 601 is used for preprocessing each bidding requirement text in a bidding requirement text set corresponding to a target industry to obtain a keyword corresponding to the bidding requirement text and generate a bidding requirement corpus according to the keyword;

a topic processing module 602, configured to input the bidding requirement corpus into a topic model, and perform information extraction on the bidding requirement corpus through the topic model to obtain a topic-word probability;

and the hotspot identification module 603 is configured to determine a topic association matching degree according to the TF-IDF value corresponding to the keyword and the topic word probability, and determine a bidding demand hotspot corresponding to the target industry according to the topic association matching degree.

In one embodiment of the present application, the preprocessing module 601 is configured to:

calculating TF-IDF values of all the participles in the text to be processed, and sequencing all the participles contained in the bidding requirement text set from large to small according to the TF-IDF values to obtain a participle sequence;

In one embodiment of the application, the bidding requirement corpus comprises a plurality of bidding requirement texts and keywords corresponding to the bidding requirement texts; the topic processing module 602 is configured to:

In one embodiment of the present application, the subject processing module 602 is further configured to:

calculating a perplexity corresponding to the topic model when setting different topic numbers before inputting the bidding requirement corpus into the topic model;

constructing a theme number-confusion map according to the theme number and the confusion, and acquiring inflection points in the theme number-confusion map;

In an embodiment of the present application, the bidding requirement hot spot recognition apparatus 600 further includes:

the sample acquisition module is used for acquiring a bidding requirement text sample set corresponding to the target industry before the bidding requirement corpus is input into the topic model;

and the training module is used for generating a bidding requirement corpus sample library according to the keyword sample, and training a to-be-trained topic model according to the bidding requirement corpus sample library to obtain the topic model.

In an exemplary embodiment of the present application, the hot spot identification module 603 includes:

and the determining unit is used for taking the maximum theme related matching degree in the plurality of theme related matching degrees corresponding to the bidding requirement texts as the target theme related matching degree corresponding to the bidding requirement texts.

calculating the topic association matching degree according to formula (1):

wherein p (T) is the topic associated matching degree corresponding to the current topic T, v _i For the ith keyword in the keyword set V corresponding to the current topic T, alpha is the topic weight, and alpha belongs to [0,1 ]]，TF _vi Is and keyword v _i Corresponding TF value, IDF _vi Is and keyword v _i Corresponding IDF value, p (v) _i | T) as a keyword v _i Topic-word probabilities with the current topic T.

In an exemplary embodiment of the present application, the hotspot identification module 603 is further configured to:

and taking the subject corresponding to the target subject association matching degree as the invitation requirement hotspot.

the method comprises the steps that before preprocessing is carried out on various bidding requirement texts in a bidding requirement text set corresponding to a target industry, a bidding text set corresponding to the target industry is obtained, wherein the bidding text set comprises one or more bidding texts;

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods in this application are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Fig. 7 schematically shows a block diagram of a computer system for implementing an electronic device according to an embodiment of the present application, where the electronic device may be disposed in a terminal device or a server.

It should be noted that the computer system 700 of the electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the random access memory 703, various programs and data necessary for system operation are also stored. The cpu 701, the rom 702, and the ram 703 are connected to each other via a bus 704. An Input/Output interface 705 (Input/Output interface, i.e., I/O interface) is also connected to the bus 704.

In some embodiments, the following components are connected to the input/output interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a local area network card, a modem, and the like. The communication section 709 performs communication processing via a network such as the internet. A driver 710 is also connected to the input/output interface 705 as necessary. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to embodiments of the present application, the processes described in the various method flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. When the computer program is executed by the central processing unit 701, various functions defined in the system of the present application are executed.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make an electronic device execute the method according to the embodiments of the present application.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for identifying a hotspot of a bidding demand is characterized by comprising the following steps:

inputting the bidding requirement corpus into a theme model, and extracting information of the bidding requirement corpus through the theme model to obtain theme-word probability;

and determining a topic association matching degree according to the TF-IDF value corresponding to the keyword and the topic-word probability, and determining a bidding requirement hotspot corresponding to the target industry according to the topic association matching degree.

2. The method according to claim 1, wherein preprocessing each bidding requirement text in the set of bidding requirement texts corresponding to the target industry to obtain keywords corresponding to the bidding requirement text comprises:

3. The method according to claim 1, wherein the bidding requirement corpus comprises a plurality of the bidding requirement texts and keywords corresponding to each of the bidding requirement texts;

the inputting the bidding requirement corpus into a topic model, and extracting information of the bidding requirement corpus through the topic model to obtain topic-word probability comprises:

4. The method of claim 3, further comprising:

5. The method of claim 1, wherein prior to inputting the bidding requirements corpus into a topic model, the method further comprises:

acquiring a bidding demand text sample set corresponding to the target industry;

preprocessing each bidding requirement text sample in the bidding requirement text sample set to obtain a keyword sample corresponding to the bidding requirement text sample;

and generating a bidding requirement corpus sample library according to the keyword sample, and training a to-be-trained topic model according to the bidding requirement corpus sample library to obtain the topic model.

6. The method of claim 1, wherein determining a topic association match based on the TF-IDF value corresponding to the keyword and the topic-word probability comprises:

determining a plurality of topic association matching degrees corresponding to each bidding requirement text according to the TF-IDF value corresponding to each keyword in each bidding requirement text and the topic-word probability;

and taking the maximum theme association matching degree in the plurality of theme association matching degrees corresponding to the bidding requirement texts as the target theme association matching degree corresponding to the bidding requirement texts.

7. The method of claim 6, wherein determining a plurality of topic association matching degrees corresponding to each of the bidding requirement texts according to the TF-IDF value corresponding to each of the keywords in each of the bidding requirement texts and the topic-word probability comprises:

calculating the topic association matching degree according to formula (1):

wherein p (T) is the topic associated matching degree corresponding to the current topic T, v _i Is the ith key word in the key word set V corresponding to the current topic T, alpha is the topic weight, and alpha belongs to [0,1 ]]，TF _vi Is a key word v _i Corresponding TF value, IDF _vi Is a key word v _i Corresponding IDF value, p (v) _i | T) as a keyword v _i Topic-word probabilities with the current topic T.

8. The method according to claim 6, wherein the determining of the bidding requirement hotspot corresponding to the target industry according to the topic association matching degree comprises:

9. The method of claim 1, wherein prior to preprocessing each of the set of bidding requirement texts corresponding to the target industry, the method further comprises:

acquiring a bidding text set corresponding to the target industry, wherein the bidding text set comprises one or more bidding texts;

and extracting a target text corresponding to the service requirement in the bidding text, and constructing the bidding requirement text set according to the target text.

10. A bid solicitation hotspot identification device, comprising:

the theme processing module is used for inputting the bidding requirement corpus into a theme model and processing the bidding requirement corpus through the theme model to acquire theme-word probability;

11. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the bidding requirement hotspot identification method of any one of claims 1-9.

12. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to execute the method of claim 1 to 9 via execution of the executable instructions.