CN112434158B

CN112434158B - Enterprise tag acquisition method, enterprise tag acquisition device, storage medium and computer equipment

Info

Publication number: CN112434158B
Application number: CN202011264990.1A
Authority: CN
Inventors: 柴源
Original assignee: Haichuanghui Technology Entrepreneurship Development Co ltd
Current assignee: Haichuanghui Technology Entrepreneurship Development Co ltd
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2024-05-28
Anticipated expiration: 2040-11-13
Also published as: CN112434158A

Abstract

The invention discloses an enterprise tag acquisition method, an acquisition device, a storage medium and computer equipment, wherein the enterprise tag acquisition method not only extracts keywords based on enterprise basic information texts, enterprise financing texts and enterprise business model texts for describing enterprises, but also screens candidate keywords according to the positions, parts of speech, repetition times, independent ideographic capacity, heat and the like of the candidate keywords, and can take the candidate keywords which are more focused by investors as enterprise tags, so that the investors can quickly find target enterprises through the enterprise tags.

Description

Enterprise tag acquisition method, enterprise tag acquisition device, storage medium and computer equipment

Technical Field

The invention relates to the technical field of enterprise classification in the financial industry, in particular to an enterprise label acquisition method, an enterprise label acquisition device, a storage medium and computer equipment.

Background

With the progress of science and technology and the rapid development of economy, some enterprises need to introduce investors to invest in the enterprises in order to expand the development range, and for the investors, the investors often need to acquire interested contents from massive data when selecting the enterprises, so that the efficiency of searching the investors is greatly reduced.

Disclosure of Invention

The technical problem solved by the invention is to provide an enterprise tag acquisition method, an enterprise tag acquisition device, a storage medium and computer equipment, so that an investor can search an enterprise by using the enterprise tag, and the enterprise searching efficiency is improved.

The technical scheme adopted by the invention comprises the following specific contents:

An enterprise tag acquisition method comprises the following steps:

acquiring a text to be extracted, wherein the text to be extracted comprises at least one enterprise basic information text, at least one enterprise financing text and at least one enterprise business model text, and determining the text type of the text to be extracted according to the content of the text to be extracted;

Segmenting the text to be extracted of each text type to obtain candidate keywords, and obtaining initial weights of the candidate keywords;

Obtaining similarity values of each candidate keyword and candidate keywords of other text types;

Acquiring a heat value of each candidate keyword;

Obtaining a weight optimization value of each candidate keyword according to the similarity value, the heat value and the initial weight of each candidate keyword;

and determining the candidate keywords with the weight optimization values exceeding a preset threshold as enterprise tags.

As a preferable mode of the above scheme, an initial weight of each candidate keyword is obtained:

Obtaining a position parameter r _i1 of the candidate keyword according to the position of the candidate keyword in the text to be extracted, and when the candidate keyword is simultaneously present in the title and the text of the text to be extracted, r _i1 =2; when the candidate keywords are simultaneously present in the title or the text of the text to be extracted, r _i1 =1;

Obtaining a repetition parameter r _i2 of the candidate keyword according to the repetition times of the candidate keyword in the text to be extracted, and Wherein: a _i is the repetition number of the ith candidate keyword, and n is the number of the candidate keywords;

Obtaining an expression parameter r _i3 of the candidate keyword according to the independent ideographic capability of the candidate keyword in the text to be extracted, and when the candidate keyword can be independently ideographic, r _i3 =1; when the candidate keywords cannot be ideogrammed independently, r _i3 =0;

obtaining a part-of-speech parameter r _i4 of the candidate keyword according to the part of speech of the candidate keyword in the text to be extracted, and when the candidate keyword is a verb, an adjective, a quantity word and a pronoun, r _i4 =0; when the candidate keyword is a noun, r _i4 =1;

obtaining initial weight omega _i0 of the candidate key words according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter Wherein: n is the number of the candidate keywords.

As a preferred aspect of the above solution, the obtaining the similarity value of each candidate keyword and the candidate keywords of other text types includes the following steps:

Constructing a first vector a according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter of the candidate keyword, wherein the first vector is a= (r _i1,r_i2,r_i3,r_i4): r _i1,r_i2,r_i3,r_i4 is the position parameter, repetition parameter, expression parameter and part-of-speech parameter of the i candidate keyword respectively;

Constructing a second vector B according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter of the candidate keyword of the associated word, and the first vector is b= (r _j1,r_j2,r_j3,r_j4), wherein: r _j1,r_j2,r_j3,r_j4 is the position parameter, repetition parameter, expression parameter and part-of-speech parameter of the j candidate keyword, and the related word is a candidate keyword of other text types;

calculating the similarity value of the candidate keywords and the associated words by using the first vector and the second vector, wherein the calculation formula of the similarity value is as follows:

as a preferable aspect of the above solution, the obtaining the popularity value of each candidate keyword includes the following steps:

taking the candidate keywords as statistical items to count the vocabulary popularity of the candidate keywords;

Taking the set of each candidate keyword as a statistical item to count the heat of the set of a plurality of candidate keywords which are simultaneously concerned by investors;

And adding the vocabulary heat and the aggregate heat to obtain the retrieval heat of the candidate keywords.

As a preferable mode of the above scheme, the statistical methods of the vocabulary heat and the aggregate heat are the same, and the statistical methods are:

setting a statistical starting time, and dividing the duration between the statistical starting time and the calculation time of the overall heat or the vocabulary heat or the collection heat into a plurality of time periods;

and weighting the whole heat or the vocabulary heat or the aggregate heat in a way that the contribution degree to the heat value is lower as the distance from the current time is longer, namely: wherein: lambda _j is the weight value corresponding to the j-th time period, and the closer to the time period calculated by the heat value, the larger the corresponding weight value is; beta _ij is the number of times the statistical item of the overall heat or the lexical heat or the aggregate heat is collected in the jth time period.

As a preferable mode of the above scheme, according to the similarity value, the heat value and the initial weight of each candidate keyword, a calculation formula for obtaining the weight optimization value of each candidate keyword is as follows:

The invention also discloses an enterprise tag acquisition device, which comprises a first acquisition module, a second acquisition module, a third acquisition module, a fourth acquisition module, a calculation module and a determination module, wherein: the method comprises the steps that a first acquisition module acquires a text to be extracted, wherein the text to be extracted comprises at least one enterprise basic information text, at least one enterprise financing text and at least one enterprise business model text, and the text type of the text to be extracted is determined according to the content of the text to be extracted; the second acquisition module divides the text to be extracted of each text type to obtain candidate keywords, and acquires initial weights of the candidate keywords; the third acquisition module acquires similarity values of each candidate keyword and candidate keywords of other text types; the fourth acquisition module acquires the heat value of each candidate keyword; the calculation module obtains a weight optimization value of each candidate keyword according to the similarity value, the heat value and the initial weight of each candidate keyword; and the determining module determines the candidate keywords with the weight optimization values exceeding a preset threshold as enterprise tags.

The invention also discloses a computer device, which comprises a memory and a processor connected with the memory, wherein the memory stores a computer program, and the computer program realizes the steps of realizing the enterprise tag acquisition method when being executed by the processor.

The invention also discloses a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the method for acquiring enterprise labels.

Compared with the prior art, the invention has the beneficial effects that:

The method for acquiring the enterprise tag disclosed by the invention not only extracts the keywords based on the enterprise basic information text, the enterprise financing text and the enterprise business model text waiting text for extracting for describing the enterprise, but also screens the candidate keywords according to the positions, parts of speech, repetition times, independent ideographic capacity, heat and the like of the candidate keywords, and can take the candidate keywords which are more focused by an investor as the enterprise tag, so that the investor can quickly find a target enterprise through the enterprise tag.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention, as well as the preferred embodiments thereof, together with the following detailed description of the invention, given by way of illustration only, together with the accompanying drawings.

Drawings

FIG. 1 is an application environment diagram of a method for acquiring enterprise tags in accordance with a preferred embodiment;

FIG. 2 is a flow chart of a method for acquiring an enterprise tag according to a preferred embodiment;

FIG. 3 is a block diagram of an enterprise tag acquisition apparatus in accordance with a preferred embodiment;

FIG. 4 is a block diagram illustrating a second acquisition module of FIG. 3;

FIG. 5 is a block diagram illustrating a third acquisition module of FIG. 3;

FIG. 6 is a block diagram illustrating a fourth acquisition module of FIG. 3;

FIG. 7 is a block diagram of the computer device of the preferred embodiment;

Wherein, each reference sign is:

1. A terminal; 2. a server; 3. a first acquisition module; 4. a second acquisition module; 5. a third acquisition module; 6. a fourth acquisition module; 7. a computing module; 8. a determining module; 9. a first acquisition unit; 10. a second acquisition unit; 11. a third acquisition unit; 12. a fourth acquisition unit; 13. a first calculation unit; 14. a first building unit; 15. a second construction unit; 16. a second calculation unit; 17. a first statistical unit; 18. a second statistical unit; 19. and a third calculation unit.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description is given below of the specific implementation, structure, characteristics and effects according to the invention with reference to the accompanying drawings and preferred embodiments:

Example 1

As shown in fig. 1, an application environment diagram of an enterprise tag acquiring method of the present invention is shown, where the enterprise tag acquiring method is applied to an enterprise tag acquiring system, the enterprise tag acquiring system includes a terminal 1 and a server 2, where the terminal 1 and the server 2 are connected through a network, the terminal 1 may be specifically a desktop terminal or a mobile terminal, the mobile terminal may be specifically at least one of a mobile phone, a tablet computer, a notebook computer, a portable wearable device, and the like, and the server 2 may be implemented by using an independent server or a server cluster formed by multiple servers.

As shown in fig. 2, in one embodiment, the present invention provides a method for obtaining an enterprise tag, which is described by taking the application of the method to the server 2 in fig. 1 as an example, and includes the following steps:

The text to be extracted of each text type is segmented to obtain candidate keywords, initial weight of each candidate keyword is obtained, and the candidate keywords obtained by segmenting the text to be extracted of each text type comprise basic information keywords used for reflecting basic information of enterprises, financing keywords used for reflecting financing information of the enterprises and business mode keywords used for reflecting business modes of the enterprises due to different text types.

Acquiring a heat value of each candidate keyword;

Obtaining a position parameter r _i1 of the candidate keyword according to the position of the candidate keyword in the text to be extracted, and when the candidate keyword is simultaneously present in the title and the text of the text to be extracted, r _i1 =2; when the candidate keywords are simultaneously present in the title or the body of the text to be extracted, r _i1 =1.

Obtaining a repetition parameter r _i2 of the candidate keyword according to the repetition times of the candidate keyword in the text to be extracted, andWherein: a _i is the repetition number of the ith candidate keyword, and n is the number of the candidate keywords.

Obtaining an expression parameter r _i3 of the candidate keyword according to the independent ideographic capability of the candidate keyword in the text to be extracted, and when the candidate keyword can be independently ideographic, r _i3 =1; when the candidate keywords cannot be ideographic independently, r _i3 =0.

Obtaining a part-of-speech parameter r _i4 of the candidate keyword according to the part of speech of the candidate keyword in the text to be extracted, and when the candidate keyword is a verb, an adjective, a quantity word and a pronoun, r _i4 =0; when the candidate keyword is a noun, r _i4 =1.

Obtaining initial weight omega _i0 of the candidate key words according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameterWherein: n is the number of the candidate keywords.

It should be appreciated that the initial weight of each of the candidate keywords is determined based on the text to be extracted in which the candidate keyword is located.

constructing a first vector a according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter of the candidate keyword, wherein the first vector is a= (r _i1,r_i2,r_i3,r_i4): r _i1,r_i2,r_i3,r_i4 is the position parameter, repetition parameter, expression parameter and part-of-speech parameter of the i candidate keyword respectively.

Constructing a second vector B according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter of the candidate keyword of the associated word, and the first vector is b= (r _j1,r_j2,r_j3,r_j4), wherein: r _j1,r_j2,r_j3,r_j4 is a position parameter, a repetition parameter, an expression parameter and a part-of-speech parameter of the jth candidate keyword respectively, and the related words are candidate keywords of other text types, namely the i candidate keyword and the j candidate keyword are different in text types of the text to be extracted.

the candidate keywords are used as statistical items to count the vocabulary heat of the candidate keywords, which can reflect the attention heat of investors to each candidate keyword, so that the candidate keywords with higher attention heat of investors can be accumulated.

And taking the set of the candidate keywords as a statistical item to count the heat of the set of the candidate keywords which are simultaneously paid attention to by the investor, wherein the heat of the investor when paying attention to the candidate keywords simultaneously can be reflected.

And adding the vocabulary heat and the aggregate heat to obtain the search heat of the investor on the candidate keywords.

And adding the vocabulary heat and the collection heat to obtain the search heat of the investor on the candidate keywords, and carrying out statistics of the two dimensions on the search information input by the investor when searching the enterprise by a search engine, so that the integrity of the statistics is enhanced, and the investor can input the candidate keywords and the collection of the candidate keywords to obtain statistics.

It should be appreciated that the hotness value of the candidate keyword should be the vocabulary entered by the investor when searching for businesses or financing items using a search engine.

As a preferable mode of the above scheme, the statistical methods of the vocabulary heat and the aggregate heat are the same, and the concept of "time cooling" is introduced when the vocabulary heat and the aggregate heat are counted, that is, the farther from the current time, the lower the contribution to the heat value is. Because there are many existing hot spot enterprises, the hot spot enterprises may be replaced by other hot spot enterprises quickly over time after the hot spot period, so that the hot spot enterprises closer to the current time are more attractive to investors, and therefore, based on the consideration of the factor, the statistical method is as follows:

When the vocabulary heat and the aggregate heat are counted based on the consideration of time cooling, the candidate keywords of the hot spot can be ensured to have higher heat values.

In addition, different methods of valuing λ _j can be used, for example: the weight values corresponding to the time periods are valued in an arithmetic distribution mode, and the weight value corresponding to the ith time period is: Or the weight values corresponding to the time periods are taken as values in a mode of proportionally distributing, and the weight value corresponding to the j-th time period is as follows: or the value of lambda _j can also be determined according to the update rate of the hotspot enterprise.

it should be understood that, although the steps in the flowchart of fig. 2 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Example two

The invention also discloses an enterprise tag acquisition device, which comprises a first acquisition module 3, a second acquisition module 4, a third acquisition module 5, a fourth acquisition module 6, a calculation module 7 and a determination module 8, wherein: the first obtaining module 3 obtains a text to be extracted, wherein the text to be extracted comprises at least one enterprise basic information text, at least one enterprise financing text and at least one enterprise business model text, and determines the text type of the text to be extracted according to the content of the text to be extracted; the second obtaining module 4 performs word segmentation on the text to be extracted of each text type to obtain candidate keywords, and obtains initial weights of each candidate keyword; the third obtaining module 5 obtains a similarity value of each candidate keyword and candidate keywords of other text types; the fourth obtaining module 6 obtains a heat value of each candidate keyword; the calculation module 7 obtains a weight optimization value of each candidate keyword according to the similarity value, the heat value and the initial weight of each candidate keyword; the determining module 8 determines candidate keywords with weight optimization values exceeding a preset threshold value as enterprise tags.

And due to different text types, segmenting the text to be extracted of each text type to obtain candidate keywords, wherein the candidate keywords comprise basic information keywords used for reflecting basic information of enterprises, financing keywords used for reflecting financing information of the enterprises and business mode keywords used for reflecting business modes of the enterprises.

As a further preferable aspect, as shown in fig. 4, the second acquisition module 4 includes a first acquisition unit 9, a second acquisition unit 10, a third acquisition unit 11, a fourth acquisition unit 12, and a first calculation unit 13, wherein:

The first obtaining unit 9 obtains a position parameter r _i1 of the candidate keyword according to the position of the candidate keyword in the text to be extracted, and when the candidate keyword appears in the title and the text of the text to be extracted at the same time, r _i1 =2; when the candidate keywords are simultaneously present in the title or the text of the text to be extracted, r _i1 =1;

the second obtaining unit 10 obtains a repetition parameter r _i2 of the candidate keyword according to the repetition number of the candidate keyword in the text to be extracted, and Wherein: a _i is the repetition number of the ith candidate keyword, and n is the number of the candidate keywords;

The third obtaining unit 11 obtains an expression parameter r _i3 of the candidate keyword according to the independent ideographic capability of the candidate keyword in the text to be extracted, and when the candidate keyword can be independently ideographic, r _i3 =1; when the candidate keywords cannot be ideogrammed independently, r _i3 =0;

The fourth obtaining unit 12 obtains a part-of-speech parameter r _i4 of the candidate keyword according to the part of speech of the candidate keyword in the text to be extracted, and when the candidate keyword is a verb, an adjective, a number word and a pronoun, r _i4 =0; when the candidate keyword is a noun, r _i4 =1;

The first calculation unit 13 obtains the initial weight ω _i0 of the candidate keyword according to the location parameter, the repetition parameter, the expression parameter and the part-of-speech parameter Wherein: n is the number of the candidate keywords.

As a further preferred solution, as shown in fig. 5, the third obtaining module 5 includes a first building unit 14, a second building unit 15, and a second calculating unit 16, where:

The first construction unit 14 constructs a first vector a from the position parameter, the repetition parameter, the expression parameter, and the part-of-speech parameter of the candidate keyword, and the first vector is a= (r _i1,r_i2,r_i3,r_i4), wherein: r _i1,r_i2,r_i3,r_i4 is the position parameter, repetition parameter, expression parameter and part-of-speech parameter of the i candidate keyword respectively;

The second construction unit 15 constructs a second vector B from the position parameter, the repetition parameter, the expression parameter, and the part-of-speech parameter of the candidate keyword of the associated word, and the first vector is b= (r _j1,r_j2,r_j3,r_j4), wherein: r _j1,r_j2,r_j3,r_j4 is a position parameter, a repetition parameter, an expression parameter and a part-of-speech parameter of the jth candidate keyword respectively, and the related words are candidate keywords of other text types, namely the i candidate keyword and the j candidate keyword are different in text types of texts to be extracted;

the second calculation unit 16 calculates a similarity value of the candidate keyword and the related word using the first vector and the second vector, and a calculation formula of the similarity value is:

As a further preferable solution, as shown in fig. 6, the fourth obtaining module 6 includes a first statistics unit 17, a second statistics unit 18, and a third calculation unit 19, where:

the first statistics unit 17 uses the candidate keywords as statistics items to count the vocabulary popularity of the candidate keywords;

The second statistics unit 18 counts the heat of the collection of the candidate keywords as a statistics item for the investor to pay attention to the collection of the candidate keywords at the same time;

The third computing unit 19 adds the vocabulary heat and the aggregate heat to obtain a search heat for the enterprise by the investor.

In this embodiment, the statistical methods of the vocabulary heat and the aggregate heat are the same as those of the first embodiment.

As a further preferable solution, the calculation module 7 obtains a weight optimization value of each candidate keyword according to the similarity value, the heat value and the initial weight of each candidate keyword, and the calculation formula is as follows:

It should be noted that, each module in the enterprise tag acquisition apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Example III

The invention also discloses a computer device, which can be a server, as shown in fig. 7, and comprises a processor, a memory, a network interface and a database which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing operation behavior data, commodity information data, and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements the steps of the method of obtaining an enterprise tag.

It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In other embodiments, a computer device is provided, including a memory and a processor connected to the memory, where the memory stores a computer program, and the computer program when executed by the processor implements the steps of implementing the method for obtaining an enterprise tag, and specifically includes the following steps: acquiring a text to be extracted, wherein the text to be extracted comprises at least one enterprise basic information text, at least one enterprise financing text and at least one enterprise business model text, and determining the text type of the text to be extracted according to the content of the text to be extracted; segmenting the text to be extracted of each text type to obtain candidate keywords, and obtaining initial weights of the candidate keywords; obtaining similarity values of each candidate keyword and candidate keywords of other text types; acquiring a heat value of each candidate keyword; obtaining a weight optimization value of each candidate keyword according to the similarity value, the heat value and the initial weight of each candidate keyword; and determining the candidate keywords with the weight optimization values exceeding a preset threshold as enterprise tags.

In other embodiments, the step of obtaining the initial weight of each candidate keyword is implemented when the processor executes the computer program, and specifically includes the following steps: (1) Obtaining a position parameter r _i1 of the candidate keyword according to the position of the candidate keyword in the text to be extracted, and when the candidate keyword is simultaneously present in the title and the text of the text to be extracted, r _i1 =2; when the candidate keywords are simultaneously present in the title or the text of the text to be extracted, r _i1 =1; (2) Obtaining a repetition parameter r _i2 of the candidate keyword according to the repetition times of the candidate keyword in the text to be extracted, and Wherein: a _i is the repetition number of the ith candidate keyword, and n is the number of the candidate keywords; (3) Obtaining an expression parameter r _i3 of the candidate keyword according to the independent ideographic capability of the candidate keyword in the text to be extracted, and when the candidate keyword can be independently ideographic, r _i3 =1; when the candidate keywords cannot be ideogrammed independently, r _i3 =0; (4) Obtaining a part-of-speech parameter r _i4 of the candidate keyword according to the part of speech of the candidate keyword in the text to be extracted, and when the candidate keyword is a verb, an adjective, a quantity word and a pronoun, r _i4 =0; when the candidate keyword is a noun, r _i4 =1; (5) Obtaining initial weight omega _i0 of the candidate key words according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter, and then/>Wherein: n is the number of the candidate keywords.

In some other embodiments, the step of obtaining the similarity value of each candidate keyword and the candidate keywords of other text types is implemented when the processor executes the computer program, and specifically includes the following steps: (1) Constructing a first vector a according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter of the candidate keyword, wherein the first vector is a= (r _i1,r_i2,r_i3,r_i4): r _i1,r_i2,r_i3,r_i4 is the position parameter, repetition parameter, expression parameter and part-of-speech parameter of the i candidate keyword respectively; (2) Constructing a second vector B according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter of the candidate keyword of the associated word, and the first vector is b= (r _j1,r_j2,r_j3,r_j4), wherein: r _j1,r_j2,r_j3,r_j4 is a position parameter, a repetition parameter, an expression parameter and a part-of-speech parameter of the jth candidate keyword respectively, and the related words are candidate keywords of other text types, namely the i candidate keyword and the j candidate keyword are different in text types of texts to be extracted; (3) Calculating the similarity value of the candidate keywords and the associated words by using the first vector and the second vector, wherein the calculation formula of the similarity value is as follows:

in some other embodiments, the step of obtaining the popularity value of each candidate keyword is implemented when the processor executes the computer program, and specifically includes the following steps: acquiring retrieval information input by investors when searching enterprises; performing word segmentation processing on the search information by using a word segmentation technology to obtain candidate keywords, and taking the candidate keywords as statistical items to count the vocabulary popularity of the candidate keywords; taking the set of each candidate keyword as a statistical item to count the heat of the set of a plurality of candidate keywords which are simultaneously concerned by investors; and adding the overall heat, the vocabulary heat and the aggregate heat to obtain the retrieval heat of investors to enterprises.

In this embodiment, the statistical methods of the overall heat, the vocabulary heat, and the aggregate heat are the same as those of the first embodiment. The overall heat is mainly used for reflecting the attention degree of investors to the complete retrieval information.

Example IV

The invention also discloses a computer readable storage medium having stored thereon a computer program which when executed by a processor realizes the steps of: acquiring a text to be extracted, wherein the text to be extracted comprises at least one enterprise basic information text, at least one enterprise financing text and at least one enterprise business model text, and determining the text type of the text to be extracted according to the content of the text to be extracted; segmenting the text to be extracted of each text type to obtain candidate keywords, and obtaining initial weights of the candidate keywords; obtaining similarity values of each candidate keyword and candidate keywords of other text types; acquiring a heat value of each candidate keyword; obtaining a weight optimization value of each candidate keyword according to the similarity value, the heat value and the initial weight of each candidate keyword; and determining the candidate keywords with the weight optimization values exceeding a preset threshold as enterprise tags.

In other embodiments, the step of obtaining the initial weight of each candidate keyword is implemented when the computer program is executed by a processor, and specifically includes the steps of: (1) Obtaining a position parameter r _i1 of the candidate keyword according to the position of the candidate keyword in the text to be extracted, and when the candidate keyword is simultaneously present in the title and the text of the text to be extracted, r _i1 =2; when the candidate keywords are simultaneously present in the title or the text of the text to be extracted, r _i1 =1; (2) Obtaining a repetition parameter r _i2 of the candidate keyword according to the repetition times of the candidate keyword in the text to be extracted, andWherein: a _i is the repetition number of the ith candidate keyword, and n is the number of the candidate keywords; (3) Obtaining an expression parameter r _i3 of the candidate keyword according to the independent ideographic capability of the candidate keyword in the text to be extracted, and when the candidate keyword can be independently ideographic, r _i3 =1; when the candidate keywords cannot be ideogrammed independently, r _i3 =0; (4) Obtaining a part-of-speech parameter r _i4 of the candidate keyword according to the part of speech of the candidate keyword in the text to be extracted, and when the candidate keyword is a verb, an adjective, a quantity word and a pronoun, r _i4 =0; when the candidate keyword is a noun, r _i4 =1; (5) Obtaining initial weight omega _i0 of the candidate key words according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter, and then/>Wherein: n is the number of the candidate keywords.

In some other embodiments, the step of obtaining the similarity value of each candidate keyword and the candidate keywords of other text types is implemented when the computer program is executed by a processor, and specifically includes the steps of: (1) Constructing a first vector a according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter of the candidate keyword, wherein the first vector is a= (r _i1,r_i2,r_i3,r_i4): r _i1,r_i2,r_i3,r_i4 is the position parameter, repetition parameter, expression parameter and part-of-speech parameter of the i candidate keyword respectively; (2) Constructing a second vector B according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter of the candidate keyword of the associated word, and the first vector is b= (r _j1,r_j2,r_j3,r_j4), wherein: r _j1,r_j2,r_j3,r_j4 is a position parameter, a repetition parameter, an expression parameter and a part-of-speech parameter of the jth candidate keyword respectively, and the related words are candidate keywords of other text types, namely the i candidate keyword and the j candidate keyword are different in text types of texts to be extracted; (3) Calculating the similarity value of the candidate keywords and the associated words by using the first vector and the second vector, wherein the calculation formula of the similarity value is as follows:

In other embodiments, the step of obtaining the popularity value of each candidate keyword is implemented when the computer program is executed by a processor, and specifically includes the steps of: acquiring retrieval information input by investors when searching enterprises; performing word segmentation processing on the search information by using a word segmentation technology to obtain candidate keywords, and taking the candidate keywords as statistical items to count the vocabulary popularity of the candidate keywords; taking the set of each candidate keyword as a statistical item to count the heat of the set of a plurality of candidate keywords which are simultaneously concerned by investors; and adding the overall heat, the vocabulary heat and the aggregate heat to obtain the retrieval heat of investors to enterprises.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include non-volatile memory and/or volatile memory, where: (1) The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory; (2) Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It should be noted that in the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but any insubstantial changes and substitutions made by those skilled in the art on the basis of the present invention are intended to be within the scope of the present invention as claimed.

Claims

1. The method for acquiring the enterprise tag is characterized by comprising the following steps of:

Segmenting the text to be extracted of each text type to obtain candidate keywords, obtaining initial weight of each candidate keyword, obtaining position parameters r _i1 of the candidate keywords according to positions of the candidate keywords in the text to be extracted, and when the candidate keywords are simultaneously present in the title and the text of the text to be extracted, wherein r _i1 =2; when the candidate keywords are simultaneously present in the title or the text of the text to be extracted, r _i1 =1; obtaining a repetition parameter r _i2 of the candidate keyword according to the repetition times of the candidate keyword in the text to be extracted, and Wherein: a _i is the repetition number of the ith candidate keyword, and n is the number of the candidate keywords; obtaining an expression parameter r _i3 of the candidate keyword according to the independent ideographic capability of the candidate keyword in the text to be extracted, and when the candidate keyword can be independently ideographic, r _i3 =1; when the candidate keywords cannot be ideogrammed independently, r _i3 =0; obtaining a part-of-speech parameter r _i4 of the candidate keyword according to the part of speech of the candidate keyword in the text to be extracted, and when the candidate keyword is a verb, an adjective, a quantity word and a pronoun, r _i4 =0; when the candidate keyword is a noun, r _i4 =1; obtaining initial weight omega _i0 of the candidate key words according to the position parameter, the repetition parameter, the expression parameter and the part-of-speech parameter, and then/>

Acquiring a heat value of each candidate keyword;

2. The method for obtaining an enterprise tag as claimed in claim 1, wherein obtaining a similarity value between each candidate keyword and candidate keywords of other text types comprises the steps of:

3. the method for obtaining the enterprise tag according to claim 2, wherein obtaining the popularity value of each candidate keyword comprises the steps of:

4. The method for obtaining an enterprise tag according to claim 3, wherein the statistical methods of the vocabulary heat and the aggregate heat are the same, and the statistical methods are:

setting a statistical starting time, and dividing the duration between the statistical starting time and the calculation time of the vocabulary heat or the collection heat into a plurality of time periods;

And weighting the vocabulary heat or the aggregate heat in a way that the contribution degree to the heat value is lower as the distance from the current time is longer, namely: wherein: lambda _j is the weight value corresponding to the j-th time period, and the closer to the time period calculated by the heat value, the larger the corresponding weight value is; beta _ij is the number of times the statistical item of the vocabulary heat or the aggregate heat is collected in the jth time period.

5. The method for obtaining an enterprise tag according to claim 4, wherein the calculation formula for obtaining the weight optimization value of each candidate keyword according to the similarity value, the heat value and the initial weight of each candidate keyword is as follows:

6. An enterprise tag acquisition apparatus for implementing the enterprise tag acquisition method according to any one of claims 1 to 5, characterized by comprising a first acquisition module, a second acquisition module, a third acquisition module, a fourth acquisition module, a calculation module, and a determination module, wherein: the method comprises the steps that a first acquisition module acquires a text to be extracted, wherein the text to be extracted comprises at least one enterprise basic information text, at least one enterprise financing text and at least one enterprise business model text, and the text type of the text to be extracted is determined according to the content of the text to be extracted; the second acquisition module divides the text to be extracted of each text type to obtain candidate keywords, and acquires initial weights of the candidate keywords; the third acquisition module acquires similarity values of each candidate keyword and candidate keywords of other text types; the fourth acquisition module acquires the heat value of each candidate keyword; the calculation module obtains a weight optimization value of each candidate keyword according to the similarity value, the heat value and the initial weight of each candidate keyword; and the determining module determines the candidate keywords with the weight optimization values exceeding a preset threshold as enterprise tags.

7. A computer device, characterized by: comprising a memory and a processor connected to the memory, the memory storing a computer program which, when executed by the processor, implements the steps of the method of obtaining an enterprise tag according to any one of claims 1-5.

8. A computer-readable storage medium, characterized by: a computer program stored thereon, which when executed by a processor, implements the steps of the method of obtaining an enterprise tag as claimed in any one of claims 1 to 5.