CN114780668A - Method and device for generating service label, computer storage medium and electronic terminal - Google Patents

Method and device for generating service label, computer storage medium and electronic terminal Download PDF

Info

Publication number
CN114780668A
CN114780668A CN202210427937.1A CN202210427937A CN114780668A CN 114780668 A CN114780668 A CN 114780668A CN 202210427937 A CN202210427937 A CN 202210427937A CN 114780668 A CN114780668 A CN 114780668A
Authority
CN
China
Prior art keywords
word
determining
target
frequency
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210427937.1A
Other languages
Chinese (zh)
Other versions
CN114780668B (en
Inventor
刘杰辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yancheng Tianyanchawei Technology Co ltd
Original Assignee
Yancheng Jindi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yancheng Jindi Technology Co Ltd filed Critical Yancheng Jindi Technology Co Ltd
Priority to CN202210427937.1A priority Critical patent/CN114780668B/en
Publication of CN114780668A publication Critical patent/CN114780668A/en
Application granted granted Critical
Publication of CN114780668B publication Critical patent/CN114780668B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a method and a device for generating a service label, a computer storage medium and an electronic terminal, wherein the method for generating the service label comprises the following steps: performing word segmentation processing on the description text of each target company to obtain a plurality of characteristic words; determining the global word frequency of each feature word according to the occurrence frequency of each feature word in the description texts of all target companies; determining the local word frequency of each characteristic word according to the occurrence frequency of each characteristic word in the description text corresponding to the target company with the same statistical attribute; determining the contribution degree of each feature word to the generation of the service label according to the global word frequency and the local word frequency of each corresponding feature word aiming at each target company belonging to the same statistical attribute; and determining the service label of the target company belonging to the same statistical attribute according to the contribution degree, thereby realizing the effective generation of the service label or reducing the generation difficulty of the service label.

Description

Method and device for generating service label, computer storage medium and electronic terminal
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for generating a service tag, a computer storage medium, and an electronic terminal.
Background
In a business scenario between an enterprise information website and a CRM (Customer Relationship Management, chinese: Customer Relationship Management) system, in order to implement interest recommendation of a matching user, similarity calculation between companies, ranking of search results, and the like, a business label needs to be generated for a target company. However, since data on which a service tag is generated (for example, a service tag for an industry is business scope data in company side data) is unstructured data, it cannot be really applied to actual production, and thus it is difficult to generate a service tag efficiently or it is difficult to generate a service tag with a high degree of difficulty.
Disclosure of Invention
Embodiments of the present application provide a method and an apparatus for generating a service tag, a computer storage medium, and an electronic terminal, so as to overcome or alleviate the above technical problems in the prior art.
The technical scheme adopted by the application is as follows:
a method for generating a service tag comprises the following steps:
performing word segmentation processing on the description text of each target company to obtain a plurality of characteristic words;
determining the global word frequency of each characteristic word according to the occurrence frequency of each characteristic word in the description texts of all target companies;
determining the local word frequency of each characteristic word according to the frequency of the characteristic word appearing in the description text corresponding to the target company with the same statistical attribute;
determining the contribution degree of each feature word to the generation of the service label according to the global word frequency and the local word frequency of each corresponding feature word aiming at each target company belonging to the same statistical attribute;
and determining the service label of the target company belonging to the same statistical attribute according to the contribution degree.
Optionally, the determining, according to the global word frequency and the local word frequency of each feature word corresponding to the feature word, a contribution degree of each feature word to the generation of the service tag includes: and calculating the word score of each characteristic word according to the global word frequency and the local word frequency of each corresponding characteristic word, and taking the word score as the contribution degree of the characteristic word to each target company for generating the service label.
Optionally, the calculating a word score of each feature word according to the global word frequency and the local word frequency of each feature word corresponding to the feature word includes: and calculating the word score of each characteristic word according to the global word frequency and the local word frequency of each corresponding characteristic word based on y ═ f2 × (log (1/f1), wherein f1 is the global word frequency, f2 is the local word frequency, and y is the word score.
Optionally, the determining, according to the contribution degree, a service label of a target company belonging to the same statistical attribute includes:
according to the contribution degree, screening a plurality of characteristic words with large contribution degree from all the characteristic words corresponding to the target companies belonging to the same statistical attribute to be used as business key words of the industry or the same region;
and filtering the characteristic words of each target company based on the service key words, and only keeping the characteristic words which are the same as the service key words in the characteristic words of each target company as the service labels of the target companies.
Optionally, after determining the service label of the target company belonging to the same statistical attribute according to the contribution degree, the method further includes:
determining a target company pointed by the behavior of the target user;
and determining the interest tag of the target user according to the business tag of the target company pointed by the behavior of the target user.
Optionally, after determining the service label of the target company belonging to the same statistical attribute according to the contribution degree, the method further includes:
calculating weights distributed to different behaviors of the target user;
the determining the interest tag of the target user according to the business tag of the target company pointed by the behavior of the target user comprises the following steps: and determining the interest tag of the target user according to the business tag of the target company pointed by the behavior of the target user based on the weight corresponding to different behaviors.
Optionally, the calculating weights allocated to different behaviors of the target user includes:
determining the importance degree of the different behaviors and the timeliness of the different behaviors;
and calculating weights distributed to different behaviors of the target user based on the importance degree and the timeliness.
Optionally, the determining, based on the weights corresponding to the different behaviors, the interest tag of the target user according to the service tag of the target company to which the behavior of the target user points includes:
aiming at the same target company pointed by different behaviors of the target user, taking the weight corresponding to each behavior as the contribution degree of each business label of the target company to the determination of the interest label;
and sequencing the contribution degrees of all the service tags to the determined interest tags aiming at all target companies pointed by different behaviors of the target user, and determining a plurality of service tags with large contribution degrees as the interest tags of the target user.
An apparatus for generating a service tag, comprising:
the word segmentation unit is used for performing word segmentation processing on the description text of each target company to obtain a plurality of characteristic words;
the first word frequency statistical unit is used for determining the global word frequency of each characteristic word according to the frequency of the characteristic word appearing in the description texts of all target companies;
the second word frequency statistical unit is used for determining the local word frequency of each characteristic word according to the frequency of occurrence of each characteristic word in the description text corresponding to the target company with the same statistical attribute;
the contribution degree determining unit is used for determining the contribution degree of each feature word to the generation of the service label according to the global word frequency and the local word frequency of each corresponding feature word aiming at each target company belonging to the same statistical attribute;
and the service label determining unit is used for determining the service labels of the target companies belonging to the same statistical attribute according to the contribution degree.
A computer storage medium having stored thereon a computer executable program that is executed to implement a method as in any one of the embodiments of the present application.
An electronic terminal, comprising a memory and a processor, wherein the memory is used for storing a computer executable program, and the processor is used for running the computer executable program to implement the method of any one of the embodiments of the present application.
In the embodiment of the application, a plurality of feature words are obtained by performing word segmentation on the description text of each target company; determining the global word frequency of each characteristic word according to the occurrence frequency of each characteristic word in the description texts of all target companies; determining the local word frequency of each characteristic word according to the frequency of the characteristic word appearing in the description text corresponding to the target company with the same statistical attribute; determining the contribution degree of each feature word to the generation of the service label according to the global word frequency and the local word frequency of each corresponding feature word aiming at each target company belonging to the same statistical attribute; and determining the service label of the target company belonging to the same statistical attribute according to the contribution degree, thereby realizing the effective generation of the service label or reducing the generation difficulty of the service label.
Drawings
Fig. 1 is a schematic view of an application scenario according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a method for generating a service tag according to an embodiment of the present application;
FIG. 3 is a schematic flow chart illustrating the process of determining a service tag in the embodiment of the present application;
fig. 4 is a flowchart illustrating a method for generating an interest tag of a user according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a process for calculating weights according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a step of determining an interest tag of a target user according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a service tag generation apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic terminal in an embodiment of the present application.
Detailed Description
To make the technical problems, technical solutions and advantages to be solved by the present application clearer, the following detailed description is made with reference to the accompanying drawings and specific embodiments.
In the embodiment of the application, a plurality of feature words are obtained by performing word segmentation processing on the description text of each target company; determining the global word frequency of each characteristic word according to the occurrence frequency of each characteristic word in the description texts of all target companies; determining the local word frequency of each characteristic word according to the frequency of the characteristic word appearing in the description text corresponding to the target company with the same statistical attribute; determining the contribution degree of each feature word to the generation of the service label according to the global word frequency and the local word frequency of each corresponding feature word aiming at each target company belonging to the same statistical attribute; and determining the service label of the target company belonging to the same statistical attribute according to the contribution degree, thereby realizing the effective generation of the service label or reducing the generation difficulty of the service label.
Optionally, the same statistical attribute includes at least one of the same industry and the same region.
Fig. 1 is a schematic view of an application scenario according to an embodiment of the present application; as shown in fig. 1, a background server 101 and a front-end application program are provided in the application scenario, where the background server is at least used to store the description text of each target company, and the front-end application program may be installed on an electronic terminal 102 to provide an enterprise information website or a CRM system during the use of the application program. In addition, the background server side performs word segmentation processing on the description text of each target company through a method for generating an execution service label to obtain a plurality of characteristic words; and determining the service label of the target company belonging to the same statistical attribute according to the contribution degree of each feature word to the service label generation, thereby realizing the effective generation of the service label or reducing the generation difficulty of the service label.
Wherein the method further comprises: and determining the contribution degree of each feature word to the generation of the service label according to the global word frequency and the local word frequency of each corresponding feature word aiming at each target company belonging to the same statistical attribute. This step may be included in the step of determining the business label of the target company belonging to the same statistical attribute, or may be performed before it, based on the contribution degree of each feature word to the business label generation.
The method further comprises the following steps:
determining the global word frequency of each characteristic word according to the occurrence frequency of each characteristic word in the description texts of all target companies;
and determining the local word frequency of each characteristic word according to the frequency of the characteristic word appearing in the description text corresponding to the target company with the same statistical attribute.
The step of determining the global word frequency and the local word frequency may be performed in or before the step of determining, for each target company belonging to the same statistical attribute, the contribution degree of each feature word to the generation of the service tag according to the global word frequency and the local word frequency of each feature word corresponding to the target company.
In the embodiment of fig. 1, the background server may be an independent server, or one server or several servers in a server cluster. The electronic terminal can be a desktop computer, a tablet, a notebook computer or a mobile terminal.
Fig. 2 is a schematic flowchart of a method for generating a service tag according to an embodiment of the present application; as shown in fig. 2, it includes:
s201, performing word segmentation processing on the description text of each target company to obtain a plurality of feature words;
optionally, in a specific application scenario, all companies on the backend server may be targeted companies, and their corresponding targeted texts are targeted for the processing in step S201.
Optionally, in a specific application scenario, the description text of the target company may be obtained from a website of an industrial company, or may be obtained from other third-party channels, as long as the feature words are included. The format of the description text is not particularly limited.
Optionally, in a specific application scenario, the corresponding description text may be obtained from the background server through the ID of the target company.
Optionally, in a specific application scenario, the performing word segmentation processing on the description text of each target company to obtain a plurality of feature words may include: the description text of each target company is segmented in contrast with a customized segmentation dictionary to generate a directed acyclic graph; and then according to the selected word segmentation mode, after searching for the shortest path between characters on the directed acyclic graph according to the word segmentation dictionary, intercepting sentences in the description text to obtain a plurality of characteristic words. By the method, based on a word segmentation dictionary, a directed acyclic graph and a processing method of the shortest path, the character strings which should belong to one word cannot be segmented by mistake, so that the accuracy of word segmentation is ensured, and in addition, the rapidity of the word segmentation process is also ensured.
Alternatively, in another application scenario, a word segmentation model is trained by using a word segmentation training set and a test set, and when the word segmentation accuracy of the word segmentation model meets the requirement, the word segmentation model is used for performing word segmentation processing on the description text of each target company to obtain a plurality of feature words. The type of the word segmentation model can be flexibly selected according to the requirements of application scenes.
S202, determining the global word frequency of each feature word according to the occurrence frequency of each feature word in the description texts of all target companies;
alternatively, in a specific application scenario, all feature words extracted from each corresponding description text may be stored in the same data table in units of target companies, and the above step S202 is performed based on the data table, so as to determine the global word frequency (or also referred to as full word frequency) of each feature word.
S203, determining the local word frequency of each feature word according to the occurrence frequency of each feature word in the description text corresponding to the target company with the same statistical attribute;
optionally, in a specific application scenario, the data tables corresponding to the target companies may be divided based on industries (or regions), the data tables corresponding to all the target companies belonging to the same industry (or the same region) are divided into the same data subset, and the number of times that each feature word appears in the data subset is counted, so as to obtain the corresponding local word frequency.
Optionally, in another specific application scenario, the steps S202 and S203 have no strict timing requirement, and may be executed in parallel, or the step S203 may be executed before the step S202.
Here, when the above steps S202 and S203 are executed, duplicate removal may be performed or not performed on repeated feature words, specifically according to the requirements of the application scenario, for example, if the calculation is supported, duplicate removal is not performed, and if the calculation is limited, duplicate removal is performed.
S204, determining the contribution degree of each feature word to the generation of the service label according to the global word frequency and the local word frequency of each corresponding feature word aiming at each target company belonging to the same statistical attribute;
optionally, in a specific application scenario, in the step S204, determining the contribution degree of each feature word to the generation of the service tag according to the global word frequency and the local word frequency of each feature word corresponding to the feature word includes: and calculating the word score of each characteristic word according to the global word frequency and the local word frequency of each corresponding characteristic word, and taking the word score as the contribution degree of the characteristic word to each target company for generating the service label.
Namely, the word score of each characteristic word is calculated according to the global word frequency and the local word frequency of the characteristic words of the target company belonging to the same data subset.
Optionally, in a specific application scenario, the calculating a word score of each feature word according to the global word frequency and the local word frequency of each feature word corresponding to the feature word includes: and calculating the word score of each characteristic word according to the global word frequency and the local word frequency of each corresponding characteristic word based on y-f 2 log (1/f1), wherein f1 is the global word frequency, f2 is the local word frequency, and y is the word score.
Referring to the formula y ═ f2 × log (1/f1), since the local word frequency and the global word frequency are corresponding to the target companies belonging to the same statistical attribute, and y is proportional to f2 (local word frequency) and inversely proportional to f1 (global word frequency), the word scores of the feature words with higher contribution to the business label but higher local word frequency and lower global word frequency can be made higher, and the word scores of the feature words with lower contribution to the business label but lower local word frequency and higher global word frequency are made lower by the above formula processing.
Alternatively, in a specific application scenario, the base of the log may be any real number: 10. 5, 2, etc., according to actual conditions.
Alternatively, the log is taken as an example for explanation, but actually, when calculating the word score, the method is not limited to using only the log function, and other functions may be used as long as the purpose of making the word score of the feature words with higher contribution to the business label but higher local word frequency and lower global word frequency higher, and making the word score of the feature words with lower contribution to the business label but lower local word frequency and higher global word frequency lower is achieved.
S205, determining the business label of the target company belonging to the same statistical attribute according to the contribution degree.
Optionally, in a specific application scenario, as shown in fig. 3, a schematic flow diagram for determining a service tag in the embodiment of the present application is shown; as shown in fig. 3, the determining the service label of the target company belonging to the same statistical attribute according to the contribution degree includes:
s215, screening a plurality of characteristic words with large contribution degrees from all the characteristic words corresponding to the target companies belonging to the same statistical attribute according to the contribution degrees, and using the characteristic words as business key words of the industry or the same region;
optionally, in a specific application scenario, for example, a contribution threshold is set, and accordingly, feature words with contribution degrees greater than the contribution threshold are screened from all feature words corresponding to target companies belonging to the same statistical attribute, and are used as business keywords of the industry or the same region, and the business keywords are added to a company-side keyword table for management.
S225, filtering the characteristic words of each target company based on the business key words, and only keeping the characteristic words which are the same as the business key words in the characteristic words of each target company as business labels of the target companies.
Optionally, in a specific application scenario, all feature words corresponding to target companies belonging to the same statistical attribute may be filtered based on the service keyword, or all feature words corresponding to all target companies may be filtered without considering an industry or a region.
Fig. 4 is a flowchart illustrating a method for generating an interest tag of a user according to an embodiment of the present application; as shown in fig. 4, it includes:
s401, determining a target company pointed by the behavior of the target user;
optionally, in a specific application scenario, the behavior of the target user may include browsing, searching, focusing, monitoring, and the like. The behavior may be historical behavior or real-time behavior.
S402, determining the interest tag of the target user according to the business tag of the target company pointed by the behavior of the target user.
Optionally, in a specific application scenario, the service tag of the target company to which the behavior of the target user points may be generated according to the method for generating the service tag provided in the embodiment of the present application. For example, the above steps S401 to S402 are executed after determining the service label of the target company belonging to the same statistical attribute, specifically, the contribution degree.
For example, in a specific application scenario, the method provided in fig. 4 may also be a constituent step of a method for generating a service tag of a target company in the embodiment of the present application.
Optionally, in a specific application scenario, the method for generating an interest tag of a user in fig. 4 may further include: calculating weights distributed to different behaviors of the target user; specifically, the step of calculating the weight may be performed after determining the service label of the target company belonging to the same statistical attribute according to the contribution degree. Specifically, such as is performed before step S401.
Optionally, in a specific application scenario, the determining, according to the service tag of the target company to which the behavior of the target user points, the interest tag of the target user includes: and determining the interest tag of the target user according to the business tag of the target company pointed by the behavior of the target user based on the weights corresponding to different behaviors.
Optionally, in a specific application scenario, fig. 5 is a schematic flowchart of a process for calculating a weight according to an embodiment of the present application; as shown in fig. 5, the calculating weights assigned to the different behaviors of the target user includes:
S400A, determining the importance degree of the different behaviors and the timeliness of the different behaviors;
for example, in a specific application scenario, the importance levels of monitoring, focusing, searching and browsing are sequentially decreased, but the timeliness of focusing, monitoring, searching and browsing is sequentially decreased.
Optionally, in a specific application scenario, after determining timeliness of different behaviors, a weight attenuation value is determined according to a time duration of the timeliness characterization, and if the timeliness is about new, the weight attenuation value is smaller, otherwise, the weight attenuation value is larger.
Optionally, in a specific application scenario, the importance of the different behaviors and the timeliness of the different behaviors may be determined by analyzing the behavior log.
Optionally, after determining the importance degrees of different behaviors, determining an initial weight according to the importance degree, wherein the higher the importance degree is, the larger the initial weight is, and otherwise, the smaller the initial weight is.
S400B, calculating weights distributed to different behaviors of the target user based on the importance degree and the timeliness.
Specifically, the corresponding weight may be calculated based on the initial weight and the weight attenuation value of different behaviors, for example, directly performing a product operation on the initial weight and the weight attenuation value, and using the obtained operation result as the weight of the corresponding behavior.
Through the steps S400A and S400B, the importance degree and the timeliness of the behaviors are comprehensively considered, so that the accuracy of the weight is ensured.
FIG. 6 is a flowchart illustrating a step of determining an interest tag of a target user according to an embodiment of the present application; as shown in fig. 6, in a specific application scenario, the determining, based on the weights corresponding to the different behaviors, the interest tag of the target user according to the service tag of the target company to which the behavior of the target user points includes:
s412, aiming at the same target company pointed by different behaviors of the target user, taking the weight corresponding to each behavior as the contribution degree of each business label of the target company to the determination of the interest label;
s422, aiming at all target companies pointed by different behaviors of the target user, sequencing the contribution degrees of all the service tags to the interest tags, and determining a plurality of service tags with large contribution degrees as the interest tags of the target user.
Optionally, several service tags obtained by executing step S422 are also added to the user-side keyword table to perform unified management on the interest tags of the target users.
Optionally, the service tag of the target company and the interest tag of the target user are not fixed, and the method of the embodiment may be executed again to update according to a requirement, so as to ensure timeliness of the service tag of the target company and the interest tag of the target user.
Fig. 7 is a schematic structural diagram of a service tag generation apparatus according to an embodiment of the present application; as shown in fig. 7, it includes:
a word segmentation unit 701, configured to perform word segmentation processing on the description text of each target company to obtain a plurality of feature words;
a first word frequency statistics unit 702, configured to determine, according to the number of times that each feature word appears in the description texts of all target companies, a global word frequency of each feature word;
a second word frequency statistical unit 703, configured to determine a local word frequency of each feature word according to the number of times that each feature word appears in a description text corresponding to a target company with the same statistical attribute;
a contribution degree determining unit 704, configured to determine, for each target company belonging to the same statistical attribute, a contribution degree of each feature word to the generation of the service tag according to a global word frequency and a local word frequency of each feature word corresponding to the target company;
a service label determining unit 705, configured to determine, according to the contribution degree, a service label of a target company belonging to the same statistical attribute.
Optionally, in a specific application scenario, the contribution degree determining unit 704 is specifically configured to calculate a word score of each feature word according to a global word frequency and a local word frequency of each feature word corresponding to the feature word, and use the word score as a contribution degree of the feature word to the generation of the business label by each target company.
Optionally, in a specific application scenario, the contribution degree determining unit 704 is specifically configured to calculate a word score of each feature word according to a global word frequency and a local word frequency of each feature word corresponding to the feature word, based on y ═ f2 × log (1/f1), where f1 is the global word frequency, f2 is the local word frequency, and y is the word score.
Optionally, in a specific application scenario, the service tag determining unit 705 is specifically configured to: according to the contribution degree, screening a plurality of characteristic words with large contribution degree from all the characteristic words corresponding to the target company belonging to the same statistical attribute to serve as business keywords of the industry or the same region; and filtering the characteristic words of each target company based on the service key words, and only keeping the characteristic words which are the same as the service key words in the characteristic words of each target company as the service labels of the target companies.
Optionally, in a specific application scenario, the system further includes an interest tag generating unit, configured to determine, after determining, according to the contribution degree, a service tag of a target company belonging to the same statistical attribute, a target company to which a behavior of a target user points; and determining the interest tag of the target user according to the business tag of the target company pointed by the behavior of the target user.
Optionally, in a specific application scenario, the system further includes a weight calculation unit, configured to calculate weights assigned to different behaviors of the target user after determining, according to the contribution degree, a service tag of a target company belonging to the same statistical attribute;
optionally, in a specific application scenario, the interest tag generating unit is specifically configured to: and determining the interest tag of the target user according to the business tag of the target company pointed by the behavior of the target user based on the weights corresponding to different behaviors.
Optionally, in a specific application scenario, the weight calculating unit is specifically configured to determine the importance degree of the different behaviors and the timeliness of the different behaviors; and calculating weights distributed to different behaviors of the target user based on the importance degree and the timeliness.
The interest tag generation unit is specifically configured to, for a same target company to which different behaviors of the target user point, take a weight corresponding to each behavior as a contribution degree of each service tag of the target company to the determination of the interest tag; and sequencing the contribution degrees of all the business labels to the interest label determination aiming at all target companies pointed by different behaviors of the target user, and determining a plurality of business labels with large contribution degrees as the interest labels of the target user.
The embodiment of the present application further provides a computer storage medium, where a computer executable program is stored on the computer storage medium, and the computer executable program is executed to implement any one of the service tag generation methods in the embodiments of the present application.
Fig. 8 is a schematic structural diagram of an electronic terminal in an embodiment of the present application; as shown in fig. 8, the electronic terminal includes: a memory 801 and a processor 802, wherein the memory stores thereon a computer executable program, and the processor is configured to run the computer executable program to implement the method for generating the service tag in any embodiment of the present application.
The above-mentioned embodiments are only specific embodiments of the present application, and are used to illustrate the technical solutions of the present application, but not to limit the technical solutions, and the scope of the present application is not limited to the above-mentioned embodiments, although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present application and are intended to be covered by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. A method for generating a service label is characterized by comprising the following steps:
performing word segmentation processing on the description text of each target company to obtain a plurality of characteristic words;
determining the global word frequency of each characteristic word according to the occurrence frequency of each characteristic word in the description texts of all target companies;
determining the local word frequency of each characteristic word according to the frequency of the characteristic word appearing in the description text corresponding to the target company with the same statistical attribute;
determining the contribution degree of each feature word to the generation of the service label according to the global word frequency and the local word frequency of each corresponding feature word aiming at each target company belonging to the same statistical attribute;
and determining the service label of the target company belonging to the same statistical attribute according to the contribution degree.
2. The method of claim 1, wherein the determining the contribution degree of each feature word to the generation of the service tag according to the global word frequency and the local word frequency of each corresponding feature word comprises: and calculating the word score of each characteristic word according to the global word frequency and the local word frequency of each corresponding characteristic word, and taking the word score as the contribution degree of the characteristic word to each target company for generating the service label.
3. The method according to claim 5, wherein said calculating a word score of each feature word according to the global word frequency and the local word frequency of each feature word corresponding to the feature word comprises: and calculating the word score of each characteristic word according to the global word frequency and the local word frequency of each corresponding characteristic word based on y-f 2 log (1/f1), wherein f1 is the global word frequency, f2 is the local word frequency, and y is the word score.
4. The method according to claim 1, wherein the determining the service label of the target company belonging to the same statistical attribute according to the contribution degree comprises:
according to the contribution degree, screening a plurality of characteristic words with large contribution degree from all the characteristic words corresponding to the target companies belonging to the same statistical attribute to be used as business key words of the industry or the same region;
and filtering the characteristic words of each target company based on the service key words, and only keeping the characteristic words which are the same as the service key words in the characteristic words of each target company as the service labels of the target companies.
5. The method according to any one of claims 1 to 4, wherein after determining the service label of the target company belonging to the same statistical attribute according to the contribution degree, the method further comprises:
determining a target company pointed by the behavior of the target user;
and determining the interest tag of the target user according to the business tag of the target company pointed by the behavior of the target user.
6. The method according to claim 5, wherein after determining the service label of the target company belonging to the same statistical attribute according to the contribution degree, the method further comprises:
calculating weights distributed to different behaviors of the target user;
the determining the interest tag of the target user according to the business tag of the target company pointed by the behavior of the target user comprises the following steps: and determining the interest tag of the target user according to the business tag of the target company pointed by the behavior of the target user based on the weights corresponding to different behaviors.
7. The method of claim 6, wherein the calculating weights assigned to different behaviors of the target user comprises:
determining the importance degree of the different behaviors and the timeliness of the different behaviors;
and calculating the weight distributed to different behaviors of the target user based on the importance degree and the timeliness.
8. The method according to claim 6, wherein the determining the interest tag of the target user according to the business tag of the target company pointed to by the behavior of the target user based on the corresponding weights of the different behaviors comprises:
aiming at the same target company pointed by different behaviors of the target user, taking the weight corresponding to each behavior as the contribution degree of each business label of the target company to the determination of the interest label;
and sequencing the contribution degrees of all the service tags to the determined interest tags aiming at all target companies pointed by different behaviors of the target user, and determining a plurality of service tags with large contribution degrees as the interest tags of the target user.
9. An apparatus for generating a service tag, comprising:
the word segmentation unit is used for performing word segmentation processing on the description text of each target company to obtain a plurality of characteristic words;
the first word frequency statistical unit is used for determining the global word frequency of each characteristic word according to the frequency of the characteristic word appearing in the description texts of all target companies;
the second word frequency statistical unit is used for determining the local word frequency of each characteristic word according to the frequency of occurrence of each characteristic word in the description text corresponding to the target company with the same statistical attribute;
the contribution degree determining unit is used for determining the contribution degree of each feature word to the generation of the service label according to the global word frequency and the local word frequency of each corresponding feature word aiming at each target company belonging to the same statistical attribute;
and the service label determining unit is used for determining the service labels of the target companies belonging to the same statistical attribute according to the contribution degree.
10. A computer storage medium having stored thereon a computer-executable program that is executed to implement the method of any one of claims 1-8.
11. An electronic terminal, comprising a memory for storing a computer-executable program and a processor for executing the computer-executable program to perform the method of any of claims 1-8.
CN202210427937.1A 2022-04-22 2022-04-22 Service label generation method and device, computer storage medium and electronic terminal Active CN114780668B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210427937.1A CN114780668B (en) 2022-04-22 2022-04-22 Service label generation method and device, computer storage medium and electronic terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210427937.1A CN114780668B (en) 2022-04-22 2022-04-22 Service label generation method and device, computer storage medium and electronic terminal

Publications (2)

Publication Number Publication Date
CN114780668A true CN114780668A (en) 2022-07-22
CN114780668B CN114780668B (en) 2024-04-09

Family

ID=82431408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210427937.1A Active CN114780668B (en) 2022-04-22 2022-04-22 Service label generation method and device, computer storage medium and electronic terminal

Country Status (1)

Country Link
CN (1) CN114780668B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160239865A1 (en) * 2013-10-28 2016-08-18 Tencent Technology (Shenzhen) Company Limited Method and device for advertisement classification
CN109547863A (en) * 2018-10-22 2019-03-29 武汉斗鱼网络科技有限公司 A kind of labeling method of label, device, server and storage medium
CN111814486A (en) * 2020-07-10 2020-10-23 东软集团(上海)有限公司 Enterprise client tag generation method, system and device based on semantic analysis
CN112528022A (en) * 2020-12-09 2021-03-19 广州摩翼信息科技有限公司 Method for extracting characteristic words corresponding to theme categories and identifying text theme categories

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160239865A1 (en) * 2013-10-28 2016-08-18 Tencent Technology (Shenzhen) Company Limited Method and device for advertisement classification
CN109547863A (en) * 2018-10-22 2019-03-29 武汉斗鱼网络科技有限公司 A kind of labeling method of label, device, server and storage medium
CN111814486A (en) * 2020-07-10 2020-10-23 东软集团(上海)有限公司 Enterprise client tag generation method, system and device based on semantic analysis
CN112528022A (en) * 2020-12-09 2021-03-19 广州摩翼信息科技有限公司 Method for extracting characteristic words corresponding to theme categories and identifying text theme categories

Also Published As

Publication number Publication date
CN114780668B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN105989040B (en) Intelligent question and answer method, device and system
US9317613B2 (en) Large scale entity-specific resource classification
US20190370397A1 (en) Artificial intelligence based-document processing
US20070016581A1 (en) Category setting support method and apparatus
JP2013504118A (en) Information retrieval based on query semantic patterns
US20170242919A1 (en) Analysis of Unstructured Computer Text to Generate Themes and Determine Sentiment
WO2008098956A1 (en) Method and apparatus for automatically discovering features in free form heterogeneous data
CN110163376B (en) Sample detection method, media object identification method, device, terminal and medium
CN107291755B (en) Terminal pushing method and device
US20180081964A1 (en) Method and system for next word prediction
US20170228462A1 (en) Adaptive seeded user labeling for identifying targeted content
US20210097239A1 (en) System and method for solving text sensitivity based bias in language model
US20180329983A1 (en) Search apparatus and search method
US20180181658A1 (en) Method and apparatus for recognizing wifi names of points of interest
CN112966081A (en) Method, device, equipment and storage medium for processing question and answer information
US11403331B2 (en) Multi-term query subsumption for document classification
CN112395881A (en) Material label construction method and device, readable storage medium and electronic equipment
CN114222000B (en) Information pushing method, device, computer equipment and storage medium
CN110245357B (en) Main entity identification method and device
JP2017219899A (en) Knowledge search device, knowledge search method and knowledge search program
US20220050884A1 (en) Utilizing machine learning models to automatically generate a summary or visualization of data
JP2023145767A (en) Vocabulary extraction support system and vocabulary extraction support method
CN114780668B (en) Service label generation method and device, computer storage medium and electronic terminal
CN110647537A (en) Data searching method, device and storage medium
JP2015018372A (en) Expression extraction model learning device, expression extraction model learning method and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230731

Address after: Room 404-405, 504, Building B-17-1, Big data Industrial Park, Kecheng Street, Yannan High tech Zone, Yancheng, Jiangsu Province, 224000

Applicant after: Yancheng Tianyanchawei Technology Co.,Ltd.

Address before: 224000 room 501-503, building b-17-1, Xuehai road big data Industrial Park, Kecheng street, Yannan high tech Zone, Yancheng City, Jiangsu Province (CNK)

Applicant before: Yancheng Jindi Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant