WO2020248377A1 - Information pushing method and apparatus, computer readable storage medium, and computer device - Google Patents

Information pushing method and apparatus, computer readable storage medium, and computer device Download PDF

Info

Publication number
WO2020248377A1
WO2020248377A1 PCT/CN2019/103023 CN2019103023W WO2020248377A1 WO 2020248377 A1 WO2020248377 A1 WO 2020248377A1 CN 2019103023 W CN2019103023 W CN 2019103023W WO 2020248377 A1 WO2020248377 A1 WO 2020248377A1
Authority
WO
WIPO (PCT)
Prior art keywords
similarity
segmentation
candidate user
word
digital
Prior art date
Application number
PCT/CN2019/103023
Other languages
French (fr)
Chinese (zh)
Inventor
张二红
朱娜
郑哲青
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020248377A1 publication Critical patent/WO2020248377A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services

Definitions

  • This application relates to the field of data analysis technology. Specifically, this application relates to an information push method, device, computer-readable storage medium, and computer equipment.
  • Information push is a very hot technical direction of the Internet today.
  • the so-called information push is "web (World Wide Web) broadcasting", which is a new technology that reduces information overload by regularly transmitting information that users need on the Internet through certain technical standards or protocols.
  • Push technology reduces the time spent searching on the Internet by automatically transmitting information to users. It searches and filters information according to users' interests, and pushes it to users regularly, helping users to efficiently discover valuable information.
  • the server usually needs to push information based on the intimacy between colleagues, for example, push the information of one user's attention to other users who have a colleague relationship with the user and have a higher intimacy.
  • the method used is to match the full company name of the target user with the full company name of the candidate user. If the full company name is the same, the intimacy is determined to be higher and the candidate user Push the information of the target user to the target user, otherwise it is determined that the intimacy is low, and the information push of the candidate user is not performed, but this method has the disadvantage of low computational efficiency.
  • this application proposes an information push method, device, computer readable storage medium and computer equipment to improve the calculation efficiency of company name matching.
  • an information pushing method including:
  • the digital identity set of the candidate user is obtained by processing the full company name of the target user in the same manner;
  • an information pushing device including:
  • the full company name acquisition module is used to obtain the full company name of the candidate user
  • the word segmentation module is used to segment the full company name of the candidate user to obtain the word segmentation set of the candidate user;
  • a digital identity conversion module configured to convert each word segment in the word segmentation set of the candidate user into a corresponding digital identity to obtain the digital identity set of the candidate user;
  • the matching module is configured to match the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
  • the information push module is configured to push the candidate user's information to the target user when the matches are consistent.
  • the embodiments of the present application also provide a non-volatile computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, an information push method is implemented;
  • the information pushing method includes the following steps:
  • the digital identity set of the candidate user is obtained by processing the full company name of the target user in the same manner;
  • the embodiments of the present application also provide a computer device, the computer device including:
  • One or more processors are One or more processors;
  • Storage device for storing one or more programs
  • the one or more processors execute an information push method
  • the information pushing method includes the following steps:
  • the digital identity set of the candidate user is obtained by processing the full company name of the target user in the same manner;
  • the above-mentioned information push method, device, computer readable storage medium and computer equipment convert the full name of the company into a set of digital identifications, and replace the matching of the full name of the company with the matching of the digital identification of the word segmentation. Because the matching efficiency of the digital identification is higher than that of Chinese characters Matching efficiency, so the calculation efficiency of company name matching is greatly improved in this way.
  • FIG. 1 is a schematic diagram of an information pushing method according to an embodiment of the application
  • Figure 2 is a schematic diagram of an information pushing device according to an embodiment of the application.
  • Fig. 3 is a schematic diagram of a computer device according to an embodiment of the application.
  • FIG. 1 it is a schematic diagram of an information pushing method according to an embodiment, and the method includes:
  • the target user is the object to be pushed information
  • the candidate user is the possible object to push information to the target user.
  • the data source of the company's full name can be the information that the candidate user fills in the job search website, or the information that the candidate user directly enters in the interface, and so on.
  • Word Segmentation refers to the segmentation of a sequence of Chinese characters into individual words.
  • the full name of the company is segmented to obtain a number of individual words of the full name of the company.
  • the individual words constitute a word segmentation set.
  • S130 Convert each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier, and obtain the digital identifier set of the candidate user.
  • each character in the word segmentation set is converted into digital identifiers in a certain way to obtain the digital identification set of candidate users.
  • the full name of the target user’s company is processed in the same way to obtain the target user’s digital identification set, that is, the target user’s company full name is segmented according to the same word segmentation rules to obtain the target user’s word segmentation set, and the target user is classified according to the same conversion rule
  • Each word segmentation in the word segmentation set of is converted into a corresponding digital identifier, and a digital identifier set of candidate users is obtained.
  • the digital identification set of the candidate user is matched with the digital identification set of the target user.
  • the match is consistent, it means that the candidate user and the target user belong to the same company, and the candidate user's information, such as the candidate user's identity information, or the product information that the candidate user pays attention to, etc., is pushed to the target user. Otherwise, the candidate user's information is not pushed to the target user.
  • the full name of the company is converted into a set of digital IDs, and the matching of the full name of the company is replaced by the matching of the digital ID of the word segmentation. Since the matching efficiency of the digital ID is higher than the matching efficiency of the Chinese characters, the calculation efficiency is greatly improved by this method.
  • the full name of the company generally consists of region, trade name, industry, and concluding remarks. Therefore, in one embodiment, the full name of the candidate user’s company is segmented to obtain the word segmentation of the candidate user Collection, including:
  • Geography refers to geographic location information. Taking into account the limited number of regions, for regions, a regional vocabulary can be established in advance, such as countries, provinces, cities, etc. From the candidate user’s company full name, the character matching a word in the region’s thesaurus is filtered out, and this character is the region word segmentation.
  • Industry refers to the organizational structure system of business units or individuals engaged in production of the same nature in the national economy or other economic and social operations. Taking into account the limited number of industries, for industries, a vocabulary of industries can be established in advance, such as food, communications, finance, and so on. From the full company name of the candidate user, a character that matches a word in the industry's thesaurus is screened out, and the character is the industry participle.
  • the closing words are selected from the full company names of the candidate users; the closing words are used to describe the organizational form of the company.
  • Concluding remarks are used to describe the organizational form of the company, usually the last few characters in the company's full name. Considering the limited number of concluding remarks, for concluding remarks, a vocabulary of concluding remarks can be established in advance, such as head office, group, branch, limited company, etc. A character that matches a word in the closing word database is filtered from the candidate user's company full name, and this character is the closing word participle.
  • the participle obtained after removing the geographic participle, the industry participle and the closing word participle from the full company name of the candidate user is used as the business name participle.
  • a business name is a manifestation of the legal personality of the enterprise. Taking into account the diversity of business names, for business names, the remaining part of the field after the company's full name has been removed from the geographical participle, industry participle and closing word participle as the business name participle.
  • the word segmentation set of the candidate user is formed by the geographical word segmentation, industry word segmentation, closing word word segmentation, and business name word segmentation of the candidate user.
  • the word segmentation set of the target user can also be divided according to the division rules of region, business name, industry, and conclusion.
  • the converting each word segment in the word segmentation set of the candidate user into a corresponding digital identity to obtain the digital identity set of the candidate user includes:
  • S1301 Combine the preset digital identifier of the word database representing the region and the sequence number of the position of the regional word library in the word database representing the region to obtain the digital identifier of the regional word database.
  • the various words in the regional lexicon are arranged in a certain order, and each word has its own unique position number. Add the digital identifier of the regional lexicon before the position number to obtain the digital identifier of the regional word segmentation.
  • the digital identifier of the regional word database is 1, and the position number of the regional word database in the regional word database is 13, and the digital identifier of the regional word database is 113.
  • S1302 Combine the preset digital identifier of the word database that characterizes the industry with the position serial number of the industry word segmentation in the word database that characterizes the industry to obtain the digital identifier of the industry word segmentation.
  • a digital identity for the closing word library which is used to uniquely identify the closing word library.
  • the words in the ending word database are arranged in a certain order, and each word has its own unique position number. Add the digital ID of the ending word database in front of the position number to get the digital identity of the ending word participle.
  • the numerical identifier of the ending word database is 3, and the position number of the ending word participle in the ending word database is 13, and the numerical identity of the regional word participle is 313.
  • the Hanyu Pinyin alphabet Pre-set the Hanyu Pinyin alphabet.
  • the letters of the Hanyu Pinyin alphabet are arranged in a certain order, and each letter has its own unique position number.
  • Obtain the Hanyu Pinyin of the business name participle and then use the Hanyu Pinyin alphabet to find the position number of each letter in the Hanyu Pinyin alphabet of the business name participle, and combine these position numbers in the order of the Hanyu Pinyin of the business name participle.
  • the digital identification set of the candidate user is composed of the digital identification of the geographical word segmentation, the digital identification of the industry word segmentation, the digital identification of the closing word participle, and the digital identification of the business name word segmentation.
  • the matching the digital identity set of the candidate user with the digital identity set of the target user includes:
  • Homogeneous matching refers to: match the digital identifier of the candidate user's geographic segmentation with the digital identifier of the target user's geographic segmentation to obtain the similarity of the geographic segmentation; the digital identifier of the candidate user's business name segmentation and the digital identifier of the target user's business name segmentation Perform matching to obtain the similarity of the word segmentation of the business name; match the digital ID of the industry word segmentation of the candidate user with the digital ID of the target user's industry word segmentation to obtain the similarity of the industry word segmentation; the digital ID of the closing word segmentation of the candidate user and the closing word of the target user The digital identifier of the word segmentation is matched to obtain the similarity of the ending word segmentation.
  • S1402 Calculate the similarity of the geographical segmentation, the similarity of the industry segmentation, the weighting sum of the similarity of the ending word segmentation and the similarity of the business name segmentation; the weight corresponding to the similarity of the business name segmentation, the industry
  • the weight corresponding to the similarity of the word segmentation, the weight corresponding to the similarity of the regional word segmentation, and the weight corresponding to the similarity of the ending word segmentation are sequentially reduced.
  • weight sum weight of business name participle * similarity of business name participle + weight of industry participle * similarity of industry participle + weight of geographical participle * similarity of geographical participle + weight of ending word participle* The similarity of the ending participle.
  • the colleague can be the colleague of the current employer or the colleague of the former company. Determine whether the full company name of the target user and the candidate user are the same, that is, compare the full company name of the target user with the full company name of the candidate user. If the two are consistent, they belong to the same company, otherwise they do not belong to the same company.
  • the pushing the candidate user information to the target user includes:
  • Colleague relationship intimacy is used to characterize the degree of intimacy between colleagues.
  • This application considers the work information to obtain the work information of candidate users and the work information of target users.
  • the job information includes year of employment, position, project name, work experience, and work city, etc.
  • the obtained job information and the company name used in the above steps can be stored in the form of feature vectors, for example, [company name, year of employment, position, project name, work experience, work city].
  • S1502. Calculate the similarity between the job information of the candidate user and the job information of the target user.
  • the similarity between the job information of the candidate user and the job information of the target user can be calculated using the existing method in the prior art.
  • the job information includes multiple contents, for example, the job information includes the year of employment, position, project name, work experience, and work city, in one embodiment, the calculation of the candidate user’s job information and the target user’s job
  • the similarity between information including:
  • Homogeneous matching refers to: the candidate user’s working year is matched with the target user’s working year to obtain the similarity of the working year; the candidate user’s position is matched with the target user’s position to obtain the similarity of the position; the project name of the candidate user Match with the project name of the target user to obtain the similarity of the project name; match the work experience of the candidate user with the work experience of the target user to obtain the similarity of the work experience; match the work city of the candidate user with the work city of the target user , To obtain the similarity of the working city.
  • S1502b Calculate the weighted sum of the similarity of the employment year, the similarity of the post, the similarity of the project name, the similarity of the work experience and the similarity of the working city to obtain the candidate user The degree of similarity between the work information of and the work information of the target user.
  • the similarity of the work information between the target user and the candidate user can be calculated according to the calculation formula: the weight of a certain item of content contained in the work information * the similarity of the content.
  • the similarity when calculating the similarity between the years of employment, can be determined according to the number of overlapping years of employment. For example, if the target user and the candidate user have overlapping years of employment for 3 years, the similarity is 3.
  • a vector of the position is generated, and the similarity between the vectors of the two positions is calculated.
  • the specific calculation method can be implemented according to the existing method in the prior art.
  • the similarity of project names when calculating the similarity of project names, generate a vector of project names, and calculate the similarity between the vectors of two project names; or, set the similarity of identical project names to 1, and set all the rest Is 0; or, the similarity of the same project name is set to 1, and the rest is judged whether it belongs to the project name related to the project name of the target user. If it belongs to the related project name, the similarity is set to less than 1 and A certain value greater than 0, otherwise set to 0, etc.
  • the similarity of the working city is set to 1, and all the others are set to 0.
  • the interactive relationship information is used to describe other users who interact with a user, such as a user's post promoter, or a user's background investigation certifier.
  • a user's post promoter such as a user's post promoter
  • a user's background investigation certifier such as a user's post promoter
  • work information such as a data source for determining whether to push information. Therefore, in an embodiment, the calculating the similarity between the job information of the candidate user and the job information of the target user includes:
  • the similarity of interactive relationship information there are many ways to calculate the similarity of interactive relationship information. For example, when calculating the similarity of the post introducer, if the candidate user's post introducer is the same as the target user, the similarity is set to 1, otherwise it is set to 0. However, considering that the target user may be the candidate user’s position referrer, or the candidate user may be the target user’s position referrer, in this case the intimacy between the two is relatively high, so the similarity can be calculated first Determine whether one of the users is another user’s post introducer, if so, directly set the similarity to a larger value, for example 2, if not, then determine the candidate user’s post introductor and the target user’s position Whether the interpolators are the same, the similarity is set to 1, otherwise the similarity is set to 0. The similarity calculation method of other information in the interactive relationship information is similar.
  • the similarity between the work information and the similarity between the interactive relationship information are added to obtain The final similarity, or multiplying the similarity between the work information and the similarity between the interactive relationship information to obtain the final similarity, and so on.
  • this application also provides an information push device.
  • the specific implementation of the device of this application will be described in detail below with reference to the accompanying drawings.
  • FIG. 2 it is a schematic diagram of an information pushing device of an embodiment, and the device includes:
  • the full company name obtaining module 210 is used to obtain the full company name of the candidate user
  • the word segmentation module 220 is configured to segment the full name of the candidate user's company to obtain the word segmentation set of the candidate user;
  • the digital identity conversion module 230 is configured to convert each word segment in the word segmentation set of the candidate user into a corresponding digital identity to obtain the digital identity set of the candidate user;
  • the matching module 240 is configured to match the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
  • the information push module 250 is configured to push the candidate user's information to the target user when the matches are consistent.
  • the word segmentation module 220 includes:
  • the geographical word segmentation acquisition module is used to screen out the geographical word segmentation from the full company name of the candidate user according to the preset word database representing the geographical area;
  • the industry word segmentation acquisition module is used to screen out the industry word segmentation from the full company names of the candidate users according to the preset word database representing the industry;
  • the concluding word segmentation acquisition module is used to filter the concluding word segmentation from the full company name of the candidate user according to the preset vocabulary that characterizes the concluding sentence; the concluding sentence is used to describe the organizational form of the company;
  • the business name segmentation acquisition module is used to remove the geographic segmentation, the industry segmentation, and the closing word segmentation from the full company name of the candidate user as the business name segmentation;
  • the combination module is used to form the candidate user's word segmentation set by the candidate user's geographic word segmentation, industry word segmentation, closing sentence word segmentation and business name word segmentation.
  • the matching module 240 includes:
  • the matching unit is used to combine the digital IDs of the geographical word segmentation, the digital IDs of the industry word segmentation, the digital IDs of the closing word segmentation, and the digital IDs of the business name segmentation in the digital ID set of the candidate users with the digital IDs of the target user Perform similar matching of the digital identifier of the regional word segmentation, the digital identifier of the industry segmentation, the digital identifier of the closing word segmentation, and the digital ID of the business name segmentation to obtain the similarity of the regional word segmentation, the similarity of the industry word segmentation, the similarity of the ending word segmentation and the number of the business name segmentation Similarity
  • the weight and calculation unit is used to calculate the weighted sum of the similarity of the geographical segmentation, the similarity of the industry segmentation, the similarity of the ending word segmentation and the similarity of the business name segmentation; the similarity of the business name segmentation corresponds to The weight, the weight corresponding to the similarity of the industry segmentation, the weight corresponding to the similarity of the geographic segmentation, and the weight corresponding to the similarity of the ending word segmentation are sequentially reduced;
  • the judging unit is used for judging that the matching is consistent when the weight sum is greater than the first preset value; otherwise, judging that it does not match.
  • the digital identity conversion module 230 includes:
  • the regional word segmentation digital identification obtaining unit is configured to combine the preset digital identification of the word database that characterizes the region and the position number of the regional word database in the word database that characterizes the region to obtain the digital identity of the regional word database;
  • the industry word segmentation digital identification obtaining unit is used to combine the preset digital identification of the word database that characterizes the industry with the position number of the industry word segmentation in the word database that characterizes the industry to obtain the digital identification of the industry word segmentation;
  • the ending word participle digital identification obtaining unit is used to combine the preset digital identification of the ending word database and the position number of the ending word in the ending word database to obtain the ending word participle's digital identity;
  • the business name segmentation digital identifier obtaining unit is used to obtain the Chinese pinyin of the business name segmentation, and combine the position numbers of the letters in the Chinese pinyin in the Hanyu Pinyin alphabet to obtain the digital identifier of the business name segmentation;
  • the combination unit is configured to form the digital identification set of the candidate user by the digital identification of the geographical word segmentation, the digital identification of the industry word segmentation, the digital identification of the closing sentence word segmentation, and the digital identification of the business name word segmentation.
  • the information push module 250 includes:
  • a work information obtaining unit configured to obtain work information of the candidate user in the company indicated by the full company name of the candidate, and work information of the target user in the company indicated by the full company name of the target user;
  • a similarity calculation unit configured to calculate the similarity between the work information of the candidate user and the work information of the target user
  • the pushing unit is configured to push the candidate user's information to the target user when the similarity is greater than a second preset value.
  • the similarity calculation unit includes:
  • the interactive information acquisition subunit is used to acquire the interactive relationship information of the candidate user and the interactive relationship information of the target user;
  • the first similarity calculation subunit is used to calculate the similarity between the interactive relationship information of the candidate user and the interactive relationship information of the target user;
  • the second similarity calculation subunit is used to obtain the final similarity between the candidate user and the target user according to the similarity between the work information and the similarity between the interactive relationship information.
  • the work information includes year of employment, position, project name, work experience, and work city;
  • the similarity calculation unit includes:
  • the matching subunit is used to match the year of employment, position, project name, work experience, and work city of the candidate user with the year of employment, position, project name, work experience, and work city of the target user to obtain the incumbency Yearly similarity, job similarity, project name similarity, work experience similarity and work city similarity;
  • the weight and calculation subunit is used to calculate the weight sum of the similarity of the working year, the similarity of the post, the similarity of the project name, the similarity of the work experience and the similarity of the working city To obtain the similarity between the job information of the candidate user and the job information of the target user.
  • the embodiment of the present application also provides a non-volatile computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, an information pushing method is implemented, wherein the information pushing method includes the following steps: Obtain the full company name of the candidate user; perform word segmentation on the full company name of the candidate user to obtain the word segmentation set of the candidate user; convert each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the candidate user The digital identity set of the candidate user is matched with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full name of the target user's company in the same way; if the matches are consistent, Push the candidate user's information to the target user.
  • the storage medium includes, but is not limited to, any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory), RAM (Random AccesSS Memory), and then Memory), EPROM (EraSable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), EEPROM (Electrically EraSable Programmable Read-Only Memory), flash memory, magnetic card or optical card. That is, the storage medium includes any medium that stores or transmits information in a readable form by a device (for example, a computer). It can be a read-only memory, magnetic disk or optical disk, etc.
  • An embodiment of the present application also provides a computer device, which includes:
  • One or more processors are One or more processors;
  • Storage device for storing one or more programs
  • the one or more processors implement an information pushing method, wherein the information pushing method includes the following steps: obtaining candidate user information Full company name; word segmentation of the candidate user’s company name to obtain the word segmentation set of the candidate user; each word segmentation in the word segmentation set of the candidate user is converted into a corresponding digital identifier to obtain the candidate user’s digital identification set Matching the set of digital identities of the candidate users with the set of digital identities of the target user; the set of digital identities of the target user is obtained by processing the full name of the target user's company in the same way; if the match is consistent, the candidate The user's information is pushed to the target user.
  • the information pushing method includes the following steps: obtaining candidate user information Full company name; word segmentation of the candidate user’s company name to obtain the word segmentation set of the candidate user; each word segmentation in the word segmentation set of the candidate user is converted into a corresponding digital identifier to obtain the candidate user’s digital identification set Matching the set of digital identities of the candidate users with the set
  • FIG. 3 is a schematic diagram of the structure of the computer equipment of this application, including a processor 320, a storage device 330, an input unit 340, a display unit 350 and other devices.
  • the storage device 330 may be used to store the application program 310 and various functional modules.
  • the processor 320 runs the application program 310 stored in the storage device 330 to execute various functional applications and data processing of the device.
  • the storage device 330 may be an internal memory or an external memory, or include both internal memory and external memory.
  • the internal memory may include read-only memory, programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or random access memory.
  • External storage can include hard disks, floppy disks, ZIP disks, U disks, tapes, etc.
  • the storage devices disclosed in this application include but are not limited to these types of storage devices.
  • the storage device 330 disclosed in this application is merely an example and not a limitation.
  • the input unit 340 is used to receive the input of the signal and the full name of the company.
  • the input unit 340 may include a touch panel and other input devices.
  • the touch panel can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc., to operate on the touch panel or near the touch panel), and according to preset
  • the program drives the corresponding connection device; other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as playback control buttons, switch buttons, etc.), trackball, mouse, and joystick.
  • the display unit 350 may be used to display information input by the user or information provided to the user and various menus of the computer device.
  • the display unit 350 may take the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the processor 320 is the control center of the computer equipment. It uses various interfaces and lines to connect the various parts of the entire computer, runs or executes the software programs and/or modules stored in the storage device 330, and calls data stored in the storage device. , Perform various functions and process data.
  • the computer device includes one or more processors 320, one or more storage devices 330, and one or more application programs 310, where the one or more application programs 310 are stored in the storage device 330. It is configured to be executed by the one or more processors 320, and the one or more application programs 310 are configured to execute the information pushing method described in the above embodiments.

Abstract

An information pushing method and apparatus, a computer readable storage medium, and a computer device, which are applied to the technical field of data analysis and improve the computational efficiency of company name matching. The method comprises: obtaining a company full name of a candidate user (S110); performing word segmentation on the company full name of the candidate user to obtain a word segmentation set of the candidate user (S120); converting each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier to obtain a digital identifier set of the candidate user (S130); matching the digital identifier set of the candidate user with a digital identifier set of a target user, the digital identifier set of the target user being obtained by processing the company full name of the target user in the same approach (S140); and if matching is successful, pushing the information of the candidate user to the target user (S150).

Description

信息推送方法、装置、计算机可读存储介质和计算机设备Information pushing method, device, computer readable storage medium and computer equipment
本申请要求于2019年6月14日提交中国专利局、申请号为201910517834.2,发明名称为“信息推送方法、装置、计算机可读存储介质和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on June 14, 2019, the application number is 201910517834.2, and the invention title is "information push method, device, computer readable storage medium and computer equipment", and its entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及数据分析技术领域,具体而言,本申请涉及一种信息推送方法、装置、计算机可读存储介质和计算机设备。This application relates to the field of data analysis technology. Specifically, this application relates to an information push method, device, computer-readable storage medium, and computer equipment.
背景技术Background technique
信息推送是当今互联网非常火热的一个技术方向。所谓信息推送,就是"web(World Wide Web,万维网)广播",是通过一定的技术标准或协议,在互联网上通过定期传送用户需要的信息来减少信息过载的一项新技术。推送技术通过自动传送信息给用户,来减少用于网络上搜索的时间。它根据用户的兴趣来搜索、过滤信息,并将其定期推给用户,帮助用户高效率地发掘有价值的信息。Information push is a very hot technical direction of the Internet today. The so-called information push is "web (World Wide Web) broadcasting", which is a new technology that reduces information overload by regularly transmitting information that users need on the Internet through certain technical standards or protocols. Push technology reduces the time spent searching on the Internet by automatically transmitting information to users. It searches and filters information according to users' interests, and pushes it to users regularly, helping users to efficiently discover valuable information.
基于对信息推送的准确性的需求,用户之间的关系亲密度变得越来越重要,尤其是用户之间的同事关系亲密度。服务器通常需要基于同事之间的亲密度来进行信息推送,比如,将其中一个用户关注的信息推送给与该用户具有同事关系且亲密度较高的其它用户。Based on the demand for the accuracy of information push, the intimacy between users has become more and more important, especially the intimacy of colleagues between users. The server usually needs to push information based on the intimacy between colleagues, for example, push the information of one user's attention to other users who have a colleague relationship with the user and have a higher intimacy.
发明人意识到,目前在判断同事之间的亲密度时,采用的方式为将目标用户的公司全称和候选用户的公司全称进行匹配,若是公司全称相同,则判定亲密度较高,将候选用户的信息推送给目标用户,否则判定亲密度较低,不执行候选用户的信息推送,但是该种方式存在计算效率较低的缺陷。The inventor realizes that currently, when judging the intimacy between colleagues, the method used is to match the full company name of the target user with the full company name of the candidate user. If the full company name is the same, the intimacy is determined to be higher and the candidate user Push the information of the target user to the target user, otherwise it is determined that the intimacy is low, and the information push of the candidate user is not performed, but this method has the disadvantage of low computational efficiency.
发明内容Summary of the invention
本申请针对现有方式的缺点,提出一种信息推送方法、装置、计算机可读存储介质和计算机设备,以提高公司名称匹配的计算效率。In view of the shortcomings of the existing methods, this application proposes an information push method, device, computer readable storage medium and computer equipment to improve the calculation efficiency of company name matching.
本申请的实施例根据第一个方面,提供了一种信息推送方法,包括:According to the first aspect, the embodiments of this application provide an information pushing method, including:
获取候选用户的公司全称;Get the full name of the company of the candidate user;
对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;Perform word segmentation on the full company name of the candidate user to obtain a word segmentation set of the candidate user;
将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;Converting each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the digital identifier set of the candidate user;
将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹 配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到;Matching the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
若匹配一致,将所述候选用户的信息推送给所述目标用户。If the matches are consistent, push the candidate user information to the target user.
本申请的实施例根据第二个方面,还提供了一种信息推送装置,包括:According to the second aspect, the embodiments of the present application also provide an information pushing device, including:
公司全称获取模块,用于获取候选用户的公司全称;The full company name acquisition module is used to obtain the full company name of the candidate user;
分词模块,用于对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;The word segmentation module is used to segment the full company name of the candidate user to obtain the word segmentation set of the candidate user;
数字标识转换模块,用于将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;A digital identity conversion module, configured to convert each word segment in the word segmentation set of the candidate user into a corresponding digital identity to obtain the digital identity set of the candidate user;
匹配模块,用于将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到;The matching module is configured to match the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
信息推送模块,用于在匹配一致时,将所述候选用户的信息推送给所述目标用户。The information push module is configured to push the candidate user's information to the target user when the matches are consistent.
本申请的实施例根据第三个方面,还提供了一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现一种信息推送方法;According to the third aspect, the embodiments of the present application also provide a non-volatile computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, an information push method is implemented;
其中,所述信息推送方法包括以下步骤:Wherein, the information pushing method includes the following steps:
获取候选用户的公司全称;Get the full name of the company of the candidate user;
对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;Perform word segmentation on the full company name of the candidate user to obtain a word segmentation set of the candidate user;
将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;Converting each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the digital identifier set of the candidate user;
将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到;Matching the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
若匹配一致,将所述候选用户的信息推送给所述目标用户。If the matches are consistent, push the candidate user information to the target user.
本申请的实施例根据第四个方面,还提供了一种计算机设备,所述计算机设备包括:According to the fourth aspect, the embodiments of the present application also provide a computer device, the computer device including:
一个或多个处理器;One or more processors;
存储装置,用于存储一个或多个程序,Storage device for storing one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器执行一种信息推送方法;When the one or more programs are executed by the one or more processors, the one or more processors execute an information push method;
其中,所述信息推送方法包括以下步骤:Wherein, the information pushing method includes the following steps:
获取候选用户的公司全称;Get the full name of the company of the candidate user;
对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;Perform word segmentation on the full company name of the candidate user to obtain a word segmentation set of the candidate user;
将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;Converting each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the digital identifier set of the candidate user;
将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全 称处理得到;Matching the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
若匹配一致,将所述候选用户的信息推送给所述目标用户。If the matches are consistent, push the candidate user information to the target user.
上述的信息推送方法、装置、计算机可读存储介质和计算机设备,将公司全称转换成数字标识集合,通过分词的数字标识匹配代替公司全称的匹配,由于数字标识的匹配效率要高于汉字字符的匹配效率,因此通过该种方式大大提高了公司名称匹配的计算效率。The above-mentioned information push method, device, computer readable storage medium and computer equipment convert the full name of the company into a set of digital identifications, and replace the matching of the full name of the company with the matching of the digital identification of the word segmentation. Because the matching efficiency of the digital identification is higher than that of Chinese characters Matching efficiency, so the calculation efficiency of company name matching is greatly improved in this way.
附图说明Description of the drawings
图1为本申请一个实施例的信息推送方法的示意图;FIG. 1 is a schematic diagram of an information pushing method according to an embodiment of the application;
图2为本申请一个实施例的信息推送装置的示意图;Figure 2 is a schematic diagram of an information pushing device according to an embodiment of the application;
图3为本申请一个实施例的计算机设备的示意图。Fig. 3 is a schematic diagram of a computer device according to an embodiment of the application.
具体实施方式Detailed ways
如图1所示,为一实施例的信息推送方法的示意图,该方法包括:As shown in FIG. 1, it is a schematic diagram of an information pushing method according to an embodiment, and the method includes:
S110、获取候选用户的公司全称。S110. Obtain the full name of the company of the candidate user.
本申请中,目标用户为被推送信息的对象,候选用户为可能的向目标用户推送信息的对象。在需要确定是否可以向目标用户推送候选用户的信息时,首先获取候选用户的公司全称。公司全称的数据来源可以是候选用户在求职网站上填写的信息,也可以是候选用户直接在界面中输入的信息等等。In this application, the target user is the object to be pushed information, and the candidate user is the possible object to push information to the target user. When it is necessary to determine whether the candidate user's information can be pushed to the target user, first obtain the candidate user's company name. The data source of the company's full name can be the information that the candidate user fills in the job search website, or the information that the candidate user directly enters in the interface, and so on.
S120、对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合。S120. Perform word segmentation on the full company name of the candidate user to obtain a word segmentation set of the candidate user.
分词(Word Segmentation)指的是将一个汉字序列切分成一个个单独的词。对公司全称进行分词,获得公司全称的若干个单独的词,该若干个单独的词构成分词集合。Word Segmentation refers to the segmentation of a sequence of Chinese characters into individual words. The full name of the company is segmented to obtain a number of individual words of the full name of the company. The individual words constitute a word segmentation set.
S130、将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合。S130: Convert each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier, and obtain the digital identifier set of the candidate user.
考虑到数字标识的匹配效率一般高于汉字字符的匹配效率,因此得到候选用户的分词集合后,将分词集合中各个字符按照一定的方式转换成数字标识,得到候选用户的数字标识集合。Considering that the matching efficiency of digital identifiers is generally higher than that of Chinese characters, after obtaining the word segmentation set of candidate users, each character in the word segmentation set is converted into digital identifiers in a certain way to obtain the digital identification set of candidate users.
S140、将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到。S140. Match the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner.
通过同一方式对目标用户的公司全称进行处理,获得目标用户的数字标识集合,即按照相同的分词规则对目标用户的公司全称进行分词,获得目标用户的分词集合,按照相同的转换规则将目标用户的分词集合中各个分词转换为对应的数字标识,获得候选用户的数字标识集合。为了判定目标用户和候选用户是否属于同一个公司,将候选用户的数字标识集合和目标用户的数字标识集合进行匹配。The full name of the target user’s company is processed in the same way to obtain the target user’s digital identification set, that is, the target user’s company full name is segmented according to the same word segmentation rules to obtain the target user’s word segmentation set, and the target user is classified according to the same conversion rule Each word segmentation in the word segmentation set of is converted into a corresponding digital identifier, and a digital identifier set of candidate users is obtained. In order to determine whether the target user and the candidate user belong to the same company, the digital identification set of the candidate user is matched with the digital identification set of the target user.
S150、若匹配一致,将所述候选用户的信息推送给所述目标用户。S150: If the matches are consistent, push the candidate user information to the target user.
如果匹配一致,说明候选用户与目标用户属于同一个公司,将候选用户的信息,例如候选用户的身份信息,或者候选用户关注的产品信息,等等,推送给目标用户。否则,不将候选用户的信息推送给目标用户。If the match is consistent, it means that the candidate user and the target user belong to the same company, and the candidate user's information, such as the candidate user's identity information, or the product information that the candidate user pays attention to, etc., is pushed to the target user. Otherwise, the candidate user's information is not pushed to the target user.
本实施例将公司全称转换成数字标识集合,通过分词的数字标识匹配代替公司全称的匹配,由于数字标识的匹配效率要高于汉字字符的匹配效率,因此通过该种方式大大提高了计算效率。In this embodiment, the full name of the company is converted into a set of digital IDs, and the matching of the full name of the company is replaced by the matching of the digital ID of the word segmentation. Since the matching efficiency of the digital ID is higher than the matching efficiency of the Chinese characters, the calculation efficiency is greatly improved by this method.
本申请的申请人经研究发现,公司全称一般由地域、商号、行业和结束语构成,因此,在一个实施例中,所述对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合,包括:The applicant of this application has discovered through research that the full name of the company generally consists of region, trade name, industry, and concluding remarks. Therefore, in one embodiment, the full name of the candidate user’s company is segmented to obtain the word segmentation of the candidate user Collection, including:
S1201、根据预先设置的表征地域的词库,从所述候选用户的公司全称中筛选出地域分词。S1201, according to a pre-set word database that characterizes a region, select a region word segmentation from the full company name of the candidate user.
地域指的是地理位置信息。考虑到地域的数量有限性,对于地域,可以预先建立地域的词库,例如国家、省份、城市等。从候选用户的公司全称中筛选出与该地域的词库中某一词语匹配的字符,该字符即为地域分词。Geography refers to geographic location information. Taking into account the limited number of regions, for regions, a regional vocabulary can be established in advance, such as countries, provinces, cities, etc. From the candidate user’s company full name, the character matching a word in the region’s thesaurus is filtered out, and this character is the region word segmentation.
S1202、根据预先设置的表征行业的词库,从所述候选用户的公司全称中筛选出行业分词。S1202, according to the preset word database that characterizes the industry, select industry word segmentation from the full company name of the candidate user.
行业是指从事国民经济中同性质的生产或其他经济社会的经营单位或者个体的组织结构体系。考虑到行业的数量有限性,对于行业,可以预先建立行业的词库,例如食品、通信、金融等等。从候选用户的公司全称中筛选出与该行业的词库中某一词语匹配的字符,该字符即为行业分词。Industry refers to the organizational structure system of business units or individuals engaged in production of the same nature in the national economy or other economic and social operations. Taking into account the limited number of industries, for industries, a vocabulary of industries can be established in advance, such as food, communications, finance, and so on. From the full company name of the candidate user, a character that matches a word in the industry's thesaurus is screened out, and the character is the industry participle.
S1203、根据预先设置的表征结束语的词库,从所述候选用户的公司全称中筛选出结束语分词;所述结束语用于描述公司的组织形式。S1203. According to the preset vocabulary that characterizes the closing words, the closing words are selected from the full company names of the candidate users; the closing words are used to describe the organizational form of the company.
结束语用于描述公司的组织形式,一般为公司全称中的后面几个字符。考虑到结束语的数量有限性,对于结束语,可以预先建立结束语的词库,例如总公司、集团、分公司、有限公司等等。从候选用户的公司全称中筛选出与该结束语的词库中某一词语匹配的字符,该字符即为结束语分词。Concluding remarks are used to describe the organizational form of the company, usually the last few characters in the company's full name. Considering the limited number of concluding remarks, for concluding remarks, a vocabulary of concluding remarks can be established in advance, such as head office, group, branch, limited company, etc. A character that matches a word in the closing word database is filtered from the candidate user's company full name, and this character is the closing word participle.
S1204、将从所述候选用户的公司全称中去除所述地域分词、所述行业分词和所述结束语分词之后得到的分词作为商号分词。S1204. The participle obtained after removing the geographic participle, the industry participle and the closing word participle from the full company name of the candidate user is used as the business name participle.
商号作为企业特定化的标志,是企业具有法律人格的表现。考虑到商号的多样性,对于商号,将公司全称去除地域分词、行业分词和结束语分词后剩余的部分字段作为商号分词。As a sign of enterprise specificity, a business name is a manifestation of the legal personality of the enterprise. Taking into account the diversity of business names, for business names, the remaining part of the field after the company's full name has been removed from the geographical participle, industry participle and closing word participle as the business name participle.
S1205、由所述候选用户的地域分词、行业分词、结束语分词和商号分词构成所述候选用户的分词集合。S1205. The word segmentation set of the candidate user is formed by the geographical word segmentation, industry word segmentation, closing word word segmentation, and business name word segmentation of the candidate user.
同理,目标用户的分词集合同样可以按照地域、商号、行业和结束语的划分规则进行划分。In the same way, the word segmentation set of the target user can also be divided according to the division rules of region, business name, industry, and conclusion.
为了提高数字标识转换的效率,在一个实施例中,所述将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合,包括:In order to improve the efficiency of digital identity conversion, in one embodiment, the converting each word segment in the word segmentation set of the candidate user into a corresponding digital identity to obtain the digital identity set of the candidate user includes:
S1301、将预先设置的表征地域的词库的数字标识与所述地域分词在表征地域的词库中的位置序号组合,得到所述地域分词的数字标识。S1301 Combine the preset digital identifier of the word database representing the region and the sequence number of the position of the regional word library in the word database representing the region to obtain the digital identifier of the regional word database.
可以预先为地域词库、行业词库、结束语词库设置各自的数字身份标识,用于唯一标识对应的词库,例如,将地域词库的数字标识设置为1,将行业词库的数字标识设置为2,将结束语词库的数字标识设置为3。地域词库中的各个词语按照一定的顺序排列,每个词语都有自己唯一的位置序号,在该位置序号前面加上地域词库的数字标识,就得到地域分词的数字标识。例如,地域词库的数字标识为1,地域分词在地域词库中的位置序号为13,则地域分词的数字标识为113。You can set up their respective digital IDs for the regional thesaurus, industry thesaurus, and concluding thesaurus in advance to uniquely identify the corresponding thesaurus, for example, set the digital ID of the regional thesaurus to 1, and set the digital ID of the industry thesaurus Set to 2, and set the digital ID of the ending word library to 3. The various words in the regional lexicon are arranged in a certain order, and each word has its own unique position number. Add the digital identifier of the regional lexicon before the position number to obtain the digital identifier of the regional word segmentation. For example, the digital identifier of the regional word database is 1, and the position number of the regional word database in the regional word database is 13, and the digital identifier of the regional word database is 113.
S1302、将预先设置的表征行业的词库的数字标识与所述行业分词在表征行业的词库中的位置序号组合,得到所述行业分词的数字标识。S1302: Combine the preset digital identifier of the word database that characterizes the industry with the position serial number of the industry word segmentation in the word database that characterizes the industry to obtain the digital identifier of the industry word segmentation.
预先为行业词库设置数字身份标识,用于唯一标识行业词库。行业词库中的各个词语按照一定的顺序排列,每个词语都有自己唯一的位置序号,在该位置序号前面加上行业词库的数字标识,就得到行业分词的数字标识。例如,行业词库的数字标识为2,地域分词在地域词库中的位置序号为13,则地域分词的数字标识为213。Set up a digital identity for the industry thesaurus in advance to uniquely identify the industry thesaurus. The various words in the industry thesaurus are arranged in a certain order, and each word has its own unique position number. Add the digital ID of the industry thesaurus before the position number to get the digital ID of the industry word segmentation. For example, the digital ID of the industry word database is 2, and the position number of the regional word database in the regional word database is 13, and the digital ID of the regional word database is 213.
S1303、将预先设置的表征结束语的词库的数字标识与所述结束语分词在表征结束语的词库中的位置序号组合,得到所述结束语分词的数字标识。S1303. Combine the preset digital identifier of the word bank that characterizes the ending sentence and the position number of the ending word participle in the word bank that characterizes the ending word to obtain the digital identifier of the ending word participle.
预先为结束语词库设置数字身份标识,用于唯一标识结束语词库。结束语词库中的各个词语按照一定的顺序排列,每个词语都有自己唯一的位置序号,在该位置序号前面加上结束语词库的数字标识,就得到结束语分词的数字标识。例如,结束语词库的数字标识为3,结束语分词在结束语词库中的位置序号为13,则地域分词的数字标识为313。Pre-set a digital identity for the closing word library, which is used to uniquely identify the closing word library. The words in the ending word database are arranged in a certain order, and each word has its own unique position number. Add the digital ID of the ending word database in front of the position number to get the digital identity of the ending word participle. For example, the numerical identifier of the ending word database is 3, and the position number of the ending word participle in the ending word database is 13, and the numerical identity of the regional word participle is 313.
S1304、获取所述商号分词的汉语拼音,将所述汉语拼音中各个字母在汉语拼音字母表中的位置序号组合,得到所述商号分词的数字标识。S1304. Obtain the Hanyu Pinyin of the business name participle, and combine the position numbers of each letter in the Hanyu Pinyin alphabet in the Hanyu Pinyin alphabet to obtain the digital ID of the business name participle.
预先设置汉语拼音字母表,汉语拼音字母表中各个字母按照一定的顺序排列,每个字母都有自己唯一的位置序号。获取商号分词的汉语拼音,然后通过该汉语拼音字母表查找商号分词的汉语拼音中每个字母在汉语拼音字母表中的位置序号,按照商号分词的汉语拼音的顺序对这些位置序号进行组合,就得到商号分词的数字标识。Pre-set the Hanyu Pinyin alphabet. The letters of the Hanyu Pinyin alphabet are arranged in a certain order, and each letter has its own unique position number. Obtain the Hanyu Pinyin of the business name participle, and then use the Hanyu Pinyin alphabet to find the position number of each letter in the Hanyu Pinyin alphabet of the business name participle, and combine these position numbers in the order of the Hanyu Pinyin of the business name participle. Get the digital ID of the business name segmentation.
S1305、由所述地域分词的数字标识、所述行业分词的数字标识、所述结束语分词的数字标识和所述商号分词的数字标识构成所述候选用户的数字标识集合。S1305. The digital identification set of the candidate user is composed of the digital identification of the geographical word segmentation, the digital identification of the industry word segmentation, the digital identification of the closing word participle, and the digital identification of the business name word segmentation.
在一个实施例中,所述将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配,包括:In an embodiment, the matching the digital identity set of the candidate user with the digital identity set of the target user includes:
S1401、将所述候选用户的数字标识集合中的地域分词的数字标识、行业分词的数字标识、结束语分词的数字标识、商号分词的数字标识,与所述目标用户的数字标识集合中的地域分词的数字标识、行业分词的数字 标识、结束语分词的数字标识、商号分词的数字标识进行同类匹配,获得地域分词的相似度、行业分词的相似度、结束语分词的相似度和商号分词的相似度。S1401. Combine the digital identifiers of the geographic segmentation, the digital identifiers of the industry segmentation, the digital identifiers of the closing word segmentation, and the digital identifiers of the business name segmentation in the digital identifier set of the candidate user with the geographic segmentation in the digital identifier set of the target user Perform similar matching of the digital identifier of the digital identifier, the digital identifier of the industry word segmentation, the digital identifier of the closing word segmentation, and the digital identifier of the business name segmentation to obtain the similarity of the geographical word segmentation, the similarity of the industry word segmentation, the similarity of the ending word segmentation and the similarity of the business name segmentation.
同类匹配指的是:候选用户的地域分词的数字标识与目标用户的地域分词的数字标识进行匹配,得到地域分词的相似度;候选用户的商号分词的数字标识与目标用户的商号分词的数字标识进行匹配,得到商号分词的相似度;候选用户的行业分词的数字标识与目标用户的行业分词的数字标识进行匹配,得到行业分词的相似度;候选用户的结束语分词的数字标识与目标用户的结束语分词的数字标识进行匹配,得到结束语分词的相似度。Homogeneous matching refers to: match the digital identifier of the candidate user's geographic segmentation with the digital identifier of the target user's geographic segmentation to obtain the similarity of the geographic segmentation; the digital identifier of the candidate user's business name segmentation and the digital identifier of the target user's business name segmentation Perform matching to obtain the similarity of the word segmentation of the business name; match the digital ID of the industry word segmentation of the candidate user with the digital ID of the target user's industry word segmentation to obtain the similarity of the industry word segmentation; the digital ID of the closing word segmentation of the candidate user and the closing word of the target user The digital identifier of the word segmentation is matched to obtain the similarity of the ending word segmentation.
S1402、计算所述地域分词的相似度、所述行业分词的相似度、所述结束语分词的相似度和商号分词的相似度的权重和;所述商号分词的相似度对应的权重、所述行业分词的相似度对应的权重、所述地域分词的相似度对应的权重和所述结束语分词的相似度对应的权重依次降低。S1402: Calculate the similarity of the geographical segmentation, the similarity of the industry segmentation, the weighting sum of the similarity of the ending word segmentation and the similarity of the business name segmentation; the weight corresponding to the similarity of the business name segmentation, the industry The weight corresponding to the similarity of the word segmentation, the weight corresponding to the similarity of the regional word segmentation, and the weight corresponding to the similarity of the ending word segmentation are sequentially reduced.
为了提高信息推送的准确性,设置权重大小的规则为:商号分词>行业分词>地域分词>结束语分词。根据下述公式计算出权重和:权重和=商号分词的权重*商号分词的相似度+行业分词的权重*行业分词的相似度+地域分词的权重*地域分词的相似度+结束语分词的权重*结束语分词的相似度。In order to improve the accuracy of information push, the rules of setting the weight and weight are as follows: business name segmentation> industry participle> geographical participle> closing sentence participle. Calculate the weight sum according to the following formula: weight sum = weight of business name participle * similarity of business name participle + weight of industry participle * similarity of industry participle + weight of geographical participle * similarity of geographical participle + weight of ending word participle* The similarity of the ending participle.
S1403、若所述权重和大于第一预设值,判定匹配一致,否则判定不匹配。S1403. If the weight sum is greater than the first preset value, it is determined that the match is consistent, otherwise it is determined that it does not match.
两个用户称之为同事,首选需要保证两个用户的公司名称相同,在此基础上,结合其它信息确定亲密度,以进一步提高信息推送的准确性。这里的同事可以是当前就职单位的同事,也可以是前公司的同事。确定目标用户和候选用户的公司全称是否相同,即将目标用户的公司全称与候选用户的公司全称进行比较,若两者一致,则属于同一个公司,否则不属于同一个公司。Two users are called colleagues. First, it is necessary to ensure that the company names of the two users are the same. On this basis, the intimacy is determined by combining other information to further improve the accuracy of information push. The colleague here can be the colleague of the current employer or the colleague of the former company. Determine whether the full company name of the target user and the candidate user are the same, that is, compare the full company name of the target user with the full company name of the candidate user. If the two are consistent, they belong to the same company, otherwise they do not belong to the same company.
因此,在一个实施例中,所述将所述候选用户的信息推送给所述目标用户,包括:Therefore, in one embodiment, the pushing the candidate user information to the target user includes:
S1501、获取所述候选用户在其公司全称所指示公司的工作信息,以及所述目标用户在其公司全称所指示公司的工作信息。S1501. Obtain work information of the candidate user in the company indicated by the full name of the company, and work information of the target user in the company indicated by the full name of the company.
同事关系亲密度用于表征同事之间的亲疏程度。本申请从工作信息方面考虑,获取候选用户的工作信息和目标用户的工作信息。可选的,工作信息包括在职年份、岗位、项目名称、工作经历和工作城市等等。获取的工作信息与上述步骤用到的公司名称可以以特征向量的形式进行存储,例如,[公司名称,在职年份,岗位,项目名称,工作经历,工作城市]。Colleague relationship intimacy is used to characterize the degree of intimacy between colleagues. This application considers the work information to obtain the work information of candidate users and the work information of target users. Optionally, the job information includes year of employment, position, project name, work experience, and work city, etc. The obtained job information and the company name used in the above steps can be stored in the form of feature vectors, for example, [company name, year of employment, position, project name, work experience, work city].
S1502、计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度。S1502. Calculate the similarity between the job information of the candidate user and the job information of the target user.
可以采用现有技术中已有的方式计算候选用户的工作信息和目标用户的工作信息之间的相似度。The similarity between the job information of the candidate user and the job information of the target user can be calculated using the existing method in the prior art.
S1503、若相似度大于第二预设值,将所述候选用户的信息推送给所述目标用户。S1503: If the similarity is greater than a second preset value, push the candidate user information to the target user.
如果工作信息包括多项内容,如工作信息包括在职年份、岗位、项目名称、工作经历和工作城市,则在一个实施例中,所述计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度,包括:If the job information includes multiple contents, for example, the job information includes the year of employment, position, project name, work experience, and work city, in one embodiment, the calculation of the candidate user’s job information and the target user’s job The similarity between information, including:
S1502a、将所述候选用户的在职年份、岗位、项目名称、工作经历和工作城市和所述目标用户的在职年份、岗位、项目名称、工作经历和工作城市进行同类匹配,获得在职年份的相似度、岗位的相似度、项目名称的相似度、工作经历的相似度和工作城市的相似度。S1502a. Match the year of employment, position, project name, work experience, and work city of the candidate user with the year of employment, position, project name, work experience, and work city of the target user to obtain similarity in the year of employment , Job similarity, project name similarity, work experience similarity and work city similarity.
同类匹配指的是:候选用户的在职年份与目标用户的在职年份进行匹配,获得在职年份的相似度;候选用户的岗位与目标用户的岗位进行匹配,获得岗位的相似度;候选用户的项目名称与目标用户的项目名称进行匹配,获得项目名称的相似度;候选用户的工作经历与目标用户的工作经历进行匹配,获得工作经历的相似度;候选用户的工作城市与目标用户的工作城市进行匹配,获得工作城市的相似度。Homogeneous matching refers to: the candidate user’s working year is matched with the target user’s working year to obtain the similarity of the working year; the candidate user’s position is matched with the target user’s position to obtain the similarity of the position; the project name of the candidate user Match with the project name of the target user to obtain the similarity of the project name; match the work experience of the candidate user with the work experience of the target user to obtain the similarity of the work experience; match the work city of the candidate user with the work city of the target user , To obtain the similarity of the working city.
S1502b、计算所述在职年份的相似度、所述岗位的相似度、所述项目名称的相似度、所述工作经历的相似度和所述工作城市的相似度的权重和,得到所述候选用户的工作信息与所述目标用户的工作信息之间的相似度。S1502b. Calculate the weighted sum of the similarity of the employment year, the similarity of the post, the similarity of the project name, the similarity of the work experience and the similarity of the working city to obtain the candidate user The degree of similarity between the work information of and the work information of the target user.
可以为工作信息统一设置一个权重,也可以为工作信息所包含的不同内容分别设置不同的权重,例如为在职年份、岗位、项目名称、工作经历和工作城市设置不同的权重。可以根据计算公式:∑工作信息包含的某一项内容的权重*该项内容的相似度,计算出目标用户和候选用户之间工作信息的相似度。You can set a weight for the job information uniformly, or you can set different weights for different content contained in the job information, for example, set different weights for the year of employment, position, project name, work experience, and work city. The similarity of the work information between the target user and the candidate user can be calculated according to the calculation formula: the weight of a certain item of content contained in the work information * the similarity of the content.
可选的,在计算在职年份的相似度时,可以根据在职年份重叠年份的个数确定相似度,如目标用户和候选用户在职重叠年份为3年,则相似度为3。Optionally, when calculating the similarity between the years of employment, the similarity can be determined according to the number of overlapping years of employment. For example, if the target user and the candidate user have overlapping years of employment for 3 years, the similarity is 3.
可选的,在计算岗位的相似度时,生成岗位的向量,计算两个岗位的向量之间的相似度,具体计算的方法可以根据现有技术中已有的方式实现。Optionally, when calculating the similarity of the positions, a vector of the position is generated, and the similarity between the vectors of the two positions is calculated. The specific calculation method can be implemented according to the existing method in the prior art.
可选的,在计算项目名称的相似度时,生成项目名称的向量,计算两个项目名称的向量之间的相似度;或者,将项目名称完全相同的相似度设置为1,其余的全部设置为0;或者,将项目名称完全相同的相似度设置为1,其余的判断是否属于与目标用户的项目名称有关联的项目名称,如果属于有关联的项目名称,则相似度设置为小于1且大于0的某一个值,否则设置为0,等等。Optionally, when calculating the similarity of project names, generate a vector of project names, and calculate the similarity between the vectors of two project names; or, set the similarity of identical project names to 1, and set all the rest Is 0; or, the similarity of the same project name is set to 1, and the rest is judged whether it belongs to the project name related to the project name of the target user. If it belongs to the related project name, the similarity is set to less than 1 and A certain value greater than 0, otherwise set to 0, etc.
可选的,在计算工作经历的相似度时,提取工作经历中的工作职责,生成工作职责的向量,计算两个工作职责向量之间的相似度。Optionally, when calculating the similarity of the work experience, extract the work responsibilities in the work experience, generate a vector of work responsibilities, and calculate the similarity between two work responsibilities vectors.
可选的,在计算工作城市的相似度时,将工作城市完全相同的相似度设置为1,其余的全部设置为0。Optionally, when calculating the similarity of the working city, the similarity of the working city is set to 1, and all the others are set to 0.
互动关系信息用于描述与一个用户有互动的其它用户,例如一个用户的职位内推人,或者一个用户的背景调查证明人等。为了进一步提高信息推送的准确度,还可以结合互动关系信息,将该互动关系信息结合工作信息作为确定是否进行信息推送的数据源。因此,在一个实施例中,所述计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度,包括:The interactive relationship information is used to describe other users who interact with a user, such as a user's post promoter, or a user's background investigation certifier. In order to further improve the accuracy of information push, it is also possible to combine interactive relationship information, and combine the interactive relationship information with work information as a data source for determining whether to push information. Therefore, in an embodiment, the calculating the similarity between the job information of the candidate user and the job information of the target user includes:
S15021、获取所述候选用户的互动关系信息和所述目标用户的互动关系信息。S15021. Obtain the interactive relationship information of the candidate user and the interactive relationship information of the target user.
S15022、计算所述候选用户的互动关系信息和所述目标用户的互动关系信息之间的相似度。S15022. Calculate the similarity between the interactive relationship information of the candidate user and the interactive relationship information of the target user.
互动关系信息的相似度计算有多种方式。例如,在计算职位内推人的相似度时,如果候选用户的职位内推人与目标用户的职位内推人相同,则相似度设置为1,否则设置为0。但是考虑到目标用户有可能是候选用户的职位内推人,或者候选用户有可能是目标用户的职位内推人,该种情况下两者的亲密度较高,因此可以在计算相似度之前先判断其中一个用户是否为另一个用户的职位内推人,如果是,直接将相似度设置为一个较大的值,例如2,如果不是,再判断候选用户的职位内推人和目标用户的职位内推人是否相同,相同则相似度设置为1,否则相似度设置为0。互动关系信息中其它信息的相似度计算方式类似。There are many ways to calculate the similarity of interactive relationship information. For example, when calculating the similarity of the post introducer, if the candidate user's post introducer is the same as the target user, the similarity is set to 1, otherwise it is set to 0. However, considering that the target user may be the candidate user’s position referrer, or the candidate user may be the target user’s position referrer, in this case the intimacy between the two is relatively high, so the similarity can be calculated first Determine whether one of the users is another user’s post introducer, if so, directly set the similarity to a larger value, for example 2, if not, then determine the candidate user’s post introductor and the target user’s position Whether the interpolators are the same, the similarity is set to 1, otherwise the similarity is set to 0. The similarity calculation method of other information in the interactive relationship information is similar.
S15023、根据工作信息之间的相似度和互动关系信息之间的相似度,获得所述候选用户和所述目标用户之间最终的相似度。S15023. Obtain a final similarity between the candidate user and the target user according to the similarity between the work information and the similarity between the interactive relationship information.
根据工作信息之间的相似度和互动关系信息之间的相似度得到最终的相似度的方式有很多,例如,将工作信息之间的相似度和互动关系信息之间的相似度相加,得到最终的相似度,或者,将工作信息之间的相似度和互动关系信息之间的相似度相乘,得到最终的相似度,等等。There are many ways to obtain the final similarity according to the similarity between the work information and the similarity between the interactive relationship information. For example, the similarity between the work information and the similarity between the interactive relationship information are added to obtain The final similarity, or multiplying the similarity between the work information and the similarity between the interactive relationship information to obtain the final similarity, and so on.
基于同一发明构思,本申请还提供了一种信息推送装置,下面结合附图对本申请装置的具体实施方式进行详细介绍。Based on the same inventive concept, this application also provides an information push device. The specific implementation of the device of this application will be described in detail below with reference to the accompanying drawings.
如图2所示,为一实施例的信息推送装置的示意图,该装置包括:As shown in FIG. 2, it is a schematic diagram of an information pushing device of an embodiment, and the device includes:
公司全称获取模块210,用于获取候选用户的公司全称;The full company name obtaining module 210 is used to obtain the full company name of the candidate user;
分词模块220,用于对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;The word segmentation module 220 is configured to segment the full name of the candidate user's company to obtain the word segmentation set of the candidate user;
数字标识转换模块230,用于将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;The digital identity conversion module 230 is configured to convert each word segment in the word segmentation set of the candidate user into a corresponding digital identity to obtain the digital identity set of the candidate user;
匹配模块240,用于将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到;The matching module 240 is configured to match the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
信息推送模块250,用于在匹配一致时,将所述候选用户的信息推送给所述目标用户。The information push module 250 is configured to push the candidate user's information to the target user when the matches are consistent.
在一个实施例中,分词模块220包括:In one embodiment, the word segmentation module 220 includes:
地域分词获取模块,用于根据预先设置的表征地域的词库,从所述候选用户的公司全称中筛选出地域分词;The geographical word segmentation acquisition module is used to screen out the geographical word segmentation from the full company name of the candidate user according to the preset word database representing the geographical area;
行业分词获取模块,用于根据预先设置的表征行业的词库,从所述候选用户的公司全称中筛选出行业分词;The industry word segmentation acquisition module is used to screen out the industry word segmentation from the full company names of the candidate users according to the preset word database representing the industry;
结束语分词获取模块,用于根据预先设置的表征结束语的词库,从所述候选用户的公司全称中筛选出结束语分词;所述结束语用于描述公司的组织形式;The concluding word segmentation acquisition module is used to filter the concluding word segmentation from the full company name of the candidate user according to the preset vocabulary that characterizes the concluding sentence; the concluding sentence is used to describe the organizational form of the company;
商号分词获取模块,用于将从所述候选用户的公司全称中去除所述地域分词、所述行业分词和所述结束语分词之后得到的分词作为商号分词;The business name segmentation acquisition module is used to remove the geographic segmentation, the industry segmentation, and the closing word segmentation from the full company name of the candidate user as the business name segmentation;
组合模块,用于由所述候选用户的地域分词、行业分词、结束语分词和商号分词构成所述候选用户的分词集合。The combination module is used to form the candidate user's word segmentation set by the candidate user's geographic word segmentation, industry word segmentation, closing sentence word segmentation and business name word segmentation.
在一个实施例中,匹配模块240包括:In one embodiment, the matching module 240 includes:
匹配单元,用于将所述候选用户的数字标识集合中的地域分词的数字标识、行业分词的数字标识、结束语分词的数字标识、商号分词的数字标识,与所述目标用户的数字标识集合中的地域分词的数字标识、行业分词的数字标识、结束语分词的数字标识、商号分词的数字标识进行同类匹配,获得地域分词的相似度、行业分词的相似度、结束语分词的相似度和商号分词的相似度;The matching unit is used to combine the digital IDs of the geographical word segmentation, the digital IDs of the industry word segmentation, the digital IDs of the closing word segmentation, and the digital IDs of the business name segmentation in the digital ID set of the candidate users with the digital IDs of the target user Perform similar matching of the digital identifier of the regional word segmentation, the digital identifier of the industry segmentation, the digital identifier of the closing word segmentation, and the digital ID of the business name segmentation to obtain the similarity of the regional word segmentation, the similarity of the industry word segmentation, the similarity of the ending word segmentation and the number of the business name segmentation Similarity
权重和计算单元,用于计算所述地域分词的相似度、所述行业分词的相似度、所述结束语分词的相似度和商号分词的相似度的权重和;所述商号分词的相似度对应的权重、所述行业分词的相似度对应的权重、所述地域分词的相似度对应的权重和所述结束语分词的相似度对应的权重依次降低;The weight and calculation unit is used to calculate the weighted sum of the similarity of the geographical segmentation, the similarity of the industry segmentation, the similarity of the ending word segmentation and the similarity of the business name segmentation; the similarity of the business name segmentation corresponds to The weight, the weight corresponding to the similarity of the industry segmentation, the weight corresponding to the similarity of the geographic segmentation, and the weight corresponding to the similarity of the ending word segmentation are sequentially reduced;
判断单元,用于在所述权重和大于第一预设值时,判定匹配一致,否则判定不匹配。The judging unit is used for judging that the matching is consistent when the weight sum is greater than the first preset value; otherwise, judging that it does not match.
在一个实施例中,数字标识转换模块230包括:In one embodiment, the digital identity conversion module 230 includes:
地域分词数字标识获得单元,用于将预先设置的表征地域的词库的数字标识与所述地域分词在表征地域的词库中的位置序号组合,得到所述地域分词的数字标识;The regional word segmentation digital identification obtaining unit is configured to combine the preset digital identification of the word database that characterizes the region and the position number of the regional word database in the word database that characterizes the region to obtain the digital identity of the regional word database;
行业分词数字标识获得单元,用于将预先设置的表征行业的词库的数字标识与所述行业分词在表征行业的词库中的位置序号组合,得到所述行业分词的数字标识;The industry word segmentation digital identification obtaining unit is used to combine the preset digital identification of the word database that characterizes the industry with the position number of the industry word segmentation in the word database that characterizes the industry to obtain the digital identification of the industry word segmentation;
结束语分词数字标识获得单元,用于将预先设置的表征结束语的词库的数字标识与所述结束语分词在表征结束语的词库中的位置序号组合,得到所述结束语分词的数字标识;The ending word participle digital identification obtaining unit is used to combine the preset digital identification of the ending word database and the position number of the ending word in the ending word database to obtain the ending word participle's digital identity;
商号分词数字标识获得单元,用于获取所述商号分词的汉语拼音,将所述汉语拼音中各个字母在汉语拼音字母表中的位置序号组合,得到所述商号分词的数字标识;The business name segmentation digital identifier obtaining unit is used to obtain the Chinese pinyin of the business name segmentation, and combine the position numbers of the letters in the Chinese pinyin in the Hanyu Pinyin alphabet to obtain the digital identifier of the business name segmentation;
组合单元,用于由所述地域分词的数字标识、所述行业分词的数字标识、所述结束语分词的数字标识和所述商号分词的数字标识构成所述候选用户的数字标识集合。The combination unit is configured to form the digital identification set of the candidate user by the digital identification of the geographical word segmentation, the digital identification of the industry word segmentation, the digital identification of the closing sentence word segmentation, and the digital identification of the business name word segmentation.
在一个实施例中,信息推送模块250包括:In one embodiment, the information push module 250 includes:
工作信息获取单元,用于获取所述候选用户在其公司全称所指示公司的工作信息,以及所述目标用户在其公司全称所指示公司的工作信息;A work information obtaining unit, configured to obtain work information of the candidate user in the company indicated by the full company name of the candidate, and work information of the target user in the company indicated by the full company name of the target user;
相似度计算单元,用于计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度;A similarity calculation unit, configured to calculate the similarity between the work information of the candidate user and the work information of the target user;
推送单元,用于在相似度大于第二预设值时,将所述候选用户的信息推送给所述目标用户。The pushing unit is configured to push the candidate user's information to the target user when the similarity is greater than a second preset value.
在一个实施例中,相似度计算单元包括:In an embodiment, the similarity calculation unit includes:
互动信息获取子单元,用于获取所述候选用户的互动关系信息和所述目标用户的互动关系信息;The interactive information acquisition subunit is used to acquire the interactive relationship information of the candidate user and the interactive relationship information of the target user;
第一相似度计算子单元,用于计算所述候选用户的互动关系信息和所述目标用户的互动关系信息之间的相似度;The first similarity calculation subunit is used to calculate the similarity between the interactive relationship information of the candidate user and the interactive relationship information of the target user;
第二相似度计算子单元,用于根据工作信息之间的相似度和互动关系信息之间的相似度,获得所述候选用户和所述目标用户之间最终的相似度。The second similarity calculation subunit is used to obtain the final similarity between the candidate user and the target user according to the similarity between the work information and the similarity between the interactive relationship information.
在一个实施例中,所述工作信息包括在职年份、岗位、项目名称、工作经历和工作城市;相似度计算单元包括:In an embodiment, the work information includes year of employment, position, project name, work experience, and work city; the similarity calculation unit includes:
匹配子单元,用于将所述候选用户的在职年份、岗位、项目名称、工作经历和工作城市和所述目标用户的在职年份、岗位、项目名称、工作经历和工作城市进行同类匹配,获得在职年份的相似度、岗位的相似度、项目名称的相似度、工作经历的相似度和工作城市的相似度;The matching subunit is used to match the year of employment, position, project name, work experience, and work city of the candidate user with the year of employment, position, project name, work experience, and work city of the target user to obtain the incumbency Yearly similarity, job similarity, project name similarity, work experience similarity and work city similarity;
权重和计算子单元,用于计算所述在职年份的相似度、所述岗位的相似度、所述项目名称的相似度、所述工作经历的相似度和所述工作城市的相似度的权重和,得到所述候选用户的工作信息与所述目标用户的工作信息之间的相似度。The weight and calculation subunit is used to calculate the weight sum of the similarity of the working year, the similarity of the post, the similarity of the project name, the similarity of the work experience and the similarity of the working city To obtain the similarity between the job information of the candidate user and the job information of the target user.
上述信息推送装置的其它技术特征与上述信息推送方法的技术特征相同,在此不予赘述。Other technical features of the above-mentioned information pushing device are the same as those of the above-mentioned information pushing method, and will not be repeated here.
本申请实施例还提供一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现一种信息推送方法,其中,所述信息推送方法包括以下步骤:获取候选用户的公司全称;对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到;若匹配一致,将所述候选用户的信息推送给所述目标用户。其中,所述存储介质包括但不限于任何类型的盘(包括软盘、硬盘、光盘、 CD-ROM、和磁光盘)、ROM(Read-Only Memory,只读存储器)、RAM(Random AcceSS Memory,随即存储器)、EPROM(EraSable Programmable Read-Only Memory,可擦写可编程只读存储器)、EEPROM(Electrically EraSable Programmable Read-Only Memory,电可擦可编程只读存储器)、闪存、磁性卡片或光线卡片。也就是,存储介质包括由设备(例如,计算机)以能够读的形式存储或传输信息的任何介质。可以是只读存储器,磁盘或光盘等。The embodiment of the present application also provides a non-volatile computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, an information pushing method is implemented, wherein the information pushing method includes the following steps: Obtain the full company name of the candidate user; perform word segmentation on the full company name of the candidate user to obtain the word segmentation set of the candidate user; convert each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the candidate user The digital identity set of the candidate user is matched with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full name of the target user's company in the same way; if the matches are consistent, Push the candidate user's information to the target user. Wherein, the storage medium includes, but is not limited to, any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory), RAM (Random AccesSS Memory), and then Memory), EPROM (EraSable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), EEPROM (Electrically EraSable Programmable Read-Only Memory), flash memory, magnetic card or optical card. That is, the storage medium includes any medium that stores or transmits information in a readable form by a device (for example, a computer). It can be a read-only memory, magnetic disk or optical disk, etc.
本申请实施例还提供一种计算机设备,所述计算机设备包括:An embodiment of the present application also provides a computer device, which includes:
一个或多个处理器;One or more processors;
存储装置,用于存储一个或多个程序,Storage device for storing one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现一种信息推送方法,其中,所述信息推送方法包括以下步骤:获取候选用户的公司全称;对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到;若匹配一致,将所述候选用户的信息推送给所述目标用户。When the one or more programs are executed by the one or more processors, the one or more processors implement an information pushing method, wherein the information pushing method includes the following steps: obtaining candidate user information Full company name; word segmentation of the candidate user’s company name to obtain the word segmentation set of the candidate user; each word segmentation in the word segmentation set of the candidate user is converted into a corresponding digital identifier to obtain the candidate user’s digital identification set Matching the set of digital identities of the candidate users with the set of digital identities of the target user; the set of digital identities of the target user is obtained by processing the full name of the target user's company in the same way; if the match is consistent, the candidate The user's information is pushed to the target user.
图3为本申请计算机设备的结构示意图,包括处理器320、存储装置330、输入单元340以及显示单元350等器件。本领域技术人员可以理解,图3示出的结构器件并不构成对所有计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件。存储装置330可用于存储应用程序310以及各功能模块,处理器320运行存储在存储装置330的应用程序310,从而执行设备的各种功能应用以及数据处理。存储装置330可以是内存储器或外存储器,或者包括内存储器和外存储器两者。内存储器可以包括只读存储器、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦写可编程ROM(EEPROM)、快闪存储器、或者随机存储器。外存储器可以包括硬盘、软盘、ZIP盘、U盘、磁带等。本申请所公开的存储装置包括但不限于这些类型的存储装置。本申请所公开的存储装置330只作为例子而非作为限定。3 is a schematic diagram of the structure of the computer equipment of this application, including a processor 320, a storage device 330, an input unit 340, a display unit 350 and other devices. Those skilled in the art can understand that the structural components shown in FIG. 3 do not constitute a limitation on all computer equipment, and may include more or less components than those shown in the figure, or combine certain components. The storage device 330 may be used to store the application program 310 and various functional modules. The processor 320 runs the application program 310 stored in the storage device 330 to execute various functional applications and data processing of the device. The storage device 330 may be an internal memory or an external memory, or include both internal memory and external memory. The internal memory may include read-only memory, programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or random access memory. External storage can include hard disks, floppy disks, ZIP disks, U disks, tapes, etc. The storage devices disclosed in this application include but are not limited to these types of storage devices. The storage device 330 disclosed in this application is merely an example and not a limitation.
输入单元340用于接收信号的输入,以及公司全称等。输入单元340可包括触控面板以及其它输入设备。触控面板可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板上或在触控面板附近的操作),并根据预先设定的程序驱动相应的连接装置;其它输入设备可以包括但不限于物理键盘、功能键(比如播放控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。显示单元350可用于显示用户输入的信息或提供给用户的信息以及计算机设备的各种菜单。显示单元350可采用液晶显示器、有机发光二极管等形式。处理 器320是计算机设备的控制中心,利用各种接口和线路连接整个电脑的各个部分,通过运行或执行存储在存储装置330内的软件程序和/或模块,以及调用存储在存储装置内的数据,执行各种功能和处理数据。The input unit 340 is used to receive the input of the signal and the full name of the company. The input unit 340 may include a touch panel and other input devices. The touch panel can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc., to operate on the touch panel or near the touch panel), and according to preset The program drives the corresponding connection device; other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as playback control buttons, switch buttons, etc.), trackball, mouse, and joystick. The display unit 350 may be used to display information input by the user or information provided to the user and various menus of the computer device. The display unit 350 may take the form of a liquid crystal display, an organic light emitting diode, or the like. The processor 320 is the control center of the computer equipment. It uses various interfaces and lines to connect the various parts of the entire computer, runs or executes the software programs and/or modules stored in the storage device 330, and calls data stored in the storage device. , Perform various functions and process data.
在一实施方式中,计算机设备包括一个或多个处理器320,以及一个或多个存储装置330,一个或多个应用程序310,其中所述一个或多个应用程序310被存储在存储装置330中并被配置为由所述一个或多个处理器320执行,所述一个或多个应用程序310配置用于执行以上实施例所述的信息推送方法。In one embodiment, the computer device includes one or more processors 320, one or more storage devices 330, and one or more application programs 310, where the one or more application programs 310 are stored in the storage device 330. It is configured to be executed by the one or more processors 320, and the one or more application programs 310 are configured to execute the information pushing method described in the above embodiments.
上述信息推送方法、装置、计算机可读存储介质和计算机设备,与现有技术相互比较时,具备以下优点:The foregoing information push method, device, computer-readable storage medium and computer equipment have the following advantages when compared with the prior art:
1、通过分词的数字标识匹配代替公司全称的匹配,由于数字标识的匹配效率要高于汉字字符的匹配效率,因此通过该种方式大大提高了计算效率。1. The matching of the full name of the company is replaced by the matching of the digital identifier of the word segmentation. Since the matching efficiency of the digital identifier is higher than the matching efficiency of the Chinese character, the calculation efficiency is greatly improved by this method.
2、在公司全称匹配的基础上,引入在职年份、岗位、项目名称、工作经历、工作城市等工作信息维度、以及职位内推、证明人等互动关系信息,并赋予不同权重,实现了对前同事关系亲密度的准确量化,进而提高了信息推送的准确性。2. On the basis of matching the company’s full name, it introduces the work information dimensions such as the year of employment, position, project name, work experience, and work city, as well as the interactive relationship information such as position inference and certifier, and assigns different weights to achieve the goal Accurate quantification of co-worker relationship intimacy improves the accuracy of information push.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flowchart of the drawings are shown in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least part of the steps in the flowchart of the drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a part of sub-steps or stages of other steps.
应该理解的是,在本申请各实施例中的各功能单元可集成在一个处理模块中,也可以各个单元单独物理存在,也可以两个或两个以上单元集成于一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。It should be understood that the functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.

Claims (20)

  1. 一种信息推送方法,包括:An information push method, including:
    获取候选用户的公司全称;Get the full name of the company of the candidate user;
    对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;Perform word segmentation on the full company name of the candidate user to obtain a word segmentation set of the candidate user;
    将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;Converting each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the digital identifier set of the candidate user;
    将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到;Matching the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
    若匹配一致,将所述候选用户的信息推送给所述目标用户。If the matches are consistent, push the candidate user information to the target user.
  2. 根据权利要求1所述的信息推送方法,所述对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合,包括:The information push method according to claim 1, wherein the segmentation of the full company name of the candidate user to obtain the segmentation set of the candidate user comprises:
    根据预先设置的表征地域的词库,从所述候选用户的公司全称中筛选出地域分词;According to the pre-set word database that characterizes the region, select the regional word segmentation from the full company name of the candidate user;
    根据预先设置的表征行业的词库,从所述候选用户的公司全称中筛选出行业分词;According to the preset word database that characterizes the industry, the industry word segmentation is selected from the full names of the candidate users' companies;
    根据预先设置的表征结束语的词库,从所述候选用户的公司全称中筛选出结束语分词;所述结束语用于描述公司的组织形式;According to the preset vocabulary that characterizes the closing words, the closing words are selected from the full company names of the candidate users; the closing words are used to describe the organizational form of the company;
    将从所述候选用户的公司全称中去除所述地域分词、所述行业分词和所述结束语分词之后得到的分词作为商号分词;The participle obtained after removing the geographical participle, the industry participle and the closing word participle from the full company name of the candidate user is used as the business name participle;
    由所述候选用户的地域分词、行业分词、结束语分词和商号分词构成所述候选用户的分词集合。The candidate user’s geographic word segmentation, industry word segmentation, closing word participle and business name word segmentation form the word segmentation set of the candidate user.
  3. 根据权利要求2所述的信息推送方法,所述将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配,包括:The information pushing method according to claim 2, wherein the matching the digital identity set of the candidate user with the digital identity set of the target user comprises:
    将所述候选用户的数字标识集合中的地域分词的数字标识、行业分词的数字标识、结束语分词的数字标识、商号分词的数字标识,与所述目标用户的数字标识集合中的地域分词的数字标识、行业分词的数字标识、结束语分词的数字标识、商号分词的数字标识进行同类匹配,获得地域分词的相似度、行业分词的相似度、结束语分词的相似度和商号分词的相似度;Combining the digital identifiers of the geographic segmentation, the digital identifiers of the industry segmentation, the digital identifiers of the closing word segmentation, and the digital identifier of the business name segmentation in the digital identifier set of the candidate user with the number of the geographic segmentation in the digital identifier set of the target user Identification, industry segmentation digital identifiers, closing sentence segmentation digital identifiers, and business name segmentation digital identifiers perform similar matching to obtain the similarity of geographic segmentation, industry segmentation similarity, closing sentence segmentation similarity and business name segmentation similarity;
    计算所述地域分词的相似度、所述行业分词的相似度、所述结束语分词的相似度和商号分词的相似度的权重和;所述商号分词的相似度对应的权重、所述行业分词的相似度对应的权重、所述地域分词的相似度对应的权重和所述结束语分词的相似度对应的权重依次降低;Calculate the weights of the similarity of the regional word segmentation, the similarity of the industry word segmentation, the similarity of the ending word segmentation and the similarity of the business name participle; the weight corresponding to the similarity of the business name participle, the weight of the industry word segmentation The weight corresponding to the similarity, the weight corresponding to the similarity of the geographical participle, and the weight corresponding to the similarity of the ending participle decrease in order;
    若所述权重和大于第一预设值,判定匹配一致,否则判定不匹配。If the weight sum is greater than the first preset value, it is determined that the match is consistent, otherwise it is determined that it does not match.
  4. 根据权利要求2所述的信息推送方法,所述将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合,包括:The information push method according to claim 2, wherein the converting each word segment in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the digital identifier set of the candidate user comprises:
    将预先设置的表征地域的词库的数字标识与所述地域分词在表征地域的词库中的位置序号组合,得到所述地域分词的数字标识;Combining the preset digital identifier of the lexicon that characterizes the region and the sequence number of the position of the regional word bank in the lexicon that characterizes the region to obtain the digital identifier of the regional word bank;
    将预先设置的表征行业的词库的数字标识与所述行业分词在表征行业的词库中的位置序号组合,得到所述行业分词的数字标识;Combining the preset digital identifier of the word database that characterizes the industry and the position serial number of the industry word segmentation in the word database that characterizes the industry to obtain the digital identifier of the industry word segmentation;
    将预先设置的表征结束语的词库的数字标识与所述结束语分词在表征结束语的词库中的位置序号组合,得到所述结束语分词的数字标识;Combining the preset digital identifier of the word bank that characterizes the ending sentence and the position number of the ending word participle in the word bank that characterizes the ending word to obtain the digital identifier of the ending word participle;
    获取所述商号分词的汉语拼音,将所述汉语拼音中各个字母在汉语拼音字母表中的位置序号组合,得到所述商号分词的数字标识;Obtaining the Chinese pinyin of the business name participle, and combining the position numbers of each letter in the Chinese pinyin in the Chinese phonetic alphabet to obtain the digital identifier of the business name participle;
    由所述地域分词的数字标识、所述行业分词的数字标识、所述结束语分词的数字标识和所述商号分词的数字标识构成所述候选用户的数字标识集合。The digital identifier of the regional word segmentation, the digital identifier of the industry word segmentation, the digital identifier of the closing word segmentation, and the digital identifier of the business name segmentation form the digital identifier set of the candidate user.
  5. 根据权利要求1至4任意一项所述的信息推送方法,所述将所述候选用户的信息推送给所述目标用户,包括:The information pushing method according to any one of claims 1 to 4, wherein the pushing the candidate user's information to the target user includes:
    获取所述候选用户在其公司全称所指示公司的工作信息,以及所述目标用户在其公司全称所指示公司的工作信息;Obtaining the job information of the candidate user in the company indicated by the full name of the company, and the job information of the target user in the company indicated by the full name of the company;
    计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度;Calculating the similarity between the work information of the candidate user and the work information of the target user;
    若相似度大于第二预设值,将所述候选用户的信息推送给所述目标用户。If the similarity is greater than the second preset value, the information of the candidate user is pushed to the target user.
  6. 根据权利要求5所述的信息推送方法,所述计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度,包括:The information push method according to claim 5, wherein said calculating the similarity between the work information of the candidate user and the work information of the target user comprises:
    获取所述候选用户的互动关系信息和所述目标用户的互动关系信息;Acquiring the interactive relationship information of the candidate user and the interactive relationship information of the target user;
    计算所述候选用户的互动关系信息和所述目标用户的互动关系信息之间的相似度;Calculating the similarity between the interactive relationship information of the candidate user and the interactive relationship information of the target user;
    根据工作信息之间的相似度和互动关系信息之间的相似度,获得所述候选用户和所述目标用户之间最终的相似度。According to the similarity between the work information and the similarity between the interactive relationship information, the final similarity between the candidate user and the target user is obtained.
  7. 根据权利要求5所述的信息推送方法,所述工作信息包括在职年份、岗位、项目名称、工作经历和工作城市;The information push method according to claim 5, wherein the work information includes year of employment, position, project name, work experience and work city;
    所述计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度,包括:The calculating the similarity between the work information of the candidate user and the work information of the target user includes:
    将所述候选用户的在职年份、岗位、项目名称、工作经历和工作城市和所述目标用户的在职年份、岗位、项目名称、工作经历和工作城市进行同类匹配,获得在职年份的相似度、岗位的相似度、项目名称的相似度、工作经历的相似度和工作城市的相似度;The candidate user’s year of employment, position, project name, work experience and work city and the target user’s year of employment, position, project name, work experience and work city are matched in the same way to obtain the similarity and position of the working year The similarity of the project name, the similarity of the work experience and the similarity of the working city;
    计算所述在职年份的相似度、所述岗位的相似度、所述项目名称的相似度、所述工作经历的相似度和所述工作城市的相似度的权重和,得到所述候选用户的工作信息与所述目标用户的工作信息之间的相似度。Calculate the weighted sum of the similarity of the working year, the similarity of the position, the similarity of the project name, the similarity of the work experience and the similarity of the working city to obtain the job of the candidate user The similarity between the information and the job information of the target user.
  8. 一种信息推送装置,包括:An information push device includes:
    公司全称获取模块,用于获取候选用户的公司全称;The full company name acquisition module is used to obtain the full company name of the candidate user;
    分词模块,用于对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;The word segmentation module is used to segment the full company name of the candidate user to obtain the word segmentation set of the candidate user;
    数字标识转换模块,用于将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;A digital identity conversion module, configured to convert each word segment in the word segmentation set of the candidate user into a corresponding digital identity to obtain the digital identity set of the candidate user;
    匹配模块,用于将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到;The matching module is configured to match the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
    信息推送模块,用于在匹配一致时,将所述候选用户的信息推送给所述目标用户。The information push module is configured to push the candidate user's information to the target user when the matches are consistent.
  9. 一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现一种信息推送方法;A non-volatile computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, an information push method is realized;
    其中,所述信息推送方法包括以下步骤:Wherein, the information pushing method includes the following steps:
    获取候选用户的公司全称;Get the full name of the company of the candidate user;
    对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;Perform word segmentation on the full company name of the candidate user to obtain a word segmentation set of the candidate user;
    将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;Converting each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the digital identifier set of the candidate user;
    将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到;Matching the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
    若匹配一致,将所述候选用户的信息推送给所述目标用户。If the matches are consistent, push the candidate user information to the target user.
  10. 根据权利要求9所述的非易失性计算机可读存储介质,所述对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合,包括:According to the non-volatile computer-readable storage medium of claim 9, the segmentation of the full company name of the candidate user to obtain the segmentation set of the candidate user comprises:
    根据预先设置的表征地域的词库,从所述候选用户的公司全称中筛选出地域分词;According to the pre-set word database that characterizes the region, select the regional word segmentation from the full company name of the candidate user;
    根据预先设置的表征行业的词库,从所述候选用户的公司全称中筛选出行业分词;According to the preset word database that characterizes the industry, the industry word segmentation is selected from the full names of the candidate users' companies;
    根据预先设置的表征结束语的词库,从所述候选用户的公司全称中筛选出结束语分词;所述结束语用于描述公司的组织形式;According to the preset vocabulary that characterizes the closing words, the closing words are selected from the full company names of the candidate users; the closing words are used to describe the organizational form of the company;
    将从所述候选用户的公司全称中去除所述地域分词、所述行业分词和所述结束语分词之后得到的分词作为商号分词;The participle obtained after removing the geographical participle, the industry participle and the closing word participle from the full company name of the candidate user is used as the business name participle;
    由所述候选用户的地域分词、行业分词、结束语分词和商号分词构成所述候选用户的分词集合。The candidate user’s geographic word segmentation, industry word segmentation, closing word participle and business name word segmentation form the word segmentation set of the candidate user.
  11. 根据权利要求10所述的非易失性计算机可读存储介质,所述将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配,包括:The non-volatile computer-readable storage medium according to claim 10, wherein the matching of the digital identity set of the candidate user with the digital identity set of the target user comprises:
    将所述候选用户的数字标识集合中的地域分词的数字标识、行业分词的数字标识、结束语分词的数字标识、商号分词的数字标识,与所述目标用户的数字标识集合中的地域分词的数字标识、行业分词的数字标识、结束语分词的数字标识、商号分词的数字标识进行同类匹配,获得地域分词的相似度、行业分词的相似度、结束语分词的相似度和商号分词的相似度;Combining the digital identifiers of the geographic segmentation, the digital identifiers of the industry segmentation, the digital identifiers of the closing word segmentation, and the digital identifier of the business name segmentation in the digital identifier set of the candidate user with the number of the geographic segmentation in the digital identifier set of the target user Identification, industry segmentation digital identifiers, closing sentence segmentation digital identifiers, and business name segmentation digital identifiers perform similar matching to obtain the similarity of geographic segmentation, industry segmentation similarity, closing sentence segmentation similarity and business name segmentation similarity;
    计算所述地域分词的相似度、所述行业分词的相似度、所述结束语分词的相似度和商号分词的相似度的权重和;所述商号分词的相似度对应的权重、所述行业分词的相似度对应的权重、所述地域分词的相似度对应的权重和所述结束语分词的相似度对应的权重依次降低;Calculate the weights of the similarity of the regional word segmentation, the similarity of the industry word segmentation, the similarity of the ending word segmentation and the similarity of the business name participle; the weight corresponding to the similarity of the business name participle, the weight of the industry word segmentation The weight corresponding to the similarity, the weight corresponding to the similarity of the geographical participle, and the weight corresponding to the similarity of the ending participle decrease in order;
    若所述权重和大于第一预设值,判定匹配一致,否则判定不匹配。If the weight sum is greater than the first preset value, it is determined that the match is consistent, otherwise it is determined that it does not match.
  12. 根据权利要求10所述的非易失性计算机可读存储介质,所述将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合,包括:The non-volatile computer-readable storage medium according to claim 10, wherein the converting each word segment in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the digital identifier set of the candidate user comprises:
    将预先设置的表征地域的词库的数字标识与所述地域分词在表征地域的词库中的位置序号组合,得到所述地域分词的数字标识;Combining the preset digital identifier of the lexicon that characterizes the region and the sequence number of the position of the regional word bank in the lexicon that characterizes the region to obtain the digital identifier of the regional word bank;
    将预先设置的表征行业的词库的数字标识与所述行业分词在表征行业的词库中的位置序号组合,得到所述行业分词的数字标识;Combining the preset digital identifier of the word database that characterizes the industry and the position serial number of the industry word segmentation in the word database that characterizes the industry to obtain the digital identifier of the industry word segmentation;
    将预先设置的表征结束语的词库的数字标识与所述结束语分词在表征结束语的词库中的位置序号组合,得到所述结束语分词的数字标识;Combining the preset digital identifier of the word bank that characterizes the ending sentence and the position number of the ending word participle in the word bank that characterizes the ending word to obtain the digital identifier of the ending word participle;
    获取所述商号分词的汉语拼音,将所述汉语拼音中各个字母在汉语拼音字母表中的位置序号组合,得到所述商号分词的数字标识;Obtaining the Chinese pinyin of the business name participle, and combining the position numbers of each letter in the Chinese pinyin in the Chinese phonetic alphabet to obtain the digital identifier of the business name participle;
    由所述地域分词的数字标识、所述行业分词的数字标识、所述结束语分词的数字标识和所述商号分词的数字标识构成所述候选用户的数字标识集合。The digital identifier of the regional word segmentation, the digital identifier of the industry word segmentation, the digital identifier of the closing word segmentation, and the digital identifier of the business name segmentation form the digital identifier set of the candidate user.
  13. 根据权利要求9-12任意一项所述将所述候选用户的信息推送给所述目标用户,包括:Pushing the candidate user's information to the target user according to any one of claims 9-12 includes:
    获取所述候选用户在其公司全称所指示公司的工作信息,以及所述目标用户在其公司全称所指示公司的工作信息;Obtaining the job information of the candidate user in the company indicated by the full name of the company, and the job information of the target user in the company indicated by the full name of the company;
    计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度;Calculating the similarity between the work information of the candidate user and the work information of the target user;
    若相似度大于第二预设值,将所述候选用户的信息推送给所述目标用户。If the similarity is greater than the second preset value, the information of the candidate user is pushed to the target user.
  14. 一种计算机设备,所述计算机设备包括:A computer device, the computer device includes:
    一个或多个处理器;One or more processors;
    存储装置,用于存储一个或多个程序,Storage device for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器执行一种信息推送方法;其中,所述信息推送方法包括以下步骤:When the one or more programs are executed by the one or more processors, the one or more processors execute an information pushing method; wherein, the information pushing method includes the following steps:
    获取候选用户的公司全称;Get the full name of the company of the candidate user;
    对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合;Perform word segmentation on the full company name of the candidate user to obtain a word segmentation set of the candidate user;
    将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合;Converting each word segmentation in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the digital identifier set of the candidate user;
    将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配;所述目标用户的数字标识集合通过同一方式对所述目标用户的公司全称处理得到;Matching the digital identity set of the candidate user with the digital identity set of the target user; the digital identity set of the target user is obtained by processing the full company name of the target user in the same manner;
    若匹配一致,将所述候选用户的信息推送给所述目标用户。If the matches are consistent, push the candidate user information to the target user.
  15. 根据权利要求14所述的计算机设备,所述对所述候选用户的公司全称进行分词,获得所述候选用户的分词集合,包括:The computer device according to claim 14, wherein the word segmentation of the full company name of the candidate user to obtain the word segmentation set of the candidate user comprises:
    根据预先设置的表征地域的词库,从所述候选用户的公司全称中筛选出地域分词;According to the pre-set word database that characterizes the region, select the regional word segmentation from the full company name of the candidate user;
    根据预先设置的表征行业的词库,从所述候选用户的公司全称中筛选出行业分词;According to the preset word database that characterizes the industry, the industry word segmentation is selected from the full names of the candidate users' companies;
    根据预先设置的表征结束语的词库,从所述候选用户的公司全称中筛选出结束语分词;所述结束语用于描述公司的组织形式;According to the preset vocabulary that characterizes the closing words, the closing words are selected from the full company names of the candidate users; the closing words are used to describe the organizational form of the company;
    将从所述候选用户的公司全称中去除所述地域分词、所述行业分词和所述结束语分词之后得到的分词作为商号分词;The participle obtained after removing the geographical participle, the industry participle and the closing word participle from the full company name of the candidate user is used as the business name participle;
    由所述候选用户的地域分词、行业分词、结束语分词和商号分词构成所述候选用户的分词集合。The candidate user’s geographic word segmentation, industry word segmentation, closing word participle and business name word segmentation form the word segmentation set of the candidate user.
  16. 根据权利要求15所述的计算机设备,所述将所述候选用户的数字标识集合与目标用户的数字标识集合进行匹配,包括:The computer device according to claim 15, wherein said matching the digital identity set of the candidate user with the digital identity set of the target user comprises:
    将所述候选用户的数字标识集合中的地域分词的数字标识、行业分词的数字标识、结束语分词的数字标识、商号分词的数字标识,与所述目标用户的数字标识集合中的地域分词的数字标识、行业分词的数字标识、结束语分词的数字标识、商号分词的数字标识进行同类匹配,获得地域分词的相似度、行业分词的相似度、结束语分词的相似度和商号分词的相似度;Combining the digital identifiers of the geographic segmentation, the digital identifiers of the industry segmentation, the digital identifiers of the closing word segmentation, and the digital identifier of the business name segmentation in the digital identifier set of the candidate user with the number of the geographic segmentation in the digital identifier set of the target user Identification, industry segmentation digital identifiers, closing sentence segmentation digital identifiers, and business name segmentation digital identifiers perform similar matching to obtain the similarity of geographic segmentation, industry segmentation similarity, closing sentence segmentation similarity and business name segmentation similarity;
    计算所述地域分词的相似度、所述行业分词的相似度、所述结束语分词的相似度和商号分词的相似度的权重和;所述商号分词的相似度对应的权重、所述行业分词的相似度对应的权重、所述地域分词的相似度对应的权重和所述结束语分词的相似度对应的权重依次降低;Calculate the weights of the similarity of the regional word segmentation, the similarity of the industry word segmentation, the similarity of the ending word segmentation and the similarity of the business name participle; the weight corresponding to the similarity of the business name participle, and the weight of the industry The weight corresponding to the similarity, the weight corresponding to the similarity of the geographical participle, and the weight corresponding to the similarity of the ending participle decrease in order;
    若所述权重和大于第一预设值,判定匹配一致,否则判定不匹配。If the weight sum is greater than the first preset value, it is determined that the match is consistent, otherwise it is determined that it does not match.
  17. 根据权利要求15所述的计算机设备,所述将所述候选用户的分词集合中各个分词转换为对应的数字标识,获得所述候选用户的数字标识集合,包括:The computer device according to claim 15, wherein the converting each word segment in the word segmentation set of the candidate user into a corresponding digital identifier to obtain the digital identifier set of the candidate user comprises:
    将预先设置的表征地域的词库的数字标识与所述地域分词在表征地域的词库中的位置序号组合,得到所述地域分词的数字标识;Combining the preset digital identifier of the lexicon that characterizes the region and the sequence number of the position of the regional word bank in the lexicon that characterizes the region to obtain the digital identifier of the regional word bank;
    将预先设置的表征行业的词库的数字标识与所述行业分词在表征行业的词库中的位置序号组合,得到所述行业分词的数字标识;Combining the preset digital identifier of the word database that characterizes the industry and the position serial number of the industry word segmentation in the word database that characterizes the industry to obtain the digital identifier of the industry word segmentation;
    将预先设置的表征结束语的词库的数字标识与所述结束语分词在表征结束语的词库中的位置序号组合,得到所述结束语分词的数字标识;Combining the preset digital identifier of the word bank that characterizes the ending sentence and the position number of the ending word participle in the word bank that characterizes the ending word to obtain the digital identifier of the ending word participle;
    获取所述商号分词的汉语拼音,将所述汉语拼音中各个字母在汉语拼音字母表中的位置序号组合,得到所述商号分词的数字标识;Obtaining the Chinese pinyin of the business name participle, and combining the position numbers of each letter in the Chinese pinyin in the Chinese phonetic alphabet to obtain the digital identifier of the business name participle;
    由所述地域分词的数字标识、所述行业分词的数字标识、所述结束语分词的数字标识和所述商号分词的数字标识构成所述候选用户的数字标识集合。The digital identifier of the regional word segmentation, the digital identifier of the industry word segmentation, the digital identifier of the closing word segmentation, and the digital identifier of the business name segmentation form the digital identifier set of the candidate user.
  18. 根据权利要求14-17任意一项所述的计算机设备,所述将所述候选用户的信息推送给所述目标用户,包括:The computer device according to any one of claims 14-17, wherein said pushing information of said candidate user to said target user comprises:
    获取所述候选用户在其公司全称所指示公司的工作信息,以及所述目标用户在其公司全称所指示公司的工作信息;Obtaining the job information of the candidate user in the company indicated by the full name of the company, and the job information of the target user in the company indicated by the full name of the company;
    计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度;Calculating the similarity between the work information of the candidate user and the work information of the target user;
    若相似度大于第二预设值,将所述候选用户的信息推送给所述目标用户。If the similarity is greater than the second preset value, the information of the candidate user is pushed to the target user.
  19. 根据权利要求18所述的计算机设备,所述计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度,包括:The computer device according to claim 18, wherein the calculating the similarity between the job information of the candidate user and the job information of the target user comprises:
    获取所述候选用户的互动关系信息和所述目标用户的互动关系信息;Acquiring the interactive relationship information of the candidate user and the interactive relationship information of the target user;
    计算所述候选用户的互动关系信息和所述目标用户的互动关系信息之间的相似度;Calculating the similarity between the interactive relationship information of the candidate user and the interactive relationship information of the target user;
    根据工作信息之间的相似度和互动关系信息之间的相似度,获得所述候选用户和所述目标用户之间最终的相似度。According to the similarity between the work information and the similarity between the interactive relationship information, the final similarity between the candidate user and the target user is obtained.
  20. 根据权利要求18所述的计算机设备,所述工作信息包括在职年份、岗位、项目名称、工作经历和工作城市;The computer device according to claim 18, wherein the work information includes year of employment, position, project name, work experience, and work city;
    所述计算所述候选用户的工作信息与所述目标用户的工作信息之间的相似度,包括:The calculating the similarity between the work information of the candidate user and the work information of the target user includes:
    将所述候选用户的在职年份、岗位、项目名称、工作经历和工作城市和所述目标用户的在职年份、岗位、项目名称、工作经历和工作城市进行同类匹配,获得在职年份的相似度、岗位的相似度、项目名称的相似度、工作经历的相似度和工作城市的相似度;The candidate user’s year of employment, position, project name, work experience and work city and the target user’s year of employment, position, project name, work experience and work city are matched in the same way to obtain the similarity and position of the working year The similarity of the project name, the similarity of the work experience and the similarity of the working city;
    计算所述在职年份的相似度、所述岗位的相似度、所述项目名称的相似度、所述工作经历的相似度和所述工作城市的相似度的权重和,得到所述候选用户的工作信息与所述目标用户的工作信息之间的相似度。Calculate the weighted sum of the similarity of the working year, the similarity of the position, the similarity of the project name, the similarity of the work experience and the similarity of the working city to obtain the job of the candidate user The similarity between the information and the job information of the target user.
PCT/CN2019/103023 2019-06-14 2019-08-28 Information pushing method and apparatus, computer readable storage medium, and computer device WO2020248377A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910517834.2A CN110381115B (en) 2019-06-14 2019-06-14 Information pushing method and device, computer readable storage medium and computer equipment
CN201910517834.2 2019-06-14

Publications (1)

Publication Number Publication Date
WO2020248377A1 true WO2020248377A1 (en) 2020-12-17

Family

ID=68250434

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103023 WO2020248377A1 (en) 2019-06-14 2019-08-28 Information pushing method and apparatus, computer readable storage medium, and computer device

Country Status (2)

Country Link
CN (1) CN110381115B (en)
WO (1) WO2020248377A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079434B (en) * 2019-12-05 2023-10-20 企查查科技股份有限公司 Method, equipment and storage medium for automatically corresponding company name for company short
CN111800513B (en) * 2020-07-09 2022-09-27 北京字节跳动网络技术有限公司 Method and device for pushing information and computer readable medium of electronic equipment
CN111898378B (en) * 2020-07-31 2023-09-19 中国联合网络通信集团有限公司 Industry classification method and device for government enterprise clients, electronic equipment and storage medium
CN112182140A (en) * 2020-08-17 2021-01-05 北京来也网络科技有限公司 Information input method and device combining RPA and AI, computer equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493841A (en) * 2009-02-23 2009-07-29 深圳市中科新业信息科技发展有限公司 Searching method and device
CN103838789A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Text similarity computing method
WO2016141075A1 (en) * 2015-03-02 2016-09-09 Ajuba, Llc Push notification system for advertising
CN106095867A (en) * 2016-06-03 2016-11-09 北京奇虎科技有限公司 A kind of book recommendation method based on industry analysis and device
CN106446100A (en) * 2016-09-13 2017-02-22 乐视控股(北京)有限公司 Content recommendation method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106302104B (en) * 2015-06-26 2020-01-21 阿里巴巴集团控股有限公司 User relationship identification method and device
CN105956192A (en) * 2016-06-15 2016-09-21 中国互联网络信息中心 Method and system for acquiring shortened form of organization name based on website homepage information
CN107357916A (en) * 2017-07-19 2017-11-17 北京金堤科技有限公司 Data processing method and system
CN108460014B (en) * 2018-02-07 2022-02-25 百度在线网络技术(北京)有限公司 Enterprise entity identification method and device, computer equipment and storage medium
CN109561132B (en) * 2018-10-23 2022-08-16 深圳平安医疗健康科技服务有限公司 Information pushing method and device, server and terminal
CN109522417A (en) * 2018-10-26 2019-03-26 浪潮软件股份有限公司 A kind of trading company's abstracting method of company name

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493841A (en) * 2009-02-23 2009-07-29 深圳市中科新业信息科技发展有限公司 Searching method and device
CN103838789A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Text similarity computing method
WO2016141075A1 (en) * 2015-03-02 2016-09-09 Ajuba, Llc Push notification system for advertising
CN106095867A (en) * 2016-06-03 2016-11-09 北京奇虎科技有限公司 A kind of book recommendation method based on industry analysis and device
CN106446100A (en) * 2016-09-13 2017-02-22 乐视控股(北京)有限公司 Content recommendation method and device

Also Published As

Publication number Publication date
CN110381115B (en) 2022-03-11
CN110381115A (en) 2019-10-25

Similar Documents

Publication Publication Date Title
WO2020248377A1 (en) Information pushing method and apparatus, computer readable storage medium, and computer device
US8606779B2 (en) Search method, similarity calculation method, similarity calculation, same document matching system, and program thereof
JP2019502979A (en) Automatic interpretation of structured multi-field file layouts
US20230153870A1 (en) Unsupervised embeddings disentanglement using a gan for merchant recommendations
JP2014534540A (en) Interactive multi-mode image search
MX2013005056A (en) Multi-modal approach to search query input.
CN101911069A (en) Method and system for discovery and modification of data clusters and synonyms
CN107436916B (en) Intelligent answer prompting method and device
JP7103496B2 (en) Related score calculation system, method and program
JP5023176B2 (en) Feature word extraction apparatus and program
CN110516011B (en) Multi-source entity data fusion method, device and equipment
CN112882623B (en) Text processing method and device, electronic equipment and storage medium
JPH11167581A (en) Information sorting method, device and system
Ruppert et al. Visual interactive creation and validation of text clustering workflows to explore document collections
WO2008062822A1 (en) Text mining device, text mining method and text mining program
Maheshwari et al. An approach to extract special skills to improve the performance of resume selection
US20140037154A1 (en) Automatically determining a name of a person appearing in an image
CN109902148B (en) Automatic enterprise name completion method for address book contacts
US8266599B2 (en) Output from changed object on application
CN112989011B (en) Data query method, data query device and electronic equipment
JP2006023968A (en) Unique expression extracting method and device and program to be used for the same
JPH11272709A (en) File retrieval system
Grefenstette et al. Competing Views of Word Meaning: Word Embeddings and Word Senses
Wehrheim et al. Turn, Turn, Turn: A Digital History of German Historiography, 1950–2019
KR100769465B1 (en) Query matching method and system using category matching

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932594

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932594

Country of ref document: EP

Kind code of ref document: A1