CN115905696A - Method, system, electronic device and storage medium for generating HCP image based on big data screening - Google Patents

Method, system, electronic device and storage medium for generating HCP image based on big data screening Download PDF

Info

Publication number
CN115905696A
CN115905696A CN202211441094.7A CN202211441094A CN115905696A CN 115905696 A CN115905696 A CN 115905696A CN 202211441094 A CN202211441094 A CN 202211441094A CN 115905696 A CN115905696 A CN 115905696A
Authority
CN
China
Prior art keywords
user
data
generating
hcp
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211441094.7A
Other languages
Chinese (zh)
Inventor
黄振
杨贵龙
马超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lingli Health Management Co ltd
Original Assignee
Shanghai Lingli Health Management Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lingli Health Management Co ltd filed Critical Shanghai Lingli Health Management Co ltd
Priority to CN202211441094.7A priority Critical patent/CN115905696A/en
Publication of CN115905696A publication Critical patent/CN115905696A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method, a system, electronic equipment and a storage medium for generating an HCP image based on big data screening, and belongs to the technical field of computers. Aiming at the problem of HCP portrait missing, the invention provides a method for generating HCP portrait based on big data screening, which comprises the steps of obtaining user data, wherein a user belongs to an HCP, and the user data comprises user identity information and user browsing behavior data; generating basic attributes, interest attributes and relationship attributes of the user based on the user data; and generating a user portrait corresponding to the user based on the basic attribute, the interest attribute and the relationship attribute of the user. The invention realizes the generation of HCP portrait, and the invention not only has the traditional single website to collect the user data, but also collects the user data from different medical information websites, the data source is rich, and the acquisition is captured at regular time, the data timeliness is strong.

Description

Method, system, electronic device and storage medium for generating HCP image based on big data screening
Technical Field
The invention relates to the technical field of computers, in particular to a method, a system, an electronic device and a storage medium for generating an HCP image based on big data screening.
Background
Before 2015, in the global medicine marketing market, the traditional offline marketing mode is the main form of medicine enterprise marketing and promotion. Along with the implementation of national medical improvement policies, a traditional offline marketing mode suffers from impact, and meanwhile, a digital marketing technology obtains unprecedented development space, but the premise of marketing is that accurate client groups are in a vast area, how to search own target clients in the sea becomes the greatest problem of the current marketing market, otherwise, a large amount of manpower and financial resources are wasted.
The user portrait, namely the user information structuring and labeling, accurately describes and analyzes the information of each aspect of the user by describing the data of each dimension of the user such as the population attribute, the social attribute, the interest preference and the like, and mines the potential value. In the related art, when a user portrait of a user is generated, portrait tags related to user behavior data are generally counted by extracting the portrait tags from the user behavior data, and the portrait tags of each user are scored according to the frequency of counting, and then the user portrait is obtained according to the scoring of the portrait tags. However, no HCP image is currently available.
Therefore, in view of the above problems, it is necessary to propose a further solution to solve at least one of the problems.
Disclosure of Invention
To address the problem of HCP portrait loss concerns, the present invention provides methods, systems, electronic devices, and storage media for generating HCP portraits based on big data filtering. The technical scheme is as follows:
a method of generating an HCP image based on big data screening, comprising:
acquiring user data, wherein the user belongs to a HCP (host computer controller), and the user data comprises user identity information and user browsing behavior data;
generating basic attributes, interest attributes and relationship attributes of the user based on the user data;
and generating a user portrait corresponding to the user based on the basic attribute, the interest attribute and the relation attribute of the user.
In a preferred embodiment of the present invention, obtaining user data comprises:
acquiring user data based on the UID of the user, the URL of a browsing webpage, the IP, the entering time, the leaving time and an access source; and the number of the first and second groups,
generating a target medical information website set based on the medical information website ranking;
and capturing articles in the target medical information websites in the set at regular time, and acquiring user data based on the articles.
In a preferred embodiment of the present invention, the obtaining user data based on the article includes:
extracting nouns in the article;
performing word cloud analysis on the nouns;
and generating user data according to the word cloud analysis result.
In a preferred embodiment of the present invention, the generating the target medical information website set based on the ranking of the medical information websites includes:
according to the ranking order, dividing a plurality of medical information websites into m subsets, and randomly grabbing different numbers of websites from the m subsets to generate the set.
In a preferred embodiment of the present invention, the capturing the articles in the target medical information website in the set at regular time includes:
and adopting different crawling modes for the articles according to the ranking of different subsets.
In a preferred embodiment of the invention, HDFS is used to store the user data.
The other technical scheme is as follows:
a system for generating an HCP image based on big data screening, comprising:
the data acquisition module is used for acquiring user data, and the user data comprises user identity information and user browsing behavior data;
the data analysis module is used for generating basic attributes, interest attributes and relation attributes of the user;
and the portrait generation module is used for generating a user portrait corresponding to the user based on the basic attribute, the interest attribute and the relationship attribute of the user.
The other technical scheme is as follows:
an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any of the above methods when the program is executed by the processor.
The other technical scheme is as follows:
a computer-readable storage medium storing a computer program for performing any one of the methods described above.
Compared with the prior art, the invention has the beneficial effects that:
the invention generates the basic attribute, the interest attribute and the relation attribute of the user based on the user data belonging to the HCP, and generates the user portrait of the corresponding user based on the basic attribute, the interest attribute and the relation attribute of the user, thereby realizing the generation of the HCP portrait.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a partial flow chart of user data acquisition in the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Example 1:
as shown in fig. 1, a method for generating an HCP image based on big data screening, comprising:
and acquiring user data, wherein the user belongs to the HCP, and the user data comprises user identity information and user browsing behavior data.
In the method, user data has two sources, one of which is to acquire the user data based on the user UID, the URL of a browsing webpage, the IP, the entering time, the leaving time and the access source.
The partial data is generally obtained by a User log system based on a self-owned platform (i.e., HCP registration data and system behavior log data in fig. 1), and compared with a conventional User log system that uses an IP to identify a User, in this embodiment, a UID is used to identify the User, and the UID refers to a User Identification (User Identification), which is a numerical value automatically generated by the system during network platform registration, and can record how long a certain User browses a website URL (Universal Resource Locator) at a certain time, which is also called a web Address, and is an Address of a standard Resource on the internet (Address).
More specifically, S1, a long-connection communication service is constructed using gateway Worker.
S2, after clicking the access URL, the user immediately triggers the total station statistical script and sends 5 fields of UID, URL, IP, access time and access source to the server.
And S3, the server stores the data in a temporary table after receiving the data.
And S4, after the user closes the webpage or closes the browser, the long connection service monitors the user leaving information, and the current time-entering time = the online time.
And S5, recording 5 fields of the S2 and 7 fields of the leaving time and the online time obtained by the S4 into a log table.
The method adopts a total station log system to record, adds a user behavior statistical code at the public head of each page, and records information such as which user enters, when the user leaves, and what data is accessed. In order to deal with the concurrency condition, the method adopts HDFS with strong concurrency resistance to collect and can support PB level data.
As shown in fig. 2, in order to avoid a single data source and rich data sources, another source is to obtain data from a medical information website, and the data of the part needs massive and real-time data, the medical information website is firstly classified to avoid resources waste due to blind crawling. The target medical information website set is generated based on the ranking of the medical information websites, and the ranking of the medical information websites can be obtained from an Alexa website which is a website specially releasing the world ranking of the websites. According to the ranking order, dividing a plurality of medical information websites into m subsets, and randomly grabbing different numbers of websites from the m subsets to generate a set. For example, 500 websites with the top rank of 1000 are randomly picked, 300 websites with the top rank of 1000-10000 are randomly picked, and 200 websites with the top rank of 10000 are randomly picked, so that the diversity of data sources can be ensured.
And then regularly capturing articles in the target medical information websites in the set so as to ensure the timeliness of the data. That is, different crawling modes are adopted for the articles according to the ranking of different subsets. For example, the first 500 websites belong to key crawling websites, 300 websites belong to middle-level crawling websites, and 200 websites belong to non-key crawling websites, and a website total-station article crawling mode is adopted for the key crawling websites; the method comprises the following steps that (1) a middle-level crawling website is adopted, and a crawling mode for partial articles of the website is adopted to obtain 50% of the content of the whole website; and (3) crawling the website non-intensively, and acquiring 20% of the content of the whole website by adopting a low-volume crawling mode of the content of the website.
And based on the article, acquiring user data, specifically comprising:
nouns in the articles are extracted, namely, the nouns appearing in the articles are extracted mainly, and meaningless verbs, semantic words and the like are removed.
Performing word cloud analysis on nouns, mainly performing word frequency analysis, arranging the occurrence times of the nouns from high to low, and selecting high-frequency nouns as word cloud analysis results, for example, taking the first 3 nouns and the occurrence times of the nouns is more than 5.
And generating user data according to the word cloud analysis result. By the steps, data dimension solidification can be avoided, and portrait dynamic dimension expansion is achieved.
In the method, the user data of the two sources can be stored by adopting the HDFS, and the HDFS has the characteristics of distribution, fault tolerance, high availability, high throughput, expandability and low price.
Based on user data, basic attributes, interest attributes and relationship attributes of the user are generated, the basic attributes, the interest attributes and the relationship attributes can be stored in a Hive data warehouse, and Hive is selected to provide powerful SQL-like statements, so that people familiar with the relational database can quickly inquire related information.
The basic attributes comprise the name, the mobile phone number, the gender, the age and the native place of the user, and the information of the five dimensions can be obtained through the registration information of the user.
The interest attributes include a preference field and a preference access time of the user. The method is characterized in that accurate marketing activities are carried out on users, the interests and hobbies of the users are accurately known, and the interested academic content is recommended to the users at proper time, so that the interest attributes of the users are abstracted into two dimensions of user preference fields and preference access time.
Modeling is carried out by considering two factors of times and duration of accessing certain type of websites by a user in one day, assuming that the times and duration of accessing medical websites are equally important to the preference of the user, the sum is used as a weighted value of the interest degree of the user to a certain field, and finally the category of the preference field with the maximum weighted value in one day of the user is used as the dimension index.
Dividing the time of surfing the Internet of a user every day by taking hours as a unit, counting the time length of surfing the Internet of the user in each hour, selecting the hour with the longest time of surfing the Internet of the user as a time dimension index of the user, wherein the index can reflect the surfing habit of the user, and the dimension index calculation formula is as follows:
user preference access time = max { total length of access in a certain hour }.
The relationship attribute refers to the relationship between things and other things, and the nature of things themselves or the relationship between things and other things are the attributes of things. For example, the color, smell, hardness, and softness of an apple are attributes that relate to the nature of the apple itself. The relationship that apple 1 is larger or smaller than apple 2 is also an attribute.
The method adopts a relation extraction method of supervised learning, uses Bi-LSTM (Bi-directional Long Short-Term Memory which is formed by combining forward LSTM and backward LSTM and is often used for modeling context information in natural language processing tasks) as a sentence encoder, embeds medical sentences into Word Embedding as input, uses CNN as an extractor of characteristics, and finally obtains the probability of N relations through a softmax layer. Compared with the conventional method, the method omits the step of feature construction, and can avoid errors generated in feature construction. The accuracy of relation extraction is improved. Finally forming a relation network map. For example, there is a description in the article that the chief physician in the XX hospital attends the work in 1980. Then the relation that Zhang Sanzhu is the chief physician can be drawn out.
And generating a user portrait corresponding to the user based on the basic attribute, the interest attribute and the relation attribute of the user.
Meanwhile, the method can also select a WEB general framework based on PHP, codeigniter and MyBatis to realize a data visualization module, and the visualization module is divided into different layers according to an MVC design mode. The Model is mainly an entity object summarized by service analysis, and the View layer View is displayed for an interface, and is mainly used for providing interface input and calling a control layer. The control layer Controller separates out a program of the Service logic, receives a request transmitted from the front end, realizes Service logic processing by calling a Service method of a Service layer, and transmits an obtained result to the view layer; the Service layer provides a Service method interface for the Controller layer, and can realize specific Service logic processing by calling a DAO layer method; the DAO persistence layer realizes interaction with databases such as MySQL, elastic search and the like.
Example 2:
a system for generating HCP images based on big data screening, for implementing the method of embodiment 1, the system comprising:
and the data acquisition module is used for acquiring user data, and the user data comprises user identity information and user browsing behavior data.
And the data analysis module is used for generating the basic attribute, the interest attribute and the relation attribute of the user.
And the portrait generation module is used for generating a user portrait corresponding to the user based on the basic attribute, the interest attribute and the relationship attribute of the user.
Example 3:
an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of embodiment 1 when executing the program.
Example 4:
a computer-readable storage medium storing a computer program for executing the method of embodiment 1.
In summary, the invention generates the basic attribute, the interest attribute and the relationship attribute of the user based on the user data belonging to the HCP, and then generates the user portrait corresponding to the user based on the basic attribute, the interest attribute and the relationship attribute of the user, thereby realizing the generation of the HCP portrait.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. A method for generating an HCP image based on big data screening, comprising:
acquiring user data, wherein the user belongs to a HCP (host computer controller), and the user data comprises user identity information and user browsing behavior data;
generating basic attributes, interest attributes and relationship attributes of the user based on the user data;
and generating a user portrait corresponding to the user based on the basic attribute, the interest attribute and the relationship attribute of the user.
2. The method of generating an HCP image based on big data screening of claim 1, wherein obtaining user data comprises:
acquiring user data based on the user UID, the URL of a browsing webpage, the IP, the entering time, the leaving time and the access source; and (c) a second step of,
generating a target medical information website set based on the medical information website ranking;
and capturing articles in the target medical information websites in the set at regular time, and acquiring user data based on the articles.
3. The method of claim 2, wherein obtaining user data based on the article comprises:
extracting nouns in the article;
performing word cloud analysis on the nouns;
and generating user data according to the word cloud analysis result.
4. The method of claim 2, wherein generating the set of target medical information web sites based on the ranking of medical information web sites comprises:
according to the ranking order, dividing a plurality of medical information websites into m subsets, and randomly grabbing different numbers of websites from the m subsets to generate the set.
5. A method for generating an HCP image based on big data filter as claimed in claim 4, wherein said capturing the articles in the web sites of the target medical information category in the set periodically comprises:
and adopting different crawling modes for the articles according to the ranking of different subsets.
6. A method of generating an HCP image based on big data screening according to claim 1, wherein HDFS is used to store the user data.
7. A system for generating an HCP image based on big data screening, comprising:
the data acquisition module is used for acquiring user data, and the user data comprises user identity information and user browsing behavior data;
the data analysis module is used for generating basic attributes, interest attributes and relationship attributes of the user;
and the portrait generation module is used for generating a user portrait corresponding to the user based on the basic attribute, the interest attribute and the relationship attribute of the user.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-6 when executing the program.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1-6.
CN202211441094.7A 2022-11-17 2022-11-17 Method, system, electronic device and storage medium for generating HCP image based on big data screening Pending CN115905696A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211441094.7A CN115905696A (en) 2022-11-17 2022-11-17 Method, system, electronic device and storage medium for generating HCP image based on big data screening

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211441094.7A CN115905696A (en) 2022-11-17 2022-11-17 Method, system, electronic device and storage medium for generating HCP image based on big data screening

Publications (1)

Publication Number Publication Date
CN115905696A true CN115905696A (en) 2023-04-04

Family

ID=86480920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211441094.7A Pending CN115905696A (en) 2022-11-17 2022-11-17 Method, system, electronic device and storage medium for generating HCP image based on big data screening

Country Status (1)

Country Link
CN (1) CN115905696A (en)

Similar Documents

Publication Publication Date Title
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
Alam et al. Processing social media images by combining human and machine computing during crises
JP6419905B2 (en) Using inverse operators on queries
JP6435307B2 (en) Search intent for queries
US10728203B2 (en) Method and system for classifying a question
US11899681B2 (en) Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
US9754210B2 (en) User interests facilitated by a knowledge base
JP6001809B2 (en) Search query interaction on online social networks
JP2021108183A (en) Method, apparatus, device and storage medium for intention recommendation
JP6196316B2 (en) Adjusting content distribution based on user posts
US10380249B2 (en) Predicting future trending topics
US8725717B2 (en) System and method for identifying topics for short text communications
US9858308B2 (en) Real-time content recommendation system
US11675824B2 (en) Method and system for entity extraction and disambiguation
US20170097984A1 (en) Method and system for generating a knowledge representation
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
US9898519B2 (en) Systems and methods of enriching CRM data with social data
US9846751B2 (en) Takepart action platform for websites
US20210256221A1 (en) System and method for automatic summarization of content with event based analysis
US20140289268A1 (en) Systems and methods of rationing data assembly resources
US20160086499A1 (en) Knowledge brokering and knowledge campaigns
US11657228B2 (en) Recording and analyzing user interactions for collaboration and consumption
WO2014107150A1 (en) Inferring facts from online user activity
US20160085850A1 (en) Knowledge brokering and knowledge campaigns
CN115905696A (en) Method, system, electronic device and storage medium for generating HCP image based on big data screening

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication