CN114297341A - Public opinion popularity determination method, device, equipment and storage medium - Google Patents

Public opinion popularity determination method, device, equipment and storage medium Download PDF

Info

Publication number
CN114297341A
CN114297341A CN202111494031.3A CN202111494031A CN114297341A CN 114297341 A CN114297341 A CN 114297341A CN 202111494031 A CN202111494031 A CN 202111494031A CN 114297341 A CN114297341 A CN 114297341A
Authority
CN
China
Prior art keywords
behavior data
behavior
data set
target
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111494031.3A
Other languages
Chinese (zh)
Other versions
CN114297341B (en
Inventor
王云云
晁昆
程新洲
关键
张恒
曹丽娟
张帆
张晴晴
成晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202111494031.3A priority Critical patent/CN114297341B/en
Publication of CN114297341A publication Critical patent/CN114297341A/en
Application granted granted Critical
Publication of CN114297341B publication Critical patent/CN114297341B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method, a device, equipment and a storage medium for determining public sentiment popularity, which are used for improving the accuracy of determining the public sentiment popularity and comprise the following steps: acquiring a plurality of behavior data sets aiming at a target event in each preset area; each behavior data set comprises behavior data of a plurality of target searching behaviors, and the behavior data of each target searching behavior comprises a cell position corresponding to each target searching behavior and a searching starting time of each target searching behavior; determining the local heat corresponding to each behavior data set according to the behavior data included in each behavior data set; the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set respectively; and determining the sum of the local heat corresponding to each behavior data set in each preset area as the heat of the target event.

Description

Public opinion popularity determination method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of communication, in particular to a method, a device, equipment and a storage medium for determining public opinion popularity.
Background
With the development of communication technology and internet, public sentiment events are more and more common, and how to quantitatively evaluate the popularity of the public sentiment events becomes a research hotspot.
In the existing technical scheme for determining the popularity enthusiasm, only quantitative analysis and calculation are focused on the text content, the event type, the publishing channel and the media information publishing quantity of a target event, but the influence range and the fermentation time of the public opinion are not reflected, so that the determined public opinion enthusiasm is possibly not accurate enough.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for determining public sentiment popularity, which are used for improving the accuracy of determining the public sentiment popularity. In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, a method for determining popularity enthusiasm is provided, which includes: acquiring a plurality of behavior data sets aiming at a target event in each preset area; each behavior data set comprises behavior data of a plurality of target search behaviors, the correlation degree between each target search behavior and each target event is larger than a first threshold value, and the behavior data of each target search behavior comprises a cell position corresponding to each target search behavior and a search starting time of each target search behavior; the distance between the cell positions included by any two behavior data in each behavior data set is smaller than a second threshold value; determining the local heat corresponding to each behavior data set according to the behavior data included in each behavior data set; the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set respectively, the cell density is used for reflecting the density of the cell positions included in each behavior data set, and the time density is used for reflecting the density of the search starting time included in each behavior data set; and determining the sum of the local heat corresponding to each behavior data set in each preset area as the heat of the target event.
The technical scheme provided by the invention at least has the following beneficial effects: the behavior data comprises search starting time and cell positions, the server can cluster the search behavior data by combining the cell positions, and can cluster the occurrence places of the target search behaviors based on the cell positions of the behavior data, so that a plurality of target data sets obtained by clustering can reflect the cell density degree of target events in the preset area. Further, the server determines the local heat corresponding to each target data set according to the search starting time and the cell position in each target data set, and finally determines the sum of the local heat corresponding to all the behavior data sets as the heat of the target event. Since the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set, the greater the density of the cell positions and the greater the density of the search starting time, the greater the heat of the target event is determined to be. Therefore, the technical scheme provided by the invention can determine the public opinion popularity by combining the time characteristic and the space characteristic of the target searching behavior of the user aiming at the target event, and can improve the accuracy of determining the public opinion popularity to a certain extent.
Optionally, the obtaining of multiple behavior data sets for the target event in each preset region includes: determining behavior data of target search behaviors included in each preset area; determining a plurality of clustering center cells in each preset region according to the cell positions corresponding to the target searching behaviors in each preset region; the distance between any two clustering center cells is larger than a third threshold value; and clustering the behavior data of the target search behavior included in each preset region according to a plurality of clustering center cells in each preset region to obtain a plurality of behavior data sets.
The technical scheme provided by the invention at least has the following beneficial effects: because the distance between any two clustering center cells is larger than the third threshold, the clustering center cells are determined to be relatively independent and dispersed. Therefore, a plurality of behavior data sets obtained based on clustering of a plurality of clustering center cells are more accurate.
Optionally, the determining the behavior data of the target search behavior included in each preset region includes: acquiring search texts of original search behaviors included in each preset area; determining the correlation degree between the original search behavior and the target event based on the keywords in the search text of the original search behavior and the description information of the target event; and determining the original searching behavior with the correlation degree larger than a first threshold value as a target searching behavior based on the determined correlation degree, and acquiring behavior data of the target searching behavior.
The technical scheme provided by the invention at least has the following beneficial effects: because the description information and the keywords are adopted to determine the correlation degree between the original searching behavior and the target event, the determined target searching behavior can be more accurate, and the accuracy of subsequently acquiring the behavior data of the target searching behavior is ensured.
Optionally, the determining the correlation between the original search behavior and the target event based on the keyword in the search text of the original search behavior and the description information of the target event includes: determining a first main word, a second main word and an emotional tendency word in the description information; the relevancy of the first subject word and the target event is greater than the relevancy of the second subject word and the target event; if at least one of the first subject word and the second subject word exists in the search text, determining the correlation degree between the original search behavior and the target event according to a preset rule; the preset rules include: the correlation degree between the original search behavior and the target event is positively correlated with the number of the at least one subject and the number of the emotional tendency words in the search text respectively.
The technical scheme provided by the invention at least has the following beneficial effects: the description information of the target event can be divided into main words and emotional tendency words with different degrees of correlation, the degree of correlation between the original search behavior and the target event is determined according to the number of the main words and the emotional tendency words contained in the search text, and an implementation mode for determining the degree of correlation can be provided.
Optionally, the determining, according to the cell position corresponding to the target search behavior included in each preset region, a plurality of clustering center cells in each preset region includes: determining a plurality of candidate clustering center cells in each preset area; each candidate cluster center cell comprises at least one candidate search behavior within the coverage area; the degree of correlation between each candidate search behavior and the target event is greater than a fourth threshold; the fourth threshold is greater than the first threshold; from the plurality of candidate cluster center cells, a plurality of cluster center cells are determined.
The technical scheme provided by the invention at least has the following beneficial effects: the clustering center cell is set to be the cell where the target searching behavior with the correlation degree larger than the fourth threshold value is located, the determined behavior data set can be centered on the target searching behavior with high correlation degree, and then the subsequent cell density and time density can be more accurate, and the accuracy of determining the public opinion heat is improved.
Optionally, the determining a plurality of clustering center cells from the plurality of candidate clustering center cells includes: determining the earliest starting time corresponding to each candidate clustering center cell; the earliest starting time is the earliest starting search time in the starting search times of the candidate search behaviors included in each candidate clustering center cell; sequencing the candidate clustering center cells based on the sequence of the earliest starting time to obtain a sequencing result; and determining the plurality of clustering center cells based on the cell positions of the plurality of candidate clustering center cells and the sequence of each candidate clustering center cell in the sequencing result.
The technical scheme provided by the invention at least has the following beneficial effects: and sequencing based on the earliest starting time and sequentially judging based on the sequence in the sequencing result, so that the determined clustering center cell is the cell which starts to execute the target searching behavior at the earliest time, and the time intensity of the determined behavior data set can be accurately reflected in the follow-up process.
Optionally, the number of the clustering center cells is less than or equal to the maximum clustering number, and the maximum clustering number is positively correlated with the area of each preset region.
The technical scheme provided by the invention at least has the following beneficial effects: the corresponding maximum clustering number can be determined according to the area size of the preset area, so that the number of the subsequent behavior data sets of each preset area is related to the area size, and the subsequent local heat summation process can be more accurate.
Optionally, the determining the local heat corresponding to each behavior data set according to the behavior data included in each behavior data set includes: determining the time density value of each behavior data set according to the starting search time included in each behavior data set; the time density value is inversely related to the time density; determining the cell density value of each behavior data set according to the cell position included in each behavior data set; the cell density value is positively correlated with the cell density; and determining the local heat corresponding to each behavior data set according to the time density value of each behavior data set and the cell density value of each behavior data set.
Optionally, the determining the time density value of each behavior data set according to the search starting time included in each behavior data set includes: determining a plurality of duration times corresponding to each behavior data set and an average duration time corresponding to each behavior data set; each duration corresponds to one target searching behavior, and each duration is the time interval between the starting searching time and the current time of one target searching behavior; the average duration is an average of a plurality of durations; for each target search behavior in each behavior data set, determining the square of the difference value between the duration corresponding to each target search behavior and the average duration, and taking the square as the sub-time density value of each target search behavior; and determining the sum of the sub-time density values of the plurality of target search behaviors included in each behavior data set as the time density value of each behavior data set.
Optionally, the determining the cell density value of each behavior data set according to the cell position included in each behavior data set includes: determining a cluster center cell of the cell positions of the plurality of target searching behaviors included in each behavior data set, and determining the distance between the cell position of each target searching behavior and the cluster center cell to obtain a target distance corresponding to each target searching behavior; and determining the cell density value of each behavior data set according to the target distance of the target search behaviors included in each behavior data set.
Optionally, the determining, according to the target distances of the target search behaviors included in each behavior data set, the cell density value of each behavior data set includes: determining the correlation between each target searching behavior and a target event, and obtaining the sub-cell density value of each target searching behavior according to the ratio of the target distance corresponding to each target searching behavior; and determining the sum of the density values of the sub-cells of the plurality of target search behaviors included in each behavior data set as the density value of the cell of each behavior data set.
Optionally, the determining the local heat corresponding to each behavior data set according to the time density value of each behavior data set and the cell density value of each behavior data set includes: determining the local heat corresponding to each behavior data set according to the target number of the target search behaviors in each behavior data set, the cell density value of each behavior data set and the time density value of each behavior data set; the local heat is positively correlated with the target amount.
In a second aspect, a device for determining popularity enthusiasm is provided, which includes an obtaining unit and a determining unit; the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of behavior data sets aiming at a target event in each preset area; each behavior data set comprises behavior data of a plurality of target search behaviors, the correlation degree between each target search behavior and each target event is larger than a first threshold value, and the behavior data of each target search behavior comprises a cell position corresponding to each target search behavior and a search starting time of each target search behavior; the distance between the cell positions included by any two behavior data in each behavior data set is smaller than a second threshold value; the determining unit is used for determining the local heat corresponding to each behavior data set according to the behavior data included in each behavior data set; the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set respectively, the cell density is used for reflecting the density of the cell positions included in each behavior data set, and the time density is used for reflecting the density of the search starting time included in each behavior data set; and the determining unit is further used for determining the sum of the local heat corresponding to each behavior data set in each preset area as the heat of the target event.
Optionally, the obtaining unit is specifically configured to: determining behavior data of target search behaviors included in each preset area; determining a plurality of clustering center cells in each preset region according to the cell positions corresponding to the target searching behaviors in each preset region; the distance between any two clustering center cells is larger than a third threshold value; and clustering the behavior data of the target search behavior included in each preset region according to a plurality of clustering center cells in each preset region to obtain a plurality of behavior data sets.
Optionally, the obtaining unit is specifically configured to: acquiring search texts of original search behaviors included in each preset area; determining the correlation degree between the original search behavior and the target event based on the keywords in the search text of the original search behavior and the description information of the target event; and determining the original searching behavior with the correlation degree larger than a first threshold value as a target searching behavior based on the determined correlation degree, and acquiring behavior data of the target searching behavior.
Optionally, the obtaining unit is specifically configured to: determining a first main word, a second main word and an emotional tendency word in the description information; the relevancy of the first subject word and the target event is greater than the relevancy of the second subject word and the target event; if at least one of the first subject word and the second subject word exists in the search text, determining the correlation degree between the original search behavior and the target event according to a preset rule; the preset rules include: the correlation degree between the original search behavior and the target event is positively correlated with the number of the at least one subject and the number of the emotional tendency words in the search text respectively.
Optionally, the obtaining unit is specifically configured to: determining a plurality of candidate clustering center cells in each preset area; each candidate cluster center cell comprises at least one candidate search behavior within the coverage area; the degree of correlation between each candidate search behavior and the target event is greater than a fourth threshold; the fourth threshold is greater than the first threshold; from the plurality of candidate cluster center cells, a plurality of cluster center cells are determined.
Optionally, the obtaining unit is specifically configured to: determining the earliest starting time corresponding to each candidate clustering center cell; the earliest starting time is the earliest starting search time in the starting search times of the candidate search behaviors included in each candidate clustering center cell; sequencing the candidate clustering center cells based on the sequence of the earliest starting time to obtain a sequencing result; and determining the plurality of clustering center cells based on the cell positions of the plurality of candidate clustering center cells and the sequence of each candidate clustering center cell in the sequencing result.
Optionally, the number of the clustering center cells is less than or equal to the maximum clustering number, and the maximum clustering number is positively correlated with the area of each preset region.
Optionally, the determining unit is specifically configured to: determining the time density value of each behavior data set according to the starting search time included in each behavior data set; the time density value is inversely related to the time density; determining the cell density value of each behavior data set according to the cell position included in each behavior data set; the cell density value is positively correlated with the cell density; and determining the local heat corresponding to each behavior data set according to the time density value of each behavior data set and the cell density value of each behavior data set.
Optionally, the determining unit is specifically configured to: determining a plurality of duration times corresponding to each behavior data set and an average duration time corresponding to each behavior data set; each duration corresponds to one target searching behavior, and each duration is the time interval between the starting searching time and the current time of one target searching behavior; the average duration is an average of a plurality of durations; for each target search behavior in each behavior data set, determining the square of the difference value between the duration corresponding to each target search behavior and the average duration, and taking the square as the sub-time density value of each target search behavior; and determining the sum of the sub-time density values of the plurality of target search behaviors included in each behavior data set as the time density value of each behavior data set.
Optionally, the determining unit is specifically configured to: determining a cluster center cell of the cell positions of the plurality of target searching behaviors included in each behavior data set, and determining the distance between the cell position of each target searching behavior and the cluster center cell to obtain a target distance corresponding to each target searching behavior; and determining the cell density value of each behavior data set according to the target distance of the target search behaviors included in each behavior data set.
Optionally, the determining unit is specifically configured to: determining the correlation between each target searching behavior and a target event, and obtaining the sub-cell density value of each target searching behavior according to the ratio of the target distance corresponding to each target searching behavior; and determining the sum of the density values of the sub-cells of the plurality of target search behaviors included in each behavior data set as the density value of the cell of each behavior data set.
Optionally, the determining unit is specifically configured to: determining the local heat corresponding to each behavior data set according to the target number of the target search behaviors in each behavior data set, the cell density value of each behavior data set and the time density value of each behavior data set; the local heat is positively correlated with the target amount.
In a third aspect, a server is provided, the server comprising a memory and a processor; a memory for storing computer program code comprising computer instructions, the server performing the method for determining popularity enthusiasm as provided by the first aspect or any one of its possible implementations when the computer instructions are executed by the processor, is coupled to the processor.
In a fourth aspect, a computer-readable storage medium is provided, in which instructions are stored, and when the instructions are executed on a server, the server is caused to execute the method for determining public opinion popularity provided by the first aspect or any possible implementation manner thereof.
The invention provides a method, a device, equipment and a storage medium for determining public sentiment popularity, because the behavior data related to the invention comprises search starting time and cell positions, a server can cluster the search behavior data by combining the cell positions, and can cluster the occurrence places of target search behaviors based on the cell positions of the behavior data, so that a plurality of target data sets obtained by clustering can reflect the cell density degree of target events in a preset area. Further, the server determines the local heat corresponding to each target data set according to the search starting time and the cell position in each target data set, and finally determines the sum of the local heat corresponding to all the behavior data sets as the heat of the target event. Since the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set, the greater the density of the cell positions and the greater the density of the search starting time, the greater the heat of the target event is determined to be. Therefore, the technical scheme provided by the invention can determine the public opinion popularity by combining the time characteristic and the space characteristic of the target searching behavior of the user aiming at the target event, and can improve the accuracy of determining the public opinion popularity to a certain extent.
Drawings
Fig. 1 is a schematic structural diagram of a system for determining popularity according to an embodiment of the present invention;
fig. 2 is a first flowchart illustrating a method for determining public opinion popularity according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a second method for determining public opinion popularity according to an embodiment of the present invention;
fig. 4 is a third schematic flow chart illustrating a method for determining public opinion popularity according to an embodiment of the present invention;
fig. 5 is a flowchart illustrating a fourth method for determining public opinion popularity according to an embodiment of the present invention;
fig. 6 is a flow chart of a method for determining public opinion popularity according to an embodiment of the present invention;
fig. 7 is a sixth schematic flow chart of a method for determining public opinion popularity according to an embodiment of the present invention;
fig. 8 is a seventh flowchart illustrating a method for determining popularity according to an embodiment of the present invention;
fig. 9 is a flow chart of a method for determining public opinion popularity according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a device for determining popularity according to an embodiment of the present invention;
fig. 11 is a first schematic structural diagram of a server according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In the embodiments of the present invention, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "e.g.," an embodiment of the present invention is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
In the description of the present invention, "/" means "or" unless otherwise specified, for example, a/B may mean a or B. "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. Further, "at least one" or "a plurality" means two or more. The terms "first", "second", and the like do not necessarily limit the number and execution order, and the terms "first", "second", and the like do not necessarily limit the difference.
The method for determining the popularity enthusiasm provided by the embodiment of the invention can be suitable for a system for determining the popularity enthusiasm. Fig. 1 is a schematic diagram showing a structure of the public opinion popularity determination system. As shown in fig. 1, a system 10 for determining popularity is used to determine the popularity of a target event. The system 10 for determining the popularity includes a device 11 for determining the popularity (hereinafter referred to as a determining device) and a server 12. The public opinion popularity degree determination device 11 is connected with the server 12. The determining device 11 for determining the popularity enthusiasm and the server 12 may be connected in a wired manner or in a wireless manner, which is not limited in the embodiment of the present invention.
The server 12 may be located in a machine room on the core network side of the operator, and the user acquires internet access data of the user plane of the core network and engineering parameters of a base station or a cell in each area.
The server 12 may be configured to obtain behavior data for a user performing a search behavior operation on a target event in each of the regions.
For example, the behavior data may include search text of the search behavior, search start time, cell location corresponding to the search behavior, cell identification, and other information.
Further, the server may further cluster the behavior data in each region based on the obtained behavior data to obtain a plurality of behavior data sets, and further determine the local heat corresponding to each behavior data set according to the cell density and the time density corresponding to each behavior data set. Finally, the server determines the sum of the local heat corresponding to each behavior data set in each area as the heat of the target event.
In some embodiments, the determining apparatus 11 and the server 12 for determining popularity may be independent devices or may be integrated in the same device, and this is not particularly limited in the embodiments of the present invention.
When the device 11 for determining the popularity enthusiasm and the server 12 are integrated into the same device, the data transmission mode between the device 11 for determining the popularity enthusiasm and the server 12 is the data transmission between the internal modules of the device. In this case, the data transmission flow between the two is the same as the "data transmission flow between the two when the determination device 11 of public opinion popularity degree and the server 12 are independent of each other".
In the following embodiments provided by the embodiments of the present invention, the embodiments of the present invention are described by taking an example in which the determination device 11 of public sentiment popularity and the server 12 are set independently of each other.
Fig. 2 is a flowchart illustrating a method for determining popularity in accordance with some exemplary embodiments. In some embodiments, the method for determining public sentiment popularity may be applied to the device for determining public sentiment popularity shown in fig. 1, or may be applied to a server or other similar devices including the device for determining public sentiment popularity. Hereinafter, the present invention will be described with reference to the following embodiments, which take an example of a method for determining popularity applied to a server.
Referring to fig. 2, the method for determining popularity according to the embodiment of the present invention includes the following steps S201 to S203.
S201, a server acquires a plurality of behavior data sets aiming at a target event in each preset area;
each behavior data set comprises behavior data of a plurality of target search behaviors, the correlation degree between each target search behavior and each target event is greater than a first threshold value, and the behavior data of each target search behavior comprises a cell position corresponding to each target search behavior and the search starting time of each target search behavior; the distance between the cell positions included by any two behavior data in each behavior data set is smaller than a second threshold value;
as a possible implementation manner, the server may determine behavior data of the target search behavior included in each preset region, and determine a plurality of clustering center cells in each preset region according to a cell position corresponding to the target search behavior included in each preset region.
Further, the server may cluster the behavior data of the target search behavior included in each preset region according to a plurality of clustering center cells in each preset region to obtain a plurality of behavior data sets.
The clustering operation may be based on the location of the cell, or may be based on the location of the user equipment when the user performs the target search behavior through a search engine of the user equipment.
The preset areas may be any administrative areas, such as a province or a city, and the present application is not limited thereto. The first threshold value and the second threshold value may be set in the server by the operation and maintenance personnel in advance.
The target event may also be referred to as a public sentiment event or a public sentiment in different scenes.
It can be understood that the degree of correlation between the target search behavior and the target event is greater than the first threshold, which reflects that the input search text is searched based on the target event when the user performs the target search behavior.
In some embodiments, the server may obtain the search text of the target search behavior from a Uniform Resource Identifier (URI) of a search request initiated by the user through the user equipment.
Illustratively, taking any preset area in the national range as Beijing city and taking the target event as 'electronic driving license enforcement' as an example, the server acquires behavior data of all target search behaviors with the correlation degree between the search behavior and the 'electronic driving license enforcement' being greater than a first threshold value in the Beijing city within a preset time period, and clusters the acquired behavior data to obtain a plurality of behavior data sets.
The specific implementation manner of this step may refer to the subsequent description of the embodiment of the present disclosure, and is not described herein again.
S202, the server determines the local heat corresponding to each behavior data set according to the behavior data included in each behavior data set;
the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set respectively, the cell density is used for reflecting the density of the cell positions included in each behavior data set, and the time density is used for reflecting the density of the search starting time included in each behavior data set;
as a possible implementation manner, the server may determine, according to the search starting time included in each behavior data set, a time density value that is inversely related to the time density of each behavior data set; meanwhile, the server also determines a cell density value of each behavior data set, which is inversely related to the cell density, according to the cell position included in each behavior data set.
Further, the server may determine the local heat corresponding to each behavior data set according to the time density value of each behavior data set and the cell density value of each behavior data set.
It can be understood that, in the area corresponding to each behavior data set, the denser the cells where the target search behavior occurs, the higher the local heat corresponding to the behavior data set is; in the area corresponding to each behavior data set, the more concentrated the search starting time of the target search behavior occurs, the higher the local heat corresponding to the behavior data set.
The specific implementation manner of this step may refer to the subsequent description of the embodiment of the present disclosure, and is not described herein again.
S203, the server determines the sum of the local heat corresponding to each behavior data set in each preset area as the heat of the target event.
As a possible implementation manner, the server determines the sum of the local heat corresponding to each behavior data set in each preset area as the heat of the target event.
For example, taking provinces in which each preset region is nationwide as an example, after the server calculates and obtains the local heat corresponding to each behavior data set in each province, the server sums the local heat corresponding to all the behavior data sets included in all the provinces and cities to obtain the heat of the target event nationwide.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects: the behavior data related to the embodiment of the invention comprises the search starting time and the cell position, the server can cluster the search behavior data by combining the cell position, and can cluster the occurrence place of the target search behavior based on the cell position of the behavior data, so that a plurality of target data sets obtained by clustering can reflect the cell density degree of the target event in the preset area. Further, the server determines the local heat corresponding to each target data set according to the search starting time and the cell position in each target data set, and finally determines the sum of the local heat corresponding to all the behavior data sets as the heat of the target event. Since the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set, the greater the density of the cell positions and the greater the density of the search starting time, the greater the heat of the target event is determined to be. Therefore, the technical solution provided by the embodiment of the present invention can determine the popularity by combining the temporal characteristics and the spatial characteristics of the target search behavior of the user for the target event, and can improve the accuracy of determining the popularity to a certain extent.
In one design, in order to obtain multiple behavior data sets for a target event in each preset area, as shown in fig. 3, S201 provided in the embodiment of the present invention specifically includes the following S2011-S2013.
S2011, the server determines behavior data of target searching behaviors included in each preset area;
as a possible implementation manner, the server acquires search texts of original search behaviors included in each preset area, and determines the correlation degree between the original search behaviors and a target event according to the acquired search texts;
the specific implementation manner of determining the correlation degree in this step may refer to the subsequent description of the embodiment of the present invention, and is not described herein again.
Further, the server determines the original search behavior with the correlation degree larger than the first threshold as the target search behavior based on the determined correlation degree, and acquires behavior data of the target search behavior.
It should be noted that the server obtains the behavior data of the target search behavior, and may associate the core network user plane internet data corresponding to the target search behavior with the engineering parameters of the cell to obtain the behavior data of each target search behavior.
In some embodiments, the core network user plane internet data corresponding to the target search behavior may include a user identifier, a cell identifier, a search start time, an uri, and an identifier of a preset area. The engineering parameters of the cell may include a cell identifier, a base station identifier, an identifier of a preset area, and a cell location.
Illustratively, the cell location can be the longitude and latitude of the base station.
The specific implementation manner of this step may refer to the following description of the embodiment of the present invention, and is not described herein again.
S2012, the server determines a plurality of clustering center cells in each preset area according to the cell positions corresponding to the target searching behaviors included in each preset area;
the distance between any two clustering center cells is larger than a third threshold value;
as a possible implementation manner, the server determines to obtain a plurality of clustering center cells from the cells of all target search behaviors in the preset area according to a preset selection rule.
It can be understood that the distance between the quality inspections of the cells in any two clustering centers is greater than the third threshold, so that the behavior data set obtained by subsequent clustering is more accurate.
Illustratively, the third threshold may be 200 km.
The specific implementation manner of this step may refer to the following description of the embodiment of the present invention, and is not described herein again.
S2013, the server clusters the behavior data of the target search behavior included in each preset area according to the clustering center cells in each preset area to obtain a plurality of behavior data sets.
As a possible implementation manner, after determining a plurality of clustering center cells, the server clusters the behavior data of all target search behaviors in a preset area to obtain a plurality of behavior data sets.
In one embodiment, the clustering action may be performed using a k-means clustering algorithm, in which the plurality of cluster center cells may be initial cluster centers in the clustering algorithm.
In this case, the second threshold may be set in advance in the k-means clustering algorithm.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects: because the distance between any two clustering center cells is larger than the third threshold, the clustering center cells are determined to be relatively independent and dispersed. Therefore, a plurality of behavior data sets obtained based on clustering of a plurality of clustering center cells are more accurate.
In one design, in order to determine behavior data of target search behaviors included in each preset region, as shown in fig. 4, S2011 provided in the embodiment of the present invention specifically includes following S301 to S304.
S301, the server acquires search texts of original search behaviors included in each preset area;
as a possible implementation manner, the server may associate internet data of the core network user plane within a preset time period in a preset area with an engineering parameter of a cell to determine original search behaviors included in the preset area, and obtain a search text of each original search behavior.
The search text may be located in the URI of the internet data of the core network user plane.
It should be noted that the original search behavior may be any search behavior within a preset time period in a preset area.
S302, the server determines the correlation between the original search behavior and the target event based on the keywords in the search text of the original search behavior and the description information of the target event;
as a possible implementation manner, the server determines the description information of the target event, simultaneously acquires the keywords in the search text, and determines the correlation degree between the original search behavior and the target event based on the determined description information and the keywords.
For example, taking the target event as "execute electronic driver's license", the description information of the target event may include "electronic driver's license", "driver's license reform", "driver's license convenience" or "driver's license carelessness", and the search text of the user performing the original search behavior may include keywords such as "drive without driver's license", "driver's license damage", "driver's license subsidizing", and the like.
In some embodiments, the server may determine a correlation between the keywords and the description information in the search text based on a preset text processing model, and determine the correlation between the keywords and the description information as a correlation between the original search behavior and the target event.
It should be noted that, in the embodiment of the present invention, the text processing model is not limited, and for specific use of the text processing model, reference may be made to the prior art, which is not described herein again.
For another possible implementation manner in this step, reference may be made to the subsequent description of the embodiment of the present invention, and details are not described here.
S303, the server determines the original searching behavior with the correlation degree larger than a first threshold value as a target searching behavior based on the determined correlation degree;
as a possible implementation manner, the server determines whether each original search behavior is a target search behavior based on the determined correlation and the first threshold.
S304, the server acquires behavior data of the target searching behavior.
As a possible implementation manner, after determining the target search behavior, the server determines a cell location of the target search behavior based on the cell identifier of the target search behavior, and determines a search start time of the target search behavior based on the core network user plane data of the target search behavior.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects: because the description information and the keywords are adopted to determine the correlation degree between the original searching behavior and the target event, the determined target searching behavior can be more accurate, and the accuracy of subsequently acquiring the behavior data of the target searching behavior is ensured.
In one design, in order to determine the correlation between the original search behavior and the target event based on the description information and the keyword, as shown in fig. 5, S302 provided in the embodiment of the present invention may specifically include S3021 to S3023 described below.
S3021, the server determines a first subject word, a second subject word and an emotional tendency word in the description information;
the relevancy of the first subject word and the target event is greater than the relevancy of the second subject word and the target event;
as a possible implementation manner, the server may obtain the first subject word, the second subject word, and the emotional tendency word in the description information from a preset database.
It should be noted that the first subject word, the second subject word, and the emotional tendency word in the preset database may be preset in the server by the operation and maintenance staff.
As another possible implementation manner, the server may divide the description information into the non-first subject word, the second subject word, and the emotional tendency word based on a pre-trained semantic analysis model.
For example, taking the description information of the target event as "execute electronic driving license" as an example, the first subject word may be "electronic driving license", the second subject word may be "driving license", "make up driving license", and the like, and the emotional tendency word may be "driving convenience", "procedure simple", and the like.
S3022, the server judges whether at least one of the first subject word and the second subject word exists in the search text of the original search behavior.
As a possible implementation manner, the server performs semantic analysis on the search text, and determines whether the search text contains the at least one body word.
It is to be understood that, in the case that the search text includes the at least one body word, the server may determine that the original search behavior corresponding to the search text is executed for the target event.
S3023, if at least one of the first subject word and the second subject word exists in the search text, the server determines the correlation between the original search behavior and the target event according to a preset rule;
wherein, the preset rule comprises: the correlation degree between the original search behavior and the target event is positively correlated with the number of the at least one subject and the number of the emotional tendency words in the search text respectively.
As a possible implementation manner, the server determines the number of the first main words, the second main words and the emotional tendency words in the search text, and determines the correlation degree between the original search behavior and the target event based on the determined number.
Illustratively, table 1 shows an illustration of the preset rule described above.
TABLE 1 Preset rules
Figure BDA0003399474600000151
As shown in table 1, the fifth threshold value is greater than the first threshold value and less than the fourth threshold value in the correlation, and the fourth threshold value is less than the sixth threshold value. The higher the degree of correlation, the more relevant the original search behavior is to the target event.
If the search text comprises at least one second main word, the server determines that the correlation degree between the original search behavior and the target event is larger than a first threshold value and smaller than a fifth threshold value.
If the search text comprises at least one second main word and at least one emotional tendency word, the server determines that the correlation degree between the original search behavior and the target event is greater than a fifth threshold value and smaller than a fourth threshold value.
If the search text comprises at least one first main word, the server determines that the correlation degree between the original search behavior and the target event is greater than a fourth threshold and smaller than a sixth threshold.
If the search text comprises at least one first main word and at least one emotional tendency word, the server determines that the correlation degree between the original search behavior and the target event is larger than a sixth threshold value.
In some embodiments, the correlation degrees may be further differentiated by a correlation level, for example, a first threshold corresponds to a first-level correlation, a fifth threshold corresponds to a second-level correlation, a fourth threshold corresponds to a third-level correlation, and a sixth threshold corresponds to a fourth-level correlation.
In other embodiments, the server may determine the correlation degree between the original search behavior and the target event according to a preset rule in order from the highest level to the lowest level.
For example, for any original search behavior, if the original search behavior is determined to be related to the target event in four levels, the server starts to determine the degree of correlation between the next original search behavior and the target event. Conversely, if the server determines that any one of the original search behaviors is not related to the target event in four levels, it further determines whether the original search behavior is related to the target event in three levels until it is determined that the original search behavior is not related to the target event in one level. In this case, the server determines that the original search behavior is not related to the target event and discards the behavior data of the original search behavior.
It can be understood that the server sequentially judges the correlation degree according to the correlation degree from the highest level to the lowest level, and after the correlation degree is determined to be the high level, the subsequent determination on the original search behavior is not performed any more, so that the corresponding computing resources can be saved.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects: the description information of the target event can be divided into main words and emotional tendency words with different degrees of correlation, the degree of correlation between the original search behavior and the target event is determined according to the number of the main words and the emotional tendency words contained in the search text, and an implementation mode for determining the degree of correlation can be provided.
In one design, in order to determine multiple clustering center cells from the cells corresponding to the multiple target search behaviors, as shown in fig. 6, S2012 provided in the embodiment of the present invention may specifically include the following S401 to S402.
S401, a server determines a plurality of candidate clustering center cells in each preset area;
wherein the coverage area of each candidate clustering center cell comprises at least one candidate searching behavior; the degree of correlation between each candidate search behavior and the target event is greater than a fourth threshold; the fourth threshold is greater than the first threshold;
as a possible implementation manner, the server determines a plurality of candidate search behaviors from the plurality of target search behaviors in the preset area according to the correlation between the target search behavior and the target event and a fourth threshold, and determines a plurality of candidate clustering center cells from the cells corresponding to the plurality of candidate search behaviors.
S402, the server determines a plurality of clustering center cells from the candidate clustering center cells.
As a possible implementation manner, for each candidate clustering center cell, the server determines a distance between the candidate clustering center cell and another candidate clustering center cell in the plurality of candidate clustering center cells, and determines that the candidate clustering center cell is a clustering center cell when the distance is greater than a third threshold. On the contrary, if the distance is greater than the third threshold, the server determines whether the next candidate clustering center cell belongs to the clustering center cells until a plurality of clustering center cells are determined.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects: the clustering center cell is set to be the cell where the target searching behavior with the correlation degree larger than the fourth threshold value is located, the determined behavior data set can be centered on the target searching behavior with high correlation degree, and then the subsequent cell density and time density can be more accurate, and the accuracy of determining the public opinion heat is improved.
In one design, in order to determine multiple clustering center cells from multiple candidate clustering center cells, as shown in fig. 7, S402 provided in the embodiment of the present invention specifically includes the following S4021 to S4023.
S4021, the server determines the earliest starting time corresponding to each candidate clustering center cell;
the earliest starting time is the earliest starting search time in the starting search times of the candidate search behaviors included in each candidate clustering center cell;
s4022, the server sorts the candidate clustering center cells based on the sequence of the earliest starting time to obtain a sorting result;
illustratively, 20 candidate clustering center cells are included in the preset area, and the server may rank the 20 candidate clustering center cells based on the order of the earliest starting time of each candidate clustering center cell to obtain a ranking result.
S4023, the server determines a plurality of clustering center cells based on the cell positions of the plurality of candidate clustering center cells and the sequence of each candidate clustering center cell in the sequencing result.
As a possible implementation manner, in the process of judging whether each candidate clustering center cell is a clustering center cell, the server sequentially judges whether each candidate clustering center cell is a clustering center cell according to the sequence of the earliest starting time in the sequencing result until determining a plurality of clustering center cells.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects: and sequencing based on the earliest starting time and sequentially judging based on the sequence in the sequencing result, so that the determined clustering center cell is the cell which starts to execute the target searching behavior at the earliest time, and the time intensity of the determined behavior data set can be accurately reflected in the follow-up process.
Because the areas of the preset regions are different in size, the number of the clustering center points possibly determined by different preset regions is different, and further, the local heat standards of different preset regions are different. Therefore, in one design, in order to improve the accuracy of the behavior data sets, in the method for determining public opinion popularity provided by the embodiment of the present invention, the number of the clustering center cells is less than or equal to the maximum clustering number;
wherein the maximum clustering number is positively correlated with the area of each preset region
As a possible implementation manner, the server may determine the maximum number of clusters in the preset region according to the area of the preset region and a preset formula:
Figure BDA0003399474600000171
wherein, N is the maximum clustering number, S is the area of the preset region, and a is the first coefficient.
Illustratively, a may be 200.
In this case, in S402 and S4023, if the number of clustering center cells is smaller than the maximum clustering number, the server does not perform processing; and if the number of the clustering center cells is equal to the maximum clustering number, the server finishes the process of determining the clustering center cells.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects: the corresponding maximum clustering number can be determined according to the area size of the preset area, so that the number of the subsequent behavior data sets of each preset area is related to the area size, and the subsequent local heat summation process can be more accurate.
In one design, in order to determine the local heat corresponding to each behavior data set, as shown in fig. 8, S202 provided in the embodiment of the present invention may specifically include the following S2021 to S2023.
S2021, the server determines a time density value of each behavior data set according to the starting search time included in each behavior data set;
wherein the time density value is inversely related to the time density;
as a possible implementation manner, the server may determine a plurality of durations corresponding to each behavior data set and an average duration corresponding to each behavior data set;
each duration corresponds to one target searching behavior, and each duration is a time interval between the starting searching time and the current time of one target searching behavior; the average duration is an average of a plurality of durations;
further, for each target search behavior in each behavior data set, the server determines the square of the difference between the duration corresponding to each target search behavior and the average duration, and the square is the sub-time density value of each target search behavior;
finally, the server determines the sum of the sub-time density values of the plurality of target search behaviors included in each behavior data set as the time density value of each behavior data set.
In some embodiments, the temporal density value of each behavioral data set satisfies the following formula:
Figure BDA0003399474600000181
wherein, TiThe time density value of the ith behavior data set in the preset area, j is any target searching behavior in the ith behavior data set, NiNumber of target search actions included for the ith action data set, ti,jAnd the duration of the jth target search behavior in the ith behavior data set is t, and the average duration of the ith behavior data set is t.
S2022, the server determines the cell density value of each behavior data set according to the cell position included in each behavior data set;
wherein the cell density value is positively correlated with the cell density;
as a possible implementation manner, the server may determine a cluster center cell of the cell positions of the plurality of target search behaviors included in each behavior data set, and determine a distance between the cell position of each target search behavior and the cluster center cell to obtain a target distance corresponding to each target search behavior;
further, the server determines the cell density value of each behavior data set according to the target distance of the target search behaviors included in each behavior data set.
In some embodiments, the cell density value for each behavior data set satisfies the following formula:
Figure BDA0003399474600000191
wherein M isiThe cell density value of the ith behavior data set in the preset area, j is any target searching behavior in the ith behavior data set, NiNumber of target search actions included for the ith action data set, D(j,Ci)And searching the target distance corresponding to the j target.
S2023, the server determines the local heat corresponding to each behavior data set according to the time density value of each behavior data set and the cell density value of each behavior data set.
As a possible implementation manner, the local heat corresponding to each behavior data set satisfies the following formula:
Figure BDA0003399474600000192
wherein, PiThe local heat, T, of the ith behavior data set in the preset areaiThe time density value M of the ith behavior data set in the preset areaiThe cell density value of the ith behavior data set in the preset area is obtained.
In one design, as shown in fig. 9, an embodiment of the present invention further provides another specific implementation manner in S2022, where the specific implementation manner of determining the cell density value for each behavior data set according to the target distances of the target search behaviors included in each behavior data set includes the following steps S501 to S502:
s501, the server determines the correlation between each target searching behavior and a target event and the ratio of the target distance corresponding to each target searching behavior to obtain the density value of the sub-cell of each target searching behavior;
s502, the server determines the sum of the density values of the sub-cells of the target search behaviors included in each behavior data set, and the sum is the density value of the cell of each behavior data set.
As a possible implementation, the cell density value of each behavior data set further satisfies the following formula:
Figure BDA0003399474600000201
wherein M isiThe cell density value of the ith behavior data set in the preset area, j is any target searching behavior in the ith behavior data set, NiNumber of target search actions included for the ith action data set, D(j,Ci)Target distance, Q, corresponding to jth target search behaviori,jFor the ith behavior data setThe correlation degree between j target searching behaviors and target events, and b is a second coefficient.
For example, the second coefficient b may be 100.
In one design, in order to determine the local heat corresponding to each behavior data set more accurately, S2023 provided in the embodiment of the present invention may specifically include the following steps:
the server determines the local heat corresponding to each behavior data set according to the target number of the target search behaviors in each behavior data set, the cell density value of each behavior data set and the time density value of each behavior data set;
wherein the local heat is positively correlated with the target amount.
As a possible implementation manner, in this case, the local heat corresponding to each behavior data set satisfies the following formula:
Figure BDA0003399474600000202
wherein, PiLocal heat, N, of the ith behavior data set in the preset areaiThe number of target search behaviors included in the ith behavior data set, j is any one target search behavior in the ith behavior data set, ti,jThe duration of the jth target search behavior in the ith behavior data set is defined, t is the average duration of the ith behavior data set, and D(j,Ci)Target distance, Q, corresponding to jth target search behaviori,jAnd b is a second coefficient, wherein the correlation degree between the jth target search behavior and the target event in the ith behavior data set is represented by the second coefficient.
It is understood that, in this case, in S203 provided by the embodiment of the present invention, the heat of the target event may satisfy the following formula:
Figure BDA0003399474600000203
wherein R is the heat of the target event, k is the number of behavior data sets in all the preset regions, i is the ith behavior data set in all the preset regions, and N is the heat of the target eventiThe number of target search behaviors included in the ith behavior data set, j is any one target search behavior in the ith behavior data set, ti,jThe duration of the jth target search behavior in the ith behavior data set is defined, t is the average duration of the ith behavior data set, and D(j,Ci)Target distance, Q, corresponding to jth target search behaviori,jAnd b is a second coefficient, wherein the correlation degree between the jth target search behavior and the target event in the ith behavior data set is represented by the second coefficient.
Fig. 10 is a schematic structural diagram of an apparatus for determining public sentiment popularity according to an embodiment of the present invention. As shown in fig. 10, the device 60 for determining popularity may be deployed in the server for executing the method for determining popularity.
As shown in fig. 10, the apparatus 60 for determining popularity includes an obtaining unit 601 and a determining unit 602;
an obtaining unit 601, configured to obtain multiple behavior data sets for a target event in each preset region; each behavior data set comprises behavior data of a plurality of target search behaviors, the correlation degree between each target search behavior and each target event is larger than a first threshold value, and the behavior data of each target search behavior comprises a cell position corresponding to each target search behavior and a search starting time of each target search behavior; the distance between the cell positions included by any two behavior data in each behavior data set is smaller than a second threshold value;
a determining unit 602, configured to determine, according to the behavior data included in each behavior data set, a local heat corresponding to each behavior data set; the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set respectively, the cell density is used for reflecting the density of the cell positions included in each behavior data set, and the time density is used for reflecting the density of the search starting time included in each behavior data set; the determining unit 602 is further configured to determine, as the heat of the target event, a sum of local heats corresponding to each behavior data set in each preset region.
Optionally, as shown in fig. 10, the obtaining unit 601 provided in the embodiment of the present invention is specifically configured to: determining behavior data of target search behaviors included in each preset area; determining a plurality of clustering center cells in each preset region according to the cell positions corresponding to the target searching behaviors in each preset region; the distance between any two clustering center cells is larger than a third threshold value; and clustering the behavior data of the target search behavior included in each preset region according to a plurality of clustering center cells in each preset region to obtain a plurality of behavior data sets.
Optionally, as shown in fig. 10, the obtaining unit 601 provided in the embodiment of the present invention is specifically configured to: acquiring search texts of original search behaviors included in each preset area; determining the correlation degree between the original search behavior and the target event based on the keywords in the search text of the original search behavior and the description information of the target event; and determining the original searching behavior with the correlation degree larger than a first threshold value as a target searching behavior based on the determined correlation degree, and acquiring behavior data of the target searching behavior.
Optionally, as shown in fig. 10, the obtaining unit 601 provided in the embodiment of the present invention is specifically configured to:
determining a first main word, a second main word and an emotional tendency word in the description information; the relevancy of the first subject word and the target event is greater than the relevancy of the second subject word and the target event;
if at least one of the first subject word and the second subject word exists in the search text, determining the correlation degree between the original search behavior and the target event according to a preset rule; the preset rules include: the correlation degree between the original search behavior and the target event is positively correlated with the number of the at least one subject and the number of the emotional tendency words in the search text respectively.
Optionally, as shown in fig. 10, the obtaining unit 601 provided in the embodiment of the present invention is specifically configured to: determining a plurality of candidate clustering center cells in each preset area; each candidate cluster center cell comprises at least one candidate search behavior within the coverage area; the degree of correlation between each candidate search behavior and the target event is greater than a fourth threshold; the fourth threshold is greater than the first threshold;
from the plurality of candidate cluster center cells, a plurality of cluster center cells are determined.
Optionally, as shown in fig. 10, the obtaining unit 601 provided in the embodiment of the present invention is specifically configured to:
determining the earliest starting time corresponding to each candidate clustering center cell; the earliest starting time is the earliest starting search time in the starting search times of the candidate search behaviors included in each candidate clustering center cell;
sequencing the candidate clustering center cells based on the sequence of the earliest starting time to obtain a sequencing result;
and determining the plurality of clustering center cells based on the cell positions of the plurality of candidate clustering center cells and the sequence of each candidate clustering center cell in the sequencing result.
Optionally, as shown in fig. 10, in the determining apparatus 60 provided in the embodiment of the present invention, the number of the multiple clustering center cells is less than or equal to the maximum clustering number, and the maximum clustering number is positively correlated with the area of each preset region.
Optionally, as shown in fig. 10, the determining unit 602 provided in the embodiment of the present invention is specifically configured to:
determining the time density value of each behavior data set according to the starting search time included in each behavior data set; the time density value is inversely related to the time density;
determining the cell density value of each behavior data set according to the cell position included in each behavior data set; the cell density value is positively correlated with the cell density;
and determining the local heat corresponding to each behavior data set according to the time density value of each behavior data set and the cell density value of each behavior data set.
Optionally, as shown in fig. 10, the determining unit 602 provided in the embodiment of the present invention is specifically configured to:
determining a plurality of duration times corresponding to each behavior data set and an average duration time corresponding to each behavior data set; each duration corresponds to one target searching behavior, and each duration is the time interval between the starting searching time and the current time of one target searching behavior; the average duration is an average of a plurality of durations;
for each target search behavior in each behavior data set, determining the square of the difference value between the duration corresponding to each target search behavior and the average duration, and taking the square as the sub-time density value of each target search behavior;
and determining the sum of the sub-time density values of the plurality of target search behaviors included in each behavior data set as the time density value of each behavior data set.
Optionally, as shown in fig. 10, the determining unit 602 provided in the embodiment of the present invention is specifically configured to:
determining a cluster center cell of the cell positions of the plurality of target searching behaviors included in each behavior data set, and determining the distance between the cell position of each target searching behavior and the cluster center cell to obtain a target distance corresponding to each target searching behavior;
and determining the cell density value of each behavior data set according to the target distance of the target search behaviors included in each behavior data set.
Optionally, as shown in fig. 10, the determining unit 602 provided in the embodiment of the present invention is specifically configured to:
determining the correlation between each target searching behavior and a target event, and obtaining the sub-cell density value of each target searching behavior according to the ratio of the target distance corresponding to each target searching behavior;
and determining the sum of the density values of the sub-cells of the plurality of target search behaviors included in each behavior data set as the density value of the cell of each behavior data set.
The invention provides a method, a device, equipment and a storage medium for determining public sentiment popularity, because the behavior data related to the invention comprises search starting time and cell positions, a server can cluster the search behavior data by combining the cell positions, and can cluster the occurrence places of target search behaviors based on the cell positions of the behavior data, so that a plurality of target data sets obtained by clustering can reflect the cell density degree of target events in a preset area. Further, the server determines the local heat corresponding to each target data set according to the search starting time and the cell position in each target data set, and finally determines the sum of the local heat corresponding to all the behavior data sets as the heat of the target event. Since the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set, the greater the density of the cell positions and the greater the density of the search starting time, the greater the heat of the target event is determined to be. Therefore, the technical scheme provided by the invention can determine the public opinion popularity by combining the time characteristic and the space characteristic of the target searching behavior of the user aiming at the target event, and can improve the accuracy of determining the public opinion popularity to a certain extent.
In the case of implementing the functions of the integrated modules in the form of hardware, the embodiment of the present invention provides a possible structural diagram of a server. The server is used for executing the method for determining the popularity in the embodiment. As shown in fig. 11, the server 70 includes a processor 701, a memory 702, and a bus 703. The processor 701 and the memory 702 may be connected by a bus 703.
The processor 701 is a control center of the communication apparatus, and may be a single processor or a collective term for a plurality of processing elements. For example, the processor 701 may be a Central Processing Unit (CPU), other general-purpose processors, or the like. Wherein a general purpose processor may be a microprocessor or any conventional processor or the like.
For one embodiment, processor 701 may include one or more CPUs, such as CPU 0 and CPU 1 shown in FIG. 11.
The memory 702 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
As a possible implementation, the memory 702 may be present separately from the processor 701, and the memory 702 may be connected to the processor 701 via the bus 703 for storing instructions or program code. The processor 701 may call and execute the instructions or program codes stored in the memory 702 to implement the method for determining the popularity according to the embodiment of the present invention.
In another possible implementation, the memory 702 may also be integrated with the processor 701.
The bus 703 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.
Note that the configuration shown in fig. 11 does not constitute a limitation on the server 70. In addition to the components shown in FIG. 11, the server 70 may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
As an example, in conjunction with fig. 10, the functions implemented by the acquiring unit 601 and the determining unit 602 in the device 70 for determining popularity are the same as those of the processor 701 in fig. 11.
Optionally, as shown in fig. 11, the server provided in the embodiment of the present invention may further include a communication interface 704.
A communication interface 704 for connecting with other devices through a communication network. The communication network may be an ethernet network, a radio access network, a Wireless Local Area Network (WLAN), etc. The communication interface 704 may include a receiving unit for receiving data, and a transmitting unit for transmitting data.
In one design, in the server provided in the embodiment of the present invention, the communication interface may be further integrated in the processor.
Fig. 12 shows another hardware configuration of the server in the embodiment of the present invention. As shown in fig. 12, server 80 may include a processor 801 and a communication interface 802. The processor 801 is coupled to a communication interface 802.
The functions of the processor 801 may refer to the description of the processor 701 above. The processor 801 also has a memory function, and the function of the memory 702 can be referred to.
The communication interface 802 is used to provide data to the processor 801. The communication interface 802 may be an internal interface of the communication device, or may be an external interface (corresponding to the communication interface 704) of the communication device.
It should be noted that the configuration shown in fig. 12 does not constitute a limitation on the server, and that the server 80 may include more or less components than those shown in fig. 12, or combine some components, or a different arrangement of components than those shown in fig. 12.
Through the above description of the embodiments, it is clear for a person skilled in the art that, for convenience and simplicity of description, only the division of the above functional units is illustrated. In practical applications, the above function allocation can be performed by different functional units according to needs, that is, the internal structure of the device is divided into different functional units to perform all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.
The embodiment of the present invention further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed by a computer, the computer executes each step in the method flow shown in the above method embodiment.
Embodiments of the present invention provide a computer program product including instructions, which when executed on a computer, cause the computer to execute the method for determining popularity in the above method embodiments.
The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, and a hard disk. Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM), registers, a hard disk, an optical fiber, a portable Compact disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any other form of computer-readable storage medium, in any suitable combination, or as appropriate in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). In embodiments of the invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Since the apparatus, the device readable storage medium, and the computer program product in the embodiments of the present invention may be applied to the method described above, for technical effects that can be obtained by the apparatus, the apparatus readable storage medium, and the computer program product, reference may also be made to the method embodiments described above, and details of the embodiments of the present invention are not repeated herein.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions within the technical scope of the present invention are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (15)

1. A method for determining public opinion popularity, which is characterized by comprising the following steps:
acquiring a plurality of behavior data sets aiming at a target event in each preset area; each behavior data set comprises behavior data of a plurality of target search behaviors, the correlation degree between each target search behavior and the target event is larger than a first threshold value, and the behavior data of each target search behavior comprises a cell position corresponding to each target search behavior and a search starting time of each target search behavior; the distance between the cell positions included by any two behavior data in each behavior data set is smaller than a second threshold value;
determining the local heat corresponding to each behavior data set according to the behavior data included in each behavior data set; the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set respectively, the cell density is used for reflecting the density degree of the cell positions included in each behavior data set, and the time density is used for reflecting the density degree of the search starting time included in each behavior data set;
and determining the sum of the local heat corresponding to each behavior data set in each preset area as the heat of the target event.
2. The method of claim 1, wherein the obtaining of the behavior data sets for the target events in the preset areas comprises:
determining behavior data of the target search behavior included in each preset region;
determining a plurality of clustering center cells in each preset region according to the cell position corresponding to the target searching behavior included in each preset region; the distance between any two clustering center cells is larger than a third threshold value;
and clustering the behavior data of the target search behavior included in each preset region according to a plurality of clustering center cells in each preset region to obtain a plurality of behavior data sets.
3. The method of claim 2, wherein the determining of the behavior data of the target search behavior included in each of the preset regions includes:
acquiring a search text of the original search behavior included in each preset area;
determining the correlation degree between the original search behavior and the target event based on the keywords in the search text of the original search behavior and the description information of the target event;
and determining the original searching behavior with the correlation degree larger than the first threshold value as the target searching behavior based on the determined correlation degree, and acquiring behavior data of the target searching behavior.
4. The method for determining public opinion popularity according to claim 3, wherein the determining of the correlation between the original search behavior and the target event based on the keyword in the search text of the original search behavior and the description information of the target event comprises:
determining a first main word, a second main word and an emotional tendency word in the description information; the relevancy of the first subject word and the target event is greater than the relevancy of the second subject word and the target event;
if at least one of the first subject word and the second subject word exists in the search text, determining the correlation degree between the original search behavior and the target event according to a preset rule; the preset rules include: the correlation degree between the original search behavior and the target event is positively correlated with the number of the at least one subject and the number of the emotional tendency words in the search text respectively.
5. The method of claim 2, wherein the determining a plurality of cluster center cells in each of the preset regions according to the cell location corresponding to the target search behavior included in each of the preset regions comprises:
determining a plurality of candidate clustering center cells in each preset area; each candidate cluster center cell comprises at least one candidate search behavior within the coverage area; a degree of correlation between each candidate search behavior and the target event is greater than a fourth threshold; the fourth threshold is greater than the first threshold;
determining the plurality of cluster center cells from the plurality of candidate cluster center cells.
6. The method of claim 5, wherein the determining the plurality of cluster center cells from the plurality of candidate cluster center cells comprises:
determining the earliest starting time corresponding to each candidate clustering center cell; the earliest starting time is the earliest starting search time in the starting search times of the candidate search behaviors included in each candidate clustering center cell;
sequencing the candidate clustering center cells based on the sequence of the earliest starting time to obtain a sequencing result;
determining the plurality of clustering center cells based on the cell positions of the plurality of candidate clustering center cells and the rank of each candidate clustering center cell in the ranking result.
7. The method for determining public opinion popularity degree according to any one of claims 2-6, wherein the number of the plurality of clustering center cells is less than or equal to a maximum clustering number, and the maximum clustering number is positively correlated with the area of each preset area.
8. The method for determining public opinion popularity according to claim 1, wherein the determining the local popularity corresponding to each behavior data set according to the behavior data included in each behavior data set comprises:
determining a time density value of each behavior data set according to the starting search time included in each behavior data set; the temporal density value is inversely related to the temporal density;
determining a cell density value of each behavior data set according to the cell position included in each behavior data set; the cell density value is positively correlated with the cell density;
determining the local heat corresponding to each behavior data set according to the time density value of each behavior data set and the cell density value of each behavior data set.
9. The method of claim 8, wherein the determining the time density value of each behavior data set according to the search starting time included in each behavior data set comprises:
determining a plurality of duration corresponding to each behavior data set and an average duration corresponding to each behavior data set; each duration corresponds to one target searching behavior, and each duration is the time interval between the starting searching time and the current time of the target searching behavior; the average duration is an average of the plurality of durations;
for each target search behavior in each behavior data set, determining a square of a difference value between the duration corresponding to each target search behavior and the average duration, and taking the square as a sub-time density value of each target search behavior;
determining a sum of sub-temporal density values of the plurality of target search behaviors included in each behavior data set as the temporal density value of each behavior data set.
10. The method of claim 8, wherein the determining the cell density value of each behavior data set according to the cell location included in each behavior data set comprises:
determining a cluster center cell of the cell positions of the target search behaviors included in each behavior data set, and determining the distance between the cell position of each target search behavior and the cluster center cell to obtain a target distance corresponding to each target search behavior;
and determining the cell density value of each behavior data set according to the target distance of the target search behaviors included in each behavior data set.
11. The method of claim 10, wherein the determining a cell density value for each behavior data set according to the target distance of the target search behaviors included in the behavior data set comprises:
determining the correlation between each target searching behavior and the target event, and the ratio of the target distance corresponding to each target searching behavior to obtain the sub-cell density value of each target searching behavior;
determining a sum of sub-cell density values of the plurality of target search behaviors included in each behavior data set as the cell density value of each behavior data set.
12. The method for determining public opinion popularity according to any one of claims 8-11, wherein the determining the local popularity corresponding to each behavior data set according to the time density value of each behavior data set and the cell density value of each behavior data set comprises:
determining the local heat corresponding to each behavior data set according to the target number of the target search behaviors in each behavior data set, the cell density value of each behavior data set and the time density value of each behavior data set; the local heat is positively correlated with the target amount.
13. A public opinion popularity determination device is characterized by comprising an acquisition unit and a determination unit;
the acquiring unit is used for acquiring a plurality of behavior data sets aiming at the target event in each preset area; each behavior data set comprises behavior data of a plurality of target search behaviors, the correlation degree between each target search behavior and the target event is larger than a first threshold value, and the behavior data of each target search behavior comprises a cell position corresponding to each target search behavior and a search starting time of each target search behavior; the distance between the cell positions included by any two behavior data in each behavior data set is smaller than a second threshold value;
the determining unit is configured to determine, according to the behavior data included in each behavior data set, a local heat corresponding to each behavior data set; the local heat corresponding to each behavior data set is positively correlated with the cell density and the time density of each behavior data set respectively, the cell density is used for reflecting the density degree of the cell positions included in each behavior data set, and the time density is used for reflecting the density degree of the search starting time included in each behavior data set;
the determining unit is further configured to determine, as the heat of the target event, a sum of local heats corresponding to each behavior data set in each of the preset regions.
14. A server, comprising a memory and a processor;
the memory and the processor are coupled;
the memory for storing computer program code, the computer program code comprising computer instructions;
when the processor executes the computer instructions, the server performs the method for determining public opinion popularity according to any one of claims 1 to 12.
15. A computer-readable storage medium having stored therein instructions that, when executed on a server, cause the server to execute the method for determining public opinion popularity according to any one of claims 1-12.
CN202111494031.3A 2021-12-08 2021-12-08 Public opinion popularity determination method, device, equipment and storage medium Active CN114297341B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111494031.3A CN114297341B (en) 2021-12-08 2021-12-08 Public opinion popularity determination method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111494031.3A CN114297341B (en) 2021-12-08 2021-12-08 Public opinion popularity determination method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114297341A true CN114297341A (en) 2022-04-08
CN114297341B CN114297341B (en) 2023-01-24

Family

ID=80964895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111494031.3A Active CN114297341B (en) 2021-12-08 2021-12-08 Public opinion popularity determination method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114297341B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885873A (en) * 2017-11-28 2018-04-06 百度在线网络技术(北京)有限公司 Method and apparatus for output information
CN109657116A (en) * 2018-11-12 2019-04-19 平安科技(深圳)有限公司 A kind of public sentiment searching method, searcher, storage medium and terminal device
CN109684481A (en) * 2019-01-04 2019-04-26 深圳壹账通智能科技有限公司 The analysis of public opinion method, apparatus, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885873A (en) * 2017-11-28 2018-04-06 百度在线网络技术(北京)有限公司 Method and apparatus for output information
CN109657116A (en) * 2018-11-12 2019-04-19 平安科技(深圳)有限公司 A kind of public sentiment searching method, searcher, storage medium and terminal device
CN109684481A (en) * 2019-01-04 2019-04-26 深圳壹账通智能科技有限公司 The analysis of public opinion method, apparatus, computer equipment and storage medium

Also Published As

Publication number Publication date
CN114297341B (en) 2023-01-24

Similar Documents

Publication Publication Date Title
JP7343568B2 (en) Identifying and applying hyperparameters for machine learning
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
WO2020253350A1 (en) Network content publication auditing method and apparatus, computer device and storage medium
CN107784010B (en) Method and equipment for determining popularity information of news theme
CN109033075B (en) Intention matching method and device, storage medium and terminal equipment
US11451927B2 (en) Positioning method, positioning apparatus, server, and computer-readable storage medium
CN105022761A (en) Group search method and apparatus
CN111163072B (en) Method and device for determining characteristic value in machine learning model and electronic equipment
CN111259952A (en) Abnormal user identification method and device, computer equipment and storage medium
CN105893351A (en) Speech recognition method and device
CN116244513B (en) Random group POI recommendation method, system, equipment and storage medium
CN110727769A (en) Corpus generation method and device, and man-machine interaction processing method and device
CN111708890B (en) Search term determining method and related device
CN114297341B (en) Public opinion popularity determination method, device, equipment and storage medium
CN110442696B (en) Query processing method and device
CN112579422A (en) Scheme testing method and device, server and storage medium
CN110209804B (en) Target corpus determining method and device, storage medium and electronic device
CN114970559B (en) Intelligent response method and device
CN113656575B (en) Training data generation method and device, electronic equipment and readable medium
CN115098655A (en) Common question answering method, system, equipment and medium
CN111475409B (en) System test method, device, electronic equipment and storage medium
CN110096649B (en) Post extraction method, device, equipment and storage medium
CN110856253B (en) Positioning method, positioning device, server and storage medium
CN116306622B (en) AIGC comment system for improving public opinion atmosphere
CN110717011B (en) Session message processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant