CN113779239A - Hotspot information acquisition method and device - Google Patents
Hotspot information acquisition method and device Download PDFInfo
- Publication number
- CN113779239A CN113779239A CN202110105380.5A CN202110105380A CN113779239A CN 113779239 A CN113779239 A CN 113779239A CN 202110105380 A CN202110105380 A CN 202110105380A CN 113779239 A CN113779239 A CN 113779239A
- Authority
- CN
- China
- Prior art keywords
- information
- clustering
- data
- cluster
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 238000012163 sequencing technique Methods 0.000 claims abstract description 6
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 230000008451 emotion Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 101150060512 SPATA6 gene Proteins 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 13
- 230000007774 longterm Effects 0.000 abstract description 7
- 238000002955 isolation Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000015654 memory Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 230000036506 anxiety Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a device for acquiring hotspot information, and relates to the technical field of computers. One embodiment of the method comprises: preprocessing information input by a user in a streaming computing mode to extract feature data of the information, wherein the feature data comprise background features, semantic features and clustering features of the information; dividing the information into data sets based on preset dimensionality and the characteristic data, grouping the data sets into cluster clusters according to the characteristic data, and sequencing the total amount of the information in the cluster clusters to obtain hotspot information. According to the embodiment, the text features and key information are quickly extracted through stream type calculation, the text is stored in a structured mode, so that the clustering speed is extremely high, and the calculation of multi-dimensional and long-term data is supported based on a fast and slow library isolation mode, so that massive consultation texts can be quickly analyzed, and the application scene of acquiring hot spot information is expanded.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a hotspot information acquisition method and device.
Background
The online customer service system is a general name of webpage instant messaging software. By analyzing various structured and unstructured data in the consultation scene, various marketing related data can be obtained, and the data are applied to pre-sales services, so that the conversion rate of merchants can be effectively improved. On the other hand, hotspot tracking is a common marketing operation tool and is widely applied to various scenes such as community networks, e-commerce, financial industry, news industry and the like.
In the process of implementing the invention, the inventor finds that the existing intelligent hotspot analysis system has the defects of simpler generated result, difficulty in analyzing public sentiment, incapability of being effectively reused for user consultation in the e-commerce field, complex calculation, overlong time consumption, overhigh clustering similarity and the like.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for acquiring hotspot information, which can quickly extract text features and key information through streaming calculation, store a text in a structured manner so that a clustering speed is extremely high, and support calculation of multidimensional and long-term data based on a fast-slow library isolation manner, so that a large amount of consulting texts can be quickly analyzed, and an application scenario of acquiring hotspot information is expanded.
In order to achieve the above object, according to a first aspect of the embodiments of the present invention, there is provided a hotspot information acquiring method, including:
preprocessing information input by a user in a streaming computing mode to extract feature data of the information, wherein the feature data comprise background features, semantic features and clustering features of the information;
dividing the information into data sets based on preset clustering dimensions and the characteristic data, and grouping the data sets into clustering clusters according to the characteristic data; and
and sequencing the total information in the cluster to obtain the hotspot information.
Optionally, the method according to the first aspect of embodiments of the invention, wherein,
and grouping the data sets into cluster clusters according to the cluster characteristics.
Optionally, the method according to the first aspect of embodiments of the invention, wherein,
and before preprocessing the information, filtering the information according to the occurrence scene of the information so as to reduce the amount of preprocessed data.
Optionally, the method according to the first aspect of embodiments of the invention, wherein,
performing key entity extraction on the information to obtain the clustering characteristics,
collecting a background attribute of the information as the background feature, an
And performing semantic recognition on the information to extract the semantic features.
Optionally, the method according to the first aspect of embodiments of the invention, wherein,
the background features include merchants, inventory holding units, categories, and brands,
the semantic features include textual emotions, textual intentions,
the clustering features include problem points, commodity parameters, operation methods, and appeal of users, and
the cluster dimension includes at least one of the background feature and the semantic feature.
Optionally, the method according to the first aspect of embodiments of the invention, wherein,
storing the preprocessed information in a first storage device in a structured storage manner within a specific time threshold, and
migrating the preprocessed information exceeding the specific time threshold to a second storage device in a structured storage mode.
Optionally, the method according to the first aspect of embodiments of the invention, wherein,
and selecting to acquire hotspot information from the first storage device and/or the second storage device according to a preset clustering period.
Optionally, the method according to the first aspect of embodiments of the invention, wherein,
and when the clustering cycle is larger than the specific time threshold, pre-clustering the information which is in accordance with the clustering cycle in the second storage device according to text content, merging the pre-clustered information with the information in the first storage device, and dividing a data set.
Optionally, the method according to the first aspect of the embodiments of the present invention further includes:
selecting specific information in each cluster as the hot spot information as representative information of the cluster, and
and taking the dimension and the Hash sequence of the clustering characteristics as the identification of the clustering cluster.
According to a second aspect of the embodiments of the present invention, there is provided a hotspot information acquiring device, including:
the system comprises a characteristic data acquisition module, a data processing module and a data processing module, wherein the characteristic data acquisition module is used for preprocessing information input by a user in a streaming computing mode so as to extract characteristic data of the information, and the characteristic data comprises background characteristics, semantic characteristics and clustering characteristics of the information;
an information clustering module for dividing the information into data sets based on preset clustering dimensions and the characteristic data, and grouping the data sets into clustering clusters according to the characteristic data, an
And the hot spot information acquisition module is used for sequencing the total information in the cluster to acquire the hot spot information.
Optionally, the apparatus according to the second aspect of the embodiments of the invention, wherein,
and grouping the data sets into cluster clusters according to the cluster characteristics.
Optionally, the apparatus according to the second aspect of the embodiments of the invention, wherein,
and before preprocessing the information, filtering the information according to the occurrence scene of the information so as to reduce the amount of preprocessed data.
Optionally, the apparatus according to the second aspect of the embodiments of the invention, wherein,
performing key entity extraction on the information to obtain the clustering characteristics,
collecting a background attribute of the information as the background feature, an
And performing semantic recognition on the information to extract the semantic features.
Optionally, the apparatus according to the second aspect of the embodiments of the invention, wherein,
the background features include merchants, inventory holding units, categories, and brands,
the semantic features include textual emotions, textual intentions,
the clustering features include problem points, commodity parameters, operation methods, and appeal of users, and
the cluster dimension includes at least one of the background feature and the semantic feature.
Optionally, the apparatus according to the second aspect of the embodiments of the invention, wherein,
storing the preprocessed information in a first storage device in a structured storage manner within a specific time threshold, and
migrating the preprocessed information exceeding the specific time threshold to a second storage device in a structured storage mode.
Optionally, the apparatus according to the second aspect of the embodiments of the invention, wherein,
and selecting to acquire hotspot information from the first storage device and/or the second storage device according to a preset clustering period.
Optionally, the apparatus according to the second aspect of the embodiments of the invention, wherein,
and when the clustering cycle is larger than the specific time threshold, pre-clustering the information which is in accordance with the clustering cycle in the second storage device according to text content, merging the pre-clustered information with the information in the first storage device, and dividing a data set.
Optionally, the apparatus according to the second aspect of the embodiments of the invention, wherein,
selecting specific information in each cluster as the hot spot information as representative information of the cluster, and
and taking the dimension and the Hash sequence of the clustering characteristics as the identification of the clustering cluster.
According to a third aspect of the embodiments of the present invention, there is provided an electronic device for acquiring hotspot information, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method as described in the first aspect of embodiments of the invention.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, performs the method according to the first aspect of embodiments of the present invention.
One embodiment of the above invention has the following advantages or benefits: according to the technical scheme of the embodiment of the invention, the text characteristics and key information can be quickly extracted through stream type calculation, the text is stored in a structured mode, so that the clustering speed is extremely high, and the calculation of multi-dimensional and long-term data is supported based on a fast and slow library isolation mode, so that massive consultation texts can be quickly analyzed, and the application scene of acquiring hot spot information is expanded.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a hotspot information acquisition method according to an embodiment of the invention;
fig. 2 is a schematic diagram of a main flow of a feature data acquisition step of a hotspot information acquisition method according to an embodiment of the invention;
fig. 3 is a schematic diagram of main modules of a hotspot information acquisition device according to an embodiment of the invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a hotspot information acquiring method according to an embodiment of the present invention, and as shown in fig. 1, the hotspot information acquiring method according to the embodiment of the present invention includes: a characteristic data acquisition step S101, an information clustering step S102 and a hotspot information acquisition step S103.
Characteristic data acquisition step S101
In the feature data acquisition step S101, information input by a user is preprocessed to acquire feature data of the information.
As shown in fig. 2, the preprocessing includes a clustering feature extraction step S1011, a semantic feature recognition step S1012, and a background feature collection step S1013 to obtain a background feature (i.e., a static feature), a semantic feature (i.e., a dynamic feature), and a clustering feature of the information, respectively.
Clustering features refer to key entities of information. Specifically, taking information input by a user as an example of a text, in the clustering feature extraction step S1011, a key entity of the text is extracted as a clustering feature by a feature recognition model of keyword extraction or pre-training. The clustering characteristics can include problem points, goods parameters, operation modes, appeal of the users and the like of the users.
Semantic features refer to features that change as the information content changes, such as mood, intent, need, and the like. In the semantic feature recognition step S1012, a text emotion, a text intention, or the like is recognized by a semantic recognition model (for example, an emotion model, an intention model, or another model) or the like as a semantic feature of the text. Textual emotions may include anxiety, anger, satisfaction, and the like. Textual intent may include price, activity, logistics, and the like.
The background feature refers to an inherent feature of the information, and does not change with the change of the information content. In the background feature collection step S1013, background attributes of information, such as a scene (a merchant, a category, a brand, and the like) where a conversation occurs, are collected as background features of the text.
Alternatively, when the information input by the user is not text (e.g., speech), the information may be pre-processed by converting the non-text information into text in advance.
Alternatively, as shown in fig. 2, before the information input by the user is preprocessed, the information may be filtered by a method such as rule matching and text classification model according to the information occurrence scenario such as the business features. For example, when a conversation occurs in an e-commerce scenario, only the question sentence in the message text needs to be retained. By this step, the amount of data to be processed in the preprocessing step can be reduced.
Alternatively, as shown in fig. 2, after the user inputs the information, the information may be transmitted to the information gateway in a streaming manner through the message middleware to perform streaming calculation on the information.
Alternatively, as shown in fig. 2, after the information input by the user is preprocessed, the clustering feature, the semantic feature and the background feature of the information may be supplemented in the text and distributed and stored in a structured manner.
Alternatively, as shown in fig. 2, the preprocessed information within a certain time threshold may be stored in a short term memory buffer to facilitate fast data access. Specific examples of short term memory caches may include remote dictionary services (redis) and relational database management systems (mysql), among others.
Optionally, as shown in fig. 2, the preprocessed information that exceeds a specific time threshold may also be migrated to the long-term mass storage, so as to implement storage of mass data. Examples of long-term mass storage may include distributed file systems (hdfs), distributed document databases (elasticsearch), and the like.
In the feature data obtaining step S101, the text is preprocessed while the text of the information is generated in a streaming calculation manner, and fast clustering of the text can be supported in a manner of distributing a fast-slow storage database.
Information clustering step S102
In the information clustering step S102, a clustering dimension (hereinafter, sometimes simply referred to as "dimension") is set in advance. The dimensions will determine the range and magnitude of data selected in the database. In particular, the dimensions may include at least one of semantic and contextual features of the information. For example, a "merchant" in the background feature may be selected as a dimension, or a textual emotion (e.g., "angry") in the semantic feature may also be selected as a dimension, or a combination of the background and semantic features may be selected, e.g., "merchant" + "brand" + "angry" as a dimension.
The data set is partitioned according to preset dimensions. For example, when the dimension is a merchant, then the data for each merchant is divided into the data set for that merchant. The data sets are then grouped into clusters according to the clustering characteristics of the information.
Alternatively, in the information clustering step S102, in addition to the preset dimension, a clustering cycle may be preset. The clustering cycle will determine the database and corresponding data partitions queried during data acquisition.
For example, when the set clustering period is short, for example, smaller than a specific time threshold, the required data set may be directly obtained from the memory cache according to the dimension.
On the other hand, when the set clustering period is long, for example, greater than a specific time threshold, the historical data may be acquired from the long-term mass storage, then merged with the data in the cache, and then the data set is divided.
Optionally, after obtaining the historical data, the historical data may be pre-clustered according to information content, and then the pre-clustered historical data is merged with the data in the cache to reduce the amount of subsequent calculation.
Hotspot information acquisition step S103
In the hot spot information obtaining step S103, the number of pieces of information in each cluster is summed, and then sorted according to the total amount of information in the cluster to obtain hot spot data. For example, the top N (N is a natural number) of the total amount of information may be set as hotspot information.
Through preliminary tests, the hot spot information acquisition method can complete hot spot information acquisition tasks of 7 ten million magnitude and 20 ten thousand dimensionalities within 5 minutes.
Optionally, in each cluster as the hot spot information, specific information in the cluster, for example, a piece of information with the largest number, is selected as representative information of the cluster. Meanwhile, the dimension of the clustering cluster and the Hash sequence of the clustering characteristics are used as the identification of the clustering cluster, so that the tracking and the source tracing of the hotspot information are facilitated.
Fig. 3 shows an apparatus for implementing the hotspot information acquisition method of the present invention. As shown in fig. 3, the hotspot information acquiring device 300 according to the embodiment of the present invention includes:
the feature data acquisition module 301 is configured to pre-process information input by a user in a streaming computing manner to extract feature data of the information, where the feature data includes a background feature, a semantic feature, and a clustering feature of the information;
an information clustering module 302, which divides the information into data sets based on preset clustering dimensions and the characteristic data, and groups the data sets into clustering clusters according to the characteristic data; and
and a hotspot information acquiring module 303, configured to sort the total amount of information in the cluster to acquire hotspot information.
Optionally, the hotspot information acquiring device of the present invention further includes an information gateway module. The information gateway module can filter the first information through the modes of rule matching, text classification models and the like according to the service characteristics. For example, when a conversation occurs in an e-commerce scenario, only the question sentence in the message text needs to be retained. By this step, the amount of data to be processed in the preprocessing step can be reduced.
Optionally, the hotspot information acquiring device of the present invention further includes a message middleware. After the user inputs the information, the information can be transmitted to the information gateway in a streaming way through the message middleware so as to carry out streaming calculation on the information.
Fig. 4 shows an exemplary system architecture 400 to which the hotspot information acquisition method or the hotspot information acquisition device of the embodiments of the invention can be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405. The network 404 serves as a medium for providing communication links between the terminal devices 401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 405 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 401, 402, 403. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the hot spot information obtaining method provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the hot spot information obtaining apparatus is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a characteristic data acquisition module, an information clustering module and a hotspot information acquisition module. The names of these modules do not in some cases constitute a limitation on the modules themselves, and for example, the feature data acquisition module may also be described as a "module that preprocesses information input by a user".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: preprocessing information input by a user in a streaming computing mode to extract feature data of the information, wherein the feature data comprise background features, semantic features and clustering features of the information; dividing the information into data sets based on preset dimensionality and the characteristic data, and grouping the data sets into cluster clusters according to the characteristic data; and sequencing the total information in the cluster to obtain the hotspot information.
According to the technical scheme of the embodiment of the invention, text features and key information can be quickly extracted through stream type calculation, the text is stored in a structured mode, so that the clustering speed is extremely high, and the calculation of multi-dimensional and long-term data is supported based on a fast and slow library isolation mode, so that massive consultation texts can be quickly analyzed, and the innovative design of the marketing capacity of a merchant is enabled.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (12)
1. A hotspot information acquisition method is characterized by comprising the following steps:
preprocessing information input by a user in a streaming computing mode to extract feature data of the information, wherein the feature data comprise background features, semantic features and clustering features of the information;
dividing the information into data sets based on preset clustering dimensions and the characteristic data, and grouping the data sets into clustering clusters according to the characteristic data; and
and sequencing the total information in the cluster to obtain the hotspot information.
2. The method of claim 1,
and grouping the data sets into cluster clusters according to the cluster characteristics.
3. The method of claim 1,
and before preprocessing the information, filtering the information according to the occurrence scene of the information so as to reduce the amount of preprocessed data.
4. The method of claim 1,
performing key entity extraction on the information to obtain the clustering characteristics,
collecting a background attribute of the information as the background feature, an
And performing semantic recognition on the information to extract the semantic features.
5. The method of claim 1,
the background features include merchants, inventory holding units, categories, and brands,
the semantic features include textual emotions, textual intentions,
the clustering features include problem points, commodity parameters, operation methods, and appeal of users, and
the cluster dimension includes at least one of the background feature and the semantic feature.
6. The method of claim 1,
storing the preprocessed information in a first storage device in a structured storage manner within a specific time threshold, and
migrating the preprocessed information exceeding the specific time threshold to a second storage device in a structured storage mode.
7. The method of claim 6, further comprising:
and selecting to acquire hotspot information from the first storage device and/or the second storage device according to a preset clustering period.
8. The method of claim 7, wherein when the clustering cycle is greater than the specific time threshold, the information in the second storage device corresponding to the clustering cycle is pre-clustered according to text content, and the pre-clustered information is merged with the information in the first storage device before dividing the data set.
9. The method of claim 1, further comprising:
selecting specific information in each cluster as the hot spot information as representative information of the cluster, and
and taking the dimension and the Hash sequence of the clustering characteristics as the identification of the clustering cluster.
10. A hotspot information acquisition device, comprising:
the system comprises a characteristic data acquisition module, a data processing module and a data processing module, wherein the characteristic data acquisition module is used for preprocessing information input by a user in a streaming computing mode so as to extract characteristic data of the information, and the characteristic data comprises background characteristics, semantic characteristics and clustering characteristics of the information;
an information clustering module for dividing the information into data sets based on preset clustering dimensions and the characteristic data, and grouping the data sets into clustering clusters according to the characteristic data, an
And the hot spot information acquisition module is used for sequencing the total information in the cluster to acquire the hot spot information.
11. An electronic device for hotspot information acquisition, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110105380.5A CN113779239A (en) | 2021-01-26 | 2021-01-26 | Hotspot information acquisition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110105380.5A CN113779239A (en) | 2021-01-26 | 2021-01-26 | Hotspot information acquisition method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113779239A true CN113779239A (en) | 2021-12-10 |
Family
ID=78835455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110105380.5A Pending CN113779239A (en) | 2021-01-26 | 2021-01-26 | Hotspot information acquisition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113779239A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116340991A (en) * | 2023-02-02 | 2023-06-27 | 魔萌动漫文化传播(深圳)有限公司 | Big data management method and device for IP gallery material resources and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776713A (en) * | 2016-11-03 | 2017-05-31 | 中山大学 | It is a kind of based on this clustering method of the Massive short documents of term vector semantic analysis |
CN107038156A (en) * | 2017-04-28 | 2017-08-11 | 北京清博大数据科技有限公司 | A kind of hot spot of public opinions Forecasting Methodology based on big data |
WO2018086401A1 (en) * | 2016-11-14 | 2018-05-17 | 平安科技(深圳)有限公司 | Cluster processing method and device for questions in automatic question and answering system |
CN108959484A (en) * | 2018-06-21 | 2018-12-07 | 中国人民解放军战略支援部队信息工程大学 | More tactful media data filtration methods and its device towards event detection |
CN109492109A (en) * | 2018-11-22 | 2019-03-19 | 北京神州泰岳软件股份有限公司 | A kind of information hot spot method for digging and device |
CN110111084A (en) * | 2019-05-16 | 2019-08-09 | 上饶市中科院云计算中心大数据研究院 | A kind of government affairs service hotline analysis method and system |
CN110210557A (en) * | 2019-05-31 | 2019-09-06 | 南京工程学院 | A kind of online incremental clustering method of unknown text under real-time streams tupe |
CN110297988A (en) * | 2019-07-06 | 2019-10-01 | 四川大学 | Hot topic detection method based on weighting LDA and improvement Single-Pass clustering algorithm |
-
2021
- 2021-01-26 CN CN202110105380.5A patent/CN113779239A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776713A (en) * | 2016-11-03 | 2017-05-31 | 中山大学 | It is a kind of based on this clustering method of the Massive short documents of term vector semantic analysis |
WO2018086401A1 (en) * | 2016-11-14 | 2018-05-17 | 平安科技(深圳)有限公司 | Cluster processing method and device for questions in automatic question and answering system |
CN107038156A (en) * | 2017-04-28 | 2017-08-11 | 北京清博大数据科技有限公司 | A kind of hot spot of public opinions Forecasting Methodology based on big data |
CN108959484A (en) * | 2018-06-21 | 2018-12-07 | 中国人民解放军战略支援部队信息工程大学 | More tactful media data filtration methods and its device towards event detection |
CN109492109A (en) * | 2018-11-22 | 2019-03-19 | 北京神州泰岳软件股份有限公司 | A kind of information hot spot method for digging and device |
CN110111084A (en) * | 2019-05-16 | 2019-08-09 | 上饶市中科院云计算中心大数据研究院 | A kind of government affairs service hotline analysis method and system |
CN110210557A (en) * | 2019-05-31 | 2019-09-06 | 南京工程学院 | A kind of online incremental clustering method of unknown text under real-time streams tupe |
CN110297988A (en) * | 2019-07-06 | 2019-10-01 | 四川大学 | Hot topic detection method based on weighting LDA and improvement Single-Pass clustering algorithm |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116340991A (en) * | 2023-02-02 | 2023-06-27 | 魔萌动漫文化传播(深圳)有限公司 | Big data management method and device for IP gallery material resources and electronic equipment |
CN116340991B (en) * | 2023-02-02 | 2023-11-07 | 魔萌动漫文化传播(深圳)有限公司 | Big data management method and device for IP gallery material resources and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Surya et al. | Sentimental analysis using Naive Bayes classifier | |
JP6511487B2 (en) | Method and apparatus for information push | |
CN107797982B (en) | Method, device and equipment for recognizing text type | |
CN106649890B (en) | Data storage method and device | |
CN107679217B (en) | Associated content extraction method and device based on data mining | |
CN107908616B (en) | Method and device for predicting trend words | |
CN111538837A (en) | Method and device for analyzing enterprise operation range information | |
CN107908662B (en) | Method and device for realizing search system | |
CN111339295A (en) | Method, apparatus, electronic device and computer readable medium for presenting information | |
CN112148841B (en) | Object classification and classification model construction method and device | |
CN105701182A (en) | Information pushing method and apparatus | |
CN111400436A (en) | Search method and device based on user intention recognition | |
CN111861596A (en) | Text classification method and device | |
CN113051380A (en) | Information generation method and device, electronic equipment and storage medium | |
CN116739626A (en) | Commodity data mining processing method and device, electronic equipment and readable medium | |
CN111581378A (en) | Method and device for establishing user consumption label system based on transaction data | |
CN111078849A (en) | Method and apparatus for outputting information | |
CN112667770A (en) | Method and device for classifying articles | |
CN114445179A (en) | Service recommendation method and device, electronic equipment and computer readable medium | |
CN111538817A (en) | Man-machine interaction method and device | |
CN113779239A (en) | Hotspot information acquisition method and device | |
CN114860667B (en) | File classification method, device, electronic equipment and computer readable storage medium | |
CN110852078A (en) | Method and device for generating title | |
CN110472055B (en) | Method and device for marking data | |
KR20230059364A (en) | Public opinion poll system using language model and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |