CN110297995B - Method and device for collecting information - Google Patents

Method and device for collecting information Download PDF

Info

Publication number
CN110297995B
CN110297995B CN201910597659.2A CN201910597659A CN110297995B CN 110297995 B CN110297995 B CN 110297995B CN 201910597659 A CN201910597659 A CN 201910597659A CN 110297995 B CN110297995 B CN 110297995B
Authority
CN
China
Prior art keywords
user
identity
terminal
unregistered user
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910597659.2A
Other languages
Chinese (zh)
Other versions
CN110297995A (en
Inventor
俞明轩
尹路
吴志殿
于复淮
韩艺兰
王敏
秦帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN201910597659.2A priority Critical patent/CN110297995B/en
Publication of CN110297995A publication Critical patent/CN110297995A/en
Application granted granted Critical
Publication of CN110297995B publication Critical patent/CN110297995B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9562Bookmark management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the disclosure discloses a method and a device for collecting information. One embodiment of the method comprises: receiving a request for collecting search results, which is sent by a terminal of an unregistered user; detecting whether an identity mark for marking an unregistered user exists in a cache of a terminal; if the user identity is not found, inserting an identity used for identifying the user who does not log in into a cache of the terminal; and storing the search result as the collected content of the unregistered user and the identity of the unregistered user in a correlation mode. The embodiment realizes the collection service of the unregistered user, ensures that the high-efficiency operation of the cloud collection function can be ensured under the condition of mass data (the number of users reaches tens of millions to hundreds of millions of orders of magnitude and the collection content of a single user is more (thousands to tens of thousands), and has no substantial influence on the operation efficiency of the online search engine.

Description

Method and device for collecting information
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and apparatus for collecting information.
Background
The internet collection technology has become an increasingly popular technology for the application in the it industry, and the common collection technology implementation means at present are divided into local collection and cloud collection.
The local collection is generally implemented by storing the webpage url or service content information that the user wishes to collect to a local hard disk. The disadvantage of the scheme is obvious, namely, when the user leaves the computer or the mobile phone of the user and operates the computer or the mobile phone of the third party, the user cannot acquire the originally collected data information. And the local data is easy to damage and difficult to recover after damage.
The cloud collection means that collection data information of a user is stored in a cloud collection server provided by a service party through a network, and the scheme has two bottlenecks at present:
1. the implementation of the collection function strongly depends on the user login account, that is, only the user who owns the application account can implement the function of collecting data.
2. If the collection function is used in a massive data scene, the time consumed for organizing, maintaining, searching and operating the user collection data of the cloud collection server will be obviously increased due to the fact that the user is too large in size, and therefore the quality of service provided for the user is seriously affected.
Disclosure of Invention
The embodiment of the disclosure provides a method and a device for collecting information.
In a first aspect, an embodiment of the present disclosure provides a method for collecting information, including: receiving a request for collecting search results, which is sent by a terminal of an unregistered user; detecting whether an identity mark for marking an unregistered user exists in a cache of a terminal; if the user identity is not found, inserting an identity used for identifying the user who does not log in into a cache of the terminal; and storing the search result as the collected content of the unregistered user in association with the identity of the unregistered user.
In some embodiments, the method further comprises: in response to receiving a request sent by a terminal for viewing the collected content by an unregistered user, acquiring an identity of the unregistered user from a cache of the terminal; searching for collection content stored in association with the identity of the unregistered user; and returning the searched collection content to the terminal.
In some embodiments, the method further comprises: in response to receiving a request sent by a terminal for deleting target collection content from an unregistered user, acquiring an identity of the unregistered user from a cache of the terminal; and deleting the target collection content stored in association with the identity of the unregistered user.
In some embodiments, the method further comprises: the method comprises the steps of responding to a login request of a registered user sent by a received terminal, and obtaining an identity of an unregistered user in a cache of the terminal; storing the collection content stored in association with the identity of the user who does not log in and the identity of the registered user in association; and deleting the collection content stored in association with the identity of the unregistered user.
In some embodiments, the method further comprises: in response to detecting that the favorite content stored in association with the identity of the unregistered user is not updated within a predetermined time, deleting the favorite content stored in association with the identity of the unregistered user.
In some embodiments, the method further comprises: responding to a search request sent by a terminal of an unregistered user or a registered user, and acquiring at least one search result according to the search request; matching at least one search result with collection contents stored in association with the identity of the unregistered user or the registered user, and marking the search result which is successfully matched as collected; at least one search result is returned to the terminal along with the favorite indicia such that the favorite search results are displayed on the terminal in a predetermined manner.
In a second aspect, embodiments of the present disclosure provide a system for collecting information, comprising: a collection server configured to perform the method of any of the first aspects when executed; and the search server is configured to respond to the received search request sent by the terminal of the unregistered user or the registered user and return the search result to the terminal.
In some embodiments, the search server is further configured to: responding to a search request sent by a terminal of an unregistered user or a registered user, and acquiring at least one search result according to the search request; acquiring collection contents stored in association with the identity of an unregistered user or a registered user from a collection server; storing the acquired collection content in a memory of a search server in a hash table mode; matching at least one search result with collection contents stored in association with the identity of the unregistered user or the registered user, and marking the search result which is successfully matched as collected; at least one search result is returned to the terminal along with the favorite indicia such that the favorite search results are displayed on the terminal in a predetermined manner.
In some embodiments, the collection server selects a key-value database in the non-relational database for data storage.
In some embodiments, the database of the favorite server includes a plurality of database server clusters distributed throughout, the plurality of database server clusters having a real-time cross-write function such that data updated by a user under one database server cluster can be shared to other database server clusters.
In a third aspect, an embodiment of the present disclosure provides an apparatus for collecting information, including: a receiving unit configured to receive a request for collecting search results transmitted by a terminal of an unregistered user; the detection unit is configured to detect whether an identity identifier for identifying an unregistered user exists in a cache of the terminal; the inserting unit is configured to insert an identity identifier for identifying an unregistered user into a cache of the terminal if the identity identifier does not exist; and the storage unit is configured to store the search result as the collected content of the unregistered user and the identity of the unregistered user in a correlation mode.
In some embodiments, the apparatus further comprises a viewing unit configured to: in response to receiving a request sent by a terminal for viewing the collected content by an unregistered user, acquiring an identity of the unregistered user from a cache of the terminal; searching for collection content stored in association with the identity of the unregistered user; and returning the searched collection content to the terminal.
In some embodiments, the apparatus further comprises a deletion unit configured to: in response to receiving a request sent by a terminal for deleting target collection content from an unregistered user, acquiring an identity of the unregistered user from a cache of the terminal; and deleting the target collection content stored in association with the identity of the unregistered user.
In some embodiments, the apparatus further comprises a merging unit configured to: the method comprises the steps of responding to a login request of a registered user sent by a received terminal, and obtaining an identity of an unregistered user in a cache of the terminal; storing the collection content stored in association with the identity of the user who does not log in and the identity of the registered user in association; and deleting the collection content stored in association with the identity of the unregistered user.
In some embodiments, the apparatus further comprises a cleaning unit configured to: in response to detecting that the favorite content stored in association with the identity of the unregistered user is not updated within a predetermined time, deleting the favorite content stored in association with the identity of the unregistered user.
In some embodiments, the apparatus further comprises a marking unit configured to: responding to a search request sent by a terminal of an unregistered user or a registered user, and acquiring at least one search result according to the search request; matching at least one search result with collection contents stored in association with the identity of the unregistered user or the registered user, and marking the search result which is successfully matched as collected; at least one search result is returned to the terminal along with the favorite indicia such that the favorite search results are displayed on the terminal in a predetermined manner.
In a fourth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the first aspects.
In a fifth aspect, embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method as in any one of the first aspect.
According to the method and the device for collecting information, under the condition that a user logs in or does not log in an account of a search engine, the searched content can be collected to the cloud collection server through different requests, or the user collection information stored in the cloud collection server can be checked and deleted, and the existing search engine can mark the user collection content. The architecture design of the cloud server ensures that the high-efficiency operation of the cloud collection server function can be ensured under the condition of mass data (the number of users reaches tens of millions to hundreds of millions of orders and the collection content of a single user is more (thousands to tens of thousands), and the operation efficiency of an online search engine is not substantially influenced.
Drawings
Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for collecting information, according to the present disclosure;
3a, 3b, 3c are schematic diagrams of one application scenario of a method for favoring information in accordance with the present disclosure;
FIG. 4 is a flow diagram of yet another embodiment of a method for collecting information in accordance with the present disclosure;
FIG. 5 is a schematic diagram illustrating one embodiment of an apparatus for collecting information according to the present disclosure;
FIG. 6 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the disclosed method for collecting information or apparatus for collecting information may be applied.
As shown in FIG. 1, the system architecture 100 may include a terminal 101, a favorites server 102, and a search server 103. The terminal 101, the favorite server 102, and the search server 103 are connected in communication via a network. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal 101 to interact with the favorites server 102 and the search server 103 over a network to receive or send messages, etc. The terminal 101 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal 101 may be hardware or software. When the terminal 101 is hardware, it may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, mpeg Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, mpeg Audio Layer 4), a laptop computer, a desktop computer, and the like. When the terminal 101 is software, it can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The collection server 102 is composed of a plurality of layers of reverse proxy servers, a server function code module of each server in the collection server cluster and a database. In the embodiment, the server function code module is jointly implemented by php and c + + languages.
The remote database of the collection server 102 selects a key-value library in the non-relational database to store data. The advantage of this type of database is that the data lookup efficiency does not increase with the increase in the data inventory of the database; therefore, under the scene of mass data, the database can still ensure extremely high data searching efficiency, thereby ensuring the operating efficiency of the collection server. In this embodiment, Redis is selected as the remote database of the cloud server.
The database of the collection server 102 is formed by a plurality of server clusters distributed throughout the country, and in order to ensure the consistency of data acquired by users in different regions throughout the country, each database server cluster has a real-time mutual writing function, so that the data updated by users in one database server cluster can be shared by all other database server clusters. In this embodiment, three database server clusters may be arranged and distributed in the north, south and south of the country.
The search server 103 provides a search engine that returns search results according to a search request of a user. In addition, during the process of searching and browsing by the user, the search engine can mark the collected content of the user for the user. The realization principle is as follows: in the process of providing a search result for a user by a search engine, the search engine can acquire user collection contents in a collection server database through an identity of an unregistered user or an identity of a registered user, store the collection contents in a memory of the search server in a hash table mode, compare the return data of the search engine with the user collection contents, mark the return data of the search engine collected by the user, and display the return data to the user.
The reason for storing the collection data in the memory of the search server is as follows: under the scheme, the search engine can replace high-frequency access of the search server to the collection server by single access to the collection server, and meanwhile, the operating pressure of the search server and the collection server is reduced.
The reason why the collected data is stored in the memory of the search server in the form of the hash table is as follows: under the scheme, the time complexity of the search engine for comparing the returned data with the collected data can be minimized, so that the influence of the collection display function on the operation efficiency of the search engine is minimized.
It should be noted that the method for collecting information provided by the embodiment of the disclosure is generally executed by the collection server 102, and accordingly, the device for collecting information is generally disposed in the collection server 102. The search server 103 may also be used as a whole with the collection server 102, so that the collection server has not only a collection function but also a search function.
It should be understood that the number of terminals, favorites servers, and search servers in FIG. 1 are merely illustrative. There may be any number of terminals, favorites servers, and search servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for collecting information in accordance with the present disclosure is shown. The method for collecting information comprises the following steps:
step 201, receiving a request for collecting search results sent by a terminal of an unregistered user.
In the present embodiment, an executing body (e.g., a server shown in fig. 1) of the method for collecting information may receive a request for collecting search results from a terminal with which a user browses a web page through a wired connection manner or a wireless connection manner. An unregistered user refers to a user who is not logged in to the search engine. The search results may include web addresses, pictures, and the like. If the user logs in the search engine account, the user identity can be identified by using the permanent encrypted field which can uniquely identify the user, and the user information corresponding to the database in the collection server is searched or operated by using the field. The search server and the collection server may share login information of the user. The user logs in the search engine, which is equivalent to logging in the favorite server, and the favorite of the logged-in user can be found by directly using the identity of the login.
Step 202, detecting whether an identity used for identifying an unregistered user exists in a cache of the terminal.
In this embodiment, the website stores cookies (typically encrypted) with data on the user's local terminal for purposes of user identity identification and session tracking. When the user uses the search engine anonymously, the collection server inserts cookie on the terminal, the cookie records information such as user ID, passwords, browsed webpages, residence time and the like, when the user uses the search engine again, the collection server obtains relevant information of the user by reading the cookie, corresponding actions can be made, for example, collection operation of the user who does not log in is allowed, and the webpage, the picture and the like collected by the user who does not log in can be found.
And 203, if the user identity identification does not exist, inserting an identity identification for identifying the user who does not log in into the cache of the terminal.
In this embodiment, when the user does not log in the search server, the collection server inserts a unique identifier (a flag bit shown in fig. 3 a) of a flag field into a browser cache (cookie) of the terminal to identify the user individual (if the field already exists in the browser cache, no insertion operation is performed), so that the search service provider can also identify the user individual by using the identifier of the unregistered user on the premise that the user does not log in the search engine account. Therefore, the collection server can still identify the user identity, and the identity of the user who does not log in is used for searching or operating the corresponding user information of the collection database.
And step 204, storing the search result as the collected content of the unregistered user and the identity of the unregistered user in a correlated manner.
In this embodiment, the corresponding relationship between the identity of the unregistered user and the collection content is stored in the form of a key value pair. That is, a favorite is created for the unregistered user and the search results are added to the favorite.
In some optional implementations of this embodiment, the method further includes: in response to receiving a request sent by a terminal for viewing the collected content by an unregistered user, acquiring an identity of the unregistered user from a cache of the terminal; searching for the collection content stored in association with the identity of the unregistered user; and returning the searched collection content to the terminal. I.e. to view favorites of non-logged-on users. After the identity is distributed to the users who do not log in, the operations of collecting, viewing favorites, canceling the favorites and the like can be carried out like the users who log in.
In some optional implementations of this embodiment, the method further includes: in response to receiving a request sent by a terminal for deleting target collection content from an unregistered user, acquiring an identity of the unregistered user from a cache of the terminal; and deleting the target collection content stored in association with the identity of the unregistered user. I.e. delete content that is not already needed in the favorites of the unregistered user.
In some optional implementations of this embodiment, the method further includes: in response to detecting that the favorite content stored in association with the identity of the unregistered user is not updated within a predetermined time, deleting the favorite content stored in association with the identity of the unregistered user. And setting data timeout time for the user who does not log in a database of the collection server, and if the user who does not log in does not update the collection content within a period of time, the default user does not use the browser any more or the browser cache is cleared, and the user cannot access the anonymous account again through the last unique marking field provided by the search engine. Under the condition, all data of the unregistered user under the collection server database are cleared, and the pressure of maintaining unnecessary data by the collection server database is relieved.
In some optional implementations of this embodiment, the method further includes: responding to a search request sent by a terminal of an unregistered user or a registered user, and acquiring at least one search result according to the search request; matching at least one search result with collection contents stored in association with the identity of the unregistered user or the registered user, and marking the search result which is successfully matched as collected; at least one search result is returned to the terminal along with the favorite indicia such that the favorite search results are displayed on the terminal in a predetermined manner. If the favorite server has the function of a search engine, the search result can be matched with the content in the favorite (similarity calculation) when the user inquires the search result, if the matching degree is higher than a preset value, the search result is considered to be collected by the user, and a favorite mark can be marked on the search result. The browser of the terminal can display the search results with the favorite marks in a preset mode, for example, the page of the browser displays the favorite marks, or the favorite pictures are highlighted.
In some optional implementation manners of this embodiment, if the collection server and the search server are independent servers, in the process of providing a search result for the user by the search server, the search server may obtain the user collection content in the collection server database by using the user-unregistered unique tag field or the user-logged unique tag field, store the collection content in the memory of the search server in the form of a hash table, compare the search server return data with the user collection content, tag the return data of the search server that the user has collected, and present the return data to the user. The collection data is stored in the memory of the search server, so that the high-frequency access of the search server to the cloud collection server is replaced by the single access of the collection server, and the operating pressure of the search server and the collection server is reduced. The collected data is stored in the memory of the search server in a hash table form, and the time complexity of the search server for comparing the returned data with the collected data can be minimized, so that the influence of the collection display function on the operation efficiency of the search engine is reduced to the minimum. Therefore, under the scene of mass data, the high-efficiency operation of the cloud collection server function can be still ensured, and the operation efficiency of the online search engine is not substantially influenced.
With continuing reference to FIGS. 3a-3c, FIGS. 3a-3c are diagrams of an application scenario of the method for favorites information according to the present embodiment. In the application scenario of fig. 3a, when a user finds a desired collection content in a search result, a collection service request (including but not limited to adding a collection, canceling a collection, and viewing a favorite) is sent, the user collection sends the request to a multi-tier proxy server, the proxy server selects a collection server for processing the request through a load balancing policy, and interacts with a database cluster at a back end after a series of processing is performed on the request. Because the back-end server selects the k-v type database, the operation efficiency of the back-end server is not influenced by the total amount of data stored in the database, and the back-end server can be ensured to operate efficiently in a mass data scene. And after the database is executed, returning the data to the terminal of the user according to the original path when the request is sent.
According to fig. 3b, the data of the database clusters in different areas may be updated and shared in real time, for example, a new collection data is added to the collection database in north china, and the new collection data is also updated to the south server cluster in real time.
As shown in fig. 3c, when the user searches for information, the search engine will take the collected data of the user out of the database and compare them, mark the search results collected by the user, and show them to the user.
With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for favorites information is illustrated. The process 400 of the method for collecting information includes the steps of:
step 401, receiving a request for collecting search results sent by a terminal of an unregistered user.
Step 402, detecting whether an identity mark for marking the user who does not log in exists in a cache of the terminal.
And step 403, if the user identity does not exist, inserting an identity for identifying the user who does not log in into the cache of the terminal.
And step 404, storing the search result as the collected content of the unregistered user and the identity of the unregistered user in a correlated manner.
The steps 401 and 404 are substantially the same as the steps 201 and 204, and therefore, the description thereof is omitted.
Step 405, in response to receiving a login request of a registered user sent by a terminal, acquiring an identity of an unregistered user in a cache of the terminal.
In this embodiment, if a login request of a registered user sent by a terminal is received, it may be detected whether an identity of an unregistered user exists in a cache in the terminal. If the user does not log in the favorite server, the user is allowed to log in the favorite server anonymously. That is, the unregistered user is the same person as the logged-in user.
And 406, storing the collection content stored in the identity identification association of the unregistered user and the identity identification of the registered user in an association manner.
In this embodiment, when it is detected that the unregistered user and the registered user are the same person, the contents of the favorites under the anonymous account may be merged into the favorites of the registered user. And adding the favorite contents under the account number of the unregistered user into the favorite of the registered user.
Step 407, delete the favorite content stored in association with the id of the unregistered user.
In the embodiment, all information of the unregistered user is deleted, and the pressure of maintaining unnecessary data in the database of the collection server is relieved on the premise of ensuring good user experience.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for collecting information in the present embodiment represents a step of merging the collection content of the unregistered user and the collection content of the registered user. Therefore, the scheme described in the embodiment can reduce the pressure of maintaining unnecessary data in the database of the collection server on the premise of ensuring good user experience. The method can ensure extremely high data searching efficiency in a mass data scene, thereby ensuring the operating efficiency of the cloud collection server.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of an apparatus for collecting information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the apparatus 500 for collecting information of the present embodiment includes: a receiving unit 501, a detecting unit 502, an inserting unit 503 and a storing unit 504. The receiving unit 501 is configured to receive a request for collecting search results sent by a terminal of an unregistered user; a detecting unit 502 configured to detect whether an identity identifier for identifying an unregistered user exists in a cache of a terminal; an inserting unit 503 configured to insert, if the identifier does not exist, an identity identifier for identifying the unregistered user into the cache of the terminal; a storage unit 504 configured to store the search result as the favorite content of the unregistered user in association with the identity of the unregistered user.
In this embodiment, the specific processing of the receiving unit 501, the detecting unit 502, the inserting unit 503 and the storing unit 504 of the device 500 for collecting information may refer to step 201, step 202, step 203, step 204 in the corresponding embodiment of fig. 2.
In some optional implementations of this embodiment, the apparatus 500 further comprises a viewing unit (not shown in the drawings) configured to: in response to receiving a request sent by a terminal for viewing the collected content by an unregistered user, acquiring an identity of the unregistered user from a cache of the terminal; searching for the collection content stored in association with the identity of the unregistered user; and returning the searched collection content to the terminal.
In some optional implementations of this embodiment, the apparatus 500 further comprises a deletion unit (not shown in the drawings) configured to: in response to receiving a request sent by a terminal for deleting target collection content from an unregistered user, acquiring an identity of the unregistered user from a cache of the terminal; and deleting the target collection content stored in association with the identity of the unregistered user.
In some optional implementations of this embodiment, the apparatus 500 further comprises a merging unit (not shown in the drawings) configured to: the method comprises the steps of responding to a login request of a registered user sent by a received terminal, and obtaining an identity of an unregistered user in a cache of the terminal; storing the collection content stored in association with the identity of the user who does not log in and the identity of the registered user in association; and deleting the collection content stored in association with the identity of the unregistered user.
In some optional implementations of this embodiment, the apparatus 500 further comprises a cleaning unit (not shown in the drawings) configured to: in response to detecting that the favorite content stored in association with the identity of the unregistered user is not updated within a predetermined time, deleting the favorite content stored in association with the identity of the unregistered user.
In some optional implementations of this embodiment, the apparatus 500 further comprises a marking unit (not shown in the drawings) configured to: responding to a search request sent by a terminal of an unregistered user or a registered user, and acquiring at least one search result according to the search request; matching at least one search result with collection contents stored in association with the identity of the unregistered user or the registered user, and marking the search result which is successfully matched as collected; at least one search result is returned to the terminal along with the favorite indicia such that the favorite search results are displayed on the terminal in a predetermined manner.
Referring now to FIG. 6, a block diagram of an electronic device (e.g., the favorites server of FIG. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The favorites server illustrated in FIG. 6 is merely an example, and should not impose any limitations on the functionality or scope of use of embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a request for collecting search results, which is sent by a terminal of an unregistered user; detecting whether an identity mark for marking an unregistered user exists in a cache of a terminal; if the user identity is not found, inserting an identity used for identifying the user who does not log in into a cache of the terminal; and storing the search result as the collected content of the unregistered user in association with the identity of the unregistered user.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a receiving unit, a detecting unit, an inserting unit, and a storing unit. The names of these units do not constitute a limitation to the unit itself in some cases, and for example, the receiving unit may also be described as "a unit that receives a request for collecting search results sent by a terminal of an unregistered user".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (16)

1. A method for collecting information, comprising:
receiving a request for collecting search results, which is sent by a terminal of an unregistered user;
detecting whether an identity mark for marking the unregistered user exists in a cache of the terminal;
if the user identity is not found, inserting an identity used for identifying the unregistered user into a cache of the terminal;
storing the search result as the collected content of the unregistered user and the identity of the unregistered user in a correlation manner;
in response to detecting that the favorite content stored in association with the identity of the unregistered user is not updated within a predetermined time, deleting the favorite content stored in association with the identity of the unregistered user.
2. The method of claim 1, wherein the method further comprises:
in response to receiving a request of an unregistered user for viewing collection content sent by the terminal, acquiring an identity of the unregistered user from a cache of the terminal;
searching for the collection content stored in association with the identity of the unregistered user;
and returning the searched collection content to the terminal.
3. The method of claim 1, wherein the method further comprises:
in response to receiving a request sent by the terminal for deleting target collection content from an unregistered user, acquiring an identity of the unregistered user from a cache of the terminal;
deleting the target collection content stored in association with the identity of the unregistered user.
4. The method of claim 1, wherein the method further comprises:
responding to a login request of a registered user sent by the terminal, and acquiring an identity of an unregistered user in a cache of the terminal;
storing the collection content stored in association with the identity of the unregistered user and the identity of the registered user;
and deleting the collection content stored in association with the identity of the unregistered user.
5. The method of claim 1, wherein the method further comprises:
responding to a search request sent by a terminal of a user who does not log in or a registered user, and acquiring at least one search result according to the search request;
matching the at least one search result with the collected content stored in association with the identity of the unregistered user or the registered user, and marking the search result which is successfully matched as collected;
returning the at least one search result to the terminal along with the favorite indicia such that the favorite search results are displayed on the terminal in a predetermined manner.
6. A system for collecting information, comprising:
a collection server configured when executed to implement the method of any of claims 1-4;
the search server is configured to respond to a search request sent by a terminal of an unregistered user or a registered user and return a search result to the terminal.
7. The system of claim 6, wherein the search server is further configured to:
responding to a search request sent by a terminal of a user who does not log in or a registered user, and acquiring at least one search result according to the search request;
acquiring collection content stored in association with the identity of the unregistered user or the registered user from the collection server;
storing the acquired collection content in a memory of the search server in a hash table mode;
matching the at least one search result with the collected content stored in association with the identity of the unregistered user or the registered user, and marking the search result which is successfully matched as collected;
returning the at least one search result to the terminal along with the favorite indicia such that the favorite search results are displayed on the terminal in a predetermined manner.
8. The system of claim 6, wherein the collection server selects a key-value database of the non-relational database for data storage.
9. The system of claim 6, wherein the database of the favorite server comprises a plurality of database server clusters distributed throughout, the plurality of database server clusters having a real-time cross-write function such that data updated by a user under one database server cluster can be shared to other database server clusters.
10. An apparatus for collecting information, comprising:
a receiving unit configured to receive a request for collecting search results transmitted by a terminal of an unregistered user;
a detecting unit configured to detect whether an identity identifier for identifying the unregistered user exists in a cache of the terminal;
the inserting unit is configured to insert an identity identifier for identifying the unregistered user into a cache of the terminal if the identity identifier does not exist;
a storage unit configured to store the search result as a favorite content of the unregistered user in association with an identity of the unregistered user;
a deleting unit configured to delete the favorite content stored in association with the identification of the unregistered user in response to detection that the favorite content stored in association with the identification of the unregistered user is not updated within a predetermined time.
11. The apparatus of claim 10, wherein the apparatus further comprises a viewing unit configured to:
in response to receiving a request of an unregistered user for viewing collection content sent by the terminal, acquiring an identity of the unregistered user from a cache of the terminal;
searching for the collection content stored in association with the identity of the unregistered user;
and returning the searched collection content to the terminal.
12. The apparatus of claim 10, wherein the deletion unit is further configured to:
in response to receiving a request sent by the terminal for deleting target collection content from an unregistered user, acquiring an identity of the unregistered user from a cache of the terminal;
deleting the target collection content stored in association with the identity of the unregistered user.
13. The apparatus of claim 10, wherein the apparatus further comprises a merging unit configured to:
responding to a login request of a registered user sent by the terminal, and acquiring an identity of an unregistered user in a cache of the terminal;
storing the collection content stored in association with the identity of the unregistered user and the identity of the registered user;
and deleting the collection content stored in association with the identity of the unregistered user.
14. The apparatus of claim 10, wherein the apparatus further comprises a marking unit configured to:
responding to a search request sent by a terminal of a user who does not log in or a registered user, and acquiring at least one search result according to the search request;
matching the at least one search result with the collected content stored in association with the identity of the unregistered user or the registered user, and marking the search result which is successfully matched as collected;
returning the at least one search result to the terminal along with the favorite indicia such that the favorite search results are displayed on the terminal in a predetermined manner.
15. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
16. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-5.
CN201910597659.2A 2019-07-04 2019-07-04 Method and device for collecting information Active CN110297995B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910597659.2A CN110297995B (en) 2019-07-04 2019-07-04 Method and device for collecting information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910597659.2A CN110297995B (en) 2019-07-04 2019-07-04 Method and device for collecting information

Publications (2)

Publication Number Publication Date
CN110297995A CN110297995A (en) 2019-10-01
CN110297995B true CN110297995B (en) 2022-06-14

Family

ID=68030235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910597659.2A Active CN110297995B (en) 2019-07-04 2019-07-04 Method and device for collecting information

Country Status (1)

Country Link
CN (1) CN110297995B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111371953A (en) * 2020-03-10 2020-07-03 北京字节跳动网络技术有限公司 Tag data processing method, device and storage medium
CN112333148B (en) * 2020-09-30 2023-03-07 深圳市彬讯科技有限公司 User label determination method, user label determination device, computer equipment and storage medium
CN112488803A (en) * 2020-12-16 2021-03-12 广州华多网络科技有限公司 Favorite storage access method and device, equipment and medium thereof
CN114461119A (en) * 2022-02-11 2022-05-10 北京百度网讯科技有限公司 Method and device for processing application page information, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103841188A (en) * 2014-02-27 2014-06-04 北京奇虎科技有限公司 Cookie information processing method and device in browser
CN106168967A (en) * 2016-07-04 2016-11-30 北京金山安全软件有限公司 Webpage information collection method, webpage information protection device and mobile terminal
US9858439B1 (en) * 2017-06-16 2018-01-02 OneTrust, LLC Data processing systems for identifying whether cookies contain personally identifying information

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8230089B2 (en) * 2009-03-25 2012-07-24 Digital River, Inc. On-site dynamic personalization system and method
CN101764841A (en) * 2009-12-16 2010-06-30 中兴通讯股份有限公司 Method and device for synchronizing user data
CN103186666B (en) * 2013-03-01 2017-02-08 北京百度网讯科技有限公司 Method, device and equipment for searching based on favorites
US9529928B2 (en) * 2013-03-05 2016-12-27 International Business Machines Corporation Intelligent categorization of bookmarks
CN103279527B (en) * 2013-05-30 2019-04-26 百度在线网络技术(北京)有限公司 A kind of user interest network address method for digging and device
CN103763304B (en) * 2013-12-20 2017-07-04 百度在线网络技术(北京)有限公司 A kind of method and apparatus of submission information
CN104484165B (en) * 2014-11-24 2017-10-10 北京奇虎科技有限公司 A kind of browser collection folder data processing method, browser client and system
CN104392378B (en) * 2014-12-10 2018-02-27 北京京东尚科信息技术有限公司 A kind of article that adds is to the method and system of shopping cart

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103841188A (en) * 2014-02-27 2014-06-04 北京奇虎科技有限公司 Cookie information processing method and device in browser
CN106168967A (en) * 2016-07-04 2016-11-30 北京金山安全软件有限公司 Webpage information collection method, webpage information protection device and mobile terminal
US9858439B1 (en) * 2017-06-16 2018-01-02 OneTrust, LLC Data processing systems for identifying whether cookies contain personally identifying information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Cache cookies for browser authentication;A. Juels 等;《2006 IEEE Symposium on Security and Privacy (S&P"06)》;20060619;1-5 *
基于多Agent的网络取证数据采集系统设计与实现;宋艳芳;<中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20140815(第8期);I140-543 *

Also Published As

Publication number Publication date
CN110297995A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
CN110297995B (en) Method and device for collecting information
CN107679211B (en) Method and device for pushing information
US20210314354A1 (en) Techniques for determining threat intelligence for network infrastructure analysis
US11961120B2 (en) Systems and methods for accessing first party cookies
US9436763B1 (en) Infrastructure enabling intelligent execution and crawling of a web application
US10182046B1 (en) Detecting a network crawler
US20150234891A1 (en) Method and system for providing code scanning result information
CN110198248B (en) Method and device for detecting IP address
US20150236925A1 (en) Managing cookie data
TWI579787B (en) Systems and methods for instant e-coupon distribution
US20210165911A1 (en) System and method for improving security of personally identifiable information
CN112835904A (en) Data processing method and data processing device
US9646104B1 (en) User tracking based on client-side browse history
CN108932640B (en) Method and device for processing orders
US20120203865A1 (en) Apparatus and methods for providing behavioral retargeting of content from partner websites
CN108011936B (en) Method and device for pushing information
JP6683681B2 (en) Determining the contribution of various user interactions to conversions
CN107291923B (en) Information processing method and device
US10412076B2 (en) Identifying users based on federated user identifiers
CN111488386A (en) Data query method and device
US20190286671A1 (en) Algorithmic computation of entity information from ip address
US10664546B2 (en) Techniques for URL archiving while browsing a web page
US20150347112A1 (en) Providing data analytics for cohorts
US10290022B1 (en) Targeting content based on user characteristics
CN114357280A (en) Information pushing method and device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant