Information identification method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an information identification method and apparatus.
Background
With the increasing popularity of networks, people are increasingly querying information through networks. For query information input by a user, in the related art, it is generally identified whether the query information is hot information based on a frequency of inputting some query information by a plurality of users within a certain time, and time is required to accumulate user behaviors, so that there is a delay in the timeliness of identification. Moreover, when the query quantity of a piece of real-time query information is always small and the query quantity is accumulated insufficiently, the timeliness and the fire and heat degree of the query information cannot be identified, so that a required result cannot be returned to a user.
Disclosure of Invention
In view of this, the present disclosure provides an information identification method and apparatus, which can quickly identify query information input by a user.
According to an aspect of the present disclosure, there is provided an information identifying method including:
decomposing query information input by a user into one or more pieces of basic information;
acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
and under the condition that the fourth occurrence probability is greater than or equal to a first threshold value, identifying the query information as first-class query information.
For the above method, in one possible implementation, the resources of the first category set are resources uploaded by users related to the resources of the first category set in a first time interval.
For the above method, in one possible implementation, the method further includes:
acquiring first resource information of all resources;
acquiring second resource information of the resources of the first category set;
obtaining the first occurrence probability and the second occurrence probability based on the first resource information and the second resource information.
For the above method, in one possible implementation, the method further includes:
under the condition that the query information is first-class query information, acquiring resources queried according to the query information;
obtaining the correlation degree of the inquired resources and the inquired information;
sequencing the inquired resources according to the relevance;
and establishing a resource recommendation list according to the sequencing result.
For the above method, in a possible implementation manner, the first resource information and the second resource information respectively include one or more of the following: resource identification, title, and label.
For the above method, in one possible implementation, the resources of the first category set are news-like video resources.
According to another aspect of the present disclosure, there is provided an information identifying apparatus including:
the information decomposition module is used for decomposing the query information input by the user into one or more pieces of basic information;
a query information probability obtaining module, configured to obtain a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
and the information identification module is used for identifying the query information as the first type query information under the condition that the fourth occurrence probability is greater than or equal to a first threshold value.
For the above apparatus, in one possible implementation manner, the resources of the first category set are resources uploaded by users related to the resources of the first category set in a first time interval.
For the above apparatus, in one possible implementation manner, the apparatus further includes:
the first resource information acquisition module is used for acquiring first resource information of all resources;
the second resource information acquisition module is used for acquiring second resource information of the resources in the first category set;
a resource information probability obtaining module, configured to obtain the first occurrence probability and the second occurrence probability based on the first resource information and the second resource information.
For the above apparatus, in one possible implementation manner, the apparatus further includes:
the resource acquisition module is used for acquiring the resources inquired according to the inquiry information under the condition that the inquiry information is the first-class inquiry information;
a relevancy obtaining module, configured to obtain relevancy between the queried resource and the queried information;
the resource sorting module is used for sorting the inquired resources according to the relevance;
and the list establishing module is used for establishing a resource recommendation list according to the sequencing result.
For the above apparatus, in a possible implementation manner, the first resource information and the second resource information respectively include one or more of the following: resource identification, title, and label.
For the above apparatus, in one possible implementation, the resources of the first category set are news-like video resources.
According to another aspect of the present disclosure, there is provided an information identifying apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
decomposing query information input by a user into one or more pieces of basic information;
acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
and under the condition that the fourth occurrence probability is greater than or equal to a first threshold value, identifying the query information as first-class query information.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions therein, which when executed by a processor of a terminal and/or a server, enable the terminal and/or the server to perform an information recognition method, the method including:
decomposing query information input by a user into one or more pieces of basic information;
acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
and under the condition that the fourth occurrence probability is greater than or equal to a first threshold value, identifying the query information as first-class query information.
According to the information identification method and device, the query information input by the user can be decomposed into the basic information, the occurrence probability of the query information for the resources of the first category set is further obtained, and the query information is identified as the first category query information under the condition that the occurrence probability is larger than or equal to the first threshold value, so that the query information input by the user can be quickly identified.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flow chart illustrating an information identification method according to an example embodiment.
Fig. 2 is a flow chart illustrating an information identification method according to an example embodiment.
Fig. 3 is a flow chart illustrating an information identification method according to an example embodiment.
Fig. 4 is a block diagram illustrating an information recognition apparatus according to an exemplary embodiment.
Fig. 5 is a block diagram illustrating an information recognition apparatus according to an example embodiment.
Fig. 6 is a block diagram illustrating an information recognition apparatus according to an example embodiment.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Example 1
Fig. 1 is a flow chart illustrating an information identification method according to an example embodiment. The method can be applied to a server. As shown in fig. 1, an information identification method according to an embodiment of the present disclosure includes:
step S101, decomposing the query information input by a user into one or more pieces of basic information;
step S102, acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
step S103, under the condition that the fourth occurrence probability is greater than or equal to a first threshold value, identifying the query information as first-class query information.
According to the method and the device for identifying the query information, the query information input by the user can be decomposed into the basic information, the occurrence probability of the query information for the resources of the first category set is further obtained, and the query information is identified as the first category query information under the condition that the occurrence probability is larger than or equal to the first threshold value, so that the query information input by the user can be quickly identified.
For example, for query information (e.g., query term) input by a user, the query information may be cut into terms, and the query information may be decomposed into one or more basic information (e.g., atomic terms). For example, the query information input by the user is "XX divorce event", where XX may be a person name, and may be decomposed into three basic information of "XX", "divorce", "event". Thus, query information x can be expressed as:
x=w1+w2+…+wk+…+wn-1+wn(1)
wherein x represents query information input by a user, n represents the number of basic information constituting the query information x, and wkAnd expressing the kth basic information forming the query information x, wherein the value of k is a natural number between 1 and n. For the query information x in formula (1), the probability of occurrence (fourth probability of occurrence) of the query information x for the resources of the first category set may be calculated. Wherein the resources of the first category set may be resources of a specified category, for example, the resources of the first category set may be news video resources. The first set of categories may be represented as set N, the set of all resources may be represented as set V, and then there is set N e set V.
In a possible implementation manner, a parameter y may be set, and if the query information x is the first category query information, the value of y is 1; and if the query information x is not the first-class query information, y takes a value of 0. Therefore, the probability that the query information x is the first category of query information (i.e., the probability of occurrence of the query information x for the resources of the first category set) may be represented as p (y ═ 1| x). With the conditional probability formula, p (y ═ 1| x) can be indirectly calculated by modeling p (x | y ═ 1), as shown in formula (2).
In the formula(2) In this case, p (x | y ═ 1) may represent the probability of x that y is true (equal to 1), p (y ═ 1) may represent the probability of occurrence of the resources of the first category set in all the resources (the set of all the resources may be represented as the set V) (that is, the third probability of occurrence of the resources of the first category set for all the resources), and p (x) may represent the probability of occurrence of the query information x for all the resources. Can be calculated by using Bayes formula
As shown in equation (3):
in one possible implementation, based on the independent assumption of attribute conditions, the classification influence of different basic information on the query information can be considered to be independent and not influenced, which will be used for approximating the pair
And modeling. Thus, combining equations (2) and (3) yields equation (4):
wherein, p (w)k1) may represent the kth basic information w of the query information xkA second probability of occurrence, p (w), for resources of the first set of categoriesk) Can represent basic information wkThe first probability of occurrence for all resources, p (y ═ 1), may represent the probability of occurrence of the resources of the first set of categories among all resources, (i.e., the third probability of occurrence of the resources of the first set of categories for all resources). Where p (y ═ 1) can be expressed as formula (5):
in formula (5), m represents the number of resources of all resources, and when the resource information of the ith resource of all resources is the resource information of the resource of the first category set, y (i) is 1; conversely, y (i) is 0; the value of i is 1-m.
In a possible implementation manner, according to the independent assumption of the attribute conditions, the classification influence of different basic information on the query information can be considered to be independent from each other and not to influence each other. Representing the first set of categories as set N and the set of all resources as set V, one can consider: p (w)kY 1) represents basic information wkProbability of occurrence in set N (second probability of occurrence); p (w)k) Represents basic information wkProbability of occurrence in the set V (first probability of occurrence); p (y ═ 1) represents the probability of occurrence of the set N in the set V (third probability of occurrence). Therefore, based on the first occurrence probability of the basic information for all resources, the second occurrence probability of the basic information for the resources of the first set of categories, and the third occurrence probability of the resources of the first set of categories for all resources, a fourth occurrence probability p (y 1| x) of the query information for the resources of the first set of categories may be obtained according to equation (4). It will be appreciated by those skilled in the art that the result of the calculation of equation (4) may appear to fall outside of [0, 1 ] due to the above approximation]The case (1).
Those skilled in the art should understand that the first occurrence probability, the second occurrence probability and the third occurrence probability can express the fourth occurrence probability, but the relationship between them is not limited to that shown in formula (4), and those skilled in the art can also adopt other ways to calculate the fourth occurrence probability according to the first occurrence probability, the second occurrence probability and the third occurrence probability, which is not limited by the present disclosure.
In one possible implementation, if the fourth probability of occurrence p (y ═ 1| x) (also referred to as a query information score, query score) of the query information for the resources of the first category set is greater than or equal to a first threshold, the query information may be identified as the first category query information, where the first threshold may be a preset threshold (e.g., 0.005). The first category query information may be associated with a resource of the first category set, for example, if the probability of occurrence (fourth probability of occurrence) of the query information for the news category video resource is greater than or equal to a first threshold when the resource of the first category set is the news category video resource, the query information may be identified as news category query information associated with the news category video resource.
By the method, the query information is identified without depending on accumulated user behaviors, and whether the query information input by the user is the query information of the specific category can be judged quickly, accurately and in real time by estimating the probability that the query information is the query information of the specific category, so that the related query result is provided for the user more accurately. By taking news query information as an example, according to the embodiment of the disclosure, whether the keyword input by the user is a news query word or not can be quickly identified in real time, and the matched latest and hottest news message or news video and other query results are recommended to the user.
In one possible embodiment, the resources of the first category set may be resources uploaded by users related to the resources of the first category set during a first time interval.
Wherein, the users related to the resources of the first category set can be users who upload the resources of the first category frequently, for example.
For example, a user list (user _ list) may be obtained as the seed user set U according to whether the user frequently uploads a first category of resources (e.g., news video resources). With the seed user set U, the resources of the first category set (set N) uploaded by the user set U in the first time interval may be acquired in real time (e.g., crawling data at time intervals of 10 minutes). The first time interval may be a preset time interval, and for the news video resources, in order to ensure the update of the news video and ensure the timely identification of the timeliness words (query information) of the news, the first time interval may be set to be shorter, for example, the first time interval is 48 hours.
Fig. 2 is a flow chart illustrating an information identification method according to an example embodiment. As shown in fig. 2, in one possible implementation, the method further includes:
step S104, acquiring first resource information of all resources;
step S105, acquiring second resource information of the resources of the first category set;
step S106, obtaining the first occurrence probability and the second occurrence probability based on the first resource information and the second resource information.
For example, for all resources (set V) in the database, first resource information of all resources may be obtained, and the first resource information may include one or more of a resource identification ID, a title, and a tag.
In a possible implementation manner, as described above, the resource (set N) of the first category set uploaded by the seed user set U in the first time interval may be obtained in real time, and the second resource information of the resource of the first category set may be obtained, where the second resource information may include one or more of a resource identification ID, a title, and a tag.
In a possible implementation manner, for all basic information that may occur, from the first resource information and the second resource information, an occurrence probability of the basic information in the first resource information (e.g., resource identification ID, title, and/or label) of all resources (set V) may be calculated as a first occurrence probability of the basic information for all resources (set V); an occurrence probability of the basic information in the second resource information (e.g., resource identification ID, title, and/or label) of the resource (set N) of the first category set may be calculated as the second occurrence probability of the basic information for the resource (set N) of the first category set. When all the resources (set V) and the resource (set N) of the first category set are acquired, the occurrence probability of the second resource information of the resource (set N) of the first category set in the first resource information of all the resources (set V) may be further calculated as a third occurrence probability of the resource (set N) of the first category set for all the resources (set V), and the first occurrence probability, the second occurrence probability, and the third occurrence probability may be stored. Therefore, when the query information is identified on line in real time, the query information input by the user can be directly searched after being decomposed into the basic information, and the fourth occurrence probability of the query information can be obtained by performing simple mathematical operation according to, for example, the formula (4), so that whether the query information is the first-class query information is identified.
By the method, the first resource information and the second resource information can be obtained, the first occurrence probability and the second occurrence probability are further obtained, and the identification efficiency of the query information is improved.
Fig. 3 is a flow chart illustrating an information identification method according to an example embodiment. As shown in fig. 3, in one possible implementation, the method further includes:
step S107, under the condition that the query information is the first-class query information, acquiring resources queried according to the query information;
step S108, obtaining the correlation degree of the inquired resources and the inquired information;
step S109, sorting the inquired resources according to the relevance;
and step S110, establishing a resource recommendation list according to the sequencing result.
For example, if the query information input by the user has been identified as the first category of query information, the associated resources may be recommended to the user. The degree of correlation between the resource and the query information may be predetermined. For example, for a certain query information in the query log, a resource (e.g., a video resource) queried by using the query information may be manually labeled, and the degree of correlation between the resource and the query information may be labeled, for example, each video may be scored, with a score of 1-5, which indicates that the degree of correlation between the resource and the query information is from low to high. When the manually labeled resources reach a certain amount (for example, 800-. The present disclosure does not limit the categories and specific models of the features of the extracted annotated resources.
In one possible implementation manner, the queried resources may be sorted in order of high to low degrees of relevance according to the degrees of relevance between the resources queried by the query information input by the user and the query information input by the user. In the case of the same or similar relevancy, the latest resource can be ranked at the top in the resource recommendation list. And according to the sequencing result, a resource recommendation list can be established, the resource recommendation list is displayed to the user, and the resource recommendation list is returned to the user and the resource matched with the query information for the user to check.
By the method, the relevance between the inquired resources and the inquired information can be obtained, the inquired resources are sequenced according to the relevance, and the resource recommendation list is established, so that the user experience is improved.
Example 2
Fig. 4 is a block diagram illustrating an information recognition apparatus according to an exemplary embodiment. As shown in fig. 4, the information recognition apparatus includes: an information decomposition module 401, a query information probability acquisition module 402, and an information identification module 403.
An information decomposition module 401, configured to decompose query information input by a user into one or more pieces of basic information;
a query information probability obtaining module 402, configured to obtain a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
an information identifying module 403, configured to identify the query information as first category query information when the fourth probability of occurrence is greater than or equal to a first threshold.
In one possible implementation, the resources of the first category set are resources uploaded by users related to the resources of the first category set during a first time interval.
Fig. 5 is a block diagram illustrating an information recognition apparatus according to an example embodiment. As shown in fig. 5, in a possible implementation manner, the apparatus further includes:
a first resource information obtaining module 404, configured to obtain first resource information of all resources;
a second resource information obtaining module 405, configured to obtain second resource information of resources in the first category set;
a resource information probability obtaining module 406, configured to obtain the first occurrence probability and the second occurrence probability based on the first resource information and the second resource information.
As shown in fig. 5, in a possible implementation manner, the apparatus further includes:
a resource obtaining module 407, configured to obtain a resource queried according to the query information when the query information is first-class query information;
a relevancy obtaining module 408, configured to obtain relevancy between the queried resource and the queried information;
a resource sorting module 409, configured to sort the queried resources according to the relevance;
and a list establishing module 410, configured to establish a resource recommendation list according to the sorting result.
In one possible implementation, the first resource information and the second resource information respectively include one or more of the following: resource identification, title, and label.
In one possible implementation, the resources of the first category set are news-like video resources.
Example 3
Fig. 6 is a block diagram illustrating an information recognition apparatus 1900 according to an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to FIG. 6, the device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The device 1900 may also include a power component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided that includes instructions, such as the memory 1932 that includes instructions, which are executable by the processing component 1922 of the apparatus 1900 to perform the above-described method.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.