CN106897407B - Information identification method and device - Google Patents

Information identification method and device Download PDF

Info

Publication number
CN106897407B
CN106897407B CN201710081967.0A CN201710081967A CN106897407B CN 106897407 B CN106897407 B CN 106897407B CN 201710081967 A CN201710081967 A CN 201710081967A CN 106897407 B CN106897407 B CN 106897407B
Authority
CN
China
Prior art keywords
resources
information
resource
occurrence probability
query information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710081967.0A
Other languages
Chinese (zh)
Other versions
CN106897407A (en
Inventor
司竹月
王志勇
刘尚堃
王建宇
潘柏宇
项青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN201710081967.0A priority Critical patent/CN106897407B/en
Publication of CN106897407A publication Critical patent/CN106897407A/en
Application granted granted Critical
Publication of CN106897407B publication Critical patent/CN106897407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to an information identification method and device. The method comprises the following steps: decomposing query information input by a user into basic information; acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set and a third occurrence probability of the resources of the first category set for all the resources; and under the condition that the fourth occurrence probability is greater than or equal to the first threshold value, identifying the query information as the first category query information. According to the method and the device for identifying the query information, the query information input by the user can be decomposed into the basic information, the occurrence probability of the query information for the resources of the first category set is further obtained, and the query information is identified as the first category query information when the occurrence probability is larger than or equal to the first threshold, so that the query information input by the user can be identified quickly.

Description

Information identification method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an information identification method and apparatus.
Background
With the increasing popularity of networks, people are increasingly querying information through networks. For query information input by a user, in the related art, it is generally identified whether the query information is hot information based on a frequency of inputting some query information by a plurality of users within a certain time, and time is required to accumulate user behaviors, so that there is a delay in the timeliness of identification. Moreover, when the query quantity of a piece of real-time query information is always small and the query quantity is accumulated insufficiently, the timeliness and the fire and heat degree of the query information cannot be identified, so that a required result cannot be returned to a user.
Disclosure of Invention
In view of this, the present disclosure provides an information identification method and apparatus, which can quickly identify query information input by a user.
According to an aspect of the present disclosure, there is provided an information identifying method including:
decomposing query information input by a user into one or more pieces of basic information;
acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
and under the condition that the fourth occurrence probability is greater than or equal to a first threshold value, identifying the query information as first-class query information.
For the above method, in one possible implementation, the resources of the first category set are resources uploaded by users related to the resources of the first category set in a first time interval.
For the above method, in one possible implementation, the method further includes:
acquiring first resource information of all resources;
acquiring second resource information of the resources of the first category set;
obtaining the first occurrence probability and the second occurrence probability based on the first resource information and the second resource information.
For the above method, in one possible implementation, the method further includes:
under the condition that the query information is first-class query information, acquiring resources queried according to the query information;
obtaining the correlation degree of the inquired resources and the inquired information;
sequencing the inquired resources according to the relevance;
and establishing a resource recommendation list according to the sequencing result.
For the above method, in a possible implementation manner, the first resource information and the second resource information respectively include one or more of the following: resource identification, title, and label.
For the above method, in one possible implementation, the resources of the first category set are news-like video resources.
According to another aspect of the present disclosure, there is provided an information identifying apparatus including:
the information decomposition module is used for decomposing the query information input by the user into one or more pieces of basic information;
a query information probability obtaining module, configured to obtain a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
and the information identification module is used for identifying the query information as the first type query information under the condition that the fourth occurrence probability is greater than or equal to a first threshold value.
For the above apparatus, in one possible implementation manner, the resources of the first category set are resources uploaded by users related to the resources of the first category set in a first time interval.
For the above apparatus, in one possible implementation manner, the apparatus further includes:
the first resource information acquisition module is used for acquiring first resource information of all resources;
the second resource information acquisition module is used for acquiring second resource information of the resources in the first category set;
a resource information probability obtaining module, configured to obtain the first occurrence probability and the second occurrence probability based on the first resource information and the second resource information.
For the above apparatus, in one possible implementation manner, the apparatus further includes:
the resource acquisition module is used for acquiring the resources inquired according to the inquiry information under the condition that the inquiry information is the first-class inquiry information;
a relevancy obtaining module, configured to obtain relevancy between the queried resource and the queried information;
the resource sorting module is used for sorting the inquired resources according to the relevance;
and the list establishing module is used for establishing a resource recommendation list according to the sequencing result.
For the above apparatus, in a possible implementation manner, the first resource information and the second resource information respectively include one or more of the following: resource identification, title, and label.
For the above apparatus, in one possible implementation, the resources of the first category set are news-like video resources.
According to another aspect of the present disclosure, there is provided an information identifying apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
decomposing query information input by a user into one or more pieces of basic information;
acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
and under the condition that the fourth occurrence probability is greater than or equal to a first threshold value, identifying the query information as first-class query information.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions therein, which when executed by a processor of a terminal and/or a server, enable the terminal and/or the server to perform an information recognition method, the method including:
decomposing query information input by a user into one or more pieces of basic information;
acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
and under the condition that the fourth occurrence probability is greater than or equal to a first threshold value, identifying the query information as first-class query information.
According to the information identification method and device, the query information input by the user can be decomposed into the basic information, the occurrence probability of the query information for the resources of the first category set is further obtained, and the query information is identified as the first category query information under the condition that the occurrence probability is larger than or equal to the first threshold value, so that the query information input by the user can be quickly identified.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flow chart illustrating an information identification method according to an example embodiment.
Fig. 2 is a flow chart illustrating an information identification method according to an example embodiment.
Fig. 3 is a flow chart illustrating an information identification method according to an example embodiment.
Fig. 4 is a block diagram illustrating an information recognition apparatus according to an exemplary embodiment.
Fig. 5 is a block diagram illustrating an information recognition apparatus according to an example embodiment.
Fig. 6 is a block diagram illustrating an information recognition apparatus according to an example embodiment.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Example 1
Fig. 1 is a flow chart illustrating an information identification method according to an example embodiment. The method can be applied to a server. As shown in fig. 1, an information identification method according to an embodiment of the present disclosure includes:
step S101, decomposing the query information input by a user into one or more pieces of basic information;
step S102, acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
step S103, under the condition that the fourth occurrence probability is greater than or equal to a first threshold value, identifying the query information as first-class query information.
According to the method and the device for identifying the query information, the query information input by the user can be decomposed into the basic information, the occurrence probability of the query information for the resources of the first category set is further obtained, and the query information is identified as the first category query information under the condition that the occurrence probability is larger than or equal to the first threshold value, so that the query information input by the user can be quickly identified.
For example, for query information (e.g., query term) input by a user, the query information may be cut into terms, and the query information may be decomposed into one or more basic information (e.g., atomic terms). For example, the query information input by the user is "XX divorce event", where XX may be a person name, and may be decomposed into three basic information of "XX", "divorce", "event". Thus, query information x can be expressed as:
x=w1+w2+…+wk+…+wn-1+wn(1)
wherein x represents query information input by a user, n represents the number of basic information constituting the query information x, and wkAnd expressing the kth basic information forming the query information x, wherein the value of k is a natural number between 1 and n. For the query information x in formula (1), the probability of occurrence (fourth probability of occurrence) of the query information x for the resources of the first category set may be calculated. Wherein the resources of the first category set may be resources of a specified category, for example, the resources of the first category set may be news video resources. The first set of categories may be represented as set N, the set of all resources may be represented as set V, and then there is set N e set V.
In a possible implementation manner, a parameter y may be set, and if the query information x is the first category query information, the value of y is 1; and if the query information x is not the first-class query information, y takes a value of 0. Therefore, the probability that the query information x is the first category of query information (i.e., the probability of occurrence of the query information x for the resources of the first category set) may be represented as p (y ═ 1| x). With the conditional probability formula, p (y ═ 1| x) can be indirectly calculated by modeling p (x | y ═ 1), as shown in formula (2).
Figure BDA0001226183830000071
In the formula(2) In this case, p (x | y ═ 1) may represent the probability of x that y is true (equal to 1), p (y ═ 1) may represent the probability of occurrence of the resources of the first category set in all the resources (the set of all the resources may be represented as the set V) (that is, the third probability of occurrence of the resources of the first category set for all the resources), and p (x) may represent the probability of occurrence of the query information x for all the resources. Can be calculated by using Bayes formula
Figure BDA0001226183830000072
As shown in equation (3):
Figure BDA0001226183830000073
in one possible implementation, based on the independent assumption of attribute conditions, the classification influence of different basic information on the query information can be considered to be independent and not influenced, which will be used for approximating the pair
Figure BDA0001226183830000074
And modeling. Thus, combining equations (2) and (3) yields equation (4):
Figure BDA0001226183830000075
wherein, p (w)k1) may represent the kth basic information w of the query information xkA second probability of occurrence, p (w), for resources of the first set of categoriesk) Can represent basic information wkThe first probability of occurrence for all resources, p (y ═ 1), may represent the probability of occurrence of the resources of the first set of categories among all resources, (i.e., the third probability of occurrence of the resources of the first set of categories for all resources). Where p (y ═ 1) can be expressed as formula (5):
Figure BDA0001226183830000081
in formula (5), m represents the number of resources of all resources, and when the resource information of the ith resource of all resources is the resource information of the resource of the first category set, y (i) is 1; conversely, y (i) is 0; the value of i is 1-m.
In a possible implementation manner, according to the independent assumption of the attribute conditions, the classification influence of different basic information on the query information can be considered to be independent from each other and not to influence each other. Representing the first set of categories as set N and the set of all resources as set V, one can consider: p (w)kY 1) represents basic information wkProbability of occurrence in set N (second probability of occurrence); p (w)k) Represents basic information wkProbability of occurrence in the set V (first probability of occurrence); p (y ═ 1) represents the probability of occurrence of the set N in the set V (third probability of occurrence). Therefore, based on the first occurrence probability of the basic information for all resources, the second occurrence probability of the basic information for the resources of the first set of categories, and the third occurrence probability of the resources of the first set of categories for all resources, a fourth occurrence probability p (y 1| x) of the query information for the resources of the first set of categories may be obtained according to equation (4). It will be appreciated by those skilled in the art that the result of the calculation of equation (4) may appear to fall outside of [0, 1 ] due to the above approximation]The case (1).
Those skilled in the art should understand that the first occurrence probability, the second occurrence probability and the third occurrence probability can express the fourth occurrence probability, but the relationship between them is not limited to that shown in formula (4), and those skilled in the art can also adopt other ways to calculate the fourth occurrence probability according to the first occurrence probability, the second occurrence probability and the third occurrence probability, which is not limited by the present disclosure.
In one possible implementation, if the fourth probability of occurrence p (y ═ 1| x) (also referred to as a query information score, query score) of the query information for the resources of the first category set is greater than or equal to a first threshold, the query information may be identified as the first category query information, where the first threshold may be a preset threshold (e.g., 0.005). The first category query information may be associated with a resource of the first category set, for example, if the probability of occurrence (fourth probability of occurrence) of the query information for the news category video resource is greater than or equal to a first threshold when the resource of the first category set is the news category video resource, the query information may be identified as news category query information associated with the news category video resource.
By the method, the query information is identified without depending on accumulated user behaviors, and whether the query information input by the user is the query information of the specific category can be judged quickly, accurately and in real time by estimating the probability that the query information is the query information of the specific category, so that the related query result is provided for the user more accurately. By taking news query information as an example, according to the embodiment of the disclosure, whether the keyword input by the user is a news query word or not can be quickly identified in real time, and the matched latest and hottest news message or news video and other query results are recommended to the user.
In one possible embodiment, the resources of the first category set may be resources uploaded by users related to the resources of the first category set during a first time interval.
Wherein, the users related to the resources of the first category set can be users who upload the resources of the first category frequently, for example.
For example, a user list (user _ list) may be obtained as the seed user set U according to whether the user frequently uploads a first category of resources (e.g., news video resources). With the seed user set U, the resources of the first category set (set N) uploaded by the user set U in the first time interval may be acquired in real time (e.g., crawling data at time intervals of 10 minutes). The first time interval may be a preset time interval, and for the news video resources, in order to ensure the update of the news video and ensure the timely identification of the timeliness words (query information) of the news, the first time interval may be set to be shorter, for example, the first time interval is 48 hours.
Fig. 2 is a flow chart illustrating an information identification method according to an example embodiment. As shown in fig. 2, in one possible implementation, the method further includes:
step S104, acquiring first resource information of all resources;
step S105, acquiring second resource information of the resources of the first category set;
step S106, obtaining the first occurrence probability and the second occurrence probability based on the first resource information and the second resource information.
For example, for all resources (set V) in the database, first resource information of all resources may be obtained, and the first resource information may include one or more of a resource identification ID, a title, and a tag.
In a possible implementation manner, as described above, the resource (set N) of the first category set uploaded by the seed user set U in the first time interval may be obtained in real time, and the second resource information of the resource of the first category set may be obtained, where the second resource information may include one or more of a resource identification ID, a title, and a tag.
In a possible implementation manner, for all basic information that may occur, from the first resource information and the second resource information, an occurrence probability of the basic information in the first resource information (e.g., resource identification ID, title, and/or label) of all resources (set V) may be calculated as a first occurrence probability of the basic information for all resources (set V); an occurrence probability of the basic information in the second resource information (e.g., resource identification ID, title, and/or label) of the resource (set N) of the first category set may be calculated as the second occurrence probability of the basic information for the resource (set N) of the first category set. When all the resources (set V) and the resource (set N) of the first category set are acquired, the occurrence probability of the second resource information of the resource (set N) of the first category set in the first resource information of all the resources (set V) may be further calculated as a third occurrence probability of the resource (set N) of the first category set for all the resources (set V), and the first occurrence probability, the second occurrence probability, and the third occurrence probability may be stored. Therefore, when the query information is identified on line in real time, the query information input by the user can be directly searched after being decomposed into the basic information, and the fourth occurrence probability of the query information can be obtained by performing simple mathematical operation according to, for example, the formula (4), so that whether the query information is the first-class query information is identified.
By the method, the first resource information and the second resource information can be obtained, the first occurrence probability and the second occurrence probability are further obtained, and the identification efficiency of the query information is improved.
Fig. 3 is a flow chart illustrating an information identification method according to an example embodiment. As shown in fig. 3, in one possible implementation, the method further includes:
step S107, under the condition that the query information is the first-class query information, acquiring resources queried according to the query information;
step S108, obtaining the correlation degree of the inquired resources and the inquired information;
step S109, sorting the inquired resources according to the relevance;
and step S110, establishing a resource recommendation list according to the sequencing result.
For example, if the query information input by the user has been identified as the first category of query information, the associated resources may be recommended to the user. The degree of correlation between the resource and the query information may be predetermined. For example, for a certain query information in the query log, a resource (e.g., a video resource) queried by using the query information may be manually labeled, and the degree of correlation between the resource and the query information may be labeled, for example, each video may be scored, with a score of 1-5, which indicates that the degree of correlation between the resource and the query information is from low to high. When the manually labeled resources reach a certain amount (for example, 800-. The present disclosure does not limit the categories and specific models of the features of the extracted annotated resources.
In one possible implementation manner, the queried resources may be sorted in order of high to low degrees of relevance according to the degrees of relevance between the resources queried by the query information input by the user and the query information input by the user. In the case of the same or similar relevancy, the latest resource can be ranked at the top in the resource recommendation list. And according to the sequencing result, a resource recommendation list can be established, the resource recommendation list is displayed to the user, and the resource recommendation list is returned to the user and the resource matched with the query information for the user to check.
By the method, the relevance between the inquired resources and the inquired information can be obtained, the inquired resources are sequenced according to the relevance, and the resource recommendation list is established, so that the user experience is improved.
Example 2
Fig. 4 is a block diagram illustrating an information recognition apparatus according to an exemplary embodiment. As shown in fig. 4, the information recognition apparatus includes: an information decomposition module 401, a query information probability acquisition module 402, and an information identification module 403.
An information decomposition module 401, configured to decompose query information input by a user into one or more pieces of basic information;
a query information probability obtaining module 402, configured to obtain a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
an information identifying module 403, configured to identify the query information as first category query information when the fourth probability of occurrence is greater than or equal to a first threshold.
In one possible implementation, the resources of the first category set are resources uploaded by users related to the resources of the first category set during a first time interval.
Fig. 5 is a block diagram illustrating an information recognition apparatus according to an example embodiment. As shown in fig. 5, in a possible implementation manner, the apparatus further includes:
a first resource information obtaining module 404, configured to obtain first resource information of all resources;
a second resource information obtaining module 405, configured to obtain second resource information of resources in the first category set;
a resource information probability obtaining module 406, configured to obtain the first occurrence probability and the second occurrence probability based on the first resource information and the second resource information.
As shown in fig. 5, in a possible implementation manner, the apparatus further includes:
a resource obtaining module 407, configured to obtain a resource queried according to the query information when the query information is first-class query information;
a relevancy obtaining module 408, configured to obtain relevancy between the queried resource and the queried information;
a resource sorting module 409, configured to sort the queried resources according to the relevance;
and a list establishing module 410, configured to establish a resource recommendation list according to the sorting result.
In one possible implementation, the first resource information and the second resource information respectively include one or more of the following: resource identification, title, and label.
In one possible implementation, the resources of the first category set are news-like video resources.
Example 3
Fig. 6 is a block diagram illustrating an information recognition apparatus 1900 according to an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to FIG. 6, the device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The device 1900 may also include a power component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided that includes instructions, such as the memory 1932 that includes instructions, which are executable by the processing component 1922 of the apparatus 1900 to perform the above-described method.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (11)

1. An information identification method, comprising:
decomposing query information input by a user into one or more pieces of basic information;
acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
identifying the query information as a first category of query information if the fourth probability of occurrence is greater than or equal to a first threshold,
wherein the all resources are a plurality of resources uploaded by all users, and the resources of the first category set are resources uploaded by users related to the resources of the first category set in a first time interval.
2. The method of claim 1, further comprising:
acquiring first resource information of all resources;
acquiring second resource information of the resources of the first category set;
obtaining the first occurrence probability and the second occurrence probability based on the first resource information and the second resource information.
3. The method of claim 1, further comprising:
under the condition that the query information is first-class query information, acquiring resources queried according to the query information;
obtaining the correlation degree of the inquired resources and the inquired information;
sequencing the inquired resources according to the relevance;
and establishing a resource recommendation list according to the sequencing result.
4. The method of claim 2, wherein the first resource information and the second resource information respectively comprise one or more of: resource identification, title, and label.
5. The method of claim 1, wherein the assets of the first set of categories are news-like video assets.
6. An information identifying apparatus, comprising:
the information decomposition module is used for decomposing the query information input by the user into one or more pieces of basic information;
a query information probability obtaining module, configured to obtain a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
an information identification module, configured to identify the query information as first category query information if the fourth probability of occurrence is greater than or equal to a first threshold,
wherein the all resources are a plurality of resources uploaded by all users, and the resources of the first category set are resources uploaded by users related to the resources of the first category set in a first time interval.
7. The apparatus of claim 6, further comprising:
the first resource information acquisition module is used for acquiring first resource information of all resources;
the second resource information acquisition module is used for acquiring second resource information of the resources in the first category set;
a resource information probability obtaining module, configured to obtain the first occurrence probability and the second occurrence probability based on the first resource information and the second resource information.
8. The apparatus of claim 6, further comprising:
the resource acquisition module is used for acquiring the resources inquired according to the inquiry information under the condition that the inquiry information is the first-class inquiry information;
a relevancy obtaining module, configured to obtain relevancy between the queried resource and the queried information;
the resource sorting module is used for sorting the inquired resources according to the relevance;
and the list establishing module is used for establishing a resource recommendation list according to the sequencing result.
9. The apparatus of claim 7, wherein the first resource information and the second resource information respectively comprise one or more of: resource identification, title, and label.
10. The apparatus of claim 6, wherein the resources of the first set of categories are news-like video resources.
11. An information identifying apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
decomposing query information input by a user into one or more pieces of basic information;
acquiring a fourth occurrence probability of the query information for the resources of the first category set based on a first occurrence probability of the basic information for all the resources, a second occurrence probability of the basic information for the resources of the first category set, and a third occurrence probability of the resources of the first category set for all the resources;
identifying the query information as a first category of query information if the fourth probability of occurrence is greater than or equal to a first threshold,
wherein the all resources are a plurality of resources uploaded by all users, and the resources of the first category set are resources uploaded by users related to the resources of the first category set in a first time interval.
CN201710081967.0A 2017-02-15 2017-02-15 Information identification method and device Active CN106897407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710081967.0A CN106897407B (en) 2017-02-15 2017-02-15 Information identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710081967.0A CN106897407B (en) 2017-02-15 2017-02-15 Information identification method and device

Publications (2)

Publication Number Publication Date
CN106897407A CN106897407A (en) 2017-06-27
CN106897407B true CN106897407B (en) 2020-06-12

Family

ID=59198259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710081967.0A Active CN106897407B (en) 2017-02-15 2017-02-15 Information identification method and device

Country Status (1)

Country Link
CN (1) CN106897407B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101175002A (en) * 2006-11-02 2008-05-07 株式会社理光 Method for discovering network resource
CN101520784A (en) * 2008-02-29 2009-09-02 富士通株式会社 Information issuing system and information issuing method
CN101996191A (en) * 2009-08-14 2011-03-30 北京大学 Method and system for searching for two-dimensional cross-media element
CN104484431A (en) * 2014-12-19 2015-04-01 合肥工业大学 Multi-source individualized news webpage recommending method based on field body

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160098737A1 (en) * 2014-10-06 2016-04-07 International Business Machines Corporation Corpus Management Based on Question Affinity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101175002A (en) * 2006-11-02 2008-05-07 株式会社理光 Method for discovering network resource
CN101520784A (en) * 2008-02-29 2009-09-02 富士通株式会社 Information issuing system and information issuing method
CN101996191A (en) * 2009-08-14 2011-03-30 北京大学 Method and system for searching for two-dimensional cross-media element
CN104484431A (en) * 2014-12-19 2015-04-01 合肥工业大学 Multi-source individualized news webpage recommending method based on field body

Also Published As

Publication number Publication date
CN106897407A (en) 2017-06-27

Similar Documents

Publication Publication Date Title
CN109271512B (en) Emotion analysis method, device and storage medium for public opinion comment information
CN108153901B (en) Knowledge graph-based information pushing method and device
CN109460513B (en) Method and apparatus for generating click rate prediction model
CN109886326B (en) Cross-modal information retrieval method and device and storage medium
CN107346336B (en) Information processing method and device based on artificial intelligence
CN109816039B (en) Cross-modal information retrieval method and device and storage medium
CN108804450B (en) Information pushing method and device
CN107506495B (en) Information pushing method and device
CN109325121B (en) Method and device for determining keywords of text
CN107944032B (en) Method and apparatus for generating information
CN107526718B (en) Method and device for generating text
CN107908662B (en) Method and device for realizing search system
CN108121814B (en) Search result ranking model generation method and device
CN110210038B (en) Core entity determining method, system, server and computer readable medium thereof
CN109190123B (en) Method and apparatus for outputting information
US20160248724A1 (en) Social Message Monitoring Method and Apparatus
CN108512674B (en) Method, device and equipment for outputting information
CN110503507B (en) Insurance product data pushing method and system based on big data and computer equipment
CN112819512B (en) Text processing method, device, equipment and medium
CN116597443A (en) Material tag processing method and device, electronic equipment and medium
CN106897407B (en) Information identification method and device
CN112733006A (en) User portrait generation method, device, equipment and storage medium
CN114036397B (en) Data recommendation method, device, electronic equipment and medium
CN115062119A (en) Government affair event handling recommendation method and device
CN112801053B (en) Video data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Applicant after: Youku network technology (Beijing) Co.,Ltd.

Address before: 100080 area a and C, 5 / F, block a, Sinosteel International Plaza, No. 8, Haidian Street, Haidian District, Beijing

Applicant before: 1VERGE INTERNET TECHNOLOGY (BEIJING) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200522

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 100080 area a and C, 5 / F, block a, Sinosteel International Plaza, No. 8, Haidian Street, Haidian District, Beijing

Applicant before: Youku network technology (Beijing) Co.,Ltd.