WO2019134554A1 - 内容推荐方法及装置 - Google Patents

内容推荐方法及装置 Download PDF

Info

Publication number
WO2019134554A1
WO2019134554A1 PCT/CN2018/123283 CN2018123283W WO2019134554A1 WO 2019134554 A1 WO2019134554 A1 WO 2019134554A1 CN 2018123283 W CN2018123283 W CN 2018123283W WO 2019134554 A1 WO2019134554 A1 WO 2019134554A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
feature
keyword
determining
domain
Prior art date
Application number
PCT/CN2018/123283
Other languages
English (en)
French (fr)
Inventor
刘阳阳
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to SG11202006532QA priority Critical patent/SG11202006532QA/en
Publication of WO2019134554A1 publication Critical patent/WO2019134554A1/zh
Priority to US16/911,000 priority patent/US11720572B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/282Hierarchical databases, e.g. IMS, LDAP data stores or Lotus Notes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Definitions

  • the embodiments disclosed in the present specification relate to the field of Internet technologies, and in particular, to a content recommendation method and apparatus.
  • the present specification describes a content recommendation method and apparatus for determining a feature tag of a content domain by determining a specific domain corresponding to the content information and domain knowledge information corresponding to the feature domain, and recommending the user to the user with the attribute tag of the user.
  • Content information
  • a content recommendation method includes:
  • the content recommendation information recommended to the user is determined from the content information library according to the attribute tag of the user and the feature tag.
  • the domain knowledge information includes domain level knowledge
  • the domain level knowledge includes a domain name, a category name corresponding to the domain name, and a feature word corresponding to the category name.
  • the feature word is obtained based on content corpus training in a content corpus.
  • the determining, by the domain knowledge information, the feature tag of the content information from the keyword information includes:
  • the content information includes a category label
  • the determining the specific category corresponding to the content information includes:
  • a specific category corresponding to the content information is determined according to the category tag.
  • the determining, by the feature word, the feature tag of the content information from the keyword information including:
  • the keyword information in the keyword information that matches the feature word is used as the feature tag.
  • the keyword information includes a plurality of keywords and ranking information of each keyword
  • the feature tag of the content information is determined from the keyword information according to the feature word, include:
  • the keyword information after the reordering and located within a predetermined order range is used as the feature tag of the content information.
  • the domain knowledge information includes a domain knowledge map, where the domain knowledge map includes an entity word corresponding to the domain in its first layer, and includes a physical word corresponding to the entity word in its second layer.
  • a related word, the combination of the entity word and the related word constitutes a feature combination word.
  • the determining, by the domain knowledge information, the feature tag of the content information from the keyword information includes:
  • the determining, according to the feature combination word, the feature tag of the content information from the keyword information including:
  • the keyword information in the keyword information that matches the feature combination word is used as the feature tag.
  • the keyword information includes a plurality of keywords and ranking information of each keyword, and determining, according to the feature combination words, feature tags of the content information from the keyword information.
  • the keyword information after the reordering and located within a predetermined order range is used as the feature tag of the content information.
  • the attribute tag is determined based on historical browsing content of the user.
  • the determining content recommendation information recommended by the user from the content information base includes:
  • Each content information in the candidate content information is ranked according to a preset rule, and content information whose ranking is within a preset range is used as the content recommendation information.
  • a content recommendation device in a second aspect, includes:
  • a first obtaining module configured to acquire content information in a content information base
  • a first determining module configured to determine keyword information of the content information
  • a second determining module configured to determine a specific domain corresponding to the content information
  • a second acquiring module configured to acquire domain knowledge information corresponding to the specific domain
  • a third determining module configured to determine, according to the domain knowledge information, a feature tag of the content information from the keyword information
  • a processing module configured to determine content recommendation information recommended to the user from the content information library according to the attribute tag of the user and the feature tag.
  • the domain knowledge information acquired by the second obtaining module includes domain level knowledge, where the domain level knowledge includes a domain name, a category name corresponding to the domain name, and a name of the category Corresponding feature words.
  • the feature words acquired by the second obtaining module are obtained based on content corpus training in a content corpus.
  • the third determining module specifically includes:
  • a first determining submodule configured to determine a specific category corresponding to the content information
  • a second determining submodule configured to determine, in the domain level knowledge, a specific category name corresponding to the specific category, and a feature word corresponding to the specific category name;
  • a third determining submodule determining a feature tag of the content information from the keyword information according to the feature word.
  • the content information acquired by the first obtaining module includes a category label
  • the first determining sub-module is specifically configured to:
  • a specific category corresponding to the content information is determined according to the category tag.
  • the third determining submodule is specifically configured to:
  • the keyword information in the keyword information that matches the feature word is used as the feature tag.
  • the keyword information determined by the first determining module includes a plurality of keywords and ranking information of each keyword
  • the third determining sub-module is specifically configured to:
  • the keyword information after the reordering and located within a predetermined order range is used as the feature tag of the content information.
  • the domain knowledge information acquired by the second acquiring module includes a domain knowledge map, where the domain knowledge map includes entity words corresponding to the domain in its first layer, and includes The related words corresponding to the entity words, the combination of the entity words and the related words constitute a feature combination word.
  • the third determining module specifically includes:
  • a second determining submodule configured to determine a feature combination word included in a domain knowledge map corresponding to the specific domain
  • a third determining submodule configured to determine, according to the feature combination word, a feature tag of the content information from the keyword information.
  • the third determining submodule is specifically configured to:
  • the keyword information in the keyword information that matches the feature combination word is used as the feature tag.
  • the keyword information determined by the first determining module includes a plurality of keywords and ranking information of each keyword
  • the third determining sub-module is specifically configured to:
  • the keyword information after the reordering and located within a predetermined order range is used as the feature tag of the content information.
  • the attribute tag included in the processing module is determined based on historical browsing content of the user.
  • the processing module is specifically configured to:
  • Each content information in the candidate content information is ranked according to a preset rule, and content information whose ranking is within a preset range is used as the content recommendation information.
  • a computer readable storage medium having a computer program stored thereon is provided.
  • the computer program is executed in a computer, the computer is caused to perform the method provided by any of the above-described first aspects.
  • a computing device comprising a memory and a processor.
  • An executable code is stored in the memory, and when the processor executes the executable code, the method provided by any one of the foregoing first aspects is implemented.
  • a content recommendation method and apparatus provided by the present specification first acquires content information in a content information base and determines keyword information related to the content information. And determining a specific domain corresponding to the content information, and acquiring domain knowledge information corresponding to the specific domain. Next, the feature tag of the content information is determined from the keyword information based on the domain knowledge information. Then, based on the feature tag and the attribute tag of the user, the content recommendation letter recommended by the user is determined from the content information base. By adopting this method, it is possible to recommend more accurate content information to the user.
  • FIG. 1 is a flowchart of a content recommendation method according to an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of domain level knowledge provided by an embodiment disclosed in the present specification
  • FIG. 3 is a schematic diagram of a domain knowledge map provided by an embodiment of the present disclosure.
  • FIG. 4 is a structural diagram of a content recommendation apparatus according to an embodiment of the disclosure.
  • FIG. 1 is a flowchart of a content recommendation method according to an embodiment of the disclosure.
  • the execution subject of the method may be a device having processing capabilities: a server or a system or device. As shown in FIG. 1 , the method specifically includes:
  • Step S110 Acquire content information in the content information base, and determine keyword information of the content information.
  • the content information library may include content information within the validity period.
  • the validity period may be set according to the service attribute of the service corresponding to the content information (for example, the requirement for timeliness). For example, the validity period of the content information corresponding to the news service can be set to one day. For another example, the validity period of the content information corresponding to the popular science knowledge service can be set to one month.
  • the content information may include graphic information (eg, pictures, articles, etc.) or audio and video information (eg, audio, video advertisements, etc.).
  • Determining the keyword information of the content information may include: determining text information of the content information, and determining keyword information according to the text information.
  • the content information includes a video advertisement
  • the text information may be extracted from the video and the audio information therein is converted into text information, and the keyword information of the video advertisement is determined according to the text information therein; or, the content
  • the information includes a video advertisement and a text introduction information of the video advertisement, and the keyword information of the video advertisement can be determined according to the text introduction information.
  • the content information includes an article, at which point the textual information in the article can be directly determined.
  • determining the keyword information according to the text information may include: performing at least one preprocessing on the text information, such as structural analysis, word segmentation processing, de-stop word processing, part-of-speech tagging, and named entity recognition. And using a keyword extraction algorithm to determine keyword information from the preprocessed text information.
  • the structured analysis may include analysis of the paragraph structure in the text information, for example, determining the title and body text in the text information, and the paragraph structure in the body text;
  • the word segmentation processing may include a unigram, a binary wordifier (bigram) ), ternary participle (trigram), etc.;
  • de-stop words may include removing stop words in the text information according to the preset stop word list (eg, functional words having no practical meaning: this, that,); part of speech
  • the annotation may include labeling the part of speech (eg, noun, adverb, adjective, etc.) of the word in the text information;
  • Named Entity Recognition NER
  • the keyword extraction algorithm may include a TextRank algorithm and a TF-IDF (Term Frequency–inverse Document Frequency) algorithm.
  • the pre-processed text information includes a plurality of words, as well as the location of each word in the textual information (eg, in the title or in the body), the part of the tag, and the like.
  • the keyword extraction algorithm may be used to identify the keyword information from the pre-processed text information.
  • step S120 a specific area corresponding to the content information is determined.
  • the domain information may be included in the content information.
  • determining the specific domain corresponding to the content information may include: determining a specific domain corresponding to the content information according to the domain tag.
  • the domain tag may be defined and generated by the creator of the content information for facilitating user search for the content information.
  • the domain tag included in the content information is a "travel service”, and accordingly, it can be determined that the specific domain corresponding to the content information is a travel service.
  • the specific domain corresponding to the content information may be further determined according to the keyword information determined in step S110.
  • the keyword information includes domain information, and accordingly, a specific domain corresponding to the content information may be determined based on the domain information.
  • step S130 domain knowledge information corresponding to the specific area is acquired.
  • the server may store preset domain knowledge information, where the domain knowledge information may include at least one of domain level knowledge and domain knowledge map.
  • the domain level knowledge may include a domain name, a category name corresponding to the domain name, and a feature word corresponding to the category name;
  • the domain knowledge map may include an entity word corresponding to the domain in the first layer thereof, and in the second
  • the layer includes related words corresponding to the entity words, and the entity words and the associated words corresponding thereto can be combined to form the feature combination words.
  • the domain name and category name included in the domain level knowledge can be set based on the current general knowledge system (for example, the knowledge system can include the division of fields and disciplines).
  • feature words included in the domain level knowledge can be obtained based on a large amount of content corpus in the content corpus.
  • the domain knowledge map can be obtained based on a large amount of content corpus processing in the content corpus. More specifically, first, an entity word (for example, a proper noun, etc.) corresponding to the domain can be identified by the NER, for example, a proper noun "driver's license” corresponding to the field "travel service” can be identified. Then, the related words corresponding to the entity words can be determined by means of template extraction, inter-word correlation and mutual information entropy.
  • an entity word for example, a proper noun, etc.
  • the related words corresponding to the entity words can be determined by means of template extraction, inter-word correlation and mutual information entropy.
  • the template extraction method may include setting a template (for example, the issuance of the driver's license XX), and then extracting related words (for example, new rules) from the content corpus by using the template;
  • the inter-word correlation manner may include utilizing the length as a predetermined number of characters a sliding window (eg, 5 characters) that extracts words that appear at the same time as the entity word in the sliding window, and uses words that appear in the frequency for a predetermined number of times (eg, 10 times) as related words; mutual information entropy
  • the method may include determining a similarity between the words included in the content corpus and the entity words, and using a word having a similarity higher than a preset value (eg, 0.6) as the related word.
  • the domain level knowledge and/or domain knowledge map corresponding to the particular domain determined in step S120 may be acquired.
  • the acquired domain level knowledge corresponding to the specific domain may include a plurality of category names corresponding to the specific domain, and a plurality of feature words corresponding to each of the plurality of category names.
  • the specific domain determined in step S120 is a travel service, whereby domain level knowledge as shown in FIG. 2 corresponding to the travel service can be acquired.
  • the domain name is the travel service
  • the category names corresponding to the travel service include: cars, airplanes, trains, and subways.
  • the characteristic words corresponding to the automobile include: maintenance, refueling, car washing, etc.
  • the characteristic words corresponding to the aircraft include: mileage, economy class, first class, etc. (characteristic words corresponding to other categories such as subway, train, etc. are not shown in FIG. 2).
  • a plurality of entity words may be included in a single domain, and accordingly, there may be multiple domain knowledge maps corresponding to the domain.
  • Each domain knowledge map may include an entity word at its first level and a plurality of associated words corresponding to the entity word at the second level.
  • the specific area determined in step S120 is a travel service, whereby a plurality of domain knowledge maps corresponding to the travel service can be acquired.
  • the acquired domain knowledge map may include the domain knowledge map as shown in FIG. 4.
  • the entity word is a driver's license
  • the related words corresponding to the entity word include deduction points, new rules, query violations, renewal of certificates, annual review, and so on.
  • the feature tag of the content information is determined from the keyword information according to the domain knowledge information in step S140. .
  • the keyword information in the keyword information that matches the domain knowledge information is used as the feature tag of the content information.
  • the keyword information is ranked according to the domain knowledge information, and the keyword information ranked within the preset range is used as the feature tag of the content information.
  • the domain knowledge information acquired in step S130 may include at least domain level knowledge, and determining the feature tag of the content information from the keyword information according to the domain level knowledge may include: determining a specific category corresponding to the content information.
  • determining the feature tag of the content information from the keyword information according to the domain level knowledge may include: determining a specific category corresponding to the content information.
  • a specific category name corresponding to a specific category, and a feature word corresponding to the specific category name are determined; and a feature label of the content information is determined according to the feature word.
  • the content information can include a category tag.
  • determining the specific category corresponding to the content information may include: determining a specific category corresponding to the content information according to the category label.
  • the category tag can be defined and generated by the creator of the content information for facilitating the search for the content information.
  • the category tag included in the content information is "car", and accordingly, it can be determined that the specific category corresponding to the content information is a car.
  • the specific category corresponding to the content information may be further determined according to the keyword information determined in step S110.
  • the keyword information includes category information, and accordingly, a specific category corresponding to the content information may be determined based on the category information.
  • the determined specific category corresponding to the content information is a car
  • the domain level knowledge acquired in step S130 is as shown in FIG. 2. According to this, in the domain level knowledge, it is possible to determine that the specific category name corresponding to the specific category (car) is a car, and the feature words corresponding to the category name (car) include: maintenance, fueling, and car washing.
  • determining the feature tag of the content information according to the feature word may include: using the keyword information in the keyword information that matches the feature word as the feature tag.
  • the determined feature words include: maintenance, fueling, car washing, etc., whereby the feature tags including maintenance and car wash can be determined from keyword information (eg, including: maintenance, car wash, etc.).
  • the keyword information may include a plurality of keywords and weight information of the respective keywords.
  • determining the feature tag of the content information according to the feature word may include: updating the weight of the plurality of keywords according to the matching situation of each keyword and the feature word; and using the keyword whose updated weight is greater than a preset threshold Feature tag for content information.
  • the weight value of the keyword may be increased by a first preset value (eg, 0.1); when the keyword (eg, automatic) When the car wash is matched with a feature word (car wash), the weight value of the keyword may be increased by a second preset value (eg, 0.05); when the keyword does not match all the feature words, the key may be maintained. The original weight value of the word. In this way, the weight values of the respective keywords can be updated. For the keyword whose weight is updated, it is determined whether the final weight is greater than a preset threshold (for example, 0.5), and the keyword whose weight value is greater than the preset threshold is used as the feature tag.
  • a preset threshold for example, 0.5
  • the keyword information may further include a plurality of keywords and ranking information corresponding to each keyword (eg, may be sorted according to weights).
  • determining the feature tag of the content information according to the feature word may include: reordering the plurality of keywords according to the matching situation of each keyword and the feature word and the original sorting information; The keyword information within the sequence range is used as a feature tag of the content information.
  • the predetermined sequence range may be set in advance or modified in real time.
  • the ranking information may include weight values of the respective keywords, and the weight values of the keywords may be updated according to the matching situation of the respective keywords and the feature words (eg, whether they match).
  • the way to update the weight value can be as described in the previous example.
  • the plurality of keywords are sorted according to the updated weight value, and the keyword information located in the predetermined order range (eg, the top ten) is used as the feature tag.
  • the domain knowledge information acquired in step S130 may include at least a domain knowledge map, and determining a feature tag of the content information from the keyword information according to the domain knowledge map may include: determining that the domain knowledge map includes a feature combination word; a feature tag for determining content information from the keyword information according to the feature combination word.
  • a domain knowledge map including the one shown in FIG. 4 can be acquired in step S130.
  • the entity words in Figure 4 are driver's licenses, and the related words corresponding to the driver's license include: deduction points, new rules, query violations, renewal of certificates and annual examinations.
  • the characteristic combination words include: “driver's license - deduction points”, “driver's license - new rules”, “driver's license - query violation”, “driver's license - renewal certificate” and “driver's license - year Review” and so on.
  • determining the feature tag of the content information according to the feature combination word may include: using the keyword information in the keyword information that matches the feature combination word as the feature tag.
  • the identified combination of words includes: driver's license - deduction points, driver's license - new rules, driver's license - query violations, driver's license - renewal and driver's license - annual review, etc., according to which keyword information (eg, Including: driver's license, deduction points, new rules, driver's license annual review, etc.) to determine the characteristics of the driver's license - deduction points, driver's license - new rules, driver's license - annual review.
  • keyword information eg, Including: driver's license, deduction points, new rules, driver's license annual review, etc.
  • the keyword information may include a plurality of keywords and weight information of the respective keywords.
  • determining the feature tag of the content information according to the feature combination word may include: updating the weight of the plurality of keywords according to the matching situation of each keyword and the feature combination word; and keying the updated weight to the preset threshold The word is used as a feature tag for content information.
  • the keyword information may include a plurality of keywords and ranking information corresponding to each keyword (eg, may be sorted according to weights).
  • determining the feature tag of the content information according to the feature combination word may include: reordering the plurality of keywords according to the matching situation of each keyword and the feature combination word and the original sorting information; The keyword information located within the predetermined order range is used as a feature tag of the content information.
  • the ranking information may include the weight value of each keyword, and the weight value of the keyword may be updated according to the matching situation of each keyword and the feature combination word (eg, whether it matches). For example, when a keyword (eg, driver's license-deduction) matches a certain feature combination (driver's license-deduction), the weight value of the keyword may be increased to a first preset value (eg, 0.1). When a keyword (eg, a driver's license) is partially matched with a certain feature word (driver's license - deduction), the weight value of the keyword may be increased by a second preset value (eg, 0.05); When the word does not match all the feature combinations, the original weight value of the keyword can be maintained. Then, the plurality of keywords are sorted according to the updated weight value, and the keyword information located in the predetermined order range (eg, the first five digits) is used as the feature tag.
  • a keyword eg, driver's license-deduction
  • the weight value of the keyword may be
  • step S150 the content recommendation information recommended to the user is determined from the content information library based on the attribute tag and the feature tag of the user.
  • the attribute tag of the user may be determined based on the history browsing content of the user.
  • the attribute tag of the user may be determined based on the feature tag of the user's history browsing content.
  • the feature tag can be obtained by performing the above steps S110-S140, except that the content information in the content information library is not acquired in step S110, but the history browsing content of the user is acquired.
  • determining the content recommendation information recommended by the user from the content information base may include: using the content information corresponding to the feature tag that matches the attribute tag as the content recommendation information.
  • the content information matching the feature tag and the attribute tag is further filtered according to a preset rule to determine the final content recommendation information.
  • determining, from the content information base, the content recommendation information recommended to the user includes: the content information corresponding to the feature tag matching the attribute tag as the candidate content information recommended to the user; The rule is to rank each content information in the candidate content information, and the content information whose ranking is within the preset range is used as the content recommendation information.
  • the preset rule may include a weight value of the feature tag
  • ranking each content information in the candidate content information according to the preset rule may include: selecting feature tags and features corresponding to each content information in the candidate content information.
  • the weight value of the label is determined by weighted summation, and the content information is ranked according to the score.
  • the preset range may be set according to a business rule related to the content information, for example, the content information belongs to the technology information, and the business rule may include the content information of the top five rankings recommended to the user.
  • the content information belongs to a music column, and the business rules thereof may include recommending the content information ranked in the top ten to the user.
  • the content recommendation information recommended to the user may be directly determined from the content information base.
  • the similarity between the multiple users may be determined according to the attribute tags of the multiple users including the first user, and the similarity with the first user is determined from the plurality of users to be within a preset threshold range.
  • Second user based on the browsing records of the content information in the content information database by the plurality of second users, the content recommendation information recommended to the first user is determined from the content information library.
  • a single piece of content information can be associated with multiple domains.
  • a plurality of specific areas corresponding to the content information may be determined, and in step S130, domain knowledge information corresponding to each specific domain (eg, domain level knowledge and/or domain knowledge map) may be acquired. .
  • the domain level knowledge corresponding to each specific domain may be acquired in step S130, and then the specific category corresponding to the specific information in the specific domain and the feature words corresponding to each specific category are determined in step S140. And determining a feature tag of the content information based on the feature word.
  • the domain knowledge map corresponding to each specific domain may be acquired in step S130, and then the feature combination words included in the knowledge maps of the respective domains are determined in step S140, and the content information is determined according to the feature combination words.
  • Feature tag the domain knowledge map corresponding to each specific domain
  • the feature words in the domain level knowledge include a single word
  • the feature combination words in the domain knowledge map include a combination of at least two words. Words.
  • the feature words in the domain level knowledge are mainly words that are strongly related to the category (that is, one or several categories that can be explicitly inferred according to the word) (eg, the feature word "car wash” usually belongs to the car category) And for some words that exist in all categories but have different semantics under different categories (eg, new rules) may not be set as feature words.
  • the feature information with clear semantics in the domain can be determined by extracting the combined words (eg, driver's license-new rules).
  • the domain level knowledge or the domain knowledge map may be used alone, or the domain level knowledge and the domain knowledge map may be combined to determine the feature label of the content information, thereby determining the content recommendation information recommended to the user. .
  • content information in the content information database is acquired, and keyword information related to the content information is determined. And determining a specific domain corresponding to the content information, and acquiring domain knowledge information corresponding to the specific domain.
  • the feature tag of the content information is determined from the keyword information based on the domain knowledge information.
  • a content recommendation letter recommended to the user is determined from the content information base.
  • the embodiments disclosed in the present specification further provide a content recommendation device.
  • the device 400 includes:
  • the first obtaining module 410 is configured to obtain content information in the content information base
  • a first determining module 420 configured to determine keyword information of the content information
  • a second determining module 430 configured to determine a specific domain corresponding to the content information
  • a second obtaining module 440 configured to acquire domain knowledge information corresponding to a specific domain
  • a third determining module 450 configured to determine, according to domain knowledge information, a feature tag of the content information from the keyword information
  • the processing module 460 is configured to determine content recommendation information recommended to the user from the content information base according to the attribute tag and the feature tag of the user.
  • the keyword extraction algorithm in the determining submodule includes at least one of a TF-IDF algorithm and a TextRank algorithm.
  • the content information acquired by the first obtaining module 410 includes a domain label
  • the second determining module 430 is specifically configured to:
  • a specific field corresponding to the content information is determined based on the domain tag.
  • the domain knowledge information acquired by the second obtaining module 440 includes domain level knowledge, and the domain level knowledge includes a domain name, a category name corresponding to the domain name, and a feature word corresponding to the category name.
  • the feature words acquired by the second obtaining module 440 are obtained based on the content corpus training in the content corpus.
  • the third determining module 450 specifically includes:
  • a first determining submodule 451, configured to determine a specific category corresponding to the content information
  • a second determining submodule 452 configured to determine, in the domain level knowledge, a specific category name corresponding to the specific category, and a feature word corresponding to the specific category name;
  • the third determining sub-module 453 determines the feature tag of the content information from the keyword information according to the feature word.
  • the content information acquired by the first obtaining module 410 includes a category label
  • the first determining submodule 451 is specifically configured to:
  • a specific category corresponding to the content information is determined according to the category tag.
  • the third determining submodule 453 is specifically configured to:
  • the keyword information in the keyword information that matches the feature word is used as the feature tag.
  • the keyword information determined by the first determining module 420 includes a plurality of keywords and ranking information of each keyword
  • the third determining sub-module 453 is specifically configured to:
  • the keyword information that has been reordered and located within a predetermined order is used as a feature tag of the content information.
  • the domain knowledge information acquired by the second obtaining module 440 includes a domain knowledge map, where the domain knowledge map includes entity words corresponding to the domain in its first layer, and includes corresponding entity words in the second layer thereof.
  • the related words, the combination of entity words and related words constitute a feature combination word.
  • the third determining module 450 specifically includes:
  • a second determining submodule 452 configured to determine a feature combination word included in a domain knowledge map corresponding to a specific domain
  • the third determining sub-module 453 is configured to determine a feature tag of the content information from the keyword information according to the feature combination word.
  • the third determining submodule 453 is specifically configured to:
  • the keyword information in the keyword information that matches the feature combination word is used as the feature tag.
  • the keyword information determined by the first determining module 420 includes a plurality of keywords and ranking information of each keyword
  • the third determining sub-module 453 is specifically configured to:
  • the keyword information that has been reordered and located within a predetermined order is used as a feature tag of the content information.
  • the attribute tag included in the processing module 460 is determined based on the user's historical browsing content.
  • processing module 460 is specifically configured to:
  • the content information corresponding to the feature tag matching the attribute tag is used as candidate content information recommended to the user;
  • Each piece of content information in the candidate content information is ranked according to a preset rule, and content information whose ranking is within a preset range is used as the content recommendation information.
  • the first acquisition module 410 acquires content information in the content information repository, and the first determination module 420 determines keywords related to the content information.
  • the second determining module 430 determines a specific domain corresponding to the content information, and the second obtaining module 440 acquires domain knowledge information corresponding to the specific domain.
  • the third determining module 450 determines the feature tag of the content information from the keyword information according to the domain knowledge information.
  • the processing module 460 determines a content recommendation letter recommended to the user from the content information base according to the feature tag and the attribute tag of the user.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Library & Information Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种内容推荐方法,该方法包括:获取内容信息库中的内容信息,并确定与该内容信息相关的关键词信息(S110);确定与该内容信息对应的特定领域(S120);获取与该特定领域对应的领域知识信息(S130);根据领域知识信息从关键词信息中确定内容信息的特征标签(S140);根据用户的属性标签和特征标签,从内容信息库中确定向用户推荐的内容推荐信息(S150)。

Description

内容推荐方法及装置 技术领域
本说明书披露的多个实施例涉及互联网技术领域,尤其涉及一种内容推荐方法及装置。
背景技术
随着互联网技术的发展,人们越来越频繁地浏览网络平台提供的内容信息。例如,在网络购物平台中浏览商品信息,或者在新闻平台浏览热点信息,或者在理财平台浏览理财资讯,或者在支付平台浏览支付服务信息等。
不同用户在使用同一网络平台时,对其提供的内容信息的需求有着或多或少的差异。另一方面,网络平台中信息的海量增长也常常让用户难以选择。目前,向用户推荐的内容信息由于存在不够精准等不足,难以满足用户的个性化需求。因此,需要提供一种合理的方法,以满足用户浏览网络平台中提供的内容信息的多种需求。
发明内容
本说明书描述了一种内容推荐方法及装置,通过确定与内容信息对应的特定领域以及与特征领域对应的领域知识信息,进而确定内容信息的特征标签,并结合用户的属性标签向用户推荐更加精准的内容信息。
第一方面,提供了一种内容推荐方法。该方法包括:
获取内容信息库中的内容信息,并确定所述内容信息的关键词信息;
确定与所述内容信息对应的特定领域;
获取与所述特定领域对应的领域知识信息;
根据所述领域知识信息从所述关键词信息中确定所述内容信息的特征标签;
根据用户的属性标签和所述特征标签,从所述内容信息库中确定向所述用户推荐的内容推荐信息。
在一种可能的实施方式中,所述领域知识信息包括领域层级知识,所述领域层级知识包括领域名称,与所述领域名称对应的类别名称,以及与所述类别名称对应的特征单 词。
在一种可能的实施方式中,所述特征单词基于内容语料库中的内容语料训练而获得。
在一种可能的实施方式中,所述根据所述领域知识信息从所述关键词信息中确定所述内容信息的特征标签包括:
确定与所述内容信息对应的特定类别;
在所述领域层级知识中,确定与所述特定类别对应的特定类别名称,以及与所述特定类别名称对应的特征单词;
根据所述特征单词从所述关键词信息中确定所述内容信息的特征标签。
在一种可能的实施方式中,所述内容信息包括类别标签,所述确定与所述内容信息对应的特定类别,包括:
根据所述类别标签确定与所述内容信息对应的特定类别。
在一种可能的实施方式中,所述根据所述特征单词从所述关键词信息中确定所述内容信息的特征标签,包括:
将所述关键词信息中与所述特征单词匹配的关键词信息作为所述特征标签。
在一种可能的实施方式中,所述关键词信息包括多个关键词和各个关键词的排序信息,所述根据所述特征单词从所述关键词信息中确定所述内容信息的特征标签,包括:
根据所述关键词信息中所述各个关键词与所述特征单词的匹配情况,以及所述排序信息,对所述多个关键词重新排序;
将经过所述重新排序后、位于预定顺序范围内的关键词信息作为所述内容信息的特征标签。
在一种可能的实施方式中,所述领域知识信息包括领域知识图谱,所述领域知识图谱在其第一层包括与领域对应的实体词,在其第二层包括与所述实体词对应的关联词,所述实体词和所述关联词组合构成特征组合词。
在一种可能的实施方式中,所述根据所述领域知识信息从所述关键词信息中确定所述内容信息的特征标签包括:
确定与所述特定领域对应的领域知识图谱中包括的特征组合词;
根据所述特征组合词从所述关键词信息中确定所述内容信息的特征标签。
在一种可能的实施方式中,所述根据所述特征组合词从所述关键词信息中确定所述内容信息的特征标签,包括:
将所述关键词信息中与所述特征组合词匹配的关键词信息作为所述特征标签。
在一种可能的实施方式中,所述关键词信息包括多个关键词和各个关键词的排序信息,所述根据所述特征组合词从所述关键词信息中确定所述内容信息的特征标签,包括:
根据所述关键词信息中所述各个关键词与所述特征组合词的匹配情况,以及所述排序信息,对所述多个关键词重新排序;
将经过所述重新排序后、位于预定顺序范围内的关键词信息作为所述内容信息的特征标签。
在一种可能的实施方式中,所述属性标签基于所述用户的历史浏览内容确定。
在一种可能的实施方式中,所述从内容信息库中确定向所述用户推荐的内容推荐信息,包括:
将与所述属性标签匹配的特征标签所对应的内容信息,作为向所述用户推荐的候选内容信息;
根据预设规则对所述候选内容信息中的各个内容信息进行排名,并将名次在预设范围内的内容信息作为所述内容推荐信息。
第二方面,提供了一种内容推荐装置。该装置包括:
第一获取模块,用于获取内容信息库中的内容信息;
第一确定模块,用于确定所述内容信息的关键词信息;
第二确定模块,用于确定与所述内容信息对应的特定领域;
第二获取模块,用于获取与所述特定领域对应的领域知识信息;
第三确定模块,用于根据所述领域知识信息从所述关键词信息中确定所述内容信息的特征标签;
处理模块,用于根据用户的属性标签和所述特征标签,从所述内容信息库中确定向所述用户推荐的内容推荐信息。
在一种可能的实施方式中,所述第二获取模块获取的领域知识信息包括领域层级知识,所述领域层级知识包括领域名称,与所述领域名称对应的类别名称,以及与所述类别名称对应的特征单词。
在一种可能的实施方式中,所述第二获取模块获取的所述特征单词基于内容语料库中的内容语料训练而获得。
在一种可能的实施方式中,所述第三确定模块具体包括:
第一确定子模块,用于确定与所述内容信息对应的特定类别;
第二确定子模块,用于在所述领域层级知识中,确定与所述特定类别对应的特定类别名称,以及与所述特定类别名称对应的特征单词;
第三确定子模块,根据所述特征单词从所述关键词信息中确定所述内容信息的特征标签。
在一种可能的实施方式中,所述第一获取模块获取的内容信息包括类别标签,所述第一确定子模块具体用于:
根据所述类别标签确定与所述内容信息对应的特定类别。
在一种可能的实施方式中,所述第三确定子模块具体用于:
将所述关键词信息中与所述特征单词匹配的关键词信息作为所述特征标签。
在一种可能的实施方式中,所述第一确定模块确定的关键词信息中包括多个关键词和各个关键词的排序信息,所述第三确定子模块具体用于:
根据所述关键词信息中所述各个关键词与所述特征单词的匹配情况,以及所述排序信息,对所述多个关键词重新排序;
将经过所述重新排序后、位于预定顺序范围内的关键词信息作为所述内容信息的特征标签。
在一种可能的实施方式中,所述第二获取模块获取的领域知识信息包括领域知识图谱,所述领域知识图谱在其第一层包括与领域对应的实体词,在其第二层包括与所述实体词对应的关联词,所述实体词和所述关联词组合构成特征组合词。
在一种可能的实施方式中,所述第三确定模块具体包括:
第二确定子模块,用于确定与所述特定领域对应的领域知识图谱中包括的特征组合 词;
第三确定子模块,用于根据所述特征组合词从所述关键词信息中确定所述内容信息的特征标签。
在一种可能的实施方式中,所述第三确定子模块具体用于:
将所述关键词信息中与所述特征组合词匹配的关键词信息作为所述特征标签。
在一种可能的实施方式中,所述第一确定模块确定的关键词信息中包括多个关键词和各个关键词的排序信息,所述第三确定子模块具体用于:
根据所述关键词信息中所述各个关键词与所述特征组合词的匹配情况,以及所述排序信息,对所述多个关键词重新排序;
将经过所述重新排序后、位于预定顺序范围内的关键词信息作为所述内容信息的特征标签。
在一种可能的实施方式中,所述处理模块中包括的属性标签基于所述用户的历史浏览内容确定。
在一种可能的实施方式中,所述处理模块具体用于:
将与所述属性标签匹配的特征标签所对应的内容信息,作为向所述用户推荐的候选内容信息;
根据预设规则对所述候选内容信息中的各个内容信息进行排名,并将名次在预设范围内的内容信息作为所述内容推荐信息。
第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序。当所述计算机程序在计算机中执行时,令计算机执行上述第一方面中任一种实施方式提供的方法。
第四方面,提供了一种计算设备,包括存储器和处理器。所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现上述第一方面中任一种实施方式提供的方法。
本说明书提供的一种内容推荐方法及装置,首先,获取内容信息库中的内容信息,并确定与该内容信息相关的关键词信息。以及,确定与该内容信息对应的特定领域,并获取与该特定领域对应的领域知识信息。接着,根据该领域知识信息从关键词信息中确定内容信息的特征标签。然后,根据该特征标签和用户的属性标签,从内容信息库中确 定向用户推荐的内容推荐信。通过采用这种方式,实现向用户推荐更加精准的内容信息。
附图说明
为了更清楚地说明本说明书披露的多个实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书披露的多个实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本说明书披露的一个实施例提供的一种内容推荐方法的流程图;
图2为本说明书披露的一个实施例提供的领域层级知识的示意图;
图3为本说明书披露的一个实施例提供的领域知识图谱的示意图;
图4为本说明书披露的一个实施例提供的一种内容推荐装置的结构图。
具体实施方式
下面结合附图,对本说明书披露的多个实施例进行描述。
图1为本说明书披露的一个实施例提供的一种内容推荐方法的流程图。所述方法的执行主体可以为具有处理能力的设备:服务器或者系统或者装置。如图1所示,所述方法具体包括:
步骤S110,获取内容信息库中的内容信息,并确定该内容信息的关键词信息。
具体地,内容信息库中可以包括在有效期内的内容信息。其中,有效期可以根据内容信息所对应业务的业务属性(如,对时效性的要求)设定。例如,可以将新闻业务所对应的内容信息的有效期设置为1天。又例如,可以将科普知识业务所对应的内容信息的有效期设置为1个月。
内容信息可以包括图文信息(如,图片、文章等)或音视频信息(如,音频、视频广告等)。
确定内容信息的关键词信息,可以包括:确定内容信息的文本信息,并根据文本信息确定关键词信息。
在一个实施例中,内容信息包括视频广告,此时,可以从视频中提取文字信息以及 将其中的音频信息转化为文字信息,并根据其中的文字信息确定视频广告的关键词信息;或者,内容信息包括视频广告和该视频广告的文本介绍信息,则可以根据文本介绍信息确定视频广告的关键词信息。
在另一个实施例中,内容信息包括文章,此时,可以直接确定该文章中的文本信息。
进一步地,根据文本信息确定关键词信息,可以包括:对文本信息进行结构化分析、分词处理、去停用词处理、词性标注和命名实体识别中的至少一种预处理。以及采用关键词提取算法,从经过预处理后的文本信息中确定关键词信息。
其中,结构化分析可以包括对文本信息中段落结构的分析,例如,判断出文本信息中的标题和正文,以及正文中的段落结构;分词处理可以包括一元分词(unigram)、二元分词(bigram)、三元分词(trigram)等;去停用词可以包括根据预设的停用词表去除文本信息中的停用词(如,无实际意义的功能词:这、那、的);词性标注可以包括对文本信息中的词语的词性(如,名词、副词、形容词等)进行标注;命名实体识别(Named Entity Recognition,简称NER)可以包括识别文本信息中具有特定意义的实体(如,人名、地名、机构名、专有名词等);关键词提取算法可以包括TextRank算法和TF-IDF(Term Frequency–inverse Document Frequency)算法等。
在一个例子中,经过预处理后的文本信息中包括多个词语,以及各个词语在文本信息中的位置(如,位于标题中或位于正文中)、标注的词性等。相应地,可以采用关键词提取算法,从经过预处理后的文本信息中加权识别出关键词信息。
步骤S120,确定与内容信息对应的特定领域。
在一个实施例中,内容信息中可以包括领域标签。相应地,确定与内容信息对应的特定领域,可以包括:根据领域标签确定与内容信息对应的特定领域。其中,领域标签可以由内容信息的创建者为该内容信息便于用户搜索而定义生成。
在一个例子中,内容信息中包括的领域标签为“出行服务”,相应地可以确定出与该内容信息对应的特定领域为出行服务。
在另一个实施例中,可以根据步骤S110中确定出的关键词信息,进一步确定与内容信息对应的特定领域。在一个例子中,关键词信息中包括领域信息,相应地,可以根据领域信息确定出与内容信息对应的特定领域。
在步骤S120中确定与内容信息对应的特定领域后,接着,在步骤S130,获取与特 定领域对应的领域知识信息。
具体地,服务器中可以存储有预先设定的领域知识信息,该领域知识信息可以包括领域层级知识和领域知识图谱中的至少一种。其中,领域层级知识可以包括领域名称,与该领域名称对应的类别名称,以及与该类别名称对应的特征单词;领域知识图谱可以在其第一层包括与领域对应的实体词,在其第二层包括与实体词对应的关联词,且实体词和与之对应的关联词可以组合构成特征组合词。
需要说明的是,领域层级知识中包括的领域名称和类别名称可以基于目前通用的知识体系(如,知识体系中可以包括领域和学科的划分)进行设定。此外,领域层级知识中包括的特征单词,可以基于内容语料库中的大量内容语料进行训练而获得。
领域知识图谱可以基于内容语料库中的大量内容语料处理而获得。更具体地,首先,可以通过NER识别与领域对应的实体词(如,专有名词等),例如,可以识别出与领域“出行服务”对应的专有名词“驾驶证”等。然后,可以通过模板提取、词间相关性和互信息熵等方式确定与实体词对应的关联词。其中,模板提取方式可以包括设定一个模板(如,驾驶证XX的出台),然后利用模板从内容语料中提取关联词(如,新规);词间相关性方式可以包括利用长度为预定字符数(如,5个字符)的滑动窗口,提取出在滑动窗口中与实体词同时出现的词语,并将这些词语中出现频次达到预定次数(如,10次)的词语作为关联词;互信息熵方式可以包括确定内容语料包括的词语中与实体词之间的相似度,并将相似度高于预设值(如,0.6)的词语作为关联词。
在一个实施例中,可以获取与步骤S120中确定的特定领域对应的领域层级知识和/领域知识图谱。
在一个实施例中,获取的与特定领域对应的领域层级知识中,可以包括与该特定领域对应的多个类别名称,以及与多个类别名称中各个类别名称对应的多个特征单词。
在一个例子中,在步骤S120中确定的特定领域为出行服务,据此可以获取与出行服务对应的如图2所示的领域层级知识。图2中,领域名称为出行服务,与出行服务对应的类别名称包括:汽车、飞机、火车和地铁等。与汽车对应特征单词包括:保养、加油、洗车等,与飞机对应的特征单词包括:里程、经济舱、头等舱等(图2中未示出与地铁、火车等其他类别对应的特征单词)。
在一个实施例中,单个领域中可以包括多个实体词,相应地,与该领域对应的领域知识图谱可以有多个。各个领域知识图谱在其第一层可以包括实体词,在第二层可 以包括与该实体词对应的多个关联词。
在一个例子中,在步骤S120中确定的特定领域为出行服务,据此可以获取与出行服务对应的多个领域知识图谱。例如,获取的多个领域知识图谱中可以包括如图4所示的领域知识图谱。图4中,实体词是驾驶证,与实体词对应的关联词包括扣分、新规、查询违规、换证、年审等。
在步骤S110中确定出内容信息的关键词信息,以及在步骤S130中获取到与特定领域对应的领域知识信息后,在步骤S140中,根据领域知识信息从关键词信息中确定内容信息的特征标签。
具体地,将关键词信息中与领域知识信息相匹配的关键词信息作为内容信息的特征标签。或者,根据领域知识信息对关键词信息进行排名,并将排名在预设范围内的关键词信息作为内容信息的特征标签。
在一个实施例中,在步骤S130中获取的领域知识信息可以至少包括领域层级知识,根据该领域层级知识从关键词信息中确定内容信息的特征标签,可以包括:确定与内容信息对应的特定类别;在领域层级知识中,确定与特定类别对应的特定类别名称,以及与特定类别名称对应的特征单词;根据特征单词确定内容信息的特征标签。
在一个例子中,内容信息可以包括类别标签。相应地,确定与内容信息对应的特定类别,可以包括:根据类别标签确定与内容信息对应的特定类别。其中,类别标签可以由内容信息的创建者为该内容信息便于用于搜索而定义生成。例如,内容信息中包括的类别标签为“汽车”,相应地可以确定出与该内容信息对应的特定类别为汽车。
在另一个例子中,可以根据步骤S110中确定出的关键词信息,进一步确定与内容信息对应的特定类别。在一个例子中,关键词信息中包括类别信息,相应地,可以根据类别信息确定出与内容信息对应的特定类别。
在一个例子中,确定的与内容信息对应的特定类别为汽车,在步骤S130中获取的领域层级知识如图2所示。据此,可以在领域层级知识中,确定与特定类别(汽车)对应的特定类别名称为汽车,以及与类别名称(汽车)对应的特征单词包括:保养、加油和洗车等。
在一个例子中,根据特征单词确定内容信息的特征标签,可以包括:将关键词信息中与特征单词匹配的关键词信息作为特征标签。例如,确定的特征单词包括:保养、 加油、洗车等,据此可以从关键词信息(如,包括:保养、洗车等)确定出包括保养、洗车的特征标签。
在另一例子中,关键词信息中可以包括多个关键词以及各个关键词的权重信息。相应地,根据特征单词确定内容信息的特征标签,可以包括:根据各个关键词与特征单词的匹配情况,更新所述多个关键词的权重;将更新后的权重大于预设阈值的关键词作为内容信息的特征标签。比如说,当关键词(如,洗车)与某个特征单词(洗车)完全匹配时,可以将该关键词的权重值增加第一预设值(如,0.1);当关键词(如,自动洗车)与某个特征单词(洗车)部分匹配时,可以将该关键词的权重值增加第二预设值(如,0.05);当关键词与所有特征单词均不匹配时,可以保持该关键词原有的权重值。如此,可以更新各个关键词的权重值。对于更新了权重的关键词,判断其最终权重是否大于预设阈值(如0.5),将权重值大于预设阈值的关键词作为特征标签。
进一步地,在另一个例子中,关键词信息中还可以包括多个关键词以及各个关键词对应的排序信息(如,可以根据权重进行排序)。相应地,根据特征单词确定内容信息的特征标签,可以包括:根据各个关键词与特征单词的匹配情况以及原有的排序信息,对多个关键词进行重新排序;将经过重新排序后、位于预定顺序范围内的关键词信息作为内容信息的特征标签。其中,预定顺序范围可以提前设定或实时修改。
例如,排序信息可包括各个关键词的权重值,可以根据各个关键词与特征单词的匹配情况(如,是否匹配)更新该关键词的权重值。更新权重值的方式可以如之前例子所述。然后,根据更新后的权重值对多个关键词进行排序,并将位于预定顺序范围(如,前十位)内的关键词信息作为特征标签。
在另一个实施例中,在步骤S130中获取的领域知识信息可以至少包括领域知识图谱,根据该领域知识图谱从关键词信息中确定内容信息的特征标签,可以包括:确定该领域知识图谱中包括的特征组合词;根据特征组合词从关键词信息中确定内容信息的特征标签。在一个例子中,在步骤S130中可以获取包括如图4所示的领域知识图谱。图4中的实体词为驾驶证,与驾驶证对应的关联词包括:扣分、新规、查询违规、换证和年审等。相应地,可以确定出其中的特征组合词包括:“驾驶证-扣分”、“驾驶证-新规”、“驾驶证-查询违规”、“驾驶证-换证”和“驾驶证-年审”等。
在一个例子中,根据特征组合词确定内容信息的特征标签,可以包括:将关键词信息中与特征组合词匹配的关键词信息作为特征标签。例如,确定的特征组合词包括: 驾驶证-扣分、驾驶证-新规、驾驶证-查询违规、驾驶证-换证和驾驶证-年审等,据此可以从关键词信息(如,包括:驾驶证、扣分、新规、驾驶证年审等)确定出包括驾驶证-扣分、驾驶证-新规、驾驶证-年审的特征标签。
在另一例子中,关键词信息中可以包括多个关键词以及各个关键词的权重信息。相应地,根据特征组合词确定内容信息的特征标签,可以包括:根据各个关键词与特征组合词的匹配情况,更新所述多个关键词的权重;将更新后的权重大于预设阈值的关键词作为内容信息的特征标签。
在另一个例子中,关键词信息中可以包括多个关键词以及各个关键词对应的排序信息(如,可以根据权重进行排序)。相应地,根据特征组合词确定内容信息的特征标签,可以包括:根据各个关键词与特征组合词的匹配情况以及原有的排序信息,对多个关键词进行重新排序;将经过重新排序后、位于预定顺序范围内的关键词信息作为内容信息的特征标签。
例如,排序信息可包括各个关键词的权重值,可以根据各个关键词与特征组合词的匹配情况(如,是否匹配)更新该关键词的权重值。比如说,当关键词(如,驾驶证-扣分)与某个特征组合词(驾驶证-扣分)完全匹配时,可以将该关键词的权重值增加第一预设值(如,0.1);当关键词(如,驾驶证)与某个特征组合词(驾驶证-扣分)部分匹配时,可以将该关键词的权重值增加第二预设值(如,0.05);当关键词与所有特征组合词均不匹配时,可以保持该关键词原有的权重值。然后,根据更新后的权重值对多个关键词进行排序,并将位于预定顺序范围(如,前五位)内的关键词信息作为特征标签。
在步骤S140中确定内容信息的特征标签后,接着,在步骤S150,根据用户的属性标签和特征标签,从内容信息库中确定向用户推荐的内容推荐信息。
具体地,用户的属性标签可以基于用户的历史浏览内容确定。在一个实施例中,可以根据用户的历史浏览内容的特征标签,确定用户的属性标签。其中特征标签可以通过执行上述步骤S110-步骤S140而获得,区别在于步骤S110中不是获取内容信息库中的内容信息,而是获取用户的历史浏览内容。
在一个实施例中,从内容信息库中确定向用户推荐的内容推荐信息,可以包括:将与属性标签匹配的特征标签所对应的内容信息,作为内容推荐信息。
在另一实施例中,从特征标签与属性标签相匹配的内容信息中,按照预设规则 进行进一步筛选,来确定最终的内容推荐信息。具体而言,在一个例子中,从内容信息库中确定向用户推荐的内容推荐信息包括,将与属性标签匹配的特征标签所对应的内容信息,作为向用户推荐的候选内容信息;以及根据预设规则对候选内容信息中的各个内容信息进行排名,并将名次在预设范围内的内容信息作为内容推荐信息。
在一个例子中,预设规则可以包括特征标签的权重值,根据预设规则对候选内容信息中的各个内容信息进行排名,可以包括:根据候选内容信息中各个内容信息所对应的特征标签以及特征标签的权重值,通过加权求和的方式确定该内容信息的评分,并根据评分对各个内容信息进行排名。
在一个例子中,预设范围可以根据与内容信息相关的业务规则进行设定,例如,内容信息属于科技资讯,其业务规则可以包括向用户推荐名次排在前五位的内容信息。又例如,内容信息属于音乐栏目,其业务规则可以包括向用户推荐名次排在前十位的内容信息。
需要说明的是,可以在根据步骤S110-S140确定出用户的属性标签后,直接从内容信息库中确定向用户推荐的内容推荐信息。具体地,可以根据包括第一用户在内的多个用户的属性标签确定多个用户之间的相似度,并从多个用户中确定与第一用户的相似度在预设阈值范围内的多个第二用户。然后,根据多个第二用户对内容信息库中内容信息的浏览记录,从内容信息库中确定向第一用户推荐的内容推荐信息。
另外,单个内容信息可以与多个领域相关。相应地,在步骤S120中,可以确定与该内容信息对应的多个特定领域,再在步骤S130中可以获取与各个特定领域对应的领域知识信息(如,领域层级知识和/或领域知识图谱)。
在一个实施例中,在步骤S130中可以获取与各个特定领域对应的领域层级知识,进而在步骤S140中确定内容信息在各个特定领域中所对应的特定类别,以及与各个特定类别对应的特征单词,以及根据该特征单词确定内容信息的特征标签。
在另一个实施例中,在步骤S130中可以获取与各个特定领域对应的领域知识图谱,进而在步骤S140中确定各个领域知识图谱中包括的特征组合词,以及根据该特征组合词确定内容信息的特征标签。
此外,领域知识信息中包括的领域层级知识和领域知识图谱的主要区别在于:第一,领域层级知识中的特征单词包括单个的词语,领域知识图谱中的特征组合词包括由至少两个单词组合的词语。第二,领域层级知识中的特征单词主要是与类别强相关(也 就是能够根据该词语明确推测出其所属的一个或几个类别)的词语(如,特征单词“洗车”通常属于汽车类别),而对于某些在所有类别都存在、但是在不同类别下具有不同语义的词语(如,新规)可能不会被设定为特征单词。而基于领域知识图谱可以通过提取组合词(如,驾驶证-新规)的方式确定出在领域中具有明确语义的特征信息。显然,在上述方法中,既可以单独使用领域层级知识或领域知识图谱,也可以将领域层级知识和领域知识图谱结合起来使用,以确定内容信息的特征标签,进而确定向用户推荐的内容推荐信息。
由上可知,在本说明书披露的多个实施例提供的内容推荐方法中,首先,获取内容信息库中的内容信息,并确定与该内容信息相关的关键词信息。以及,确定与该内容信息对应的特定领域,并获取与该特定领域对应的领域知识信息。接着,根据该领域知识信息从关键词信息中确定内容信息的特征标签。然后,根据该特征标签和用户的属性标签,从内容信息库中确定向用户推荐的内容推荐信。通过采用这种方法,实现向用户推荐更加精准的内容信息。
与内容推荐方法对应地,本说明书披露的多个实施例还提供一种内容推荐装置,如图4所示,该装置400包括:
第一获取模块410,用于获取内容信息库中的内容信息;
第一确定模块420,用于确定内容信息的关键词信息;
第二确定模块430,用于确定与内容信息对应的特定领域;
第二获取模块440,用于获取与特定领域对应的领域知识信息;
第三确定模块450,用于根据领域知识信息从关键词信息中确定内容信息的特征标签;
处理模块460,用于根据用户的属性标签和特征标签,从内容信息库中确定向用户推荐的内容推荐信息。
在一种可能的实施方式中,确定子模块中的关键词提取算法包括TF-IDF算法和TextRank算法中的至少一种。
在一种可能的实施方式中,第一获取模块410获取的内容信息包括领域标签,第二确定模块430具体用于:
根据领域标签确定与内容信息对应的特定领域。
在一种可能的实施方式中,第二获取模块440获取的领域知识信息包括领域层级知识,领域层级知识包括领域名称,与领域名称对应的类别名称,以及与类别名称对应的特征单词。
在一种可能的实施方式中,第二获取模块440获取的特征单词基于内容语料库中的内容语料训练而获得。
在一种可能的实施方式中,第三确定模块450具体包括:
第一确定子模块451,用于确定与内容信息对应的特定类别;
第二确定子模块452,用于在领域层级知识中,确定与特定类别对应的特定类别名称,以及与特定类别名称对应的特征单词;
第三确定子模块453,根据特征单词从关键词信息中确定内容信息的特征标签。
在一种可能的实施方式中,第一获取模块410获取的内容信息包括类别标签,第一确定子模块451具体用于:
根据类别标签确定与内容信息对应的特定类别。
在一种可能的实施方式中,第三确定子模块453具体用于:
将关键词信息中与特征单词匹配的关键词信息作为特征标签。
在一种可能的实施方式中,第一确定模块420确定的关键词信息中包括多个关键词和各个关键词的排序信息,第三确定子模块453具体用于:
根据关键词信息中各个关键词与特征单词的匹配情况,以及排序信息,对多个关键词重新排序;
将经过重新排序后、位于预定顺序范围内的关键词信息作为内容信息的特征标签。
在一种可能的实施方式中,第二获取模块440获取的领域知识信息包括领域知识图谱,领域知识图谱在其第一层包括与领域对应的实体词,在其第二层包括与实体词对应的关联词,实体词和关联词组合构成特征组合词。
在一种可能的实施方式中,第三确定模块450具体包括:
第二确定子模块452,用于确定与特定领域对应的领域知识图谱中包括的特征组合词;
第三确定子模块453,用于根据特征组合词从关键词信息中确定内容信息的特征标签。
在一种可能的实施方式中,第三确定子模块453具体用于:
将关键词信息中与特征组合词匹配的关键词信息作为特征标签。
在一种可能的实施方式中,第一确定模块420确定的关键词信息中包括多个关键词和各个关键词的排序信息,第三确定子模块453具体用于:
根据关键词信息中各个关键词与特征组合词的匹配情况,以及排序信息,对多个关键词重新排序;
将经过重新排序后、位于预定顺序范围内的关键词信息作为内容信息的特征标签。
在一种可能的实施方式中,处理模块460中包括的属性标签基于用户的历史浏览内容确定。
在一种可能的实施方式中,处理模块460具体用于:
将与属性标签匹配的特征标签所对应的内容信息,作为向用户推荐的候选内容信息;
根据预设规则对候选内容信息中的各个内容信息进行排名,并将名次在预设范围内的内容信息作为内容推荐信息。
由上可知,在本说明书披露的多个实施例提供的内容推荐装置中,首先,第一获取模块410获取内容信息库中的内容信息,第一确定模块420确定与该内容信息相关的关键词信息,第二确定模块430确定与该内容信息对应的特定领域,第二获取模块440获取与该特定领域对应的领域知识信息。接着,第三确定模块450根据该领域知识信息从关键词信息中确定内容信息的特征标签。然后,处理模块460根据该特征标签和用户的属性标签,从内容信息库中确定向用户推荐的内容推荐信。通过采用这种装置,实现向用户推荐更加精准的内容信息。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本说明书披露的多个实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本说明书披露的多个实施例的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本说明书披露的多个实施例的具体实施方式而已,并不用于限定本说明书披露的多个实施例的保护范围,凡在本说明书披露的多个实施例的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本说明书披露的多个实施例的保护范围之内。

Claims (20)

  1. 一种内容推荐方法,其特征在于,包括:
    获取内容信息库中的内容信息,并确定所述内容信息的关键词信息;
    确定与所述内容信息对应的特定领域;
    获取与所述特定领域对应的领域知识信息;
    根据所述领域知识信息从所述关键词信息中确定所述内容信息的特征标签;
    根据用户的属性标签和所述特征标签,从所述内容信息库中确定向所述用户推荐的内容推荐信息。
  2. 根据权利要求1所述的方法,其特征在于,所述领域知识信息包括领域层级知识,所述领域层级知识包括领域名称,与所述领域名称对应的类别名称,以及与所述类别名称对应的特征单词。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述领域知识信息从所述关键词信息中确定所述内容信息的特征标签包括:
    确定与所述内容信息对应的特定类别;
    在所述领域层级知识中,确定与所述特定类别对应的特定类别名称,以及与所述特定类别名称对应的特征单词;
    根据所述特征单词从所述关键词信息中确定所述内容信息的特征标签。
  4. 根据权利要求3所述的方法,其特征在于,所述内容信息包括类别标签,所述确定与所述内容信息对应的特定类别,包括:
    根据所述类别标签确定与所述内容信息对应的特定类别。
  5. 根据权利要求3所述的方法,其特征在于,所述根据所述特征单词从所述关键词信息中确定所述内容信息的特征标签,包括:
    将所述关键词信息中与所述特征单词匹配的关键词信息作为所述特征标签。
  6. 根据权利要求3所述的方法,其特征在于,所述关键词信息包括多个关键词和各个关键词的排序信息,所述根据所述特征单词从所述关键词信息中确定所述内容信息的特征标签,包括:
    根据所述关键词信息中所述各个关键词与所述特征单词的匹配情况,以及所述排序信息,对所述多个关键词重新排序;
    将经过所述重新排序后、位于预定顺序范围内的关键词信息作为所述内容信息的特征标签。
  7. 根据权利要求1所述的方法,其特征在于,所述领域知识信息包括领域知识图 谱,所述领域知识图谱在其第一层包括与领域对应的实体词,在其第二层包括与所述实体词对应的关联词,所述实体词和所述关联词组合构成特征组合词。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述领域知识信息从所述关键词信息中确定所述内容信息的特征标签包括:
    确定与所述特定领域对应的领域知识图谱中包括的特征组合词;
    根据所述特征组合词从所述关键词信息中确定所述内容信息的特征标签。
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述特征组合词从所述关键词信息中确定所述内容信息的特征标签,包括:
    将所述关键词信息中与所述特征组合词匹配的关键词信息作为所述特征标签。
  10. 根据权利要求8所述的方法,其特征在于,所述关键词信息包括多个关键词和各个关键词的排序信息,所述根据所述特征组合词从所述关键词信息中确定所述内容信息的特征标签,包括:
    根据所述关键词信息中所述各个关键词与所述特征组合词的匹配情况,以及所述排序信息,对所述多个关键词重新排序;
    将经过所述重新排序后、位于预定顺序范围内的关键词信息作为所述内容信息的特征标签。
  11. 一种内容推荐装置,其特征在于,包括:
    第一获取模块,用于获取内容信息库中的内容信息;
    第一确定模块,用于确定所述内容信息的关键词信息;
    第二确定模块,用于确定与所述内容信息对应的特定领域;
    第二获取模块,用于获取与所述特定领域对应的领域知识信息;
    第三确定模块,用于根据所述领域知识信息从所述关键词信息中确定所述内容信息的特征标签;
    处理模块,用于根据用户的属性标签和所述特征标签,从所述内容信息库中确定向所述用户推荐的内容推荐信息。
  12. 根据权利要求11所述的装置,其特征在于,所述第二获取模块获取的领域知识信息包括领域层级知识,所述领域层级知识包括领域名称,与所述领域名称对应的类别名称,以及与所述类别名称对应的特征单词。
  13. 根据权利要求12所述的装置,其特征在于,所述第三确定模块具体包括:
    第一确定子模块,用于确定与所述内容信息对应的特定类别;
    第二确定子模块,用于在所述领域层级知识中,确定与所述特定类别对应的特定类 别名称,以及与所述特定类别名称对应的特征单词;
    第三确定子模块,根据所述特征单词从所述关键词信息中确定所述内容信息的特征标签。
  14. 根据权利要求13所述的装置,其特征在于,所述第一获取模块获取的内容信息包括类别标签,所述第一确定子模块具体用于:
    根据所述类别标签确定与所述内容信息对应的特定类别。
  15. 根据权利要求13所述的装置,其特征在于,所述第三确定子模块具体用于:
    将所述关键词信息中与所述特征单词匹配的关键词信息作为所述特征标签。
  16. 根据权利要求13所述的装置,其特征在于,所述第一确定模块确定的关键词信息中包括多个关键词和各个关键词的排序信息,所述第三确定子模块具体用于:
    根据所述关键词信息中所述各个关键词与所述特征单词的匹配情况,以及所述排序信息,对所述多个关键词重新排序;
    将经过所述重新排序后、位于预定顺序范围内的关键词信息作为所述内容信息的特征标签。
  17. 根据权利要求11所述的装置,其特征在于,所述第二获取模块获取的领域知识信息包括领域知识图谱,所述领域知识图谱在其第一层包括与领域对应的实体词,在其第二层包括与所述实体词对应的关联词,所述实体词和所述关联词组合构成特征组合词。
  18. 根据权利要求11所述的装置,其特征在于,所述第三确定模块具体包括:
    第二确定子模块,用于确定与所述特定领域对应的领域知识图谱中包括的特征组合词;
    第三确定子模块,用于根据所述特征组合词从所述关键词信息中确定所述内容信息的特征标签。
  19. 根据权利要求18所述的装置,其特征在于,所述第三确定子模块具体用于:
    将所述关键词信息中与所述特征组合词匹配的关键词信息作为所述特征标签。
  20. 根据权利要求18所述的装置,其特征在于,所述第一确定模块确定的关键词信息中包括多个关键词和各个关键词的排序信息,所述第三确定子模块具体用于:
    根据所述关键词信息中所述各个关键词与所述特征组合词的匹配情况,以及所述排序信息,对所述多个关键词重新排序;
    将经过所述重新排序后、位于预定顺序范围内的关键词信息作为所述内容信息的特征标签。
PCT/CN2018/123283 2018-01-08 2018-12-25 内容推荐方法及装置 WO2019134554A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
SG11202006532QA SG11202006532QA (en) 2018-01-08 2018-12-25 Content recommendation method and apparatus
US16/911,000 US11720572B2 (en) 2018-01-08 2020-06-24 Method and system for content recommendation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810015028.0 2018-01-08
CN201810015028.0A CN108268619B (zh) 2018-01-08 2018-01-08 内容推荐方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/911,000 Continuation-In-Part US11720572B2 (en) 2018-01-08 2020-06-24 Method and system for content recommendation

Publications (1)

Publication Number Publication Date
WO2019134554A1 true WO2019134554A1 (zh) 2019-07-11

Family

ID=62773196

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123283 WO2019134554A1 (zh) 2018-01-08 2018-12-25 内容推荐方法及装置

Country Status (5)

Country Link
US (1) US11720572B2 (zh)
CN (1) CN108268619B (zh)
SG (1) SG11202006532QA (zh)
TW (1) TWI687823B (zh)
WO (1) WO2019134554A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159420A (zh) * 2019-12-12 2020-05-15 西安交通大学 一种基于属性计算与知识模板的实体优化方法
CN111259659A (zh) * 2020-01-14 2020-06-09 北京百度网讯科技有限公司 信息处理方法和装置
CN112686043A (zh) * 2021-01-12 2021-04-20 武汉大学 一种基于词向量的企业所属新兴产业分类方法

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268619B (zh) * 2018-01-08 2020-06-30 阿里巴巴集团控股有限公司 内容推荐方法及装置
CN110570316A (zh) 2018-08-31 2019-12-13 阿里巴巴集团控股有限公司 训练损伤识别模型的方法及装置
CN113272803A (zh) * 2019-01-04 2021-08-17 三星电子株式会社 从电子装置检索智能信息的方法和设备
CN110516030B (zh) * 2019-08-26 2022-11-01 北京百度网讯科技有限公司 意图词的确定方法、装置、设备及计算机可读存储介质
CN110706021A (zh) * 2019-09-12 2020-01-17 微梦创科网络科技(中国)有限公司 一种广告投放方法及系统
US10943072B1 (en) * 2019-11-27 2021-03-09 ConverSight.ai, Inc. Contextual and intent based natural language processing system and method
US11386463B2 (en) * 2019-12-17 2022-07-12 At&T Intellectual Property I, L.P. Method and apparatus for labeling data
TWI800743B (zh) * 2020-07-17 2023-05-01 開曼群島商粉迷科技股份有限公司 個人化內容推薦方法、圖形使用者介面及其系統
CN112417202B (zh) * 2020-09-04 2023-06-30 上海哔哩哔哩科技有限公司 内容筛选方法及装置
CN112328832B (zh) * 2020-10-27 2022-08-09 内蒙古大学 一种融合标签和知识图谱的电影推荐方法
CN112348638B (zh) * 2020-11-09 2024-02-20 上海秒针网络科技有限公司 一种活动文案推荐方法、装置、电子设备和存储介质
CN112380339A (zh) * 2020-11-23 2021-02-19 北京达佳互联信息技术有限公司 热点事件挖掘方法、装置及服务器
CN112685645A (zh) * 2021-01-13 2021-04-20 敖客星云(北京)科技发展有限公司 基于知识图谱的智能教育推荐方法、系统、设备和介质
CN113157857B (zh) * 2021-03-13 2023-06-02 中国科学院新疆理化技术研究所 面向新闻的热点话题检测方法、装置及设备
CN113032671B (zh) * 2021-03-17 2024-02-23 北京百度网讯科技有限公司 内容处理方法、装置、电子设备和存储介质
CN113076428A (zh) * 2021-03-19 2021-07-06 北京沃东天骏信息技术有限公司 一种书单生成方法和装置
CN113177170B (zh) * 2021-04-12 2023-05-23 维沃移动通信有限公司 评论展示方法、装置及电子设备
CN115248890B (zh) * 2021-04-27 2024-04-05 百度国际科技(深圳)有限公司 用户兴趣画像的生成方法、装置、电子设备以及存储介质
CN112988979B (zh) * 2021-04-29 2021-10-08 腾讯科技(深圳)有限公司 实体识别方法、装置、计算机可读介质及电子设备
CN113704614A (zh) * 2021-08-30 2021-11-26 康键信息技术(深圳)有限公司 基于用户画像的页面生成方法、装置、设备及介质
CN113806561A (zh) * 2021-10-11 2021-12-17 中国人民解放军国防科技大学 一种基于实体属性的知识图谱事实补全方法
TWI800982B (zh) 2021-11-16 2023-05-01 宏碁股份有限公司 文章標記資料的產生裝置及其產生方法
CN113936765A (zh) * 2021-12-17 2022-01-14 北京因数健康科技有限公司 周期行为报告的生成方法及装置、存储介质、电子设备
CN114722147A (zh) * 2022-03-31 2022-07-08 长沙博为软件技术股份有限公司 一种电子病历中传染病史的质控方法、系统、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103442001A (zh) * 2013-08-22 2013-12-11 百度在线网络技术(北京)有限公司 信息推荐方法、装置和服务器
US20160205427A1 (en) * 2015-01-14 2016-07-14 Samsung Electronics Co., Ltd. User terminal apparatus, system, and control method thereof
CN106156204A (zh) * 2015-04-23 2016-11-23 深圳市腾讯计算机系统有限公司 文本标签的提取方法和装置
CN106897273A (zh) * 2017-04-12 2017-06-27 福州大学 一种基于知识图谱的网络安全动态预警方法
CN108268619A (zh) * 2018-01-08 2018-07-10 阿里巴巴集团控股有限公司 内容推荐方法及装置

Family Cites Families (77)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7031901B2 (en) 1998-05-13 2006-04-18 Abu El Ata Nabil A System and method for improving predictive modeling of an information system
US6397334B1 (en) 1998-12-17 2002-05-28 International Business Machines Corporation Method and system for authenticating objects and object data
US6644973B2 (en) 2000-05-16 2003-11-11 William Oster System for improving reading and speaking
US6925452B1 (en) 2000-05-22 2005-08-02 International Business Machines Corporation Method and system for recognizing end-user transactions
US7093129B1 (en) 2000-06-19 2006-08-15 International Business Machines Corporation Secured encrypted communications in a voice browser
JP3846851B2 (ja) 2001-02-01 2006-11-15 松下電器産業株式会社 画像のマッチング処理方法及びその装置
US7565537B2 (en) 2002-06-10 2009-07-21 Microsoft Corporation Secure key exchange with mutual authentication
US20040196363A1 (en) 2003-04-01 2004-10-07 Gary Diamond Video identification verification system
US7466824B2 (en) 2003-10-09 2008-12-16 Nortel Networks Limited Method and system for encryption of streamed data
US7401012B1 (en) 2005-04-20 2008-07-15 Sun Microsystems, Inc. Method and apparatus for characterizing computer system workloads
US8448226B2 (en) 2005-05-13 2013-05-21 Sarangan Narasimhan Coordinate based computer authentication system and methods
US7536304B2 (en) 2005-05-27 2009-05-19 Porticus, Inc. Method and system for bio-metric voice print authentication
KR101426870B1 (ko) 2007-03-06 2014-09-19 스미토모덴키고교가부시키가이샤 화상 가공 방법, 컴퓨터 프로그램을 기록한 컴퓨터 판독 가능한 기록 매체, 및 화상 검사 방법
US7872584B2 (en) 2007-04-09 2011-01-18 Honeywell International Inc. Analyzing smoke or other emissions with pattern recognition
US8280106B2 (en) 2007-09-29 2012-10-02 Samsung Electronics Co., Ltd. Shadow and highlight detection system and method of the same in surveillance camera and recording medium thereof
US9298979B2 (en) 2008-01-18 2016-03-29 Mitek Systems, Inc. Systems and methods for mobile image capture and content processing of driver's licenses
US8180629B2 (en) 2008-07-10 2012-05-15 Trigent Softward Ltd. Automatic pattern generation in natural language processing
DE102008046254A1 (de) 2008-09-08 2010-03-11 Giesecke & Devrient Gmbh Wertdokumentbearbeitungsvorrichtung und ein Verfahren zur Reduktion von Staub in der Wertdokumentbearbeitungsvorrichtung
KR101556654B1 (ko) 2008-11-28 2015-10-02 삼성전자주식회사 영상 통화 수행 방법 및 장치
US8121400B2 (en) 2009-09-24 2012-02-21 Huper Laboratories Co., Ltd. Method of comparing similarity of 3D visual objects
US9253167B2 (en) 2011-04-19 2016-02-02 Apriva, Llc Device and system for facilitating communication and networking within a secure mobile environment
US9082235B2 (en) 2011-07-12 2015-07-14 Microsoft Technology Licensing, Llc Using facial data for device authentication or subject identification
US8966613B2 (en) 2011-09-30 2015-02-24 Microsoft Technology Licensing, Llc Multi-frame depth image information identification
US9165188B2 (en) 2012-01-12 2015-10-20 Kofax, Inc. Systems and methods for mobile image capture and processing
US9066125B2 (en) 2012-02-10 2015-06-23 Advanced Biometric Controls, Llc Secure display
JP6052657B2 (ja) 2012-03-13 2016-12-27 パナソニックIpマネジメント株式会社 対象物検証装置、対象物検証プログラム、及び対象物検証方法
US9117318B2 (en) 2012-03-14 2015-08-25 Flextronics Ap, Llc Vehicle diagnostic detection through sensitive vehicle skin
US8705836B2 (en) 2012-08-06 2014-04-22 A2iA S.A. Systems and methods for recognizing information in objects using a mobile device
US9582843B2 (en) 2012-08-20 2017-02-28 Tautachrome, Inc. Authentication and validation of smartphone imagery
US9036943B1 (en) 2013-03-14 2015-05-19 Amazon Technologies, Inc. Cloud-based image improvement
US10475014B1 (en) 2013-03-15 2019-11-12 Amazon Technologies, Inc. Payment device security
US9147127B2 (en) 2013-03-15 2015-09-29 Facebook, Inc. Verification of user photo IDs
US9723251B2 (en) 2013-04-23 2017-08-01 Jaacob I. SLOTKY Technique for image acquisition and management
CN104142940B (zh) * 2013-05-08 2017-11-17 华为技术有限公司 信息推荐处理方法及装置
US9268823B2 (en) * 2013-05-10 2016-02-23 International Business Machines Corporation Partial match derivation using text analysis
US10319035B2 (en) 2013-10-11 2019-06-11 Ccc Information Services Image capturing and automatic labeling system
US9202119B2 (en) 2013-10-18 2015-12-01 Daon Holdings Limited Methods and systems for determining user liveness
JP6287047B2 (ja) 2013-10-22 2018-03-07 富士通株式会社 画像処理装置、画像処理方法および画像処理プログラム
US9607138B1 (en) 2013-12-18 2017-03-28 Amazon Technologies, Inc. User authentication and verification through video analysis
CA2883010A1 (en) 2014-02-25 2015-08-25 Sal Khan Systems and methods relating to the authenticity and verification of photographic identity documents
US20150293982A1 (en) * 2014-04-14 2015-10-15 International Business Machines Corporation Displaying a representative item for a collection of items
US9646227B2 (en) 2014-07-29 2017-05-09 Microsoft Technology Licensing, Llc Computerized machine learning of interesting video sections
US9258303B1 (en) 2014-08-08 2016-02-09 Cellcrypt Group Limited Method of providing real-time secure communication between end points in a network
CA2902093C (en) 2014-08-28 2023-03-07 Kevin Alan Tussy Facial recognition authentication system including path parameters
CN106033415B (zh) * 2015-03-09 2020-07-03 深圳市腾讯计算机系统有限公司 文本内容推荐方法及装置
US9619696B2 (en) 2015-04-15 2017-04-11 Cisco Technology, Inc. Duplicate reduction for face detection
TWI556123B (zh) * 2015-08-06 2016-11-01 News tracking and recommendation method
US9794260B2 (en) 2015-08-10 2017-10-17 Yoti Ltd Liveness detection
US20170060867A1 (en) 2015-08-31 2017-03-02 Adfamilies Publicidade, SA Video and image match searching
US10065441B2 (en) 2015-09-01 2018-09-04 Digimarc Corporation Counterfeiting detection using machine readable indicia
US10706266B2 (en) 2015-09-09 2020-07-07 Nec Corporation Guidance acquisition device, guidance acquisition method, and program
US11868354B2 (en) * 2015-09-23 2024-01-09 Motorola Solutions, Inc. Apparatus, system, and method for responding to a user-initiated query with a context-based response
GB201517462D0 (en) 2015-10-02 2015-11-18 Tractable Ltd Semi-automatic labelling of datasets
WO2017059576A1 (en) 2015-10-09 2017-04-13 Beijing Sensetime Technology Development Co., Ltd Apparatus and method for pedestrian detection
US20170147990A1 (en) 2015-11-23 2017-05-25 CSI Holdings I LLC Vehicle transactions using objective vehicle data
CN105719188B (zh) 2016-01-22 2017-12-26 平安科技(深圳)有限公司 基于多张图片一致性实现保险理赔反欺诈的方法及服务器
US10242048B2 (en) * 2016-01-26 2019-03-26 International Business Machines Corporation Dynamic question formulation to query data sources
US10692050B2 (en) 2016-04-06 2020-06-23 American International Group, Inc. Automatic assessment of damage and repair costs in vehicles
US11144889B2 (en) 2016-04-06 2021-10-12 American International Group, Inc. Automatic assessment of damage and repair costs in vehicles
US20170293620A1 (en) * 2016-04-06 2017-10-12 International Business Machines Corporation Natural language processing based on textual polarity
US10789545B2 (en) 2016-04-14 2020-09-29 Oath Inc. Method and system for distributed machine learning
JP6235082B1 (ja) 2016-07-13 2017-11-22 ヤフー株式会社 データ分類装置、データ分類方法、およびプログラム
US10055882B2 (en) 2016-08-15 2018-08-21 Aquifi, Inc. System and method for three-dimensional scanning and for capturing a bidirectional reflectance distribution function
GB2554361B8 (en) 2016-09-21 2022-07-06 Emergent Network Intelligence Ltd Automatic image based object damage assessment
WO2018120013A1 (en) 2016-12-30 2018-07-05 Nokia Technologies Oy Artificial neural network
WO2018165753A1 (en) 2017-03-14 2018-09-20 University Of Manitoba Structure defect detection using machine learning algorithms
CN107015963A (zh) * 2017-03-22 2017-08-04 重庆邮电大学 基于深度神经网络的自然语言语义分析系统及方法
KR102334575B1 (ko) 2017-07-31 2021-12-03 삼성디스플레이 주식회사 무라 검출 장치 및 무라 검출 장치의 검출 방법
US11087292B2 (en) 2017-09-01 2021-08-10 Allstate Insurance Company Analyzing images and videos of damaged vehicles to determine damaged vehicle parts and vehicle asymmetries
US11586875B2 (en) 2017-11-22 2023-02-21 Massachusetts Institute Of Technology Systems and methods for optimization of a data model network architecture for target deployment
CN109919308B (zh) 2017-12-13 2022-11-11 腾讯科技(深圳)有限公司 一种神经网络模型部署方法、预测方法及相关设备
US10942767B2 (en) 2018-02-27 2021-03-09 Microsoft Technology Licensing, Llc Deep neural network workload scheduling
US10554738B1 (en) 2018-03-02 2020-02-04 Syncsort Incorporated Methods and apparatus for load balance optimization based on machine learning
US10997413B2 (en) 2018-03-23 2021-05-04 NthGen Software Inc. Method and system for obtaining vehicle target views from a video stream
GB2573809B (en) 2018-05-18 2020-11-04 Emotech Ltd Speaker Recognition
US10832065B1 (en) 2018-06-15 2020-11-10 State Farm Mutual Automobile Insurance Company Methods and systems for automatically predicting the repair costs of a damaged vehicle from images
US11030735B2 (en) 2018-08-09 2021-06-08 Exxonmobil Upstream Research Company Subterranean drill bit management system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103442001A (zh) * 2013-08-22 2013-12-11 百度在线网络技术(北京)有限公司 信息推荐方法、装置和服务器
US20160205427A1 (en) * 2015-01-14 2016-07-14 Samsung Electronics Co., Ltd. User terminal apparatus, system, and control method thereof
CN106156204A (zh) * 2015-04-23 2016-11-23 深圳市腾讯计算机系统有限公司 文本标签的提取方法和装置
CN106897273A (zh) * 2017-04-12 2017-06-27 福州大学 一种基于知识图谱的网络安全动态预警方法
CN108268619A (zh) * 2018-01-08 2018-07-10 阿里巴巴集团控股有限公司 内容推荐方法及装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159420A (zh) * 2019-12-12 2020-05-15 西安交通大学 一种基于属性计算与知识模板的实体优化方法
CN111159420B (zh) * 2019-12-12 2023-04-28 西安交通大学 一种基于属性计算与知识模板的实体优化方法
CN111259659A (zh) * 2020-01-14 2020-06-09 北京百度网讯科技有限公司 信息处理方法和装置
CN111259659B (zh) * 2020-01-14 2023-07-04 北京百度网讯科技有限公司 信息处理方法和装置
CN112686043A (zh) * 2021-01-12 2021-04-20 武汉大学 一种基于词向量的企业所属新兴产业分类方法
CN112686043B (zh) * 2021-01-12 2024-02-06 武汉大学 一种基于词向量的企业所属新兴产业分类方法

Also Published As

Publication number Publication date
CN108268619B (zh) 2020-06-30
CN108268619A (zh) 2018-07-10
TWI687823B (zh) 2020-03-11
US20200320086A1 (en) 2020-10-08
TW201931170A (zh) 2019-08-01
US11720572B2 (en) 2023-08-08
SG11202006532QA (en) 2020-08-28

Similar Documents

Publication Publication Date Title
WO2019134554A1 (zh) 内容推荐方法及装置
US9552394B2 (en) Generation of multi-faceted search results in response to query
US11176142B2 (en) Method of data query based on evaluation and device
CN102831128B (zh) 一种对互联网上的同名人物信息进行分类的方法及装置
US9910930B2 (en) Scalable user intent mining using a multimodal restricted boltzmann machine
US20150100308A1 (en) Automated Formation of Specialized Dictionaries
Yin et al. Facto: a fact lookup engine based on web tables
CN111221968B (zh) 基于学科树聚类的作者消歧方法及装置
WO2016135905A1 (ja) 情報処理システム及び情報処理方法
CN112395410B (zh) 一种基于实体抽取的产业舆情推荐方法、装置及电子设备
CN112632397A (zh) 基于多类型学术成果画像及混合推荐策略的个性化推荐方法
US10643183B2 (en) Search engine
Fu et al. Automatic record linkage of individuals and households in historical census data
WO2020243116A1 (en) Self-learning knowledge graph
KR100954842B1 (ko) 카테고리 태그 정보를 이용한 웹 페이지 분류 방법, 그 시스템 및 이를 기록한 기록매체
JP6049201B2 (ja) 類義語を検出するための共起パターンを生成するプログラム、方法、装置及びサーバ
CN112417174A (zh) 一种数据处理的方法和装置
KR101652433B1 (ko) Sns 문서에서 추출된 토픽을 기반으로 파악된 감정에 따른 개인화 광고 제공 방법
KR101265467B1 (ko) 블로그 문서에서의 경험 문장 추출 방법 및 동사 분류 방법
Yang et al. A new ontology-supported and hybrid recommending information system for scholars
CN110851560A (zh) 信息检索方法、装置及设备
KR102625347B1 (ko) 동사와 형용사와 같은 품사를 이용한 음식 메뉴 명사 추출 방법과 이를 이용하여 음식 사전을 업데이트하는 방법 및 이를 위한 시스템
JP3910823B2 (ja) アンケート分析装置、アンケート分析方法及びプログラム
CN110008307B (zh) 一种基于规则和统计学习的变形实体识别方法和装置
US20230064226A1 (en) Discovery, extraction, and recommendation of talent-screening questions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18898645

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18898645

Country of ref document: EP

Kind code of ref document: A1