CN110717029A - Information processing method and system - Google Patents

Information processing method and system Download PDF

Info

Publication number
CN110717029A
CN110717029A CN201910976749.2A CN201910976749A CN110717029A CN 110717029 A CN110717029 A CN 110717029A CN 201910976749 A CN201910976749 A CN 201910976749A CN 110717029 A CN110717029 A CN 110717029A
Authority
CN
China
Prior art keywords
keyword
keywords
original
interest
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910976749.2A
Other languages
Chinese (zh)
Inventor
康潮明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN201910976749.2A priority Critical patent/CN110717029A/en
Publication of CN110717029A publication Critical patent/CN110717029A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • G06F16/337Profile generation, learning or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One embodiment relates to an information processing method and system. The method comprises the following steps: acquiring an original keyword; obtaining a keyword set according to the original keywords, wherein the keyword set comprises at least one keyword including the original keywords; retrieving texts corresponding to one or more keywords from a preset text library based on the keywords in the keyword set; processing texts corresponding to one or more keywords by using a machine learning model to obtain interest components of the one or more keywords; and counting the occurrence frequency of each interest component in the interest components of the one or more keywords, and determining the interest components of the original keywords according to the counting result.

Description

Information processing method and system
Technical Field
The embodiment of the specification relates to the field of big data, in particular to an information processing method and system.
Background
Big data brings convenience for life and work of people. For example, a traditional marketing document needs to be designed by a specially-assigned person, and in some existing scenarios, a user only needs to input a keyword to obtain a marketing document automatically generated by the keyword by a machine. However, since only the keywords are input by the user and other related information is lacked, it is difficult for the machine to recognize the real interest (or intention) of the user and generate a document meeting the interest of the user.
Therefore, it is desirable to provide a technical solution that can accurately identify the user interest from the keywords input by the user to automatically generate a document meeting the user interest.
Disclosure of Invention
Some embodiments in this specification provide an information processing method including: acquiring an original keyword; obtaining a keyword set according to the original keywords, wherein the keyword set comprises at least one keyword including the original keywords; retrieving texts corresponding to one or more keywords from a preset text library based on the keywords in the keyword set; processing texts corresponding to one or more keywords by using a machine learning model to obtain interest components of the one or more keywords; and counting the occurrence frequency of each interest component in the interest components of the one or more keywords, and determining the interest components of the original keywords according to the counting result.
Still other embodiments in this specification provide an information processing system comprising: the original keyword acquisition module is used for acquiring an original keyword; a keyword set obtaining module, configured to obtain a keyword set according to the original keyword, where the keyword set includes at least one keyword including the original keyword; the text retrieval module is used for retrieving texts corresponding to one or more keywords from a preset text library based on the keywords in the keyword set; the keyword interest component acquisition module is used for processing texts corresponding to one or more keywords by using a machine learning model to obtain interest components of the one or more keywords; and the keyword interest component determining module is used for counting the occurrence frequency of each interest component in the interest components of the one or more keywords and determining the interest components of the original keywords according to the counting result.
Other embodiments in this specification provide an information processing apparatus comprising at least one processor and at least one memory; the at least one memory is for storing computer instructions; the at least one processor is configured to execute at least a portion of the computer instructions to implement the information processing method as described above.
Drawings
The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is an exemplary flow diagram of a method of generating a document according to some embodiments of the present description;
FIG. 2 is an exemplary flow diagram of an information processing method shown in accordance with some embodiments of the present description;
FIG. 3 is a schematic diagram of the composition of a set of keywords shown in accordance with some embodiments of the present description;
FIG. 4 is an exemplary flow diagram of a method for counting frequency of occurrence of a component of interest in accordance with some embodiments described herein;
FIG. 5 is an exemplary block diagram of an information handling system shown in accordance with some embodiments of the present description; and
FIG. 6 is an exemplary block diagram of a keyword set acquisition module in an information handling system in accordance with some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present solution, and it is obvious for a person skilled in the art that the present solution can also be applied to other similar scenarios according to these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
Embodiments in this specification provide an information processing method and system. By the information processing method, the information processing system can determine the interest components of the keywords after the keywords are input by the user, so that the files meeting the interests of the user can be automatically generated.
FIG. 1 is an exemplary flow diagram of a method of generating a document according to some embodiments of the present description. In some embodiments, the present exemplary process may be implemented on a server or on a user terminal. In some embodiments, the present exemplary process may also be implemented by a server interacting with a user terminal. For example, a user can input a keyword through a terminal of the user, the user terminal sends the keyword to a server, and the server generates a corresponding file and returns the file to the user terminal.
As shown in fig. 1, the system first obtains a keyword input by a user, determines an interest component corresponding to the keyword, and generates a document according to the keyword and the interest component corresponding to the keyword, so that the generated document can cater to the user interest.
In some embodiments, the interest component may indicate a class of information whose corresponding keywords carry a user's interest (or intent). For example, the keywords "red envelope," "transfer," "coupon," etc. may each correspond to a benefit component, which may indicate that the respective keyword carries benefit information of interest to the user. As another example, the keywords "basketball," "football," "world cup," "olympic games," etc. may correspond to sports compositions that may indicate that the respective keywords carry sports information of interest to the user. As another example, the keywords "barbecue," "hot pot," "buffet," "seafood," etc. may all correspond to dietary components that indicate that the corresponding keywords carry dietary information of interest to the user. The above is merely an example, and the correspondence relationship between the keywords and the interest components is not particularly limited in the embodiments of the present specification.
In some embodiments, a vocabulary including a correspondence between keywords and interest components may be predefined. The vocabulary may contain a plurality of tuples, each tuple including a term and an interest component corresponding to the term. In this way, if the keyword input by the user can be found from the binary group of the word list, the interest component corresponding to the keyword can be determined from the found binary group.
In some embodiments, the original keywords input by the user may be expanded to obtain a keyword set, the interest components of the keywords in the keyword set are determined, the occurrence frequency of different interest components in the interest components of the keywords is counted, and the interest components of the original keywords are determined according to the statistical result.
In some embodiments, a predetermined pattern template may be used to generate the pattern. Specifically, the document template may include fixed content and content to be filled/adjusted, and the content to be filled/adjusted may be generated/updated according to the keyword input by the user and the interest component corresponding to the keyword, so as to generate a document that includes the keyword input by the user and meets the interest of the user. In some embodiments, a pattern generation model may also be used to generate the pattern. Specifically, a large number of actually existing documents can be collected, keywords and interest components thereof can be extracted from the collected documents, the extracted keywords, the interest components corresponding to the keywords, and the documents themselves are used as samples, and the document generation model is trained to obtain the trained document generation model. And then inputting the keywords input by the user and the interest components corresponding to the keywords into the trained pattern generation model, so as to obtain the pattern output by the model.
FIG. 2 is an exemplary flow diagram of an information processing method shown in accordance with some embodiments of the present description. The process 200 may be performed by the information handling system shown in FIG. 4. The process 200 includes:
step 210, obtaining an original keyword.
The original keyword refers to a word input by the user. In some embodiments, the user may input the original keyword through a user terminal having an input function. For example, a user terminal having an input function may include one or more of a touch screen, a tablet, a microphone, a keyboard, and the like. In some embodiments, the manner in which the user enters the keywords includes, but is not limited to, any combination of one or more of typing, handwriting, selection, voice, scanning, and the like.
In some embodiments, the system may obtain the keywords input by the user directly from the user terminal. In some embodiments, the user terminal may upload the keywords input by the user to a storage device communicatively connected to the user terminal and the system, and the system may retrieve the keywords input by the user from the storage device.
In some embodiments, the system may pre-process the original keywords obtained in view of the need to process the keywords based on the keywords in the standard format. For example, preprocessing may include case-shifting, removing illegal characters, removing symbols, and the like.
Step 220, obtaining a keyword set according to the original keywords.
The keyword set comprises at least one keyword including the original keyword. In some embodiments, the original keywords may be expanded to obtain a set of keywords.
In some embodiments, the system may determine synonyms for the original keywords and construct a set of keywords from the original keywords and their synonyms. In some embodiments, the system may retrieve synonyms of the original keywords in a synonym library, wherein the synonym library may include a number of synonym pairs, each synonym pair consisting of a number of words that are synonyms of each other. The source of synonyms can be varied, for example, they can be gathered from existing sources (e.g., web, books, dictionaries, etc.) or they can be constructed manually. In some embodiments, the thesaurus may be updated periodically or aperiodically.
In some embodiments, words whose similarity to the original keyword satisfies a set condition may be determined by comparing the word vectors, and a keyword set may be constructed from the original keyword and words similar thereto (hereinafter referred to as "similar words"). Specifically, first, a plurality of candidate words may be obtained and a word vector of the original keyword and each candidate word may be determined. For example, a Word vector for each Word may be determined by the Word2Vec model. And then, determining the similarity between each candidate word and the original keyword according to the original keyword and the word vector of each candidate word. In some embodiments, the similarity of each candidate word to the original keyword may be determined according to a distance between the original keyword and a word vector of each candidate word, where a smaller distance indicates a higher similarity. And finally, selecting at least one candidate word with the similarity meeting set conditions with the original keyword from the candidate words, and constructing the keyword set according to the original keyword and the selected at least one candidate word. In some embodiments, the setting condition may include that the similarity ranking with the original keyword is at a top preset ratio or number. For example, the similarity ranking with the original keyword is at top 1, 2, 3, 4, or 5, etc. As another example, the similarity ranking to the original keyword is at the top 1%, 2%, 3%, 4%, or 5%, etc.
FIG. 3 is a schematic diagram of the composition of a set of keywords shown in accordance with some embodiments of the present description. In some embodiments, the keyword set may be obtained in combination with the above method of determining synonyms and/or similarities. The set of keywords may include the original keywords, synonyms of the original keywords, and/or similar words. In some embodiments, the keyword set may further include expanded words obtained by methods other than the above method of determining synonyms and similar words. For example, a synonym of the original keyword may be obtained and added to the keyword set, and the present specification does not limit the manner of expanding the keyword. As shown in fig. 3, the keyword set 300 includes original keywords, synonyms of the original keywords, similar words, and other expanded words. In some embodiments, there may be duplication between expanded words obtained in different ways, for example, there may be words in the keyword set that are both synonyms and similar words of the original keyword, and duplicate words may be deduplicated.
Step 230, retrieving one or more texts corresponding to the keywords from a preset text library based on the keywords in the keyword set.
In some embodiments, a large number of sentences may be included in the predetermined text library. The source of the sentence may be various, for example, it may be collected through the existing channel (such as network, book, poster, audio, video, picture, etc.), or it may be constructed by human. Taking a marketing scenario as an example, the preset text library may include a marketing corpus, where the sentences may be sentences extracted from a marketing document, for example, sentences extracted from existing marketing subject posters, such as "you get a million red packs", "discount over value", "one-line exhaustion", "national celebration special welfare dispatch", "make a financial account for more than one opportunity of living", and the like. In some embodiments, the predetermined text library may be updated periodically or aperiodically.
And searching each keyword in a preset text library to obtain a text corresponding to each keyword. In some embodiments, the keyword corresponding text may be a sentence containing the keyword. For example, the original keyword input by the user is "red envelope", and a keyword set is obtained through expansion, where the set includes the following keywords: red envelope, coupons, discounts, rewards. And searching each keyword in a preset text library to obtain a text corresponding to each keyword, and if searching the ' red packet ' to obtain a text ' million red packets and other you get robbed ', searching the discount to obtain the text ' premium discount ', and paying off all the internet '.
It should be noted that the type of the preset text library can be selected according to the nature of the file to be generated. For example, when a marketing copy needs to be generated, a text library containing marketing text may be employed. For another example, when a public welfare case needs to be generated, a text library containing public welfare texts can be adopted.
And step 240, processing the texts corresponding to the one or more keywords by using a machine learning model to obtain the interest components of the one or more keywords.
The interest component may indicate information carried by the keyword that is of interest to the user. In some embodiments, the interest component may include a benefit component (also referred to as a "benefit point") for indicating benefit information carried by the keyword that is of interest to the user, such as offer information, rebate information, price information, and the like. The component of interest may take a variety of manifestations. For example, the interest components may be in the form of tags, i.e., keywords may carry corresponding interest tags. By way of example only, the keyword "Red packet" is identified as a benefit component, in the form "< Red packet @ BEN >", where < > is the keyword and its corresponding interest tag.
For the text corresponding to each keyword, the text corresponding to the keyword can be processed by a machine learning model to obtain the interest components of the keyword. In some embodiments, the machine learning model for identifying interest components of keywords in the text corresponding to the keywords may be trained as follows:
firstly, a large number of texts can be obtained, keywords and interest components of the keywords in the texts are determined, and then the texts and the interest components of the corresponding keywords are used as samples to train the model. Specifically, the text may be used as an input of the model, the interest component of the corresponding keyword is used as a reference standard (Ground Truth), the model is supervised trained, and when a certain condition is met (for example, the number of training samples reaches a certain value, the value of the loss function is smaller than a certain value, and the like), the training is stopped, and the trained model is obtained. For example only, the text is "new year red packet large-scale delivery", it is determined that "red packet" is a keyword in the text and has a beneficial component, the text is labeled "new year < | red packet @ BEN | > large-scale delivery", and the text and the result after label processing can be used as a sample pair. In some embodiments, the sample may be partitioned to obtain a training set and a test set. The training set is used for training the model, and the testing set is used for testing whether the prediction accuracy of the model trained by the training set reaches the standard or not. In some embodiments, a validation set may be further partitioned from the sample, and the trained model may be validated.
In some embodiments, the machine learning Model may include a Long Short Term Memory-Conditional Random Field (LSTM-CRF) Model, a Conditional Random Field (CRF) Model, or a Hidden Markov Model (HMM). The LSTM-CRF model comprises an LSTM model and a CRF model which are connected with each other, the LSTM model can well generalize character features of an input text, the two character features with similar semantics can also obtain a corresponding relation after generalization, meanwhile, the CRF model can strengthen the incidence relation between the current character and the context character due to long-distance dependent modeling, and the LSTM model and the CRF model can be combined to achieve a better recognition effect.
And 250, counting the occurrence frequency of each interest component in the interest components of the one or more keywords, and determining the interest components of the original keywords according to the counting result.
In some embodiments, the processing result of the machine learning model on the text corresponding to the one or more keywords may be summarized to obtain the interest components of the related keywords, the interest components of all the related keywords are summarized and counted again, and the interest component with the highest frequency appearing in the interest components of the one or more keywords is determined as the interest component of the original keyword.
FIG. 4 is an exemplary flow diagram of a method for counting frequency of occurrence of a component of interest in accordance with some embodiments described herein. As shown in fig. 4, taking an example that a processing result obtained by processing the text corresponding to the one or more keywords by the machine learning model includes an original keyword, and a synonym and a similar word corresponding to the original keyword, and an interest component thereof, the statistical process 400 may include: and for the appearing first interest component, counting the frequency p of the first interest component appearing in all interest components of the original keyword, counting the frequency q of the first interest component appearing in the interest components of the synonyms, counting the frequency r of the first interest component appearing in the interest components of the similar words, and finally calculating the sum of p, q and r to obtain the frequency of the first interest component appearing in the interest components of all the keywords of the keyword set. Similarly, the frequency of the occurrence of the interest components such as the second interest component and the third interest component in the interest components of all the keywords in the keyword set can also be determined, and finally, the interest component with the maximum occurrence frequency in all the counted interest components is determined as the interest component of the original keyword.
It should be noted that the above description related to the flow 200 is only for illustration and explanation, and does not limit the applicable scope of the present technical solution. Various modifications and alterations to flow 200 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are intended to be within the scope of the present description.
FIG. 5 is an exemplary block diagram of an information handling system shown in accordance with some embodiments of the present description. The system 500 includes: an original keyword acquisition module 510, a keyword set acquisition module 520, a text retrieval module 530, a keyword interest component acquisition module 540, and a keyword interest component determination module 550.
The original keyword acquisition module 510 may be used to acquire an original keyword.
The keyword set obtaining module 520 may be configured to obtain a keyword set according to the original keyword, where the keyword set includes at least one keyword including the original keyword. FIG. 6 is an exemplary block diagram of the keyword set acquisition module 520. In some embodiments, as shown in fig. 6, the keyword set acquisition module 520 may further include a synonym determination unit 521. The synonym determining unit 521 may be configured to determine at least one synonym of the original keyword. The keyword set obtaining module 520 may further include a candidate word obtaining unit 522, a word vector determining unit 523, a similarity determining unit 524, and a candidate word screening unit 525. The candidate word obtaining unit 522 may be configured to obtain a plurality of candidate words. The word vector determining unit 523 may be configured to determine the original keyword and a word vector of each candidate word. The similarity determining unit 524 may be configured to determine a similarity between each candidate word and the original keyword according to the original keyword and the word vector of each candidate word. The candidate word screening unit 525 may be configured to select at least one candidate word from the plurality of candidate words, where a similarity between the candidate word and the original keyword satisfies a set condition. The keyword set construction unit 526 may be used to construct a keyword set. In some embodiments, the keyword set constructing unit 526 constructs the keyword set according to the original keyword, the synonym, and the selected at least one candidate word.
The text retrieval module 530 may be configured to retrieve one or more texts corresponding to the keywords from a preset text library based on the keywords in the keyword set.
The keyword interest component obtaining module 540 may be configured to process the text corresponding to the one or more keywords by using a machine learning model to obtain interest components of the one or more keywords. In some embodiments, the machine learning model may include a long-short term memory and conditional random field model, a conditional random field model, or a hidden markov model.
The keyword interest component determining module 550 may be configured to count occurrence frequency of each interest component in the interest components of the one or more keywords, and determine the interest component of the original keyword according to a statistical result. In some embodiments, the keyword interest component determination module 550 may be further configured to determine an interest component with the largest frequency of occurrence among the interest components of the one or more keywords as the interest component of the original keyword.
It should be understood that the system and its modules shown in FIG. 5 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).
It should be noted that the above description of the information processing system 500 and the modules thereof is merely for convenience of description and is not intended to limit the present disclosure within the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, the original keyword obtaining module 510, the keyword set obtaining module 520, the text retrieving module 530, the keyword interest component obtaining module 540 and the keyword interest component determining module 550 disclosed in fig. 5 may be different modules in a system, or may be a module that implements the functions of two or more of the above modules. For example, in some embodiments, the original keyword acquisition module 510 and the keyword set acquisition module 520 may be combined into one module. Such variations are within the scope of the present disclosure.
The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) by expanding the original keywords, a plurality of interest components corresponding to the plurality of keywords can be obtained, and the possibility of determining the interest components matching the real interest or intention of the user is higher; (2) the interest components of the keywords are identified in the text containing the keywords, and the context of the keywords in the text is utilized, so that the identified interest components are more reliable and accurate. (3) Because the interesting components of the original keywords can be better identified, the file with higher quality can be generated. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the embodiments herein. Various modifications, improvements and adaptations to the embodiments described herein may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the embodiments of the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.
Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.
Moreover, those skilled in the art will appreciate that aspects of the embodiments of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of embodiments of the present description may be carried out entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the embodiments of the present specification may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for operation of various portions of the embodiments of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, VisualBasic, Fortran2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
In addition, unless explicitly stated in the claims, the order of processing elements and sequences, use of numbers and letters, or use of other names in the embodiments of the present specification are not intended to limit the order of the processes and methods in the embodiments of the present specification. While various presently contemplated embodiments have been discussed in the foregoing disclosure by way of example, it should be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing processing device or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more embodiments. This method of disclosure, however, is not intended to imply that more features are required than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.
For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are possible within the scope of the embodiments of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims (13)

1. An information processing method comprising:
acquiring an original keyword;
obtaining a keyword set according to the original keywords, wherein the keyword set comprises at least one keyword including the original keywords;
retrieving texts corresponding to one or more keywords from a preset text library based on the keywords in the keyword set;
processing texts corresponding to one or more keywords by using a machine learning model to obtain interest components of the one or more keywords;
and counting the occurrence frequency of each interest component in the interest components of the one or more keywords, and determining the interest components of the original keywords according to the counting result.
2. The information processing method of claim 1, wherein the obtaining a set of keywords from the original keywords comprises:
determining at least one synonym of the original keyword;
and constructing the keyword set according to the original keywords and the at least one synonym.
3. The information processing method according to claim 1 or 2, wherein the obtaining a keyword set from the original keyword comprises:
acquiring a plurality of candidate words;
determining a word vector of the original keyword and each candidate word;
determining the similarity between each candidate word and the original keyword according to the original keyword and the word vector of each candidate word;
selecting at least one candidate word with the similarity meeting a set condition with the original keyword from the candidate words;
and constructing the keyword set according to the original keyword and the selected at least one candidate word.
4. The information processing method of claim 1, the machine learning model being one of:
long and short term memory and conditional random field models;
a conditional random field model; or
Hidden markov models.
5. The information processing method according to claim 1, wherein the interest component includes a benefit component indicating that its corresponding keyword carries benefit information.
6. The information processing method of claim 1, wherein the determining the interest component of the original keyword according to the statistical result comprises:
and determining the interest component with the largest frequency in the interest components of the one or more keywords as the interest component of the original keyword.
7. An information processing system comprising:
the original keyword acquisition module is used for acquiring an original keyword;
a keyword set obtaining module, configured to obtain a keyword set according to the original keyword, where the keyword set includes at least one keyword including the original keyword;
the text retrieval module is used for retrieving texts corresponding to one or more keywords from a preset text library based on the keywords in the keyword set;
the keyword interest component acquisition module is used for processing texts corresponding to one or more keywords by using a machine learning model to obtain interest components of the one or more keywords;
and the keyword interest component determining module is used for counting the occurrence frequency of each interest component in the interest components of the one or more keywords and determining the interest components of the original keywords according to the counting result.
8. The information processing system of claim 7, the keyword set acquisition module comprising:
a synonym determining unit, configured to determine at least one synonym of the original keyword;
and the keyword set constructing unit is used for constructing the keyword set according to the original keywords and the at least one synonym.
9. The information processing system according to claim 7 or 8, the keyword set acquisition module comprising:
the candidate word acquisition unit is used for acquiring a plurality of candidate words;
the word vector determining unit is used for determining the word vectors of the original keywords and each candidate word;
the similarity determining unit is used for determining the similarity between each candidate word and the original keyword according to the original keyword and the word vector of each candidate word;
the candidate word screening unit is used for selecting at least one candidate word of which the similarity with the original keyword meets a set condition from the plurality of candidate words;
and the keyword set constructing unit is used for constructing the keyword set according to the original keyword and the selected at least one candidate word.
10. The information handling system of claim 7, the machine learning model being one of:
long and short term memory and conditional random field models;
a conditional random field model; or
Hidden markov models.
11. The information handling system of claim 7, the interest component comprising a benefit component indicating that its corresponding keyword carries benefit information.
12. The information processing system of claim 7, the keyword interest component determination module further to:
and determining the interest component with the largest frequency in the interest components of the one or more keywords as the interest component of the original keyword.
13. An information processing apparatus, the apparatus comprising at least one processor and at least one memory;
the at least one memory is for storing computer instructions;
the at least one processor is configured to execute at least a part of the computer instructions to implement the information processing method according to any one of claims 1 to 6.
CN201910976749.2A 2019-10-15 2019-10-15 Information processing method and system Pending CN110717029A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910976749.2A CN110717029A (en) 2019-10-15 2019-10-15 Information processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910976749.2A CN110717029A (en) 2019-10-15 2019-10-15 Information processing method and system

Publications (1)

Publication Number Publication Date
CN110717029A true CN110717029A (en) 2020-01-21

Family

ID=69211662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910976749.2A Pending CN110717029A (en) 2019-10-15 2019-10-15 Information processing method and system

Country Status (1)

Country Link
CN (1) CN110717029A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883725A (en) * 2020-12-29 2021-06-01 上海讯飞瑞元信息技术有限公司 File generation method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326386A (en) * 2016-08-16 2017-01-11 百度在线网络技术(北京)有限公司 Search result displaying method and device
CN109388742A (en) * 2017-08-09 2019-02-26 阿里巴巴集团控股有限公司 A kind of searching method, search server and search system
CN110019888A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of searching method and device
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system
CN110309114A (en) * 2018-02-28 2019-10-08 腾讯科技(深圳)有限公司 Processing method, device, storage medium and the electronic device of media information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326386A (en) * 2016-08-16 2017-01-11 百度在线网络技术(北京)有限公司 Search result displaying method and device
CN109388742A (en) * 2017-08-09 2019-02-26 阿里巴巴集团控股有限公司 A kind of searching method, search server and search system
CN110019888A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of searching method and device
CN110309114A (en) * 2018-02-28 2019-10-08 腾讯科技(深圳)有限公司 Processing method, device, storage medium and the electronic device of media information
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883725A (en) * 2020-12-29 2021-06-01 上海讯飞瑞元信息技术有限公司 File generation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
AU2019263758B2 (en) Systems and methods for generating a contextually and conversationally correct response to a query
CN106649818B (en) Application search intention identification method and device, application search method and server
CN105989040B (en) Intelligent question and answer method, device and system
CN112035653B (en) Policy key information extraction method and device, storage medium and electronic equipment
US9286290B2 (en) Producing insight information from tables using natural language processing
CN109815487B (en) Text quality inspection method, electronic device, computer equipment and storage medium
JP5356197B2 (en) Word semantic relation extraction device
US8073877B2 (en) Scalable semi-structured named entity detection
CN110263248B (en) Information pushing method, device, storage medium and server
US8452772B1 (en) Methods, systems, and articles of manufacture for addressing popular topics in a socials sphere
CN112035730B (en) Semantic retrieval method and device and electronic equipment
US20180181544A1 (en) Systems for Automatically Extracting Job Skills from an Electronic Document
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
CN112818093A (en) Evidence document retrieval method, system and storage medium based on semantic matching
CN112036184A (en) Entity identification method, device, computer device and storage medium based on BilSTM network model and CRF model
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN112805715A (en) Identifying entity attribute relationships
CN109472022A (en) New word identification method and terminal device based on machine learning
CN107844531B (en) Answer output method and device and computer equipment
CN111199151A (en) Data processing method and data processing device
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN107291686B (en) Method and system for identifying emotion identification
CN111708870A (en) Deep neural network-based question answering method and device and storage medium
CN110717029A (en) Information processing method and system
CN116976321A (en) Text processing method, apparatus, computer device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200121