WO2019080910A1 - 一种信息处理系统及其实现信息处理的方法 - Google Patents

一种信息处理系统及其实现信息处理的方法

Info

Publication number
WO2019080910A1
WO2019080910A1 PCT/CN2018/111962 CN2018111962W WO2019080910A1 WO 2019080910 A1 WO2019080910 A1 WO 2019080910A1 CN 2018111962 W CN2018111962 W CN 2018111962W WO 2019080910 A1 WO2019080910 A1 WO 2019080910A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
information
answer
question
information processing
Prior art date
Application number
PCT/CN2018/111962
Other languages
English (en)
French (fr)
Inventor
陆艳
黄震江
高洪
刘勇
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2019080910A1 publication Critical patent/WO2019080910A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the present application relates to, but is not limited to, computer technology, for example, to an information processing system and a method thereof for implementing information processing.
  • the intelligent question answering system After receiving the user's question, the intelligent question answering system obtains the matching or recommended answer from the intelligent question answering database and returns it to the user by performing format standardization, semantic analysis, problem retrieval, similarity calculation and the like on the user problem.
  • the data source of the intelligent question and answer database is obtained through manual addition, user questioning and online learning.
  • learning through the network refers to a generalized network environment, and does not include corpus data in a social network environment related to the user's individual. It can be seen that the corpus in the intelligent question-and-answer database is very broad, and the personalized and customized content for the user itself is very small, and there is basically no correlation with the circle of the user's life. Therefore, the same problem raised by different users.
  • the smart question answering system will give the same standard answer. Although such an answer is correct from a question-and-answer point of view, it is not specific enough for the user who asks the question and is not close to the user's needs. Thus, the answer to the user's answer finally returned to the user is very large. It is not the information that the user wants most, that is to say, the current intelligent question answering system cannot provide recommendation information to the user well.
  • the present application provides an information processing system and a method for implementing information processing thereof, which can provide recommendation information for a user.
  • the present application provides an information processing system, including: a data collection unit, a learning unit, a first storage unit, a transceiver unit, and an information processing unit; wherein the data collection unit is configured to collect an association associated with a user identification ID. Information; the learning unit is configured to process the collected information to form data information based on the user ID and stored in the first storage unit; the transceiver unit is configured to receive a question raised by the user; Returning the answer to the question posed by the user to the user; the information processing unit is configured to perform the problem raised by the obtained user according to the user ID-based data information stored in the first storage unit Pre-processing to get an answer to the question posed by the user.
  • the information associated with the user identification ID is from at least one of a social network and a social platform; the social network is at least one, and the social platform is at least one.
  • the present application further provides a method for implementing information processing, including: collecting information associated with a user identification ID; processing the collected information to form data information based on the user ID and storing; obtaining a question raised by the user, according to The stored user ID-based data information is pre-processed by the obtained user's question, and an answer to the question raised by the user is obtained.
  • the information associated with the user identification ID is from at least one of a social network and a social platform; the number of social networks is at least one, and the number of social platforms is at least one.
  • the processing the collected information to form the user ID-based data information comprises: generating a temporary file according to the collected information; and each time the temporary file is generated, Temporary files are annotated and the annotated temporary file information is saved in the temporary element table.
  • the method further includes: periodically reading the existing corpus information; comparing the data in the temporary element table with the read existing corpus information, and storing the read existing Temporary elements that do not exist in the corpus information.
  • the method further includes: preprocessing the problem raised by the user to obtain an answer to be processed, and according to the obtained related information of the to-be-processed answer, the information associated with the user identification ID. The similarity comparison is performed, and the answer with the highest similarity is used as the answer to the question asked by the user.
  • the present application further provides a computer readable storage medium storing computer executable instructions arranged to perform any of the methods of implementing information processing described above.
  • the present application further provides an apparatus for implementing information processing, comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program configured to perform a method comprising the steps of: collecting and identifying the user ID-related information; processing the collected information to form data information based on the user ID and storing; obtaining a question raised by the user, and performing the problem raised by the obtained user according to the stored data information based on the user ID Pre-processing to get an answer to the question asked by the user.
  • the application further provides an information processing apparatus, a generating module, an annotation module, and a temporary element table; wherein the generating module is configured to generate a temporary file according to the collected data; and an annotation module, configured to generate each generated in the generating module When a temporary file is used, the temporary file is marked, and the marked temporary file information is saved in the temporary element table.
  • the learning unit further includes: an obtaining module, a comparison module; wherein the acquiring module is configured to periodically read corpus information from the first storage unit; the comparison module is configured to The data in the temporary element table is compared with the corpus information obtained by the obtaining module, and the temporary element that does not exist in the first storage unit is stored in the second storage unit.
  • the present application further provides a method for implementing information processing, including: generating a temporary file according to the collected information; each time the temporary file is generated, the temporary file is marked and the marked temporary file information is saved in the In the temporary element table.
  • the method further includes: periodically reading the existing corpus information; comparing the data in the temporary element table with the read existing corpus information, and storing the read existing corpus information Temporary elements that do not exist.
  • An embodiment of the present application provides a computer readable storage medium storing computer executable instructions, the computer program being configured to perform any of the foregoing methods for implementing information processing.
  • FIG. 1 is a schematic structural diagram of an information processing system of the present application.
  • FIG. 2 is a schematic structural diagram of a learning unit in an information processing system of the present application.
  • FIG. 3 is a flowchart of a method for implementing information processing according to the present application.
  • FIG. 4 is a schematic diagram of a networking architecture in the first embodiment of the present application.
  • FIG. 5 is a schematic flowchart of implementing information processing in the first embodiment of the present application.
  • FIG. 6 is a schematic diagram of a networking architecture in a second embodiment of the present application.
  • FIG. 7 is a schematic flowchart of implementing information processing in a second embodiment of the present application.
  • FIG. 8 is a schematic diagram of a networking architecture in a third embodiment of the present application.
  • FIG. 9 is a schematic flowchart of implementing information processing in a third embodiment of the present application.
  • FIG. 10 is a schematic diagram of a networking architecture in a fourth embodiment of the present application.
  • FIG. 11 is a schematic flowchart of implementing information processing in a fourth embodiment of the present application.
  • the smart question answering system can be based on the current location of the user. Recommend nearby restaurants.
  • the smart question answering system can recommend nearby restaurants according to the user's current location. The recommended restaurants are usually given in order of highest to lowest reviews. Since the intelligent question answering system does not know the tastes of the user's personal preferences, it is only possible to recommend restaurants in accordance with the pre-set rules, such as according to the rating, and recommend restaurants that meet the user's taste.
  • social networks emphasize an open network environment.
  • the relationship between members in a social network is one-way, including attention and attention.
  • Information published by each member of the social network can be seen by strangers, and fans can be set to passively receive newly posted content of the attention object.
  • Common social networks such as Weibo or Twitter.
  • the social platform emphasizes a relatively closed network environment.
  • the relationship between the members of the social platform is two-way. Once the members are added as friends, the information released by both parties can be seen from each other.
  • Common social platforms such as WeChat or Facebook.
  • the information processing system at least includes: a data collection unit, a learning unit, a first storage unit, a transceiver unit, and an information processing unit.
  • the data collection unit in the information processing system of the present application is configured to collect information associated with a user identity (IDentity, ID), such as information from at least one of a social network and a social platform.
  • IDentity identity
  • the data collection unit in the information processing system of the present application is configured to collect information associated with a user identity (IDentity, ID), such as information from at least one of a social network and a social platform.
  • the data collection unit in the information processing system of the present application may collect user identification (ID) based data from at least one of the social network and the social platform through at least one open interface of the third party social network and the social platform.
  • ID user identification
  • the data includes but is not limited to: personal information registered by the user, any combination of my original friend circle information, post, picture, audio, video, and forwarded posts, user friend or follower or fan original friend circle information, post, Any combination of pictures, audio, video, and forwarded posts, any combination of posts, pictures, audio, and videos that exist in the group, community, and public account of the user or friend or follower or fan.
  • static data includes but is not limited to: user's gender, occupation, work city, and graduate school.
  • Dynamic data includes, but is not limited to, answering a question at a time, giving an answer, posting an article, stopping on an answer page, and how long comments have been used; deeper dynamic data It may include content-based articles such as articles, speeches, and the like, and data that expresses the user's thoughts.
  • the corpus database provided in the related art does not involve data related to the user ID.
  • the data based on the user ID is collected, and the user's own information and ideas can be analyzed, so that the following can be matched.
  • the answer to the user's personal needs is provided.
  • the social network in the present application may be at least one social network that has an interface open to the information processing system of the present invention.
  • the social platform in the present application may be at least one social platform that has an interface to the information processing system of the present invention; the social network in the present application
  • the information may be obtained by the information processing system of the present application by using at least one of a social network and a social platform.
  • the social platform in the present application may be obtained by the information processing system of the present application through web crawling or searching. Information from social platforms.
  • the interface IF1 can be exchanged to obtain data related to the user ID.
  • the learning unit in the information processing system of the present application is configured to process the collected information to form data information based on the user ID, and convert it into a format required by the information processing system, and store it in the first storage unit.
  • processing the collected information may include, but is not limited to, keyword extraction, domain classification, similarity calculation, and machine learning algorithm processing such as normalization to form data information based on the user ID.
  • the method includes at least: a generating module, a labeling module, and a temporary element table.
  • a generation module is set to generate a temporary file based on data from the data collection unit.
  • the temporary file may be generated according to a preset rule.
  • the data collection unit collects data through the interface, and the returned response message includes information such as the user name, the publication time, and the published content, and the information is formed into a file, and a rule for generating the file name needs to be formulated. For example, "username + timestamp", the content of the file is published one line per line, the carriage return indicates another line, or the rules can be separated by various punctuation marks. These rules are set before the temporary file is generated.
  • the labeling module is configured to mark the temporary file every time the generating module generates a temporary file, and save the marked temporary file information in the temporary element table.
  • temporary files are marked to form temporary file information such as words, phrases, and sentences, and the temporary files stored in the temporary element table refer to these elements.
  • the data automatic annotation tool may be used, such as a model obtained by training according to historical corpus annotation records, and the temporary text is marked in combination with manual review.
  • the automatic data annotation tool is mainly configured to automatically mark various types of data, such as text, pictures, and video, and the specific implementation is not used to limit the scope of protection of the present application.
  • the accuracy of the automatic annotation tool is determined by the integrity of the relevant data training set and the algorithm model.
  • the learning unit in the information processing system of the present application further includes: an obtaining module, and a matching module.
  • the obtaining module is set to periodically read the various corpus information from the information processing system of the present application, such as the second storage unit, such as Frequently Asked Questions (FAQ), slang, equivalent sentences, etc. .
  • FAQ Frequently Asked Questions
  • slang slang
  • equivalent sentences etc.
  • the comparison module is configured to compare the data in the temporary element table with the corpus information in the existing database obtained by the obtaining module, and store the temporary elements that do not exist in the existing database into the first storage unit through the management portal.
  • the administrator can review the temporary elements through the management portal, and the temporary elements after the approval are written into the first storage unit.
  • the transceiver unit in the information processing system of the present application is configured to receive a question raised by a user by, for example, a client/World Wide Web (WEB) webpage/SMS/MMS/Interactive Voice Response (IVR) method; The resulting answer to the question posed by the user is returned to the user.
  • WEB World Wide Web
  • IVR Interactive Voice Response
  • it can be connected to various third-party applications (Applications, APPs) or WeChat or websites, or can be connected with the SMS/MMS/Voice Center of the operator to obtain the questions raised by the users.
  • third-party applications Applications, APPs
  • WeChat or websites, or can be connected with the SMS/MMS/Voice Center of the operator to obtain the questions raised by the users.
  • the answer corresponding to the question posed by the user can be obtained through the interface IF2 between the information processing unit and the first storage unit in the information processing system of the present application.
  • the information processing unit in the information processing system of the present application is configured to perform pre-processing such as sensitive word filtering and standardization on the obtained user's question according to the user ID-based data information stored in the first storage unit, and obtain a pre-processing for the user. The answer to the question.
  • pre-processing such as sensitive word filtering and standardization
  • the information processing unit is further configured to perform similarity comparison according to the obtained related information of the answer, such as content, source channel, authority, and weight, to obtain the highest score and satisfy the threshold set by the information processing system of the present application.
  • the answer serves as an answer to the question asked by the user.
  • the emphasis here is that factors such as source channels, permissions, weights, etc. are taken into account in the calculation of similarity, and how to achieve the similarity calculation can be implemented by various related technologies, and is not used to limit the protection of the present application. range.
  • the information processing system of the present application further includes a synchronization unit configured to synchronize a question posed by the user who needs to synchronize to the first storage unit, and then invoked by the information processing unit and at least one of the social network and the social platform.
  • the interface is posted to at least one of a social network and a social platform. If the user chooses to issue the question raised by the information processing system of the present application to at least one of the social network and the social platform at the same time, if the information processing system of the present application does not give the most satisfactory answer to the question raised, the user may also obtain A scheme given by at least one of friends and relatives in social networks and social platforms.
  • the solution given by at least one of the social network and the social platform may also be learned by the information processing system of the present application and stored in the first storage unit to supplement and improve the subsequent answer query. .
  • the problem raised by the user can be synchronized between the synchronization unit and the first storage unit in the information processing system of the present application through the interface IF3.
  • the problem is to recommend a nearby good restaurant or recommend a restaurant that suits my taste.
  • the information processing system of the present application analyzes that the user likes Cantonese cuisine according to the post published by the user on the social network or the personal taste information exchanged on the social platform and friends and relatives, then the information processing system of the present application will combine the current user's current Location information, giving a Cantonese restaurant that is closest to the user and has a higher score.
  • the question includes: How to get to Nanjing South Railway Station from my home.
  • the information processing system of the present application will give a post from the user's home to Nanjing based on the post information posted by the user in the morning or evening of the social network or the home address and cell information exchanged on the social platform and friends and relatives. Map of the route of the South Station.
  • the information processing system of the present application further includes: a management unit configured to configure a timing task, and trigger the data collection unit to collect data from at least one of the social network and the social platform according to the timing task timing.
  • the system administrator can configure the scheduled task through the management portal.
  • the management unit in addition to managing the corpus in the general question and answer database, is further configured to: manage and maintain the user ID-based data information stored in the first storage unit, for example, perform at least one of the following management and maintenance: setting the user based The weight of the ID data information, set the permissions when the user queries the data information based on the user ID, and guarantee the privacy of the user.
  • the system administrator can manage and maintain data based on the user ID through the management portal.
  • the management unit is further configured to: perform an addition, deletion, and redirection operation on the user ID-based data information in the first storage unit.
  • the system administrator may perform an addition, deletion, and redirection operation on the user ID-based data information through the management portal.
  • the system administrator may also perform permission setting on different types of user ID-based data information through the management portal, and specify different access rights of the user ID-based data information, such as public, friends, or only themselves. View and so on.
  • weights may also be set for user ID-based data information of different sources and types.
  • the plurality of answers can be sorted in combination with the weights, thereby obtaining the answer with the highest score and satisfying the threshold set in advance by the information processing system of the present application.
  • the information processing system of the present application further includes: a second storage unit configured to store the existing corpus information.
  • the information processing unit is configured to combine the user ID-based data information stored in the first storage unit and the existing corpus information stored in the second storage unit to perform pre-processing such as sensitive word filtering and standardization on the obtained user's question. Get answers to the questions asked by the user.
  • the second storage unit and the first storage unit may be the same database in implementation.
  • FIG. 3 is a flowchart of a method for implementing information processing according to the present application. As shown in FIG. 3, the method includes steps 300 to 302.
  • step 300 information associated with the user identification ID, such as user identification (ID) based data from at least one of a social network and a social platform, is collected.
  • ID user identification
  • the collection may acquire data related to the user ID by using the first data request message and the first data response message, where the first data request message includes but is not limited to the following fields shown in Table 1:
  • the first data response message includes but is not limited to the following fields shown in Table 2:
  • the content of the content in the first data response message in Table 2 includes the following parameters as shown in Table 3:
  • step 301 the collected data is processed to form data information based on the user ID and stored.
  • the user ID-based data information may be formed by processing the machine learning algorithm and converted into a pre-required format, and the obtained user ID-based data information may be stored.
  • the machine learning algorithm may include, but is not limited to, keyword extraction, domain classification, similarity calculation, and machine learning algorithm processing such as normalization to form data information based on the user ID.
  • the user ID-based data information stored in this step is stored in the first storage unit shown in FIG. 1, and includes fields as shown in Table 5:
  • step 301 includes: generating a temporary file according to a preset rule; marking each temporary file when each temporary file is generated, and saving the marked temporary file information in the temporary In the element table.
  • temporary file information such as words, phrases, and sentences are formed, and the labeled temporary files stored in the temporary element table refer to these elements.
  • the data collection unit collects data through the interface, and the returned response message includes information such as a user name, a publication time, and a published content, and the information is formed into a file, and a rule for generating a file name needs to be formulated. For example, "username + timestamp", the content of the file is published one line per line, the carriage return indicates another line, or the rules can be separated by various punctuation marks. These rules are set before the temporary file is generated.
  • the data automatic annotation tool may be used, such as a model obtained by training according to historical corpus annotation records, and the temporary text is marked in combination with manual review.
  • the automatic data annotation tool is mainly configured to automatically mark various types of data, such as text, pictures, and video, and the specific implementation is not used to limit the scope of protection of the present application.
  • the accuracy of the automatic annotation tool is determined by the integrity of the relevant data training set and the algorithm model.
  • the method further includes: periodically reading, according to the second storage unit shown in FIG. 1 from the existing database, various corpus information such as a FAQ, a slang, and an equivalent sentence; and the data in the temporary element table; Compares the corpus information in the existing database obtained by the acquisition module, and stores the temporary elements that do not exist in the existing database.
  • step 302 the question raised by the user is obtained, and the obtained question raised by the user is preprocessed according to the stored data information based on the user ID, and an answer to the question raised by the user is obtained.
  • the user's question can be received through, for example, a client/World Wide Web (WEB) webpage/SMS/MMS/Interactive Voice Response (IVR) method.
  • WEB World Wide Web
  • IVR Interactive Voice Response
  • the answer corresponding to the question posed by the user may be obtained through the second data request message and the second data response message, wherein the second data request message includes but is not limited to the following fields shown in Table 6:
  • the second data response message includes but is not limited to the following fields shown in Table 7:
  • the pre-processing of the obtained user-provided problem in this step includes, but is not limited to, performing pre-processing such as sensitive word filtering, standardization, and the like on the obtained user's question.
  • the step further includes: preprocessing the problem raised by the user to obtain an answer to be processed, and comparing the similarity of the obtained information of the to-be-processed answer, such as content, source channel, authority, and weight, to obtain the highest score and satisfy the advance.
  • the threshold for the answer and as the answer to the user who returned the question is that factors such as source channels, permissions, and weights are taken into account in the process of calculating similarity. How to achieve similarity calculation can be implemented by various related technologies, and is not used to limit the protection of this application. range.
  • the method shown in FIG. 3 of the present application further includes: synchronizing a question raised by a user who needs to be synchronized, and publishing to at least one of a social network and a social platform.
  • the solution given by at least one of the social network and the social platform to the problem of synchronization can also be learned and stored in the first storage unit as shown in FIG. 1 as a subsequent answer.
  • the query provides supplements and improvements.
  • the user-suggested problem can be synchronized by the third data request message and the third data response message, wherein the third data request message includes, but is not limited to, the following fields shown in Table 9:
  • the third data response message includes but is not limited to the following fields shown in Table 10:
  • the method shown in FIG. 3 of the present application further includes: configuring a timing task to trigger collection of data from at least one of a social network and a social platform according to the timing task timing.
  • the system administrator can configure the scheduled task through the management portal.
  • the method further includes: managing and maintaining the stored user ID-based data information. For example, at least one of the following management and maintenance can be performed: setting the weight of the data information based on the user ID, setting the authority when the user queries the data information based on the user ID, and ensuring the privacy of the user.
  • the system administrator can manage and maintain data based on the user ID through the management portal.
  • the method further includes: performing a addition, deletion, and redirection operation on the stored user ID-based data information.
  • the system administrator can add, delete, and modify data information based on the user ID through the management portal.
  • the system administrator may also perform permission setting on different types of user ID-based data information through the management portal, and specify different access rights of the user ID-based data information, such as public, friends, or only themselves. View and so on.
  • weights may also be set for user ID-based data information of different sources and types.
  • the multiple answers can be sorted in combination with the weights, thereby obtaining the answer with the highest score and satisfying the threshold set in advance by the information processing system of the present application.
  • the method shown in FIG. 3 of the present application further includes: combining the stored user ID-based data information and the stored existing corpus information, and performing pre-processing such as sensitive word filtering and standardization on the obtained user's questions, and obtaining the user for the user. The answer to the question asked.
  • the application further provides a computer readable storage medium storing computer executable instructions arranged to perform any of the methods of implementing information processing of the present application.
  • the present application also provides an apparatus for implementing information processing, including a processor, a memory, and a computer program stored on the memory and operable on the processor: collecting information associated with the user identification ID, such as from a social network and a social platform. At least one type of user identification (ID) based information; processing the collected information to form a user ID based data information and storing; obtaining a user-submitted question, based on the stored user ID-based data information, obtaining the obtained The user's question is pre-processed to get an answer to the question asked by the user.
  • ID user identification
  • the information processing system is a corpus management and intelligent question answering system independent of the social network, and the corpus is obtained from a social network through an open interface.
  • FIG. 4 is a schematic diagram of a networking architecture in the first embodiment of the present application.
  • the corpus management module includes at least the management unit, the data collection unit, the learning unit, and the first storage unit in FIG. 1;
  • the logic processing module includes at least the transceiver unit and the information processing unit in FIG.
  • FIG. 5 is a schematic flowchart of implementing information processing in the first embodiment of the present application. As shown in FIG. 5, the method includes steps 500 to 515.
  • the corpus management module periodically invokes an open interface of the social network according to the scheduled task configured by the administrator, and initiates a user corpus request to the social network.
  • the user corpus is requested to obtain the user's personal information, the content posted by the user/follower/fan, and the content of the comment, etc., as the corpus material of the system.
  • the corpus management module invokes the corpus query interface IF1 to query user data from the social network.
  • the corpus management module sends the first data request message as shown in Table 11 to the social network:
  • the social network After receiving the first data request message shown in Table 11, the social network returns a first data response message in the format shown in Table 12 to the corpus management module:
  • the corpus management module writes the obtained corpus material into a social corpus temporary file in a certain format, and automatically performs corpus annotation every time a temporary file is completed, and the marked data is written into the temporary element table.
  • the timed task that the administrator has set reads the corpus from the existing corpus and compares it with the corpus in the temporary element table. If it is a new corpus that does not exist in the existing corpus, the first one of the application is written.
  • the storage unit is in the social corpus database.
  • new corpus test rules preset by the administrator, new corpus that meets the rules and scores are automatically saved in the social corpus database, or uploaded by the administrator and stored in the database.
  • the saved data carries attributes such as channels, permissions, weights, and generation time of the corpus. All social corpora associated with the same user ID constitutes the user's personal corpus data set.
  • the new corpus test rule may include, but is not limited to, selecting a new word as a keyword, selecting 100 user questions for each new word, and testing the influence of the new word on the accuracy of the question and answer. If the accuracy rate meets the required score, the new word is added to the social corpus database.
  • the geographical location keywords analyzed by the corpus management module are “Guangzhou City”, “Yuexiu District”, “North Garden Restaurant”, and “Walk through so many places, the favorite is still
  • the keywords "early tea” and “gut powder” in the home tea and intestine powder can be used to associate and reason the keywords "Cantonese cuisine” and "restaurant”.
  • the realization of association and push can be achieved in the system.
  • Presetting the inclusion relationship between the entity word and the superordinate word that is, the entity word belongs to a subclass of the superordinate word, so that the parent word or the intestinal powder can be found according to the entity word, such as the morning tea and the intestinal powder in the embodiment, that is, the upper word is Cantonese cuisine or restaurant.
  • the keywords are not present in the existing corpus after the comparison, and the new keywords meet the new corpus test rules set by the administrator. Therefore, the keywords are written into the social corpus database.
  • the keyword attributes are marked, for example, the channel is “microblogging”, the permission is “open”, the weight is “70%”, and the generating time is “20170603000125”, wherein the weight can be manually set by the administrator based on experience, historical test data, etc. .
  • step 504 to step 506 the user opens the interactive webpage of the information processing system provided by the application, such as the question and answer page, and the user can log in with the personal account of several social networks that have been docked by the information processing system of the present application, thereby obtaining the user ID according to the user ID.
  • User's personal information and corpus data users can also register their accounts through the Q&A page and bind several personal accounts of the social network that have been connected to the information processing system of this application.
  • step 507 the user asks a question on the Q&A page.
  • the corpus of the general channel that is, the existing data information, that is, the second storage unit in the present application
  • the scope of the search such as the corpus of the general channel, or the corpus of a certain social network, or the corpus of all channels.
  • the question is: Is the restaurant recommended for my taste?
  • the specified search scope is: the corpus of the general channel + the corpus of the microblog channel.
  • step 508 the Q&A page initiates a query answer request to the logic processing module.
  • step 509 the logic processing module normalizes the problem to remove special symbols, filter out sensitive words, and the like.
  • Sensitive words usually refer to words that involve yellow gambling.
  • the logic processing module calls the interface IF2 of the corpus module and the corpus module to issue a query answer request to the corpus module; after the equivalence sentence query, the synonym replacement and After processing the keywords, querying the FAQ, etc., it is assumed that several answers are queried in this embodiment:
  • step 512 the corpus module returns the queried answer to each of the questions to the logic module.
  • step 513 the logic processing module performs similarity calculation on all the obtained answers to be processed, and combines the parameters of the channels, permissions, weights, etc., and assumes that the highest score answer is “locally recommended Cantonese restaurant has xxx , yyy, zzz.".
  • step 514 the logic processing module returns the answer "locally recommended Cantonese restaurant xxx, yyy, zzz" to the question and answer page.
  • step 515 the Q&A page presents the results to the user.
  • the method further includes: performing similarity comparison with the information associated with the user identification ID according to the information about the problem raised by the user after the pre-processing, to obtain the highest similarity.
  • the question is to query the answer with the highest similarity question, and the answer of the most similar question is used as the answer to the question asked by the user.
  • the functions of the logic processing module and the corpus module may be adaptively adjusted according to the descriptions in steps 510 to 513, and details are not described herein.
  • the question and answer page is integrated in a social platform, and the corpus is obtained from the social platform interface.
  • FIG. 6 is a schematic diagram of a networking architecture in a second embodiment of the present application.
  • the corpus management module includes at least the management unit, the data collection unit, the learning unit, and the first storage unit in FIG. 1;
  • the logic processing module includes at least the transceiver unit and the information processing unit in FIG.
  • FIG. 7 is a schematic flowchart of implementing information processing in the second embodiment of the present application. As shown in FIG. 7, the method includes steps 700 to 717.
  • step 700 the corpus management module periodically synchronizes user data from the social platform database.
  • the user personal information of the user and the content posted by at least one of the user and the friend, the content of the comment, or the chat content are obtained as the corpus material according to the user ID.
  • the corpus management module self-learns the acquired corpus material according to the machine learning algorithm, and automatically triggers the corpus test according to the rules preset by the administrator, and the corpus of the score answering object is automatically saved in the corpus management module, or It is saved by the administrator and saved to the corpus management module.
  • the saved data carries the channel attributes of the corpus.
  • steps 703 to 707 the user logs in to the social platform.
  • the method generally includes: the user logs in to the portal of the social platform, the portal queries the user information in the database, authenticates the user, and then returns the authentication result to the user in the login response.
  • step 708 the user logs in to the Q&A page integrated in the social platform and asks questions through the Q&A page.
  • step 709 the Q&A page will initiate a query answer request to the logic processing module.
  • step 710 the logic processing module performs normalization processing on the problem, pre-processing of sensitive word filtering, and the like.
  • step 711 the logic processing module initiates a query answer request to the corpus management module via interface IF2.
  • the corpus management module queries the database to obtain an answer to the question.
  • step 714 the corpus management module returns the query results to the logical processing module via interface IF2.
  • step 715 the logic processing module performs similarity calculation on all the answers to be processed, and combines the parameters of the channels, permissions, weights and the like to obtain the highest score answer.
  • step 716 the logic processing module returns the results to the question and answer page.
  • step 717 the Q&A page presents the results to the user.
  • the method further includes: performing similarity comparison with the information associated with the user identification ID according to the information about the problem raised by the user after the pre-processing, to obtain the highest similarity.
  • the question is to query the answer with the highest similarity question, and the answer of the most similar question is used as the answer to the question asked by the user.
  • the functions of the logic processing module and the corpus module may be adaptively adjusted according to the descriptions in steps 711 to 715, and details are not described herein.
  • FIG. 8 is a schematic diagram of a networking architecture in a third embodiment of the present application.
  • the corpus management module includes at least the management unit, the data collection unit, the learning unit, and the first storage unit in FIG. 1;
  • the logic processing module includes at least the transceiver unit and the information processing unit in FIG.
  • FIG. 9 is a schematic flowchart of implementing information processing in the third embodiment of the present application. As shown in FIG. 9, the method includes steps 900 to 924.
  • step 900 the user opens the question and answer webpage of the information processing system of the present application, and allows the user to log in with the personal account of several social networks that have been docked by the information processing system of the present application, thereby obtaining the user personal information and corpus of the user according to the user ID. data.
  • step 901 the Q&A page authenticates the account information.
  • step 902 the question and answer page returns the login result to the user via the login response.
  • step 903 the user asks a question through the question and answer page.
  • the question and answer page can be given a multiple-choice list at the bottom of the question and answer page.
  • the user can use the method of checking to determine whether to synchronize the problem to the social network.
  • the multiple selection list will not be displayed.
  • step 904 the Q&A page saves the synchronization options for the questions the user needs to synchronize.
  • step 905 the question and answer page initiates a query answer request to the logic processing module.
  • step 906 the logic processing module performs normalization processing on the problem, pre-processing of sensitive word filtering, and the like.
  • step 907 the logic processing module invokes the interface IF2 to initiate a query answer request to the corpus management module.
  • step 908 the corpus management module queries the corpus database for answers to questions.
  • step 909 the corpus management module carries the answer (list) of the queried question in the query answer response back to the logical processing module.
  • step 910 the logic processing module performs similarity calculation on all the answers to be processed, and combines the channels, permissions, weights, etc. of the query to obtain the highest score answer.
  • step 911 the logic processing module returns the result in the query answer response back to the question and answer page.
  • step 912 the Q&A page presents the results to the user.
  • step 913 if the user selects to simultaneously issue the question to the social network, then the information processing system of the present application posts the question that the user wants to publish to the social network; the question and answer page synchronizes the problem to the logic processing module.
  • step 914 the logic processing module invokes interface IF3 to synchronize the problem to the corpus management module.
  • step 915 the corpus management module initiates a question posting request to the social network via a third party interface open to the social network.
  • step 916 the social network will post questions posed by the user.
  • step 917 the social network returns a posting result to the corpus management module via the question posting response.
  • the corpus management module returns the problem synchronization response to the logic processing module, and finally returns to the question and answer page.
  • step 920 after the user's friend sees the posted question, a comment or private message can be given.
  • the user can also get the answer given by the information processing system of the present application, and can also receive the comments given by friends or fans in his social circle.
  • the corpus management module of the information processing system of the present application periodically invokes a third-party interface that is open to the social network, synchronizes the user corpus, and also obtains the content of the comments as a corpus material.
  • step 922 the social network queries the user data.
  • step 923 the social network returns the user data to the corpus management module.
  • step 924 the corpus management module re-learns this portion of the content based on the machine learning algorithm and saves it to the corpus database.
  • the user's personal corpus data set can be continuously corrected.
  • the information processing system of the present application has a richer corpus as a reference, giving an answer that is closer to the user's needs and has higher accuracy.
  • the method further includes: performing similarity comparison with the information associated with the user identification ID according to the information about the problem raised by the user after the pre-processing, to obtain the highest similarity.
  • the question is to query the answer with the highest similarity question, and the answer of the most similar question is used as the answer to the question asked by the user.
  • the functions of the logic processing module and the corpus module may be adaptively adjusted according to the descriptions in steps 907 to 910, and details are not described herein.
  • the corpus management module is applied in the information processing system of the present application.
  • FIG. 10 is a schematic diagram of a networking architecture in a fourth embodiment of the present application.
  • the corpus management module includes at least the management unit, the data collection unit, the learning unit, and the first storage unit in FIG. 1;
  • the logic processing module includes at least the transceiver unit and the information processing unit in FIG.
  • FIG. 11 is a schematic flowchart of implementing information processing in the fourth embodiment of the present application. As shown in FIG. 11, the method includes steps 1100 to 1113.
  • step 1100 the user's friends interact with the user to post the posts posted by the user on the social network.
  • the corpus management module periodically issues a user data query request to the social network to obtain the text published by the user, and simultaneously obtains the comment content as a corpus material.
  • the social network returns the comment content of the user's text to the corpus management module.
  • the corpus management module learns the data according to the machine learning algorithm, analyzes new corpus such as keywords, questions, and corresponding answers of each user, and saves the attributes of the corpus such as permissions and weights into the corpus database.
  • step 1105 the user opens a portal website that is connected to the information processing system of the present application, and logs in through the personal accounts of several social networks that have been docked by the information processing system of the present application, thereby obtaining the user personal information of the user according to the user ID.
  • Corpus data
  • the user can also register a new account through the portal and bind several personal accounts of the social network that have been docked with the information processing system of the present application.
  • step 1106 the portal invokes the relevant interface query database of the information processing system of the present application to authenticate the user.
  • step 1107 if the authentication fails, the portal returns a login failure response to the user; if the authentication is successful, the portal initiates a query for the user hotspot vocabulary request to the logical processing module in the information processing system of the present application.
  • step 1108 the logic processing module initiates a query to the corpus management module for a query for the user hotspot vocabulary.
  • the corpus management module queries the corpus database, and after comprehensive scoring, the top keywords are used as the user hotspot vocabulary list according to the preset rules.
  • the corpus management module returns a list of user hotspot words to the logical processing module by querying the user hotspot vocabulary response.
  • step 1111 the logic processing module obtains the recommended content list after comprehensive processing according to the business needs.
  • the comprehensive processing of the business mainly refers to combining the business characteristics of the recommended website, such as a shopping website, and may filter the hot words in the previous step and the hot words related to daily life, goods, and shopping. For another example, if it is an App download site, it may filter games, entertainment-related hot words, and so on.
  • step 1112 the logic processing module returns a list of recommended content to the portal.
  • step 1113 the portal returns a login success response and presents the recommended content to the user.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请公开了一种信息处理系统及其实现信息处理的方法,包括:采集来自社交网络和社交平台中至少一种的基于用户标识(ID)的数据;对采集到的数据进行处理以形成基于用户ID的数据信息并存储;获取用户提出的问题,根据存储的基于用户ID的数据信息,对获得的用户提出的问题进行预处理,得到针对该用户提出的问题的答案。

Description

一种信息处理系统及其实现信息处理的方法
本申请要求在2017年10月25日提交中国专利局、申请号为201711010979.0的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及但不限于计算机技术,例如涉及一种信息处理系统及其实现信息处理的方法。
背景技术
目前,智能问答系统在接收到用户问题后,通过对用户问题进行格式标准化、语义分析、问题检索、相似度计算等处理后,从智能问答数据库中得到匹配或推荐的答案并返回给用户。其中,智能问答数据库的数据来源是通过人工添加、用户提问以及网络学习这几种渠道获取的。这里,通过网络学习是指广义的网络环境,并不包含与用户个人相关的社交网络环境中的语料数据。由此可见,智能问答数据库中的语料范围很宽泛,而针对用户本身的个性化、定制化的内容却很少,和用户生活的圈子也基本没有关联,因此,对于不同用户提出的同样的问题,智能问答系统就会给出同一个标准答案。这样的答案虽然从一问一答的角度来看是正确的,但是对于提出问题的用户来讲是不够具体的、不贴近用户需求的,这样,智能问答系统最终返回给用户的答案很大程度上并不是该用户最想获得的信息,也就是说,目前的智能问答系统不能很好地针对用户提供推荐信息。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请提供一种信息处理系统及其实现信息处理的方法,可以提供针对用户的推荐信息。
本申请提供了一种信息处理系统,包括:数据采集单元、学习单元、第一存储单元、收发单元,以及信息处理单元;其中,所述数据采集单元,设置为 采集与用户标识ID相关联的信息;所述学习单元,设置为对采集到的信息进行处理以形成基于用户ID的数据信息并存储在所述第一存储单元中;所述收发单元,设置为接收用户提出的问题;将得到的针对所述用户提出的问题的答案返回给该用户;所述信息处理单元,设置为根据所述第一存储单元中存储的所述基于用户ID的数据信息,对获得的用户提出的问题进行预处理,得到针对所述用户提出的问题的答案。
在一实施例中,所述与用户标识ID相关联的信息来自社交网络和社交平台中至少一种;所述社交网络至少为一个,所述社交平台至少为一个。
本申请还提供了一种实现信息处理的方法,包括:采集与用户标识ID相关联的信息;对采集到的信息进行处理以形成基于用户ID的数据信息并存储;获取用户提出的问题,根据存储的所述基于用户ID的数据信息,对获得的用户提出的问题进行预处理,得到针对该用户提出的问题的答案。
在一实施例中,所述与用户标识ID相关联的信息来自于社交网络和社交平台中至少一种;社交网络的数目为至少一个,社交平台的数目为至少一个。
在一实施例中,所述对采集到的信息进行处理以形成所述基于用户ID的数据信息包括:根据所述采集到的信息生成临时文件;每生成一个所述临时文件时,对所述临时文件进行标注,并将标注后的临时文件信息保存在临时元素表中。
在一实施例中,所述方法还包括:定时读取所述已有语料信息;比较所述临时元素表中的数据和读取的所述已有语料信息,存储读取的所述已有语料信息中不存在的临时元素。
在一实施例中,所述方法还包括:将所述用户提出的问题进行预处理之后得到待处理答案,根据得到的所述待处理答案的相关信息与所述与用户标识ID相关联的信息进行相似度比较,将相似度最高的答案作为所述针对用户提出的问题的答案。本申请又提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述任一项实现信息处理的方法。
本申请再提供了一种实现信息处理的装置,包括处理器、存储器以及存储在存储器上可在处理器上运行的计算机程序,所述计算机程序配置为执行包括以下步骤的方法:采集与用户标识ID相关联的信息;对采集到的信息进行处理以形成基于用户ID的数据信息并存储;获取用户提出的问题,根据存储的所述 基于用户ID的数据信息,对获得的用户提出的问题进行预处理,得到针对该用户提出的问题的答案。
本申请还提供了一种信息处理装置,生成模块、标注模块、临时元素表;其中,生成模块,设置为根据采集到的数据生成临时文件;标注模块,,设置为在所述生成模块每生成一个临时文件时,对所述临时文件进行标注,将标注后的临时文件信息保存在临时元素表中。
在一实施例中,所述学习单元还包括:获取模块、比对模块;其中,所述获取模块,设置为定时从第一存储单元中读取语料信息;所述比对模块,设置为将临时元素表中的数据和所述获取模块获得的语料信息进行比较,将第一存储单元中不存在的临时元素存储到第二存储单元中。
本申请又提供了一种实现信息处理的方法,包括:根据采集到的信息生成临时文件;每生成一个所述临时文件时,对所述临时文件进行标注并将标注后的临时文件信息保存在临时元素表中。
在一实施例中,所述方法还包括:定时读取已有语料信息;比较所述临时元素表中的数据和读取的所述已有语料信息,存储读取的所述已有语料信息中不存在的临时元素。
本申请一实施例提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机程序设置为执行上述任一项实现信息处理的方法。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图说明
附图用来提供对本申请技术方案的理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1为本申请信息处理系统的组成结构示意图;
图2为本申请信息处理系统中学习单元的组成结构示意图;
图3为本申请实现信息处理的方法的流程图;
图4为本申请第一实施例中的组网架构示意图;
图5为本申请第一实施例中实现信息处理的流程示意图;
图6为本申请第二实施例中的组网架构示意图;
图7为本申请第二实施例中实现信息处理的流程示意图;
图8为本申请第三实施例中的组网架构示意图;
图9为本申请第三实施例中实现信息处理的流程示意图;
图10为本申请第四实施例中的组网架构示意图;
图11为本申请第四实施例中实现信息处理的流程示意图。
具体实施方式
下文中将结合附图对本申请的实施例进行详细说明。
以一个用户希望智能问答系统推荐附近好吃的餐馆为例,比如:当用户通过用户界面向智能问答系统提出“推荐个附近好吃的餐馆”的问题时,智能问答系统可以根据用户当前所在位置推荐附近的餐馆。再如:当用户提出“符合我口味的餐馆推荐下”的问题时,智能问答系统可以根据用户当前所在位置推荐附近的餐馆,推荐的餐馆通常是按照点评从高到低的顺序给出的,由于智能问答系统并不知道用户个人喜好的口味,因此,只能按照预先设置好的规则如按照评分高低推荐附近的餐馆,而推荐不出符合用户口味的餐馆。又如:当用户提出“从我家怎么去南京南站”的问题时,用户是想知道从自己家到南京南站的路径,但是智能问答系统只能根据用户当前所在位置给出用户当前所在位置到南京南站的路径。
一般用户自己圈子里的朋友口味相近的可能性会大很多,而且熟识的朋友也会对该用户的工作和生活区域更为了解,如果用户和亲朋好友在社交网络和社交平台中至少一种中讨论过这些话题、共享过位置信息、或者朋友发布过相关的内容,如果基于智能推荐引擎的问答系统能结合这些语料再给出建议和回复,答案会更接近用户想要的答案。对于提出问题的用户来讲是够具体的、更贴近用户需求的,从而能够更好地提供针对用户的推荐信息。
本文中,社交网络强调的是一个公开的网络环境,社交网络中的成员之间的关系是单向的,包括关注与被关注。社交网络中的每个成员发布的信息陌生人都可以看到,粉丝可以设置为被动接收关注对象新发布的内容。常见的社交网络如微博或推特(Twitter)等。社交平台强调的是一个较为封闭的网络环境,社交平台的成员之间的关系是双向的,成员之间一旦加为好友,双方发布的信息相互都可以看到。常见的社交平台如微信或脸书(facebook)等。
图1为本申请信息处理系统的组成结构示意图,如图1所示,信息处理系统至少包括:数据采集单元、学习单元、第一存储单元、收发单元,以及信息处理单元。
本申请信息处理系统中的数据采集单元,设置为采集与用户身份标识(IDentity,ID)相关联的信息,例如来自社交网络和社交平台中至少一种的信息。
本申请信息处理系统中的数据采集单元可以通过第三方社交网络和社交平台中至少一种开放的接口采集来自社交网络和社交平台中至少一种的基于用户标识(ID)的数据。数据包括但不限于:用户登记的个人信息,本人原创的朋友圈信息、帖子、图片、音频、视频以及转发的帖子中的任意组合,用户好友或关注人或粉丝原创的朋友圈信息、帖子、图片、音频、视频以及转发的帖子中的任意组合,用户本人或好友或关注人或粉丝加入的群组、社区、关注的公众号中存在的帖子、图片、音频以及视频中的任意组合。
本申请中与用户ID相关联的数据可以分为两类:静态数据和动态数据。其中,静态数据包括但不限于:用户的性别、职业、工作城市,以及毕业学校等。动态数据包括但不限于:在某个时间回答了一个问题、给一个回答点赞、发表了一篇文章、在某一回答页面停了多久,以及评论用了多久等;更深层次的动态数据还可以包括:用户发表的文章、言论等基于内容的,以及能表达用户思想的数据。
相关技术中提供的语料数据库中并没有涉及与用户ID有关的数据,本申请实施例中对基于用户ID的数据进行了采集,可以分析出用户自身的信息及想法,从而可以为后续给出符合用户个人需求的答案提供依据。
本申请中的社交网络可以是至少一个对本发明信息处理系统开放了接口的社交网络,本申请中的社交平台可以是至少一个对本发明信息处理系统开放了接口的社交平台;本申请中的社交网络可以是本申请信息处理系统通过网络爬虫或搜索方式获取到的来自社交网络和社交平台中至少一种的信息,本申请中的社交平台可以是本申请信息处理系统通过网络爬虫或搜索方式获取到的来自社交平台中的信息。
本申请信息处理系统与社交网络之间,信息处理系统与社交平台之间,以及信息处理系统与社交网络和社交平台之间可以通过接口IF1交互以获得与用 户ID有关的数据。
本申请信息处理系统中的学习单元,设置为对采集到的信息进行处理以形成基于用户ID的数据信息,并转换为信息处理系统要求的格式,存储在第一存储单元中。
在一实施例中,对采集到的信息进行处理可以包括但不限于:关键词提取、领域分类、相似度计算,以及归一化等机器学习算法处理,形成基于用户ID的数据信息。
图2为本申请信息处理系统中学习单元的组成结构示意图,如图2所示,至少包括:生成模块、标注模块,以及临时元素表。
生成模块,设置为根据来自数据采集单元的数据生成临时文件。
在一实施例中,可以按照预先设置的规则生成临时文件。
数据采集单元是通过接口采集数据的,返回的响应消息中会包含如用户名、发表时间,以及发表内容等信息,将这些信息形成文件,需要制定生成文件名的规则。比如“用户名+时间戳”,文件内容每行发表一条内容,回车表示另起一行,或者也可用各种标点符号分隔等规则,这些规则都是生成临时文件前设置好的。
标注模块,设置为在所述生成模块每生成一个临时文件时,对临时文件进行标注,将标注后的临时文件信息保存在临时元素表中。
需要说明的是,临时文件被标注后则形成如词、短语,以及句子之类的元素的临时文件信息,临时元素表中存储的标注后的临时文件指的是这些元素。
在一实施例中,可以采用数据自动标注工具如采用根据历史语料标注记录训练得到的模型,并结合人工审核的方式对临时文本进行标注。其中,数据自动标注工具主要设置为对各类数据,如文本、图片、视频实现自动化标注,具体实现并不用于限定本申请的保护范围。自动标注工具的准确率由相关数据训练集的完整度和算法模型共同决定。
在一实施例中,本申请信息处理系统中学习单元还包括:获取模块,以及比对模块。
获取模块,设置为定时从本申请信息处理系统已有数据库,如第二存储单元中读取各种语料信息,例如经常问到的问题(Frequently Asked Questions,FAQ)、寒暄语、等价句等。
比对模块,设置为将临时元素表中的数据和获取模块获得的已有数据库中的语料信息进行比较,将已有数据库中不存在的临时元素通过管理门户存储到第一存储单元中。
这里,管理人员可以通过管理门户是对临时元素进行审核,审核通过后的临时元素会写入第一存储单元中。
本申请信息处理系统中的收发单元,设置为通过如客户端/万维网(World Wide Web,WEB)网页/短信/彩信/互动式语音应答(Interactive Voice Response,IVR)方式等接收用户提出的问题;将得到的针对用户提出的问题的答案返回给该用户。
在一实施例中,可以接入各种第三方应用(Application,APP)或微信或网站中,也可以和运营商的短信/彩信/语音中心对接,以获取用户提出的问题。
本申请信息处理系统中信息处理单元与第一存储单元之间可以通过接口IF2获取与用户提出的问题对应的答案。
本申请信息处理系统中的信息处理单元,设置为根据第一存储单元中存储的基于用户ID的数据信息,对获得的用户提出的问题进行敏感词过滤、标准化等预处理,得到针对该用户提出的问题的答案。
在一实施例中,信息处理单元还设置为:根据得到的答案的相关信息,例如内容、来源渠道、权限、权重进行相似度比较,得到得分最高且满足本申请信息处理系统预先设置的阈值的答案作为所述针对用户提出的问题的答案。这里强调的是,在相似度的计算过程中考虑到了来源渠道、权限、权重等因素,而如何实现相似度的计算可以采用多种相关技术中的方法来实现,并不用于限定本申请的保护范围。
在一实施例中,本申请信息处理系统还包括同步单元,设置为将需要同步的用户提出的问题同步到第一存储单元中,再由信息处理单元调用和社交网络和社交平台至少一种的接口,发布到社交网络和社交平台中至少一种上。如果用户选择将对本申请信息处理系统提出的问题同时发布到社交网络和社交平台中至少一种上,那么,如果本申请信息处理系统对提出的问题没有给出最满意的答案,用户也可能获得社交网络和社交平台中至少一种中亲朋好友给出的方案。在一实施例中,社交网络和社交平台中至少一种中亲朋好友给出的方案同样可以被本申请信息处理系统学习到并存储在第一存储单元中,为后续的答案 查询提供补充和完善。
这样,用户的社交圈子里的人就可以给出针对发布出去的问题的建议和回答。后续这些答案将再次被数据采集单元采集并被学习单元学习到。
本申请信息处理系统中同步单元与第一存储单元之间可以通过接口IF3同步用户提出的问题。
例如,如果用户通过用户界面向本申请提供的信息处理系统提出的问题包括:推荐一个附近好吃的餐馆或者推荐一个符合我口味的餐馆。本申请信息处理系统根据该用户在社交网络发表过的帖子或在社交平台和亲朋好友交流过的个人口味的信息,分析出该用户喜欢粤菜,那么,本申请信息处理系统会再结合用户当前的位置信息,给出距离该用户最近且得分较高的粤菜餐馆。再如,如果用户通过用户界面向本申请提供的信息处理系统提出的问题包括:从我家怎么去南京南站。本申请信息处理系统会根据用户在社交网络中早上或晚上等休息时间发表过的位置信息的帖子或在社交平台和亲朋好友交流过的家庭住址、小区信息等,给出从该用户家到南京南站的路径地图。
在一实施例中,本申请信息处理系统还包括:管理单元,设置为配置定时任务,按照定时任务定时触发数据采集单元对来自社交网络和社交平台中至少一种的数据进行采集。在一实施例中,系统管理员可以通过管理门户配置所述定时任务。
管理单元,除了可以管理通用问答数据库里的语料,还设置为:对第一存储单元中存储的基于用户ID的数据信息进行管理和维护,比如进行以下至少之一的管理和维护:设置基于用户ID的数据信息的权重,设置用户查询基于用户ID的数据信息时的权限,以及保证用户的隐私安全。在一实施例中,系统管理员可以通过管理门户对基于用户ID的数据信息进行管理和维护。
管理单元还设置为:对第一存储单元中的基于用户ID的数据信息进行增删改查操作。在一实施例中,系统管理员可以通过管理门户对基于用户ID的数据信息进行增删改查操作。在一实施例中,系统管理员还可以通过管理门户对不同类型的基于用户ID的数据信息进行权限设置,规定不同的基于用户ID的数据信息的访问权限,如公开、好友可见或只能自己查看等。在一实施例中,还可以对来源不同、类型不同的基于用户ID的数据信息设置权重。这样,在本申请信息处理系统检索答案以及进行相似度计算时,可以结合权重对多个答 案进行排序,从而获得得分最高且满足本申请信息处理系统预先设置的阈值的答案。
在一实施例中,本申请信息处理系统还包括:第二存储单元,设置为存储已有语料信息。
信息处理单元,设置为结合第一存储单元中存储的基于用户ID的数据信息和第二存储单元中存储的已有语料信息,对获得的用户提出的问题进行敏感词过滤、标准化等预处理,得到针对该用户提出的问题的答案。
需要说明的是,第二存储单元和第一存储单元在实现上可以是同一个数据库。
图3为本申请实现信息处理的方法的流程图,如图3所示,包括步骤300至步骤302。
在步骤300中,采集与用户标识ID相关联的信息,例如来自社交网络和社交平台中至少一种的基于用户标识(ID)的数据。
本步骤中,采集可以通过第一数据请求消息和第一数据响应消息获取与用户ID有关的数据,其中,第一数据请求消息包括但不限于表1所示的以下字段:
Figure PCTCN2018111962-appb-000001
Figure PCTCN2018111962-appb-000002
表1
其中,第一数据响应消息包括但不限于表2所示的以下字段:
Figure PCTCN2018111962-appb-000003
表2
表2中第一数据响应消息中的发布的数据内容(content)字段中包括如表3所示的以下参数:
Figure PCTCN2018111962-appb-000004
表3
第一数据响应消息中的发布的数据内容(content)字段中的地理位置参数中包括如表4所示的以下参数:
Figure PCTCN2018111962-appb-000005
表4
在步骤301中,对采集到的数据进行处理以形成基于用户ID的数据信息并存储。
在一实施例中,本步骤中可以通过机器学习算法的处理并转换为预先要求的格式后形成基于用户ID的数据信息,再存储得到的基于用户ID的数据信息。
在一实施例中,机器学习算法可以包括但不限于:关键词提取、领域分类、相似度计算,以及归一化等机器学习算法处理,形成基于用户ID的数据信息。
在一实施例中,本步骤中存储的基于用户ID的数据信息如语料存储在图1所示的第一存储单元中,包括如表5所示的字段:
Figure PCTCN2018111962-appb-000006
Figure PCTCN2018111962-appb-000007
表5
在一实施例中,步骤301包括:按照预先设置的规则将采集到的数据生成临时文件;在每生成一个临时文件时,对该临时文件进行标注,并将标注后的临时文件信息保存在临时元素表中。
需要说明的是,临时文件被标注后,则形成如词、短语以及句子等元素的临时文件信息,临时元素表中存储的标注后的临时文件指的是这些元素。
数据采集单元是通过接口采集数据的,返回的响应消息中会包含如用户名、发表时间、发表内容等信息,将这些信息形成文件,需要制定生成文件名的规则。比如“用户名+时间戳”,文件内容每行发表一条内容,回车表示另起一行,或者也可用各种标点符号分隔等规则,这些规则都是生成临时文件前设置好的。
在一实施例中,可以采用数据自动标注工具如采用根据历史语料标注记录训练得到的模型,并结合人工审核的方式对临时文本进行标注。其中,数据自动标注工具主要设置为对各类数据,如文本、图片以及视频实现自动化标注,具体实现并不用于限定本申请的保护范围。自动标注工具的准确率由相关数据训练集的完整度和算法模型共同决定。
在一实施例中,还包括:定时从已有数据库如图1中所示的第二存储单元中读取如FAQ、寒暄语以及等价句等各种语料信息;将临时元素表中的数据和获取模块获得的已有数据库中的语料信息进行比较,存储已有数据库中不存在的临时元素。
在步骤302中,获取用户提出的问题,根据存储的基于用户ID的数据信息,对获得的用户提出的问题进行预处理,得到针对该用户提出的问题的答案。
在一实施例中,可以通过如客户端/万维网(WEB)网页/短信/彩信/互动式语音应答(Interactive Voice Response,IVR)方式等接收用户提出的问题。
可以通过第二数据请求消息和第二数据响应消息获取与用户提出的问题对应的答案,其中,第二数据请求消息包括但不限于表6所示的以下字段:
Figure PCTCN2018111962-appb-000008
表6
其中,第二数据响应消息包括但不限于表7所示的以下字段:
Figure PCTCN2018111962-appb-000009
表7
表7中第二数据响应消息中的发布的答案列表(answerList)字段中包括如表8所示的以下参数:
Figure PCTCN2018111962-appb-000010
表8
在一实施例中,本步骤中的对获得的用户提出的问题进行预处理包括但不限于:对获得的用户提出的问题进行敏感词过滤、标准化等预处理。
本步骤还包括:将所述用户提出的问题进行预处理之后得到待处理答案,根据得到的待处理答案的相关信息如内容、来源渠道、权限以及权重进行相似度比较,得到得分最高且满足预先设置的阈值的答案,并作为返回提出问题的用户的答案。这里强调的是,在相似度的计算过程中考虑到了来源渠道、权限以及权重等因素,而如何实现相似度的计算可以采用多种相关技术中的方法来实现,并不用于限定本申请的保护范围。
本申请图3所示的方法还包括:同步需要同步的用户提出的问题,并发布到社交网络和社交平台中至少一种上。
在一实施例中,社交网络和社交平台中至少一种中亲朋好友针对同步的问题给出的方案同样可以被学习到并存储在如图1所示的第一存储单元中,为后续的答案查询提供了补充和完善。
可以通过第三数据请求消息和第三数据响应消息同步用户提出的问题,其中,第三数据请求消息包括但不限于表9所示的以下字段:
Figure PCTCN2018111962-appb-000011
表9
其中,第三数据响应消息包括但不限于表10所示的以下字段:
Figure PCTCN2018111962-appb-000012
表10
本申请图3所示的方法还包括:配置定时任务,按照定时任务定时触发对 来自社交网络和社交平台中至少一种的数据的采集。在一实施例中,系统管理员可以通过管理门户配置所述定时任务。
在一实施例中,还包括:对存储的基于用户ID的数据信息进行管理和维护。比如可进行以下至少之一的管理和维护:设置基于用户ID的数据信息的权重,设置用户查询基于用户ID的数据信息时的权限,以及保证用户的隐私安全。在一实施例中,系统管理员可以通过管理门户对基于用户ID的数据信息进行管理和维护。
在一实施例中,还包括:对存储的基于用户ID的数据信息进行增删改查操作。比如,系统管理员可以通过管理门户对基于用户ID的数据信息进行增删改查操作。在一实施例中,系统管理员还可以通过管理门户对不同类型的基于用户ID的数据信息进行权限设置,规定不同的基于用户ID的数据信息的访问权限,如公开、好友可见或只能自己查看等。在一实施例中,还可以对来源不同、类型不同的基于用户ID的数据信息设置权重。这样,在本申请信息处理系统检索答案以及进行相似度计算时,可以结合权重对多个答案进行排序,从而获得得分最高且满足本申请信息处理系统预先设置的阈值的答案。
本申请图3所示的方法还包括:结合存储的基于用户ID的数据信息和存储的已有语料信息,对获得的用户提出的问题进行如敏感词过滤、标准化等预处理,得到针对该用户提出的问题的答案。
本申请还提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行本申请任一项实现信息处理的方法。
本申请还提供了一种实现信息处理的装置,包括处理器、存储器以及存储在存储器上可在处理器上运行的计算机程序:采集与用户标识ID相关联的信息如来自社交网络和社交平台中至少一种的基于用户标识(ID)的信息;对采集到的信息进行处理以形成基于用户ID的数据信息并存储;获取用户提出的问题,根据存储的基于用户ID的数据信息,对获得的用户提出的问题进行预处理,得到针对该用户提出的问题的答案。
下面结合实施例对本申请实施例进行详细描述。
第一实施例中,假设信息处理系统为独立于社交网络的语料管理和智能问答系统,语料从某社交网络通过开放的接口获取。
图4为本申请第一实施例中的组网架构示意图,如图4所示,其中的语料 管理模块至少包括图1中的管理单元、数据采集单元,学习单元以及第一存储单元;其中的逻辑处理模块至少包括图1中的收发单元以及信息处理单元。图5为本申请第一实施例中实现信息处理的流程示意图,如图5所示,包括步骤500至步骤515。
在步骤500中,语料管理模块根据管理员配置的定时任务,定期调用社交网络开放的接口,向社交网络发起获取用户语料请求。
通过用户语料请求以获取用户个人信息、用户本人/关注人/粉丝发布的内容,以及评论内容等,作为系统的语料素材。
在步骤501~步骤502中,语料管理模块调用语料查询接口IF1,从社交网络查询用户数据。
本实施例中,语料管理模块向社交网络发送如表11所示的第一数据请求消息:
Figure PCTCN2018111962-appb-000013
表11
社交网络在收到表11所示的第一数据请求消息后,会向语料管理模块返回如表12所示格式的第一数据响应消息:
Figure PCTCN2018111962-appb-000014
Figure PCTCN2018111962-appb-000015
表12
表12中的发布的数据内容参数的内容如表13所示:
Figure PCTCN2018111962-appb-000016
表13
表13中地理位置的内容如表14所示:
Figure PCTCN2018111962-appb-000017
Figure PCTCN2018111962-appb-000018
表14
在步骤503中,语料管理模块按一定格式将获得的语料素材写入社交语料临时文件,每完成一个临时文件便自动进行语料标注,标注后的数据写入临时元素表。
这样,管理员已设置的定时任务会从已有语料库中读取语料,与临时元素表中的语料进行比对,如果是已有语料库中不存在的新语料,则写入本申请的第一存储单元即社交语料数据库中。
按照管理员预先设置的新语料测试规则,满足规则且分数达标的新语料会自动保存到社交语料数据库中,或者由管理员审核后存入数据库中。保存的数据中携带语料的渠道、权限、权重以及生成时间等属性。所有关联在同一个用户ID下的社交语料组成该用户的个人语料数据集。这里,新语料测试规则可以包括但不限于:如选择新词作为关键词,每个新词选择100条用户问句,测试新词加入后对问答准确率的影响。如果准确率满足要求的得分,就将新词加入社交语料数据库中等。
如表11~表14所示的数据,语料管理模块分析出的地理位置关键词是“广州市”、“越秀区”、“北园酒家”,以及“走过这么多地方,最爱的还是家乡的早茶和肠粉”中的关键词“早茶”和“肠粉”,这样,可以联想和推理出关键词“粤菜”、“餐馆”,这里,联想和推到的实现可以通过在系统中预先设置实体词和上位词的包含关系来实现,即实体词属于上位词的子类,这样可以根据实体词如本实施例中的早茶、肠粉查出它们的父类即上位词为粤菜或餐馆。本实施例中,假设比对后发现已有语料库中该用户不存在这些关键词,并且这些新增关键词符合管理员设置的新语料测试规则,因此,将这些关键词写入社交语料数据库,同时标注关键词属性,比如:渠道为“微博”,权限为“公开”,权重为“70%”,生成时间为“20170603000125”,其中权重可以由管理员根据经验、历史测试数据等人工设置。
在步骤504~步骤506中,用户打开本申请提供的信息处理系统的交互网页如问答页面,用户可以使用本申请信息处理系统已对接的几种社交网络的个人账号登录,从而根据用户ID获取该用户的用户个人信息和语料数据;也可以让 用户通过问答页面注册账号并绑定几种和本申请信息处理系统已对接的社交网络的个人账号。
在步骤507中,用户在问答页面提出问题。
本实施例中,假设用户不登录问答页面的情况下只能搜索通用渠道的语料即已有的数据信息即本申请中的第二存储单元,用户在登陆后,可以由用户或系统管理员指定搜索范围,比如搜索范围是通用渠道的语料库、或是指定某几个社交网络的语料库、或是全部渠道的语料库等。
上文举例中,比如用户登录后输入问题为:符合我口味的餐馆推荐下?并且指定搜索范围为:通用渠道的语料库+微博渠道的语料库。
在步骤508中,问答页面向逻辑处理模块发起查询答案请求。
在步骤509中,逻辑处理模块对问题进行标准化处理去除特殊符号、过滤掉敏感词等预处理。
这里,特殊符号通常指各种标点符号,敏感词指政治、涉黄、赌博以及毒品这类词语。敏感词通常指一些涉及黄赌毒的词。
在步骤510~步骤511中,逻辑处理模块对“符合我口味的餐馆推荐下”这个问题,调用自身与语料模块的接口IF2,向语料模块发起查询答案请求;经过等价句查询、同义词替换及查询关键词、查询FAQ等处理后,假设本实施例中查询到若干个答案:
本地值得推荐的川菜餐馆有aaa,bbb,ccc...
本地值得推荐的苏菜餐馆有eee,fff,ggg...
本地值得推荐的湘菜餐馆有hhh,iii,jjj...
本地值得推荐的粤菜餐馆有xxx,yyy,zzz...
在步骤512中,语料模块将查询到的若个个问题答案返回给逻辑模块。
在步骤513中,逻辑处理模块对得到的所有待处理答案进行相似度计算,并结合查询到的渠道、权限、权重等参数,假设得出得分最高的答案为“本地值得推荐的粤菜餐馆有xxx,yyy,zzz...”。
在步骤514中,逻辑处理模块将答案“本地值得推荐的粤菜餐馆有xxx,yyy,zzz...”返回给问答页面。
在步骤515中,问答页面将结果展示给用户。
在一实施例中,所述方法还包括:根据经过所述预处理之后的所述用户提 出的问题的相关信息与所述与用户标识ID相关联的信息进行相似度比较,得到相似度最高的问题并针对所述相似度最高的问题查询答案,将所述相似度最高的问题的答案作为所述针对用户提出的问题的答案。其中,逻辑处理模块和语料模块的功能可参考步骤510至步骤513中的描述进行适应性调整,不再赘述。
第二实施例中,假设问答页面集成在某社交平台中,语料从社交平台接口获取。
图6为本申请第二实施例中的组网架构示意图,如图6所示,其中的语料管理模块至少包括图1中的管理单元、数据采集单元和学习单元、第一存储单元;其中的逻辑处理模块至少包括图1中的收发单元、信息处理单元。图7为本申请第二实施例中实现信息处理的流程示意图,如图7所示,包括步骤700至步骤717。
在步骤700中,语料管理模块定期从社交平台数据库同步用户数据。
本步骤中,通过社交平台的内部接口,根据用户ID可以获得该用户的用户个人信息,以及该用户本人和好友中至少之一发布的内容、评论的内容,或者聊天内容等,作为语料素材。
在步骤701~步骤702中,语料管理模块根据机器学习算法对获取的语料素材进行自学习,并根据管理员预先设置的规则自动触发语料测试,得分答标的语料自动保存到语料管理模块中,或者由管理员审核后保存到语料管理模块中。保存的数据中携带有语料的渠道属性。
在步骤703~步骤707中,用户登录社交平台。
大致包括:用户登录到社交平台的门户中,门户到数据库中查询用户信息,对用户鉴权,然后在登录响应中返回鉴权结果给用户。
在步骤708中,用户一键登录集成在社交平台中的问答页面,并通过问答页面提出问题。
在步骤709中,问答页面将向逻辑处理模块发起查询答案请求。
在步骤710中,逻辑处理模块对问题进行标准化处理、敏感词过滤等预处理。
在步骤711中,逻辑处理模块通过接口IF2向语料管理模块发起查询答案请求。
在步骤712~步骤713中,语料管理模块查询数据库获得问题答案。
在步骤714中,语料管理模块通过接口IF2将查询结果返回给逻辑处理模块。
在步骤715中,逻辑处理模块对查询到的所有待处理答案进行相似度计算,并结合查询到的渠道、权限、权重等参数,综合得到得分最高的答案。
在步骤716中,逻辑处理模块将结果返回给问答页面。
在步骤717中,问答页面将结果展示给用户。
在一实施例中,所述方法还包括:根据经过所述预处理之后的所述用户提出的问题的相关信息与所述与用户标识ID相关联的信息进行相似度比较,得到相似度最高的问题并针对所述相似度最高的问题查询答案,将所述相似度最高的问题的答案作为所述针对用户提出的问题的答案。其中,逻辑处理模块和语料模块的功能可参考步骤711至步骤715中的描述进行适应性调整,不再赘述。
第三实施例中,假设用户所提出的问题被同步到社交网络,答案被本申请信息处理系统再次学习。
图8为本申请第三实施例中的组网架构示意图,如图8所示,其中的语料管理模块至少包括图1中的管理单元、数据采集单元和学习单元、第一存储单元;其中的逻辑处理模块至少包括图1中的收发单元、信息处理单元。图9为本申请第三实施例中实现信息处理的流程示意图,如图9所示,包括步骤900至步骤924。
在步骤900中,用户打开本申请信息处理系统的问答网页,可以让用户用本申请信息处理系统已对接的几种社交网络的个人账号登录,从而根据用户ID获取该用户的用户个人信息和语料数据。
也可以让用户在本申请信息处理系统中注册账号,并绑定几种和本申请信息处理系统已对接的社交网络的个人账号。
在步骤901中,问答页面鉴权账户信息。
在步骤902中,问答页面通过登录响应将登录结果返回给用户。
在步骤903中,用户通过问答页面提出问题,问答页面根据用户已关联账号的情况,可以在问答页面下方给出多选列表,用户可以采用如勾选的方式确定是否将问题同步到社交网络。
如果用户未登录本申请信息处理系统或未关联社交网络账号的用户页面,则不会显示多选列表。
在步骤904中,问答页面保存用户需要同步的问题的同步选项。
在步骤905中,问答页面向逻辑处理模块发起查询答案请求。
在步骤906中,逻辑处理模块对问题进行标准化处理、敏感词过滤等预处理。
在步骤907中,逻辑处理模块调用接口IF2向语料管理模块发起查询答案请求。
在步骤908中,语料管理模块从语料数据库中查询问题的答案。
在步骤909中,语料管理模块将查询到的问题的答案(列表)携带在查询答案响应返回给逻辑处理模块。
在步骤910中,逻辑处理模块对查询到的所有待处理答案进行相似度计算,并结合查询到的渠道、权限、权重等,综合得到得分最高的答案。
在步骤911中,逻辑处理模块将结果携带在查询答案响应返回给问答页面。
在步骤912中,问答页面将结果展示给用户。
在步骤913中,如果用户选择的是同时发布问题到社交网络,那么,本申请信息处理系统将用户想要发布的问题发布到社交网络上;问答页面同步问题给逻辑处理模块。
在步骤914中,逻辑处理模块调用接口IF3,将问题同步给语料管理模块。
在步骤915中,语料管理模块通过社交网络开放的第三方接口向社交网络发起问题发布请求。
在步骤916中,社交网络将发布出用户提出的问题。
在步骤917中,社交网络通过问题发布响应向语料管理模块返回发布结果。
在步骤918~步骤919中,语料管理模块将问题同步响应返回给逻辑处理模块,最终返回给问答页面。
在步骤920中,用户的朋友看到发布的问题后,可以给出评论或私信。
这样,用户同时能得到本申请信息处理系统给出的答案,也能收到自己社交圈子里好友或粉丝给出的评论。
在步骤921中,本申请信息处理系统的语料管理模块定期调用社交网络开放的第三方接口,同步用户语料,同时也能获得这些评论内容,作为语料素材。
在步骤922中,社交网络查询用户数据。
在步骤923中,社交网络将用户数据返回给语料管理模块。
在步骤924中,语料管理模块根据机器学习算法重新学习这部分内容,并 保存到语料数据库中。
如此按照第三实施例的循环,用户个人的语料数据集就能不断地得到修正。这样,当用户再次提出类似或相关的问题时,本申请信息处理系统就有更丰富的语料作为参考,给出更贴近用户需求、准确率更高的答案。
在一实施例中,所述方法还包括:根据经过所述预处理之后的所述用户提出的问题的相关信息与所述与用户标识ID相关联的信息进行相似度比较,得到相似度最高的问题并针对所述相似度最高的问题查询答案,将所述相似度最高的问题的答案作为所述针对用户提出的问题的答案。其中,逻辑处理模块和语料模块的功能可参考步骤907至步骤910中的描述进行适应性调整,不再赘述。
第四实施例,假设语料管理模块应用在本申请信息处理系统中。
图10为本申请第四实施例中的组网架构示意图,如图10所示,其中的语料管理模块至少包括图1中的管理单元、数据采集单元和学习单元、第一存储单元;其中的逻辑处理模块至少包括图1中的收发单元、信息处理单元。图11为本申请第四实施例中实现信息处理的流程示意图,如图11所示,包括步骤1100至步骤1113。
在步骤1100中,用户的朋友和用户互动,评论用户发布在社交网络上的帖子。
在步骤1101中,语料管理模块定期向社交网络发出用户数据查询请求,以获取用户发表的文字,同时获取到这些评论内容,作为语料素材。
在步骤1102~步骤1103中,社交网络将朋友对用户文字的评论内容返回给语料管理模块。
在步骤1104中,语料管理模块根据机器学习算法学习这些数据,分析出每个用户的关键词、问题和相应答案等新增语料,并将语料的属性如权限、权重等保存到语料数据库中。
在步骤1105中,用户打开某对接了本申请信息处理系统的门户网站,并通过本申请信息处理系统已对接的几种社交网络的个人账号登录,从而根据用户ID获取该用户的用户个人信息和语料数据。
用户也可以通过门户网站注册新账号并绑定几种和本申请信息处理系统已对接的社交网络的个人账号。
在步骤1106中,门户网站调用本申请信息处理系统的相关接口查询数据库 对用户进行鉴权。
在步骤1107中,如果鉴权失败,门户向用户返回登录失败响应;如果鉴权成功,门户向本申请信息处理系统中的逻辑处理模块发起查询用户热点词汇请求。
在步骤1108中,逻辑处理模块向语料管理模块发起查询用户热点词汇请求。
在步骤1109中,语料管理模块查询语料数据库,综合打分后按照预先设置的规则将排名靠前的关键词作为用户热点词汇列表。
在步骤1110中,语料管理模块通过查询用户热点词汇响应向逻辑处理模块返回用户热点词汇列表。
在步骤1111中,逻辑处理模块根据业务需要综合处理后得到推荐内容列表。
这里,业务需要综合处理主要是指结合推荐网站自己的业务特点,比如购物网站,可能会筛选上一步获取的热点词汇中和日常生活、商品、购物相关的热词等。再如,如果是App下载网站,可能会筛选游戏、娱乐相关的热词等。
在步骤1112中,逻辑处理模块将推荐内容列表返回给门户。
在步骤1113中,门户返回登录成功响应并向用户展示推荐内容。

Claims (31)

  1. 一种信息处理系统,包括:数据采集单元、学习单元、第一存储单元、收发单元,以及信息处理单元;其中,
    所述数据采集单元,设置为采集与用户标识ID相关联的信息;
    所述学习单元,设置为对采集到的信息进行处理以形成基于用户ID的数据信息并存储在所述第一存储单元中;
    所述收发单元,设置为接收用户提出的问题;将得到的针对所述用户提出的问题的答案返回给该用户;
    所述信息处理单元,设置为根据所述第一存储单元中存储的所述基于用户ID的数据信息,对获得的用户提出的问题进行预处理,得到针对所述用户提出的问题的答案。
  2. 根据权利要求1所述的信息处理系统,其中,所述与用户标识ID相关联的信息来自社交网络和社交平台中至少一种;
    所述社交网络的数目为至少一个,所述社交平台的数目为至少一个。
  3. 根据权利要求2所述的信息处理系统,还包括:
    同步单元,设置为将需要同步的所述用户提出的问题同步到所述第一存储单元中;调用所述信息处理系统与所述社交网络和所述社交平台中至少一种的接口,将所述用户提出的问题发布到所述社交网络和所述社交平台中至少一种上。
  4. 根据权利要求2所述的信息处理系统,还包括:管理单元,设置为配置定时任务,按照所述定时任务定时触发所述数据采集单元,以便对来自所述社交网络和社交平台中至少一种的数据进行采集。
  5. 根据权利要求4所述的信息处理系统,所述管理单元还设置为:对所述第一存储单元中存储的所述基于用户ID的数据信息进行管理和维护。
  6. 根据权利要求4所述的信息处理系统,所述管理单元还设置为:对所述第一存储单元中的所述基于用户ID的数据信息进行增删改查操作。
  7. 根据权利要求4所述的信息处理系统,所述管理单元还设置为:对不同类型的所述基于用户ID的数据信息进行权限设置。
  8. 根据权利要求1~4任一项所述的信息处理系统,还包括:
    第二存储单元,设置为存储已有语料信息;
    所述信息处理单元,设置为结合所述第一存储单元中存储的所述基于用户 ID的数据信息和所述第二存储单元中存储的所述已有语料信息,对获得的所述用户提出的问题进行预处理,得到针对所述用户提出的问题的答案。
  9. 根据权利要求8所述的信息处理系统,其中,所述学习单元包括:生成模块、标注模块、临时元素表;其中,
    所述生成模块,设置为根据来自所述数据采集单元的数据生成临时文件;
    所述标注模块,设置为在所述生成模块每生成一个所述临时文件时,对所述临时文件进行标注,将标注后的临时文件信息保存在所述临时元素表中。
  10. 根据权利要求9所述的信息处理系统,所述学习单元还包括:获取模块、比对模块;其中,
    所述获取模块,设置为定时从所述第二存储单元中读取所述已有语料信息;
    所述比对模块,设置为将所述临时元素表中的数据和所述获取模块获得的语料信息进行比较,将所述第二存储单元中不存在的临时元素存储到所述第一存储单元中。
  11. 根据权利要求1~4任一项所述的信息处理系统,所述信息处理单元还设置为:将所述用户提出的问题进行预处理之后得到待处理答案,根据得到的所述待处理答案的相关信息与所述与用户标识ID相关联的信息进行相似度比较,将相似度最高的答案作为所述针对用户提出的问题的答案。
  12. 根据权利要求1~4任一项所述的信息处理系统,所述信息处理单元还设置为:根据经过所述预处理之后的所述用户提出的问题的相关信息与所述与用户标识ID相关联的信息进行相似度比较,得到相似度最高的问题并针对所述相似度最高的问题查询答案,将所述相似度最高的问题的答案作为所述针对用户提出的问题的答案。
  13. 一种实现信息处理的方法,包括:
    采集与用户标识ID相关联的信息;
    对采集到的信息进行处理以形成基于用户ID的数据信息并存储;
    获取用户提出的问题,根据存储的所述基于用户ID的数据信息,对获得的用户提出的问题进行预处理,得到针对该用户提出的问题的答案。
  14. 根据权利要求13所述的方法,其中,所述与用户标识ID相关联的信息来自社交网络和社交平台中至少一种;
    所述社交网络的数目为至少一个,所述社交平台的数目为至少一个。
  15. 根据权利要求14所述的方法,所述方法还包括:
    同步需要同步的所述用户提出的问题并存储;
    将同步后的所述用户提出的问题发布到所述社交网络和社交平台中至少一种上。
  16. 根据权利要求14所述的方法,所述方法还包括:
    配置定时任务,按照所述定时任务定时触发所述采集。
  17. 根据权利要求16所述的方法,所述方法还包括:对存储的所述基于用户ID的数据信息进行管理和维护。
  18. 根据权利要求16所述的方法,所述方法还包括:对存储的所述基于用户ID的数据信息进行增删改查操作。
  19. 根据权利要求16所述的方法,所述方法还包括:对不同类型的所述基于用户ID的数据信息进行权限设置。
  20. 根据权利要求13~16任一项所述的方法,还包括:
    存储已有语料信息;
    结合存储的所述基于用户ID的数据信息和存储的所述已有语料信息,对获得的所述用户提出的问题进行预处理,得到针对所述用户提出的问题的答案。
  21. 根据权利要求20所述的方法,其中,所述对采集到的信息进行处理以形成所述基于用户ID的数据信息包括:
    根据所述采集到的信息生成临时文件;
    每生成一个所述临时文件时,对所述临时文件进行标注,并将标注后的临时文件信息保存在临时元素表中。
  22. 根据权利要求21所述的方法,所述方法还包括:
    定时读取所述已有语料信息;
    比较所述临时元素表中的数据和读取的所述已有语料信息,存储读取的所述已有语料信息中不存在的临时元素。
  23. 根据权利要求13~16任一项所述的方法,所述方法还包括:将所述用户提出的问题进行预处理之后得到待处理答案,根据得到的所述待处理答案的相关信息与所述与用户标识ID相关联的信息进行相似度比较,将相似度最高的答案作为所述针对用户提出的问题的答案。
  24. 根据权利要求13~16任一项所述的方法,所述方法还包括:根据经过所 述预处理之后的所述用户提出的问题的相关信息与所述与用户标识ID相关联的信息进行相似度比较,得到相似度最高的问题并针对所述相似度最高的问题查询答案,将所述相似度最高的问题的答案作为所述针对用户提出的问题的答案。
  25. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行权利要求13~24任一项所述的实现信息处理的方法。
  26. 一种实现信息处理的装置,包括处理器、存储器以及存储在存储器上可在处理器上运行的计算机程序,所述计算机程序配置为执行包括以下步骤的方法:
    采集与用户标识ID相关联的信息;对采集到的信息进行处理以形成基于用户ID的数据信息并存储;获取用户提出的问题,根据存储的所述基于用户ID的数据信息,对获得的用户提出的问题进行预处理,得到针对该用户提出的问题的答案。
  27. 一种信息处理装置,生成模块、标注模块、临时元素表;其中,
    生成模块,设置为根据采集到的数据生成临时文件;
    标注模块,设置为在所述生成模块每生成一个所述临时文件时,对所述临时文件进行标注,将标注后的临时文件信息保存在临时元素表中。
  28. 根据权利要求27所述的信息处理系统,所述学习单元还包括:获取模块、比对模块;其中,
    所述获取模块,设置为定时从第一存储单元中读取语料信息;
    比对模块,设置为将临时元素表中的数据和所述获取模块获得的语料信息进行比较,将第一存储单元中不存在的临时元素存储到第二存储单元中。
  29. 一种实现信息处理的方法,包括:根据采集到的信息生成临时文件;
    每生成一个所述临时文件时,对所述临时文件进行标注并将标注后的临时文件信息保存在临时元素表中。
  30. 根据权利要求29所述的方法,所述方法还包括:
    定时读取已有语料信息;
    比较所述临时元素表中的数据和读取的所述已有语料信息,存储读取的所述已有语料信息中不存在的临时元素。
  31. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行权利要求29或30所述的实现信息处理的方法。
PCT/CN2018/111962 2017-10-25 2018-10-25 一种信息处理系统及其实现信息处理的方法 WO2019080910A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711010979.0A CN107992513B (zh) 2017-10-25 2017-10-25 一种信息处理系统及其实现信息处理的方法
CN201711010979.0 2017-10-25

Publications (1)

Publication Number Publication Date
WO2019080910A1 true WO2019080910A1 (zh) 2019-05-02

Family

ID=62030034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/111962 WO2019080910A1 (zh) 2017-10-25 2018-10-25 一种信息处理系统及其实现信息处理的方法

Country Status (2)

Country Link
CN (1) CN107992513B (zh)
WO (1) WO2019080910A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992513B (zh) * 2017-10-25 2021-07-13 中兴通讯股份有限公司 一种信息处理系统及其实现信息处理的方法
CN109241456A (zh) * 2018-09-13 2019-01-18 上海宇佑船舶科技有限公司 地点推荐方法、装置及服务器
CN109815324A (zh) * 2019-01-10 2019-05-28 平安科技(深圳)有限公司 校园信息的查询方法、装置、计算机设备和存储介质
CN111488738B (zh) * 2019-01-25 2023-04-28 阿里巴巴集团控股有限公司 非法信息的识别方法、装置
CN110347818B (zh) * 2019-07-18 2022-03-25 广州虎牙科技有限公司 分词统计方法、装置、电子设备及计算机可读存储介质
CN111192155A (zh) * 2019-12-25 2020-05-22 杭州龙席网络科技股份有限公司 一种基于saas的社媒询盘识别及推荐方法
CN111488500B (zh) * 2020-03-19 2023-12-12 华南师范大学 一种医学问题信息处理方法、装置和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982076A (zh) * 2012-10-30 2013-03-20 新华通讯社 基于语义标签库的多维度内容标注方法
CN103279528A (zh) * 2013-05-31 2013-09-04 俞志晨 一种基于人机结合的问答系统及方法
US20140142920A1 (en) * 2008-08-13 2014-05-22 International Business Machines Corporation Method and apparatus for Utilizing Structural Information in Semi-Structured Documents to Generate Candidates for Question Answering Systems
CN105843897A (zh) * 2016-03-23 2016-08-10 青岛海尔软件有限公司 一种面向垂直领域的智能问答系统
CN106897334A (zh) * 2016-06-24 2017-06-27 阿里巴巴集团控股有限公司 一种问题推送方法和设备
CN107992513A (zh) * 2017-10-25 2018-05-04 中兴通讯股份有限公司 一种信息处理系统及其实现信息处理的方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232468A (zh) * 2008-02-28 2008-07-30 黄伟才 问答方法及问答系统
CN103823844B (zh) * 2014-01-26 2017-02-15 北京邮电大学 社区问答服务中基于主客观上下文的问题转发系统和方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140142920A1 (en) * 2008-08-13 2014-05-22 International Business Machines Corporation Method and apparatus for Utilizing Structural Information in Semi-Structured Documents to Generate Candidates for Question Answering Systems
CN102982076A (zh) * 2012-10-30 2013-03-20 新华通讯社 基于语义标签库的多维度内容标注方法
CN103279528A (zh) * 2013-05-31 2013-09-04 俞志晨 一种基于人机结合的问答系统及方法
CN105843897A (zh) * 2016-03-23 2016-08-10 青岛海尔软件有限公司 一种面向垂直领域的智能问答系统
CN106897334A (zh) * 2016-06-24 2017-06-27 阿里巴巴集团控股有限公司 一种问题推送方法和设备
CN107992513A (zh) * 2017-10-25 2018-05-04 中兴通讯股份有限公司 一种信息处理系统及其实现信息处理的方法

Also Published As

Publication number Publication date
CN107992513A (zh) 2018-05-04
CN107992513B (zh) 2021-07-13

Similar Documents

Publication Publication Date Title
WO2019080910A1 (zh) 一种信息处理系统及其实现信息处理的方法
US10728203B2 (en) Method and system for classifying a question
US11288331B2 (en) Selective presentation of content types and sources in search
US20130085745A1 (en) Semantic-based approach for identifying topics in a corpus of text-based items
US10298528B2 (en) Topic thread creation
CN103997507B (zh) 一种信息的推送方法及装置
US8886643B2 (en) Presenting social search results
US11675824B2 (en) Method and system for entity extraction and disambiguation
JP6246951B2 (ja) ユーザコンタクトエントリのデータ設定
US20090106307A1 (en) System of a knowledge management and networking environment and method for providing advanced functions therefor
US11080287B2 (en) Methods, systems and techniques for ranking blended content retrieved from multiple disparate content sources
US20150095319A1 (en) Query Expansion, Filtering and Ranking for Improved Semantic Search Results Utilizing Knowledge Graphs
US10176265B2 (en) Awareness engine
US20140143241A1 (en) Internet news platform and related social network
US20190361857A1 (en) Method and system for associating data from different sources to generate a person-centric space
US20130238696A1 (en) System and method for presenting synchronized media in a digital content environment
CN102483756A (zh) 使用社区交流的语义分析的助理顾问
WO2013026325A1 (zh) 一种人物搜索方法、装置及存储介质
US11232522B2 (en) Methods, systems and techniques for blending online content from multiple disparate content sources including a personal content source or a semi-personal content source
US20090100032A1 (en) Method and system for creation of user/guide profile in a human-aided search system
Knight et al. CANELC: Constructing an e-language corpus
US11836169B2 (en) Methods, systems and techniques for providing search query suggestions based on non-personal data and user personal data according to availability of user personal data
Liu et al. Semantic social media analysis of Chinese tourists in Switzerland
US11216735B2 (en) Method and system for providing synthetic answers to a personal question
US9544384B2 (en) Method and system for pushing associated users in social networking service network

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18870930

Country of ref document: EP

Kind code of ref document: A1