CN107623621B - Chat corpus collection method and device - Google Patents

Chat corpus collection method and device Download PDF

Info

Publication number
CN107623621B
CN107623621B CN201610556147.8A CN201610556147A CN107623621B CN 107623621 B CN107623621 B CN 107623621B CN 201610556147 A CN201610556147 A CN 201610556147A CN 107623621 B CN107623621 B CN 107623621B
Authority
CN
China
Prior art keywords
user
session
service system
intelligent service
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610556147.8A
Other languages
Chinese (zh)
Other versions
CN107623621A (en
Inventor
路彦雄
刘秋阁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201610556147.8A priority Critical patent/CN107623621B/en
Priority to EP17826971.8A priority patent/EP3487128B1/en
Priority to PCT/CN2017/092485 priority patent/WO2018010635A1/en
Publication of CN107623621A publication Critical patent/CN107623621A/en
Priority to US16/201,415 priority patent/US11294962B2/en
Application granted granted Critical
Publication of CN107623621B publication Critical patent/CN107623621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The disclosure discloses a method and a device for collecting chatting corpuses. The method comprises the following steps: the intelligent service system acquires a user identifier corresponding to a target user; dynamically generating session initiation content according to the user portrait information and the target corpus information corresponding to the user identification; pushing the session initiation content to a session system accessed by the intelligent service system; the intelligent service system carries out conversation related to conversation initiating content between the intelligent service system and the target user through the conversation system, and saves respective reply information of the intelligent service system and the target user in the conversation to form a chat corpus. In the chat corpus collection, because the chat corpus collection problem of the intelligent service system is solved by the participation of a plurality of users, namely, the chat corpus collection problem of the intelligent service system is solved in a crowdsourcing mode, the chat corpus can be widely obtained, the chat corpus can be easily obtained, and the chat corpus is not convenient to use due to privacy reasons.

Description

Chat corpus collection method and device
Technical Field
The present disclosure relates to the field of internet application technologies, and in particular, to a method and an apparatus for collecting chat corpuses.
Background
With the rapid development of internet application technology, the chat robot is also rapidly developed, and is expected to be applied to various scenes to realize the conversation between the chat robot and the user.
However, there are often major obstacles in the actual conversation between the chat robot and the user. For example, it is difficult to accurately understand the context of a conversation, and it is impossible to realize a smooth and rational conversation between the chat robot itself and the user because the existence of factors such as user preferences is not memorized.
The conversation obstacle exists between the chat robot and the user because enough real chat corpora in enough fields cannot be collected.
In reality, real chat corpora are few, and direct use is not convenient due to privacy reasons, so that collection of the chat corpora becomes a bottleneck of development of the chat robot.
Disclosure of Invention
In order to solve the technical problems that the chat corpus is difficult to obtain and the privacy reason is inconvenient to directly use in the related technology, the disclosure provides a chat corpus collection method and device.
A chat corpus collection method is applied to a session between an intelligent service system and a target user, and comprises the following steps:
the intelligent service system acquires a user identifier corresponding to a target user;
dynamically generating session initiation content according to the user portrait information and the target corpus information corresponding to the user identification;
pushing the session initiation content to a session system accessed by the intelligent service system;
and the intelligent service system carries out conversation related to the conversation initiating content between the intelligent service system and the target user through the conversation system, and saves respective reply information of the intelligent service system and the target user in the conversation to form a chat corpus.
A method for collecting chat corpuses, which is applied to a session system of an intelligent service system participating in a session, the method comprises the following steps:
the session system selects a target user for a user initiating a session request to obtain a user identifier corresponding to the target user;
acquiring session initiation content generated for the target user by the intelligent service system;
pushing the session initiation content through a friend relationship established between the intelligent service system and the target user;
and initiating the session between the intelligent service system and the target user in the session system by pushing the session initiation content.
A method for collecting chat corpuses, the method comprising:
acquiring a broadcast message generated by the intelligent service system according to the target corpus information;
broadcasting the broadcast message generated by the intelligent service system to a user through a session system accessed by the intelligent service system;
and carrying out conversation between the intelligent service system and the user through the broadcast message, and storing respective reply information of the intelligent service system and the user in the conversation to form a chat corpus.
A chat corpus collection device applied to an intelligent service system, the device comprising:
the target user identification acquisition module is used for acquiring a user identification corresponding to a target user;
the content generation module is used for dynamically generating session initiation content according to the user portrait information and the target corpus information corresponding to the user identification;
the session initiation content pushing module is used for pushing the session initiation content to a session system accessed by the intelligent service system;
and the corpus processing module is used for carrying out conversation related to the conversation initiating content between the intelligent service system and the target user through the conversation system and storing respective reply information of the intelligent service system and the target user in the conversation to form a chat corpus.
A chat corpus collection device applied to a conversation system in which an intelligent service system participates in a conversation, the device comprising:
the target user selection module is used for selecting a target user for the user initiating the session request to obtain a user identifier corresponding to the target user;
the content acquisition module is used for acquiring the session initiation content generated by the intelligent service system for the target user;
the content pushing module is used for pushing the session initiation content through a friend relationship established between the intelligent service system and a target user;
and the session initiating module is used for initiating the session between the intelligent service system and the target user in the session system by pushing the session initiating content.
A chat corpus collection apparatus, the apparatus comprising:
the broadcast acquisition module is used for acquiring a broadcast message generated by the intelligent service system according to the target corpus information;
the broadcasting module is used for broadcasting the broadcast message generated by the intelligent service system to the user through the session system accessed by the intelligent service system;
and the session processing module is used for carrying out the session between the intelligent service system and the user through the broadcast message, and respective reply information of the intelligent service system and the user in the session is stored to form a chat corpus.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
in the collection of the chat corpus, the intelligent service system acquires the user identification corresponding to the target user, dynamically generates session initiating content according to the user portrait information and the target corpus information corresponding to the user identification, pushes the session initiating content to a session system accessed by the intelligent service system, initiates a session between the intelligent service system and the target user through the session initiating content, the intelligent service system carries out the session between the intelligent service system and the target user through the session system, and stores respective reply information of the intelligent service system and the target user in the session to form the chat corpus, in the collection of the chat corpus, because the collection of the chat corpus of the intelligent service system is realized through the participation of a plurality of users, namely, the problem of the collection of the chat corpus of the intelligent service system is solved through a crowdsourcing mode, the chat corpus can be widely obtained, and the chat corpus can be easily obtained, and also not inconvenient for privacy reasons.
On the other hand, because the collection of the chat corpus is realized based on the user drawing set information and the target corpus information, the conversation carried out in the chat corpus collection process is a targeted real conversation process, and the obtained chat corpus conforms to the chat corpus required by the target intelligent service system and reflects the real conversation process, so that the effect of the robot can be improved to the maximum extent.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a schematic illustration of an implementation environment according to the present disclosure;
FIG. 2 is a block diagram illustrating an apparatus in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a method for collecting chat corpuses applied to an intelligent service system in accordance with an illustrative embodiment;
FIG. 4 is a flowchart illustrating a step of dynamically generating session initiation content according to user portrait information and target corpus information corresponding to a user identifier according to the corresponding embodiment in FIG. 3;
fig. 5 is a flowchart illustrating a step in which the intelligent service system of the embodiment corresponding to fig. 3 performs a session related to session initiation content between itself and a target user through a session system, and stores respective reply messages of the intelligent service system and the target user in the session to form a chat corpus;
FIG. 6 is a block diagram illustrating a process by which an intelligent service system processes user reply information and obtains personalized reply information through semantic retrieval in accordance with an exemplary embodiment;
FIG. 7 is a flow diagram illustrating a method for chat corpus collection applied to a conversation system in accordance with an illustrative embodiment;
fig. 8 is a flowchart illustrating a step of performing, by the session system according to the embodiment shown in fig. 7, target user selection on a user initiating a session request to obtain a user identifier corresponding to a target user;
fig. 9 is a flowchart illustrating a method of collecting chat corpuses applied to a user terminal according to an exemplary embodiment;
FIG. 10 is a flowchart illustrating a session between the user himself and the intelligent service system in the session initiating content initiating session system according to the corresponding embodiment of FIG. 9, in which the user reply information is returned to the intelligent service system through the session system according to the session initiating content or the personalized reply information of the intelligent service system;
FIG. 11 is a diagram illustrating a chat corpus collection framework implemented by a user through a drift bottle plug-in, according to an example embodiment;
FIG. 12 is a session flow diagram illustrating an anonymous session system in accordance with an exemplary embodiment;
FIG. 13 is a flow diagram illustrating the operation of an anonymous session system in accordance with an exemplary embodiment;
FIG. 14 is a flow diagram illustrating a method of chat corpus collection in accordance with another illustrative embodiment;
FIG. 15 is a flowchart illustrating a step of conducting a conversation between the smart service system and the user through a broadcast message and saving respective reply messages of the smart service system and the user in the conversation to form a chat corpus in accordance with the corresponding embodiment of FIG. 14;
FIG. 16 is a block diagram illustrating a chat corpus collection apparatus for use on the machine side, in accordance with an illustrative embodiment;
FIG. 17 is a block diagram of a content generation module of the corresponding embodiment of FIG. 16;
FIG. 18 is a block diagram of a corpus processing module of the corresponding embodiment of FIG. 16;
fig. 19 is a block diagram illustrating a chat corpus collection apparatus for use in a conversation system in accordance with an exemplary embodiment;
FIG. 20 is a block diagram of a target user selection module of the corresponding embodiment of FIG. 19;
fig. 21 is a block diagram illustrating a chat corpus collecting apparatus applied to a user terminal according to an exemplary embodiment;
FIG. 22 is a block diagram of a chat reply module of the corresponding embodiment of FIG. 21;
fig. 23 is a block diagram illustrating a chat corpus collection apparatus according to another exemplary embodiment;
fig. 24 is a block diagram of a session processing module shown in a corresponding embodiment of fig. 23.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
FIG. 1 is a schematic illustration of an implementation environment according to the present disclosure. The implementation environment includes: an intelligent service system 110, a session system 130 and a user terminal 150.
The intelligent service system 110 and the user terminal 150 both access the session system 130 to enable a session between the intelligent service system 110 and the user terminal 150.
Fig. 2 is a block diagram illustrating an apparatus 200 according to an example embodiment. For example, the apparatus 200 may be an intelligent service system and a session system in the implementation environment shown in FIG. 1. The intelligent service system may be, for example, a server running robot software. The session system is a server or a server cluster for implementing the session function.
Referring to fig. 2, the apparatus 200 may have relatively large differences according to configuration or performance, and may include one or more Central Processing Units (CPUs) 222 (e.g., one or more processors) and a memory 232, one or more storage media 230 (e.g., one or more mass storage devices) storing an application 242 or data 244, wherein the memory 232 and the storage media 230 may be transient or persistent storage, the program stored in the storage media 230 may include one or more modules (not shown), each of which may include a series of instruction operations on a server, further, the central processor 222 may be configured to communicate with the storage media 230 to execute the series of instruction operations on the storage media 230 on the server 200, the server 200 may further include one or more power supplies 226, one or more wired or wireless network interfaces 250, one or more input/output interfaces 258, and/or one or more operating systems 241, such as Windows service, Mac XTM, UnixTM, unixk &, the steps shown in fig. 7, fig. 8, the following fig. 14, the exemplary embodiments of which are illustrated by ttg, ttw, tth, and 7.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Fig. 3 is a flow diagram illustrating a method for chat corpus collection in accordance with an exemplary embodiment. The method for collecting the chat corpus is applied to the intelligent service system 110 in the implementation environment shown in fig. 1, and as shown in fig. 3, the method for collecting the chat corpus, which may be executed by the intelligent service system 110, may include the following steps.
In step 310, the intelligent service system obtains a user identifier corresponding to the target user.
The intelligent service system is a machine which runs chat robot software and has a conversation function. The target user is a user who is about to have a conversation with the intelligent service system, and the target user can be selected from a plurality of users.
In step 330, a session initiation content is dynamically generated according to the user portrait information and the target corpus information corresponding to the user identifier.
The user identification is used for uniquely identifying the user identity. The user representation information includes at least user attributes and the like, and each user has user representation information uniquely corresponding to the user identification. The user drawing set information is used for representing the situation of the user in various aspects, and further the user is drawn through the form of information.
The target language material information is preset by the intelligent service system according to the required chat language material content. In an exemplary embodiment, the target corpus information may be a target corpus type, for example, the target corpus type may be of a sports type or a food type.
And the intelligent service system generates the session initiation content according to the user image information and the target corpus information. Therefore, the generated session initiation content is matched with the user drawing set information and the target corpus information.
On one hand, the session initiating content generated by the method is consistent with the situation of the target user, so that the effectiveness of pushing the subsequent session initiating content is improved, namely, the session initiating content pushed to the target user subsequently can cause resonance or interest of the target user, so that the possibility of subsequently receiving user replies is greatly improved, and the session initiating content generated by the method is helpful for promoting the collection of the chat linguistic data.
On the other hand, the conversation initiating content generated by the method is consistent with the target corpus information, so that the finally collected chat corpus is guaranteed to be really required by the intelligent service system, and further the targeted chat corpus collection can be realized.
In step 350, the session initiation content is pushed to the session system accessed by the intelligent service system itself.
It should be noted that, the session system is a system having a session function, and can implement a session between users, or even between a user and an intelligent service system. In one exemplary embodiment, this session system may be an anonymous session system.
The conversation system is connected with a plurality of users, and in order to realize the corpus collection of the intelligent service system, the intelligent service system is also connected with the conversation system.
The session initiation content generated by the intelligent service system for the target user will be used to start conducting a session between the intelligent service system and the target user.
Specifically, the intelligent service system pushes the session initiation content generated by the intelligent service system to the session system, and the session system is used as a relay to push the session initiation content to the terminal where the target user is located, so that the session between the intelligent service system and the target user is initiated.
In step 370, the intelligent service system performs a session related to the session initiation content between itself and the target user through the session system, and stores the respective reply information of the intelligent service system and the target user in the session to form a chat corpus.
The reply information contains reply information of the intelligent service system to the target user along with the continuous progress of the session, namely personalized reply information which is generated by the intelligent service system and returned to the target user; there is also user reply information sent by the target user according to the personalized reply information of the intelligent service system. Therefore, the reply information includes personalized reply information and user reply information.
Under the action of the session system, the intelligent service system carries out session with the target user. At this time, for the intelligent service system, the user reply information of the target user is continuously received, and the personalized reply information is generated according to the user reply information of the target user and is sent to the target user.
The conversation initiating content, the user reply information of the target user and the personalized reply information of the intelligent service system form the context of the conversation between the intelligent service system and the target user according to the respective time sequence, namely a chat record, and the chat record can be used as the chat corpus of the intelligent service system.
Through the process, the conversation between the intelligent service system and the target user is realized, and the chat linguistic data collection of the intelligent service system is further realized through the conversation, wherein due to the fact that numerous users exist in the conversation system, a plurality of users can be selected as the target users, the conversation between the intelligent service system and the target users is further realized, the collection of the chat linguistic data is further rapidly realized, and the collection efficiency of the chat linguistic data is effectively improved.
It should be particularly noted that the session initiation content and the reply message may be in a text form, but not limited to the text form, a voice form, and a picture form.
In describing the details of step 310, shown in accordance with an exemplary embodiment, step 310 may include the following steps.
The intelligent service system receives a user identifier returned to the intelligent service system by the session system through target user selection, wherein the user identifier corresponds to the target user.
And the session system returns the user identification corresponding to the target user to the intelligent service system after the selection of the target user is completed.
The intelligent service system receives the user identification returned by the session system, and then chat corpus collection can be carried out on the target user corresponding to the user identification.
By the method, the intelligent service system can realize self chat corpus collection by means of the session system, and further conveniently collect enough real chat corpuses in enough fields.
FIG. 4 is a depiction of the details of step 330, shown in accordance with an exemplary embodiment. As shown in fig. 4, the step 330 may include the following steps.
In step 331, user profile information is extracted based on the user identification.
As mentioned above, the user portrait information at least includes user attribute information, such as a city, a gender, and an age of the user. The user portrait information is stored with the user identification as an index.
Therefore, after the user identification corresponding to the target user returned by the session system is obtained, the user portrait information is searched according to the user identification, and the user portrait information corresponding to the user identification is extracted.
In step 333, session initiation content associated with the target corpus information and matching the user portrait information is generated.
The dynamic generation of the conversation initiating content is carried out to obtain the conversation initiating content which is related to the target corpus information and matched with the user portrait information, and the fixed and uncertain conversation initiating content is not used, so that the adaptivity of the collection of the chat corpus is improved.
For example, for a female target user under the age of 35, emotional, work-wise session initiation content is generated.
For another example, the preset target corpus information includes a food category and a sports category, i.e., it is desirable to collect chat corpuses related to food and chat corpuses related to sports.
At this time, for a target user whose user portrait information indicates a female, session initiation content related to a gourmet is generated; for a target user whose user pictorial information indicates male, session initiation content related to sports is generated.
FIG. 5 is a depiction of details of step 370, shown in accordance with an exemplary embodiment. As shown in fig. 5, this step 370 may include the following steps.
In step 371, in the session related to the session initiation content between the intelligent service system initiated through the session initiation content itself and the target user, the intelligent service system receives, through the session system, the user reply information returned by the target user according to the session initiation content or the personalized reply information generated by the intelligent service system.
In a session continuously carried out by the intelligent service system and the target user, the intelligent service system receives user reply information returned by the target user, wherein the user reply information corresponds to session initiation content or personalized reply information.
In step 373, the intelligent service system generates a personalized reply message corresponding to the user reply message and returns the personalized reply message to the target user through the session system.
The intelligent service system processes and semantically retrieves the user reply information to generate personalized reply information corresponding to the user reply information.
Specifically, fig. 6 is a block diagram illustrating a flow of processing a user reply message by the intelligent service system according to an exemplary embodiment, and obtaining a personalized reply message through semantic retrieval.
As shown in fig. 6, after receiving the user reply information returned by the target user through the session system, the intelligent service system first processes the user reply information, i.e. performs step 410, specifically including query (retrieved keyword) analysis 411, query inference 413 and intention identification 415, so that the obtained processing result can be used to perform semantic search, i.e. step 420, so as to retrieve the answer of the user reply information.
The query analysis refers to the processing processes of word segmentation, synonym expansion, nonsense word filtering and the like on the user reply information; the query reasoning is the basic judgment of judging whether the user reply information is a negative sentence, an question sentence and the like, and the process can be realized in a keyword matching mode; the intention identification is to judge the category, the field and the like of the reply information of the user, the process can be realized through a preset rule and a classifier, the preset rule is a template of regular expressions, the matching of the templates of the regular expressions is tried, and the classifier is a model trained through a machine learning method.
After the processing of the user reply message is completed, i.e., step 410, a semantic detection step, i.e., step 420, is performed to search for a corresponding answer in the dialog knowledge base, where the searched answer may be multiple answers, at this time, each answer is scored and sorted to finally obtain the answer with the highest score, and the personalized reply is performed using the answer, and step 430, i.e., a personalized reply message is generated and returned to the target user.
In the implementation of this process, the data module 450 composed of user portrait information, user behavior, dialog management and context management will be used as a support to ensure the continuity of the dialog.
In step 375, the user reply message and the personalized reply message are sequentially saved to form a chat corpus of the session between the intelligent service system and the target user, and the chat corpus is stored.
The user reply information and the personalized reply information which are sequentially obtained according to a certain time sequence are sequentially stored to form a complete chat record, and the chat record can be stored as a chat corpus.
Fig. 7 is a flow diagram illustrating a method for chat corpus collection in accordance with an exemplary embodiment. The flow chart of the chat corpus collection method is used in the conversation system 130 of the embodiment shown in fig. 1. As shown in fig. 7, the method for collecting the chat corpus may be performed by the conversation system 110 and may include the following steps.
In step 610, the session system selects a target user from the user initiating the session request to obtain a user identifier corresponding to the target user.
The session request refers to a session request initiated by a user to a session terminal through the terminal. For example, in an anonymous session system, a user may trigger to invoke an anonymous session plugin, and initiate a session request through triggering of a preset button in the anonymous session plugin, wherein the session request carries a user identifier.
For the session system, the received session requests will come from multiple users respectively, in other words, the multiple users all initiate session requests to the session system.
At this point, the dialog system may make a selection of the target user among these users.
In step 630, the session initiation content generated by the intelligent service system for the target user is obtained.
After the selection of the target users is completed, the session system provides the user identification corresponding to the target users to the intelligent service system so as to obtain the session initiation content of each target user from the intelligent service system.
In step 650, the session initiation content is pushed through the friend relationship established between the intelligent service system and the target user.
The session system pushes the session initiation content, that is, the session initiation content dynamically generated by the intelligent service system is respectively pushed to the corresponding target users.
And the intelligent service system realized by the session system pushes the session initiation content of the target user based on the friend relationship established between the intelligent service system and the target user. That is, in the process, the intelligent service system establishes a friend relationship with the target user.
In step 670, a session between the intelligent service system and the target user in the session system is initiated by a push of the session initiation content.
Through the process, the conversation between the intelligent service system and the target user in the conversation terminal is realized, even the conversation between the intelligent service system and a large number of target users is realized, and further the collection of the chat linguistic data is carried out in a large user range. In other words, the session terminal is utilized to distribute the collection task of the chat corpus to a plurality of target users, so that the target users can participate in the collection task, privacy reasons do not need to be considered, and on the other hand, the collection task is realized by accessing the session system through the intelligent service system, so that the cost is greatly reduced.
It should be noted here that, no matter what kind of session system, the intelligent service system can be accessed according to the actual operation requirement to realize the collection of the chat corpus, so the scheme provided by the present disclosure has very high versatility, and can be adapted to the collection of the chat corpus in various scenes, thereby breaking the dilemma of the existing collection of the chat corpus.
FIG. 8 is a depiction of the details of step 610, shown in accordance with an exemplary embodiment. As shown in fig. 8, the method for collecting chat corpuses may include the following steps.
In step 611, when starting to perform chat corpus collection, the user initiating the session request is determined whether the user has a session with the intelligent service system this time, if so, step 613 is executed, and if not, step 615 is executed.
The conversation system is provided with a chat corpus collection switch so as to be opened or closed according to project requirements, and when the chat corpus collection switch is opened, the chat corpus collection can be started.
When starting to collect chatting linguistic data, if a conversation request is received, judging whether a user initiating the conversation request matches the user with the intelligent service system at this time, namely, the conversation with the robot is carried out, if so, taking the user as a target user, and if not, executing a conversation process between the user and other users.
Specifically, the process of determining whether the user has a conversation with the robot at this time may be implemented by a preset standard. Wherein, the preset standard includes but not limited to:
(1) the method has the advantages that the user is prevented from repeatedly carrying out conversation with the intelligent service system in a short time, so that the user is prevented from being harassed;
(2) and selecting a target user from the users according to the target corpus information and the user portrait information.
For example, during a period of time, a user will only have one session with the intelligent service system;
for another example, if it is desired to collect chat corpuses about female users, male users are not selected as target users, but only female users are selected as target users.
In step 613, the user is taken as a target user, and a user identifier corresponding to the target user is obtained.
In step 615, a session flow with other users in the session system is performed.
Through the process, the target user selection suitable for the collection of the chat linguistic data is realized, and great convenience is further provided for the accurate collection of the chat linguistic data carried out subsequently.
The method for collecting the chat corpus according to an exemplary embodiment may further include the following steps.
And the session system respectively updates the friend relationship corresponding to the intelligent service system and the friend relationship corresponding to the target user, and establishes the friend relationship between the intelligent service system and the target user through the update of the friend relationship.
The session systems store the friend relationship corresponding to the intelligent service system and the friend relationship corresponding to the target user. In an exemplary embodiment, the friend relationship corresponding to the intelligent service system may be in the form of a list, that is, a friend list corresponding to the intelligent service system; correspondingly, the buddy relationship corresponding to the target user may also be in the form of a list, that is, a buddy list corresponding to the target user.
The updating of the friend relationship performed by the intelligent service system is a process of adding the user identifier of the target user to the friend list of the target user, and the friend relationship performed on the target user is similar to the above.
Therefore, the friend state between the intelligent service system and the target user is established in the session system, and the session between the intelligent service system and the target user in the session system can be carried out on the basis.
Fig. 9 is a flow diagram illustrating a method for collecting chat corpuses according to an example embodiment. The method for collecting chat corpus is applied to the user terminal 150 in the embodiment shown in fig. 1. As shown in fig. 9, the method for collecting the chat corpus may be performed by the user terminal 150 and may include the following steps.
In step 710, a session request is initiated to the session system by the anonymous session plugin that it invokes.
Wherein, in an exemplary embodiment, the session system referred to is an anonymous session system. The user terminal 150 is configured with an anonymous session plug-in. The anonymous session plug-in is used for enabling the user terminal to access the anonymous session system so as to realize the session request initiation and subsequent sessions of the user in the anonymous session system.
Specifically, the anonymous session page can be accessed by calling the anonymous session plug-in, and a session request can be initiated to the anonymous session system in the anonymous session page through triggering of a certain button.
In step 730, the session initiation content pushed by the session system by establishing the friend relationship between the user himself and the intelligent service system is received.
When the user is selected as the target user for collecting the chat corpus, the session initiation content which is returned by the session system and dynamically generated by the intelligent service system is received.
In step 750, a session between the user and the intelligent service system is initiated through the session initiation content, and in the session, the user reply information is returned to the intelligent service system through the session system according to the session initiation content or the personalized reply information of the intelligent service system.
In the anonymous session page which is called by the anonymous session plug-in to jump into, the session initiating content or the personalized reply information returned by the intelligent service system according to the reply of the target user can be replied so as to return the user reply information corresponding to the session initiating content or the personalized reply information.
Through the process, the chat corpus collection based on the user terminal is realized, and the process is not different from a real conversation process, namely, the chat corpus is a real chat scene corresponding to a target user, so that the authenticity of the chat corpus is ensured.
A method for collecting chat corps according to an exemplary embodiment may further include the following steps.
And displaying the session initiating content or the personalized reply information through the anonymous session plug-in called by the user.
When receiving session initiation content or personalized reply information returned by a session system, the anonymous session plugin called by the user terminal displays the session initiation content or the personalized reply information through an anonymous session page which is jumped into by the anonymous session plugin, so that the user can search and reply conveniently.
FIG. 10 is a depiction of the details of step 750, shown in accordance with an exemplary embodiment. As shown in fig. 10, the step 750 may include the following steps.
In step 751, in a session between the user himself in the session initiating system and the intelligent service system through the session initiating content, user reply information corresponding to the session initiating content or the personalized reply content of the intelligent service system is acquired through the called anonymous session plug-in.
The input user reply information can be obtained through the anonymous session page of the anonymous session plug-in, is corresponding to the session initiating content or the personalized reply content of the intelligent service system, and can be the genre information, the voice information or even the picture information.
In step 753, a user reply message is returned to the intelligent service system via the session system.
The chat corpus collection method is described by taking a user terminal as an intelligent mobile phone and a conversation system as an anonymous conversation system as an example and combining a specific application scene. The anonymous session plug-in configured in the smart phone is a drift bottle plug-in corresponding to the anonymous session system.
FIG. 11 illustrates a chat corpus collection framework implemented by a user through a drift bottle plug-in, according to an example embodiment.
Under this framework, a bottle pool 810 is implemented by the anonymous session system, which stores various session initiation content, which may be from other users or from the intelligent service system 820.
A user initiates a conversation request by calling a bottle picking process initiated by the drift bottle plug-in; the intelligent service system 820 also sends the session initiation content dynamically generated for the user to the bottle pool 810 of the anonymous session system by invoking a bottle dropping process initiated by the self-configured drift bottle plugin, and further the session initiation content is sent to the user by the bottle pool 810.
Thereby enabling a session between the intelligent service system 820 and the user.
In a specific implementation, the intelligent service system and the session realized by the user in the anonymous session system through the drift bottle plug-in must follow the original flow of the anonymous session system.
Fig. 12 is a session flow of an anonymous session system, shown in accordance with an example embodiment. It should be noted that in the anonymous session system, a certain quota is set for the bottle loss and the bottle pick-up respectively, so as to limit the number of bottle losses and bottle pick-up times of the user. The drift bottle index 910 is an implementation of the bottle pool 810 shown in fig. 11.
For the process of the user dropping and picking up the bottle as shown in fig. 12, since the chat corpus collection of the present disclosure only relates to the process of the user picking up the bottle, the process of the user picking up the bottle is explained herein.
Please refer to the execution process of S920 to S930 shown in fig. 12. When the user calls the drift bottle plug-in to jump into an anonymous session page, namely the drift bottle page, the user can initiate the processes of bottle losing and bottle picking on the drift bottle page.
In the process of jumping into the drift bottle page, checking the user attribute to obtain the attribute information of the city and the gender of the user and the bottle picking times of the user in a limited time range, and performing quota check to obtain the quota of the bottle picking corresponding to the user currently, namely executing the processes of S920 and S930.
When a quota of bottle picking up exists, a user can initiate a bottle picking up process on a drift bottle page, namely a conversation request is initiated to the anonymous conversation system, and at the moment, as the user is selected as a target user for chat corpus collection by the anonymous conversation system, the friend relationship of the user in the anonymous conversation system is updated, and conversation initiating content dynamically generated by the intelligent service system is obtained. Correspondingly, attributes such as quota are also updated, i.e. the process of S960 is executed.
FIG. 13 illustrates the operation of the anonymous session system in accordance with an exemplary embodiment. As shown in fig. 13, when the user initiates the process of picking up the bottle, the anonymous session system will obtain the user attributes and check the quota of this user, i.e., execute S1010 and S1020.
When the user attribute is obtained and the user is confirmed to have a quota for picking up the bottle, it is determined whether chat corpus collection is required at present, that is, S1030 is performed.
When it is determined that the chat corpus collection is required, it is further determined whether the user has picked up the bottle of the intelligent service system this time, i.e., S1040 is performed.
If the user does not pick up the bottle of the intelligent service system at this time, a normal bottle picking process is executed, namely, bottles of other users are obtained from the drift bottle index, the session initiating content in the bottles of the other users is obtained by updating the friend relationship, and attributes such as quota are updated correspondingly, namely, the processes from S1050 to S1070 are directly executed.
If the user is judged to pick up the bottle of the intelligent service system this time, the session initiation content is obtained from the intelligent service system, and the step S1080 is executed.
At this time, the session initiation content generated by the intelligent service system is also sent to the user through the update of the friend relationship, and the collection of the chat linguistic data can be realized through the continuous reply of the user and the intelligent service system.
Fig. 14 is a flow diagram illustrating a method for chat corpus collection in accordance with an exemplary embodiment. The method for collecting chat corpuses can be used in the conversation system 130 of the embodiment shown in fig. 1. As shown in fig. 14, the method for collecting chat corpuses may include the following steps.
In step 1110, a broadcast message generated by the intelligent service system according to the target corpus information is obtained.
The intelligent service system collects and generates target corpus information according to the current chat corpus to be performed, and then generates broadcast messages corresponding to the target corpus information.
The broadcast message is used for broadcasting to all or a large number of users by the intelligent service system in the session system accessed by the intelligent service system so as to initiate wide chat corpus collection.
In step 1130, the broadcasting message generated by the smart service system is broadcasted to the user through the session system accessed by the smart service system.
In step 1150, a conversation between the smart service system and the user is performed through the broadcast message, and respective reply messages of the smart service system and the user in the conversation are saved to form a chat corpus.
Wherein, the conversation between the intelligent service system and the user is initiated through the broadcast message, and the conversation between the user and the intelligent service system is carried out along with the response of the user to the broadcast message.
With the progress of the session, on one hand, for the intelligent service system, user reply information returned by the user responding to the broadcast message is received firstly;
in contrast, the intelligent service system acquires the user portrait information of the user, and generates personalized reply information corresponding to the user reply information according to the user portrait information, so that the conversation between the intelligent service system and the user can be continued.
In subsequent ongoing sessions, the intelligent service system will continuously generate personalized reply information for the received user reply information.
On the other hand, the conversation system is used as a transfer channel between the intelligent service system and the user to realize chat corpus collection for the accessed intelligent service system.
FIG. 15 is a depiction of details of step 1150, shown in accordance with an exemplary embodiment. This step 1150, as shown in fig. 15, may include the following steps.
In step 1151, in a session between the smart service system and the user initiated through the broadcast message, user reply information returned by the user is received, where the user reply information corresponds to the broadcast message or personalized reply information of the smart service system.
The personalized reply information is generated by the intelligent service system in the session according to the user portrait information aiming at the user reply information returned by the user and is used for replying the user.
And as the session system sends out the broadcast message and the personalized reply of the intelligent service system in sequence, the session system also receives the user reply information returned by the user in sequence.
In step 1153, the personalized reply information corresponding to the user reply information generated by the intelligent service system according to the user image is obtained and returned to the user.
In step 1155, the broadcast message, the user reply message, and the personalized reply message are sequentially saved to form a chat corpus of the session between the intelligent service system and the user, and the chat corpus is stored.
Through the process, targeted collection of the chat linguistic data can be realized in a wide range of users, and therefore the efficiency of collection of the chat linguistic data is improved to the maximum extent.
The following are the implementations of the device of the present disclosure
For example, the embodiment of the method for collecting the chat corpus performed by the intelligent service system 110 according to the present disclosure may be implemented. For the details not disclosed in the embodiments of the disclosed apparatus, please refer to the embodiments of the chat corpus collection method disclosed in the present disclosure.
Fig. 16 is a block diagram illustrating a chat corpus collection apparatus according to an example embodiment. The apparatus for collecting chat corpus can be used in the intelligent service system 110 of the implementation environment shown in fig. 1 to perform all the steps of the method for collecting chat corpus shown in fig. 3. As shown in fig. 16, the apparatus for collecting chat corpus includes, but is not limited to: a target user identification obtaining module 1210, a content generating module 1230, a session initiating content pushing module 1250, and a corpus processing module 1270.
A target user identifier obtaining module 1210, configured to obtain a user identifier corresponding to a target user.
And a content generating module 1230, configured to dynamically generate session initiation content according to the user portrait information and the target corpus information corresponding to the user identifier.
The session initiation content pushing module 1250 is configured to push the session initiation content to the session system accessed by the intelligent service system itself.
And the corpus processing module 1270 is configured to perform a session related to the session initiation content between the intelligent service system itself and the target user through the session system, and store respective reply information of the intelligent service system and the target user in the session to form a chat corpus.
Optionally, the target identifier obtaining module 1210 is further configured to receive a user identifier returned by the session system to the intelligent service system through target user selection, where the user identifier corresponds to the target user.
FIG. 17 is a depiction of details of the content generation module 1130 shown in accordance with an exemplary embodiment. As shown in fig. 17, the content generation module 1230 includes, but is not limited to: an image extracting unit 1231 and a content generating executing unit 1233.
A portrait extraction unit 1231, configured to extract user portrait information based on the user identification.
And a content generation executing unit 1233, configured to generate the session initiation content related to the target corpus information and matching with the user profile.
FIG. 18 is a depiction of details of the corpus processing module 1270 shown in accordance with an exemplary embodiment. As shown in fig. 18, the corpus processing module 1270 includes but is not limited to: a user reply information receiving unit 1271, a personalized reply generating unit 1273 and a corpus storing unit 1275.
The user reply information receiving unit 1271 is configured to receive, by the session system, user information returned by the target user according to the session initiation content or the personalized reply information generated by the intelligent service system, in a session between the intelligent service system itself and the target user initiated by the session initiation content.
And the personalized reply generation unit 1273 is used for generating personalized reply information corresponding to the user reply information and returning the personalized reply information to the target user through the session system.
And the corpus storage unit 1275 is used for sequentially storing the user reply information and the personalized reply information to form a chat corpus between the intelligent service system and the target user and storing the chat corpus.
Fig. 19 is a block diagram illustrating a chat corpus collection apparatus according to an example embodiment. The chat corpus collecting device includes but is not limited to: a target user selecting module 1310, a content obtaining module 1330, a content pushing module 1350 and a session initiating module 1370.
The target user selecting module 1310 is configured to select a target user from a user initiating the session request, so as to obtain a user identifier corresponding to the target user.
The content obtaining module 1330 is configured to obtain session initiation content generated by the intelligent service system for the target user.
The content push module 1350 is configured to push the session initiation content through the friend relationship established between the intelligent service system and the target user.
A session initiating module 1370, configured to initiate a session between the intelligent service system and the target user in the session system by pushing the session initiation content.
Optionally, as shown in fig. 20, the target user selecting module 1310 includes a session determining unit 1311 and an identifier obtaining unit 1313.
The session determining unit 1311 is configured to determine, when starting to perform chat corpus collection, whether the user initiating the session request performs a session with the intelligent service system this time, if so, notify the identifier obtaining unit 1313, and if not, perform a session procedure with another user in the session system.
An identifier obtaining unit 1313, configured to take the user as a target user, and obtain a user identifier corresponding to the target user.
According to another exemplary embodiment, a chat corpus collecting device is shown, which further includes a relationship updating module.
And the relationship updating module is used for respectively updating the friend relationship corresponding to the intelligent service system and the friend relationship corresponding to the target user, and establishing the friend relationship between the intelligent service system and the target user through the updating of the friend relationship.
Fig. 21 illustrates a chat corpus collecting apparatus applied to a session between a user and an intelligent service system according to an exemplary embodiment, where the chat corpus collecting apparatus includes, but is not limited to: a request initiation module 1410, a session initiation content receiving module 1430, and a chat reply module 1450.
A request initiating module 1410 configured to initiate a session request to the session system through the called anonymous session plugin.
A session initiation content receiving module 1430, configured to receive session initiation content pushed by the session system by establishing a friend relationship between the user and the intelligent service system.
The chat reply module 1450 is configured to initiate a session between the user and the intelligent service system in the session system through the session initiation content, and return, in the session, the user reply information to the intelligent service system through the session system according to the session initiation content or the personalized reply information of the intelligent service system.
Optionally, the apparatus for collecting chat corpus further includes a display module. The display module is used for displaying the session initiation content or the personalized reply information through the called anonymous session plug-in.
FIG. 22 is a depiction of details of a chat reply module, shown in accordance with an exemplary embodiment. The chat reply module 1450 includes, but is not limited to: a user reply acquisition unit 1451 and an information return unit 1453.
A user reply obtaining unit 1451, configured to obtain, in a session between the user himself and the intelligent service system in the session initiation content-initiated session system, user reply information corresponding to the session initiation content or the personalized reply content of the intelligent service system through the invoked anonymous session plugin.
An information returning unit 1453, configured to return the user reply information to the intelligent service system through the session system.
Fig. 23 is a block diagram illustrating a chat corpus collection apparatus according to an example embodiment. As shown in fig. 23, the chat corpus collecting device includes but is not limited to: a broadcast acquisition module 1510, a broadcast module 1530, and a session processing module 1550.
The broadcast obtaining module 1510 is configured to obtain a broadcast message generated by the intelligent service system according to the target corpus information.
And a broadcasting module 1530 for broadcasting the broadcasting message generated by the intelligent service system to the user through the session system accessed by the intelligent service system.
And the session processing module 1550 is configured to perform a session between the intelligent service system and the user through the broadcast message, where respective reply messages of the intelligent service system and the user in the session are stored to form a chat corpus.
FIG. 24 is a diagram illustrating details of a session handling module in accordance with an illustrative embodiment. As shown in fig. 24, the session processing module 1550 includes but is not limited to: a user reply receiving unit 1551, a personalized reply acquiring unit 1553 and a chat corpus storage unit 1555.
The user reply receiving unit 1551 is configured to receive user reply information returned by the user in a session between the intelligent service system and the user initiated through the broadcast message, where the user reply information corresponds to the broadcast message or the personalized reply information of the intelligent service system.
And the personalized reply acquisition unit 1553 is used for acquiring personalized reply information corresponding to the user reply information generated by the intelligent service system according to the user image and returning the personalized reply information to the user.
And the chat corpus storage unit 1555 is used for storing the broadcast message, the user reply message and the personalized reply message in sequence to form a chat corpus of a conversation between the intelligent service system and the user and storing the chat corpus.
Optionally, the present disclosure further provides a chat corpus collecting apparatus, which may be used in the intelligent service system 110 in the implementation environment shown in fig. 1 to execute all or part of the steps of the chat corpus collecting methods shown in fig. 3, fig. 4, fig. 5, fig. 7, fig. 8, fig. 9, fig. 10, fig. 14, and fig. 15. The device comprises:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform:
the intelligent service system acquires a user identifier corresponding to a target user;
dynamically generating session initiation content according to user portrait information and target corpus information corresponding to the user identification;
pushing session initiation content to a session system accessed by the intelligent service system, and initiating a session between the intelligent service system and a target user through the session initiation content;
the intelligent service system carries out conversation between the intelligent service system and the target user through the conversation system, and respective reply information of the intelligent service system and the target user in the conversation is stored to form a chat corpus.
The specific manner in which the processor of the apparatus in this embodiment performs operations has been described in detail in relation to the xx method, and will not be elaborated upon here.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (14)

1. A chat corpus collection method is applied to a session between an intelligent service system and a target user, and comprises the following steps:
the intelligent service system acquires a user identifier corresponding to a target user;
dynamically generating session initiation content according to user portrait information and target corpus information corresponding to the user identification, wherein the target corpus information is a target corpus type preset by the intelligent service system according to the required chat corpus content;
pushing the session initiation content to a session system accessed by the intelligent service system, wherein the session system is an anonymous session system;
the intelligent service system carries out anonymous conversation related to the conversation initiating content between the intelligent service system and a target user through the conversation system, and respective reply information of the intelligent service system and the target user in the conversation is stored to form a chat corpus;
in a session which is between an intelligent service system initiated by the session initiating content and a target user and is related to the session initiating content, the intelligent service system receives user reply information returned by the target user according to the session initiating content or personalized reply information generated by the intelligent service system through a session system;
the intelligent service system generates personalized reply information corresponding to the user reply information and returns the personalized reply information to the target user through the session system;
and the user reply information and the personalized reply information are sequentially stored to form a chat corpus of the conversation between the intelligent service system and the target user, and the chat corpus is stored.
2. The method according to claim 1, wherein the step of the intelligent service system obtaining the user identifier corresponding to the target user comprises:
the intelligent service system receives a user identifier returned to the intelligent service system by the session system through target user selection, wherein the user identifier corresponds to the target user.
3. The method according to claim 1, wherein the step of dynamically generating the session initiation content according to the user portrait information and the target corpus information corresponding to the user identifier comprises:
extracting user portrait information according to the user identification;
and generating session initiation content which is related to the target corpus information and is matched with the user portrait information.
4. A method for collecting chat corpuses is applied to a session system of an intelligent service system participating in a session, and is characterized in that the method comprises the following steps:
the conversation system selects a target user for a user initiating a conversation request to obtain a user identifier corresponding to the target user, wherein the conversation system is an anonymous conversation system;
acquiring session initiating content generated for the target user by the intelligent service system, wherein the session initiating content is dynamically generated by the intelligent service system according to user portrait information and target corpus information corresponding to the user identifier, and the target corpus information is a target corpus type preset by the intelligent service system according to the required chat corpus content;
pushing the session initiation content through a friend relationship established between the intelligent service system and the target user;
initiating an anonymous session between an intelligent service system and a target user in the session system by pushing the session initiation content;
in a session which is related to the session initiating content and is between the intelligent service system initiated by the session initiating content and a target user, the session system receives user reply information returned by the target user according to the session initiating content or personalized reply information generated by the intelligent service system;
the session system receives the personalized reply information corresponding to the user reply information generated by the intelligent service system and returns the personalized reply information to the target user;
and the user reply information and the personalized reply information are sequentially stored to form a chat corpus of the conversation between the intelligent service system and the target user, and the chat corpus is stored.
5. The method according to claim 4, wherein the step of the session system selecting the target user for the user initiating the session request to obtain the user identifier corresponding to the target user comprises:
when starting to collect chatting linguistic data, judging whether the user initiating a conversation request has a conversation with the intelligent service system or not, if so, judging that the user has the conversation with the intelligent service system or not, and if not, judging that the conversation is carried out between the user initiating the conversation request and the intelligent service system
And taking the user as a target user to obtain a user identifier corresponding to the target user.
6. The method of claim 4, wherein before the step of pushing the session initiation content through the established friend relationship between the intelligent service system and the target user, the method further comprises:
and the session system respectively updates the friend relationship corresponding to the intelligent service system and the friend relationship corresponding to the target user, and establishes the friend relationship between the intelligent service system and the target user through the update of the friend relationship.
7. A method for collecting chat corpuses, the method comprising:
acquiring a broadcast message generated by an intelligent service system according to target corpus information, wherein the target corpus information is a target corpus type preset by the intelligent service system according to the required chat corpus content;
broadcasting a broadcast message generated by the intelligent service system to a user through a session system accessed by the intelligent service system, wherein the session system is an anonymous session system;
anonymous conversation between the intelligent service system and the user is carried out through the broadcast message, and respective reply information of the intelligent service system and the user in the conversation is stored to form a chat corpus;
receiving user reply information returned by a user in a session between the intelligent service system and the user initiated by the broadcast message, wherein the user reply information corresponds to the broadcast message or personalized reply information of the intelligent service system;
acquiring personalized reply information corresponding to the user reply information generated by the intelligent service system according to the user portrait, and returning the personalized reply information to the user;
and the broadcast message, the user reply information and the personalized reply information are sequentially stored to form a chat corpus of the conversation between the intelligent service system and the user, and the chat corpus is stored.
8. The utility model provides a chat corpus collection device, its intelligent service system that is applied to which characterized in that, the device includes:
the target user identification acquisition module is used for acquiring a user identification corresponding to a target user;
the content generation module is used for dynamically generating session initiation content according to the user portrait information and the target corpus information corresponding to the user identification, wherein the target corpus information is preset by the intelligent service system according to the required chat corpus content;
the session initiation content pushing module is used for pushing the session initiation content to a session system which is accessed by the intelligent service system, and the session system is an anonymous session system;
the corpus processing module is used for carrying out anonymous conversation related to the conversation initiating content between the intelligent service system and the target user through the conversation system and storing respective reply information of the intelligent service system and the target user in the conversation to form a chat corpus;
in a session which is between an intelligent service system initiated by the session initiating content and a target user and is related to the session initiating content, the intelligent service system receives user reply information returned by the target user according to the session initiating content or personalized reply information generated by the intelligent service system through a session system;
the intelligent service system generates personalized reply information corresponding to the user reply information and returns the personalized reply information to the target user through the session system;
and the user reply information and the personalized reply information are sequentially stored to form a chat corpus of the conversation between the intelligent service system and the target user, and the chat corpus is stored.
9. The apparatus of claim 8, wherein the target user identifier obtaining module is further configured to receive a user identifier returned by the session system to the intelligent service system by making a target user selection, and wherein the user identifier corresponds to the target user.
10. The apparatus of claim 8, wherein the content generation module comprises:
a portrait extraction unit to extract user portrait information based on the user identification;
and the content generation execution unit is used for generating the session initiation content which is related to the target corpus information and is matched with the user portrait.
11. A chat corpus collection device applied to a session system of an intelligent service system participating in a session is characterized in that the device comprises:
the target user selection module is used for selecting a target user for a user initiating a session request to obtain a user identifier corresponding to the target user, and the session system is an anonymous session system;
a content obtaining module, configured to obtain session initiation content generated by the intelligent service system for the target user, where the session initiation content is dynamically generated by the intelligent service system according to user portrait information and target corpus information corresponding to the user identifier, and the target corpus information is a target corpus type preset by the intelligent service system according to the required chat corpus content;
the content pushing module is used for pushing the session initiation content through a friend relationship established between the intelligent service system and a target user;
the session initiation module is used for initiating an anonymous session between an intelligent service system and a target user in the session system through the pushing of the session initiation content;
in a session which is related to the session initiating content and is between the intelligent service system initiated by the session initiating content and a target user, the session system receives user reply information returned by the target user according to the session initiating content or personalized reply information generated by the intelligent service system;
the session system receives the personalized reply information corresponding to the user reply information generated by the intelligent service system and returns the personalized reply information to the target user;
and the user reply information and the personalized reply information are sequentially stored to form a chat corpus of the conversation between the intelligent service system and the target user, and the chat corpus is stored.
12. The apparatus of claim 11, wherein the target user selection module comprises:
the conversation judging unit is used for judging whether the user initiates a conversation request to have a conversation with the intelligent service system when starting to collect the chat linguistic data, and if so, the user notifies the identification obtaining unit;
the identification obtaining unit is used for taking the user as a target user and obtaining a user identification corresponding to the target user.
13. The apparatus of claim 11, further comprising:
and the relationship updating module is used for respectively updating the friend relationship corresponding to the intelligent service system and the friend relationship corresponding to the target user, and establishing the friend relationship between the intelligent service system and the target user through the updating of the friend relationship.
14. A chat corpus collection apparatus, comprising:
the broadcast acquisition module is used for acquiring a broadcast message generated by the intelligent service system according to target corpus information, wherein the target corpus information is a target corpus type preset by the intelligent service system according to the content of the required chat corpus;
the broadcasting module is used for broadcasting the broadcast message generated by the intelligent service system to a user through a session system accessed by the intelligent service system, wherein the session system is an anonymous session system;
the session processing module is used for carrying out anonymous session between the intelligent service system and the user through the broadcast message, and respective reply information of the intelligent service system and the user in the session is stored to form a chat corpus;
receiving user reply information returned by a user in a session between the intelligent service system and the user initiated by the broadcast message, wherein the user reply information corresponds to the broadcast message or personalized reply information of the intelligent service system;
acquiring personalized reply information corresponding to the user reply information generated by the intelligent service system according to the user portrait, and returning the personalized reply information to the user;
and the broadcast message, the user reply information and the personalized reply information are sequentially stored to form a chat corpus of the conversation between the intelligent service system and the user, and the chat corpus is stored.
CN201610556147.8A 2016-07-14 2016-07-14 Chat corpus collection method and device Active CN107623621B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201610556147.8A CN107623621B (en) 2016-07-14 2016-07-14 Chat corpus collection method and device
EP17826971.8A EP3487128B1 (en) 2016-07-14 2017-07-11 Method of generating random interactive data, network server, and smart conversation system
PCT/CN2017/092485 WO2018010635A1 (en) 2016-07-14 2017-07-11 Method of generating random interactive data, network server, and smart conversation system
US16/201,415 US11294962B2 (en) 2016-07-14 2018-11-27 Method for processing random interaction data, network server and intelligent dialog system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610556147.8A CN107623621B (en) 2016-07-14 2016-07-14 Chat corpus collection method and device

Publications (2)

Publication Number Publication Date
CN107623621A CN107623621A (en) 2018-01-23
CN107623621B true CN107623621B (en) 2020-08-07

Family

ID=61087705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610556147.8A Active CN107623621B (en) 2016-07-14 2016-07-14 Chat corpus collection method and device

Country Status (1)

Country Link
CN (1) CN107623621B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321728A (en) * 2018-03-31 2019-10-11 汇银宝网络技术股份有限公司 A kind of user self-help consultation method
CN109408800B (en) * 2018-08-23 2024-03-01 阿里巴巴(中国)有限公司 Dialogue robot system and related skill configuration method
CN109492079A (en) * 2018-10-09 2019-03-19 北京奔影网络科技有限公司 Intension recognizing method and device
CN109829039B (en) * 2018-12-13 2023-06-09 平安科技(深圳)有限公司 Intelligent chat method, intelligent chat device, computer equipment and storage medium
CN112445906A (en) * 2019-08-28 2021-03-05 北京搜狗科技发展有限公司 Method and device for generating reply message
CN111027976B (en) * 2019-11-13 2022-06-14 支付宝(杭州)信息技术有限公司 Method for obtaining transaction identity information of fraudulent party
CN112782982A (en) * 2020-12-31 2021-05-11 海南大学 Intent-driven essential computation-oriented programmable intelligent control method and system
CN112685552A (en) * 2021-02-26 2021-04-20 深圳追一科技有限公司 Information pushing method, pushing robot, computer device and storage medium
CN113318438B (en) * 2021-06-30 2023-08-15 北京字跳网络技术有限公司 Virtual prop control method, device, equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1735027A (en) * 2004-08-13 2006-02-15 上海赢思软件技术有限公司 Chat robot system
CN101588323A (en) * 2009-06-11 2009-11-25 腾讯科技(深圳)有限公司 Method and system for publishing message actively in IM group by using chat robots
CN103390047A (en) * 2013-07-18 2013-11-13 天格科技(杭州)有限公司 Chatting robot knowledge base and construction method thereof
CN105068661A (en) * 2015-09-07 2015-11-18 百度在线网络技术(北京)有限公司 Man-machine interaction method and system based on artificial intelligence

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8630961B2 (en) * 2009-01-08 2014-01-14 Mycybertwin Group Pty Ltd Chatbots

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1735027A (en) * 2004-08-13 2006-02-15 上海赢思软件技术有限公司 Chat robot system
CN101588323A (en) * 2009-06-11 2009-11-25 腾讯科技(深圳)有限公司 Method and system for publishing message actively in IM group by using chat robots
CN103390047A (en) * 2013-07-18 2013-11-13 天格科技(杭州)有限公司 Chatting robot knowledge base and construction method thereof
CN105068661A (en) * 2015-09-07 2015-11-18 百度在线网络技术(北京)有限公司 Man-machine interaction method and system based on artificial intelligence

Also Published As

Publication number Publication date
CN107623621A (en) 2018-01-23

Similar Documents

Publication Publication Date Title
CN107623621B (en) Chat corpus collection method and device
CN107278302B (en) Robot interaction method and interaction robot
CN110263197B (en) Image searching method, device, computer equipment and storage medium
TWI519979B (en) Information recommendation method and device thereof and information resource recommendation system
US11488599B2 (en) Session message processing with generating responses based on node relationships within knowledge graphs
CN110166811B (en) Bullet screen information processing method, device and equipment
CN108452526B (en) Game fault reason query method and device, storage medium and electronic device
CN111737441B (en) Human-computer interaction method, device and medium based on neural network
CN112035638B (en) Information processing method, device, storage medium and equipment
US9720982B2 (en) Method and apparatus for natural language search for variables
CN113569037A (en) Message processing method and device and readable storage medium
US11294962B2 (en) Method for processing random interaction data, network server and intelligent dialog system
CN110795589A (en) Image searching method and device, computer equipment and storage medium
CN108306813B (en) Session message processing method, server and client
CN105684406A (en) Method and system for providing access to auxiliary information
CN106202222B (en) Method and device for determining hot event
CN113158094B (en) Information sharing method and device and electronic equipment
CN110580342A (en) public number question-answer response method and device
WO2020124444A1 (en) Information processing method and related apparatus
CN111813915A (en) Message interaction method, device, equipment and computer readable storage medium
CN116489444A (en) Video playing method and device, computer equipment and storage medium
CN114138958A (en) Information interaction method, device, equipment and storage medium
CN115878874A (en) Multimodal retrieval method, device and storage medium
CN115225603A (en) User matching method and system in social application
CN117909557A (en) Man-machine interaction method, system, equipment and storage medium based on large language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant