CN114579820A

CN114579820A - Method and device for extracting context features in man-machine conversation

Info

Publication number: CN114579820A
Application number: CN202210279351.5A
Authority: CN
Inventors: 徐涛
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2022-06-03

Abstract

The embodiment of the disclosure provides a method and a device for extracting contextual features in a man-machine conversation. In the method, a first sequence of interactive activities of a first user in a human-machine conversation is recorded. The first sequence of interactive behavior includes a current interactive behavior of the first user and a historical interactive behavior of the first user. The interactive behaviors in the first sequence of interactive behaviors include: the first user's speech, the first user's web page viewed, and the virtual product the first user purchases, collects, or concerns. Then, a first sub-sequence of interactive behaviors within a specified time period is selected from the first sequence of interactive behaviors. And splicing the interactive behaviors in the first sub interactive behavior sequence into a spliced sequence. Contextual features are then extracted from the stitched sequence.

Description

Method and device for extracting context features in man-machine conversation

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for extracting context features in a man-machine conversation.

Background

Contextual features, also referred to as contextual features, are used to indicate information about an object and the context in which the object is located. In the human-computer interaction process, the contextual features are metadata and internal attributes used for big data analysis and artificial intelligence realization, and contain a large number of potentially valuable features. Contextual features can be used to provide feature input for downstream tasks such as personalized searches, dialog systems, emotion calculation, and recommendation systems, etc. The stability and accuracy of contextual features tend to play a critical role for downstream tasks.

Disclosure of Invention

Embodiments described herein provide a method, apparatus, and computer-readable storage medium storing a computer program for extracting contextual features in a human-computer conversation.

According to a first aspect of the present disclosure, a method of extracting contextual features in a human-machine conversation is provided. In the method, a first sequence of interactive activities of a first user in a human-computer conversation is recorded. The first sequence of interactive behavior includes a current interactive behavior of the first user and a historical interactive behavior of the first user. The interactive behaviors in the first sequence of interactive behaviors include: the speech of the first user, the web page viewed by the first user, and the virtual products purchased, collected, or attended to by the first user. Then, a first sub-sequence of interactive behaviors within a specified time period is selected from the first sequence of interactive behaviors. And splicing the interactive behaviors in the first sub interactive behavior sequence into a spliced sequence. Contextual features are then extracted from the stitched sequence.

In some embodiments of the disclosure, extracting context features from the stitched sequence comprises: comparing the spliced sequence with a plurality of preset keyword labels; and in response to the stitched sequence matching one or more of the plurality of preset keyword tags, determining the matching preset keyword tag as a contextual feature.

In some embodiments of the disclosure, extracting context features from the stitched sequence comprises: extracting a topic in the stitched sequence using a topic model; and determining the extracted topics as contextual features.

In some embodiments of the present disclosure, extracting topics in the stitched sequence using the topic model comprises: calculating the word number of the splicing sequence; and in response to the number of words of the concatenated sequence exceeding a threshold number of words, splitting the concatenated sequence into a plurality of sub-sequences and using a topic model to extract a topic for each sub-sequence separately, wherein the number of words of each sub-sequence does not exceed the threshold number of words.

In some embodiments of the disclosure, splitting the stitched sequence into a plurality of subsequences comprises: taking the first word of the splicing sequence as an interception starting point, and backwards reading the word with the threshold word number; obtaining the last sentence punctuation mark in the read character as an interception stop point; determining the words between the interception starting point and the interception stopping point as a subsequence; taking the first word after intercepting the stop point as a new intercepting start point, and repeating the operation until the number of words after intercepting the stop point is less than the threshold number of words; and determining the word after intercepting the stopping point as a subsequence.

In some embodiments of the disclosure, the method further comprises: and taking the extracted contextual features as labels of the first user.

In some embodiments of the disclosure, the method further comprises: acquiring a plurality of second interaction behavior sequences of a plurality of second users in the man-machine conversation, wherein the second interaction behavior sequences comprise current interaction behaviors of the second users and historical interaction behaviors of the second users, and the interaction behaviors in the second interaction behavior sequences comprise: the second user's speech, the web page browsed by the second user, and the virtual product bought, collected or concerned by the second user; generating a co-occurrence matrix of interaction behaviors corresponding to a first user and a plurality of second users; generating hidden vectors respectively corresponding to the first user, the plurality of second users and corresponding interaction behaviors thereof by adopting a matrix decomposition mode; determining a second user whose distance from the first user on the space where the hidden vector is located is less than a threshold distance; and assigning the determined label of the second user as a context feature to the first user.

In some embodiments of the present disclosure, the virtual product comprises one or more of: situational dialog, mood assessment scale, and digital therapy.

According to a second aspect of the present disclosure, there is provided an apparatus for extracting contextual features in a human-machine conversation. The apparatus includes at least one processor; and at least one memory storing a computer program. The computer program, when executed by the at least one processor, causes the apparatus to record a first sequence of interaction behaviors of the first user in the human-computer conversation, the first sequence of interaction behaviors including a current interaction behavior of the first user and a historical interaction behavior of the first user, the interaction behaviors in the first sequence of interaction behaviors including: the method comprises the following steps that the first user speaks, a webpage browsed by the first user and a virtual product purchased by the first user; selecting a first sub-sequence of interactive behaviors within a specified time period from the first sequence of interactive behaviors; splicing the interactive behaviors in the first sub interactive behavior sequence into a spliced sequence; and extracting contextual features from the stitched sequence.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the apparatus to extract contextual features from the stitched sequence by: comparing the spliced sequence with a plurality of preset keyword labels; and in response to the stitched sequence matching one or more of the plurality of preset keyword tags, determining the matching preset keyword tag as a contextual feature.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the apparatus to extract contextual features from the stitched sequence by: extracting a topic in the stitched sequence using a topic model; and determining the extracted topics as contextual features.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the apparatus to extract topics in the stitched sequence using the topic model by: calculating the word number of the splicing sequence; and in response to the number of words of the concatenated sequence exceeding the threshold number of words, splitting the concatenated sequence into a plurality of sub-sequences and using a topic model to extract a topic for each sub-sequence separately, wherein the number of words of each sub-sequence does not exceed the threshold number of words.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the apparatus to split the stitched sequence into a plurality of sub-sequences by: taking the first word of the splicing sequence as an interception starting point, and backwards reading the word with the threshold word number; obtaining the last sentence punctuation mark in the read character as an interception stop point; determining the words between the interception starting point and the interception stopping point as a subsequence; taking the first word after the interception stopping point as a new interception starting point, and repeating the operation until the number of words after the interception stopping point is less than the threshold number of words; and determining the word after intercepting the stopping point as a subsequence.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the apparatus to further: and taking the extracted contextual features as labels of the first user.

In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the apparatus to further: acquiring a plurality of second interaction behavior sequences of a plurality of second users in the man-machine conversation, wherein the second interaction behavior sequences comprise current interaction behaviors of the second users and historical interaction behaviors of the second users, and the interaction behaviors in the second interaction behavior sequences comprise: the second user's speech, the web page browsed by the second user, and the virtual product bought, collected or concerned by the second user; generating a co-occurrence matrix of interaction behaviors corresponding to a first user and a plurality of second users; generating hidden vectors respectively corresponding to the first user, the plurality of second users and corresponding interaction behaviors thereof by adopting a matrix decomposition mode; determining a second user whose distance from the first user on the space where the hidden vector is located is less than a threshold distance; and assigning the determined label of the second user as a context feature to the first user.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method according to the first aspect of the present disclosure.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, it being understood that the drawings described below relate only to some embodiments of the present disclosure, and not to limit the present disclosure, wherein:

FIG. 1 is an exemplary flow diagram of a method of extracting contextual features in a human-machine conversation, according to an embodiment of the present disclosure;

fig. 2 is an exemplary flow diagram of a process of splitting a splicing sequence according to an embodiment of the present disclosure;

FIG. 3 is an exemplary flow diagram of further processes included in a method of extracting contextual features in a human-machine conversation, according to an embodiment of the present disclosure;

FIG. 4 is an exemplary schematic diagram of a co-occurrence matrix according to an embodiment of the present disclosure; and

fig. 5 is a schematic block diagram of an apparatus for extracting contextual features in a human-machine conversation according to an embodiment of the present disclosure.

The elements in the drawings are schematic and not drawn to scale.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described below in detail and completely with reference to the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are also within the scope of protection of the disclosure.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the presently disclosed subject matter belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the specification and relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein. Terms such as "first" and "second" are only used to distinguish one element (or a portion of an element) from another element (or another portion of an element).

In the man-machine conversation process, the existing contextual feature extraction method only depends on information input of a current user, for example, feature extraction is only carried out on information input into a machine learning algorithm or a neural network model, and historical interaction behaviors and user portrait information of the current user are completely ignored. The contextual features extracted in the above manner are not enough to describe the complete context, and are bias descriptions of the context, which may cause the extracted contextual features to lack stability and affect the execution result of the downstream task.

Embodiments of the present disclosure provide a method for extracting contextual features in a human-computer conversation. Fig. 1 is an exemplary flowchart of a method of extracting contextual features in a human-machine conversation, according to an embodiment of the present disclosure. In a human-machine conversation according to an embodiment of the present disclosure, a user may talk with a robot, may browse web pages (e.g., encyclopedia articles, news, etc.), purchase, collect, or focus on virtual products (e.g., situational conversations, mood measurement scales, and digital therapies for emotional problems), and so forth. A method 100 of extracting contextual features in a human-machine conversation is described below with reference to fig. 1.

In the method 100, at block S102, a first sequence of interactive behaviors of a first user in a human-computer conversation is recorded. The first sequence of interactive behavior includes a current interactive behavior of the first user and a historical interactive behavior of the first user. The interactive behaviors in the first sequence of interactive behaviors include: the speech of the first user, the web page viewed by the first user, and the virtual products purchased, collected, or attended to by the first user.

In some embodiments of the present disclosure, a human-machine dialog system may be used to help a user deal with emotional issues. In this embodiment, the speaking of the first user may include: for example, what the user has rolled to the bot ("girlfriend and me are handedness, i may not want to hand-over with her, good despair |") or what the bot is willing to perform (e.g., "i want to see the workplace acquaintance relationship department," which may trigger the bot to present the content of the workplace acquaintance relationship department). The web pages browsed by the first user may include, for example, encyclopedia articles and news previously provided by a man-machine dialog system. These encyclopedia articles and news may include tags or keywords to facilitate sorting them into categories. The user may click directly on the encyclopedia articles or news of the category of interest to the user to browse the web page. The virtual products purchased, collected, or attended by the first user may include, for example, contextual conversations of particular content (e.g., a tab-break multi-turn conversation scenario), mood metrics (e.g., a depression self-rating scale), and digital therapies (e.g., a depression healing plan), among others.

In a human-machine dialog system, a first user may be assigned a unique identifier. The unique identifier is associated with each interactive activity of the first user to indicate that the interactive activity is relevant to the first user. In some embodiments of the present disclosure, each interaction behavior of the first user is accompanied by a time stamp and stored as log information in a log collection system. For example, if a first user actively triggers a depression event multi-turn interaction scenario at a time point a in a human-computer interaction system, a piece of log information is recorded in a log collection system: "at time point a, the first user triggered a depression event multiple rounds of interaction scenario".

At block S104, a first sub-sequence of interactive behaviors within a specified time period is selected from the first sequence of interactive behaviors. The first sequence of interactive behaviors according to some embodiments of the present disclosure may include all interactive behaviors of the first user since using the human-computer interaction system. The period may be particularly long and the emotional state of the first user may change over time. Therefore, selecting a first sub-sequence of interactive behaviors from the first sequence of interactive behaviors over a specified time period helps to focus more on the state of the first user over the time period and reduces the amount of computation. In one example, the specified time period may be, for example, the last week, the last 3 days, and the like.

At block S106, the interactive behaviors in the first sub-sequence of interactive behaviors are spliced into a spliced sequence. In some embodiments of the present disclosure, the interactive behaviors may be spliced together in chronological order of the occurrence of the interactive behaviors in the human-computer interaction system by the first user. In one example, a first user may have entered "girlfriend and I have split their hands, I may not want to split their hands with her, good despair! ". After the robot has replied, the first user may continue to input "do not want to eat, do not want to go out, and do not know what to do. "at this point, the robot may prompt the first user to purchase a virtual product" -a charming hands-free multi-turn conversation scenario ". And if the first user purchases the virtual product, triggering a love hand-separating multi-turn conversation scene. After entering the conversation scenario, the first user may enter "did i get too difficult and life continue? ". The above-mentioned interaction behavior of the first user may be spliced into a concatenation sequence: "A girl friend has separated me, I don't want to separate her, good despair! Do not want to eat or go out and do not know what to do. And triggering a love hand-separating multi-turn conversation scene. Is I too difficult to get and continue with life? ".

At block S108, contextual features are extracted from the stitched sequence. In some embodiments of the present disclosure, a plurality of preset keyword tags may be stored in a keyword dictionary (or keyword lexicon). The keyword dictionary not only maintains key words organized according to business needs, but also covers professional term dictionaries, such as 'smiling depression', 'depression self-rating scale', 'couple discord', 'hands with objects', 'love confusion', and 'academic pressure', and the like. The concatenated sequence can be compared to the plurality of preset keyword tags. In the comparison process, a predetermined keyword matching rule may be adopted to match the keyword tag included in the concatenated sequence with the plurality of preset keyword tags. If the stitched sequence matches one or more of the plurality of preset keyword tags, the matching preset keyword tag is determined as a contextual feature. The determined contextual characteristics may be bound to the first user as a tag of the first user by associating the determined contextual characteristics with a unique identifier of the first user.

In other embodiments of the present disclosure, topic models may be used to extract topics in the stitched sequence. The extracted topics are then determined as contextual features. In this embodiment, the concatenation sequence can be used as a text, and an implicit Dirichlet Allocation (LDA) topic model is used to extract the topic discussed in the text. In some cases, the interaction behavior included in the stitched sequence may be more, resulting in a longer stitched sequence. In this case, the spliced sequence may include more than one topic. To more accurately obtain each topic in the stitched sequence, the number of words of the stitched sequence may be calculated, if the number of words of the stitched sequence exceeds a threshold number of words, the stitched sequence is split into a plurality of subsequences, and the topic model is used to extract the topic of each subsequence separately. Wherein the number of words per sub-sequence does not exceed the threshold number of words.

Fig. 2 illustrates an exemplary flow diagram of a process of splitting a concatenation sequence according to an embodiment of the disclosure. At block S202, a threshold number of words is read back, with the first word of the concatenated sequence as the truncation start point. The following description will be given taking an example in which the threshold word number is 64 words. In the splicing sequence "friend girl and me are handedness, i can not want to handedness with her, good despair! Do not want to eat or go out and do not know what to do. And triggering a love hand-separating multi-turn conversation scene. Is I too difficult to get and continue with life? "in, the first word is the" female "word. Reading 64 words backwards from the first word reads a "go" word. The 64 words read are: "A woman friend and me have taken their hands, i do not want to take their hands, good despair! Do not want to eat or go out and do not know what to do. And triggering a love hand-separating multi-turn conversation scene. I am too hard and live.

At block S204, the last sentence marker point symbol in the read word is obtained as the truncation stop. In some embodiments of the present disclosure, sentence punctuation symbols comprise. ","? ","! ". In the above example, the last clause punctuation in the read word is "after" triggering the romantic hands-free multi-turn conversation scenario ". ". This ". "is determined as the intercept stop point.

At block S206, a word between the intercept start point and the intercept stop point is determined as a subsequence. In the above example, it may be determined that the first subsequence is "girlfriend and I have taken hands, I may not want to take hands with her, good despair! Do not want to eat or go out and do not know what to do. And triggering a love hand-separating multi-turn conversation scene. "

At block S208, the first word after the intercept stop point is taken as the new intercept start point. In the above example, ". The first word "i" after "becomes the new interception starting point.

At block S210, it is determined whether the number of words after the truncation stop is greater than a threshold number of words. If the number of words after the truncation stop is greater than the threshold number of words ("yes" at block S210), the process returns to block S202 to continue reading words of the threshold number of words. If the number of words after the truncation stop is not greater than the threshold number of words ("no" at block S210), the process proceeds to block S212, where the words after the truncation stop are determined to be a subsequence.

In the above example, the words after intercepting the stop point are: "is I too difficult to get and continue with life? ". There are 14 words left after intercepting the stopping point. The number of words after intercepting the stopping point is not greater than the threshold number of words 64. So the process proceeds to block S212 and will "i too difficult, still will life continue? "is determined as a subsequence.

In the above manner, the stitched sequence may be split into a plurality of sub-sequences, so that the topic model is used to extract the topic of each sub-sequence separately. In one example, the subject matter is, for example, "moderate depression tendency," "high depression positive tendency," "white collar," "social tie," "social disorder," and the like. The extracted topics are then determined as contextual features. The determined contextual characteristics may be bound to the first user as a tag of the first user by associating the determined contextual characteristics with a unique identifier of the first user.

Fig. 3 shows an exemplary flowchart of a further process included in a method of extracting contextual features in a human-machine conversation according to an embodiment of the present disclosure. At block S302, a plurality of second interaction behavior sequences of a plurality of second users in the human-computer conversation are obtained. The second sequence of interaction behavior includes current interaction behavior of the second user and historical interaction behavior of the second user. The interactive behaviors in the second sequence of interactive behaviors include: the second user's speech, the second user's web page viewed, and the virtual product the second user purchases, collects, or concerns.

At block S304, a co-occurrence matrix of interaction behaviors with which the first user and the plurality of second users correspond is generated. Fig. 4 shows an exemplary schematic of the co-occurrence matrix. In the example of FIG. 4, user 1, user 2, and user 3 are shown. Where user 1 is a first user and users 2 and 3 are second users. The interaction behavior includes, for example: "reading encyclopedia article a", "reading encyclopedia article b", "purchasing evaluation scale c", "purchasing evaluation scale d", "purchasing virtual commodity e", "purchasing virtual commodity f". If a user performs an interactive action, the value at the intersection of the user and the interactive action is "1", otherwise it is "0" in the co-occurrence matrix.

At block S306, hidden vectors corresponding to the first user and the plurality of second users and their corresponding interactive behaviors are generated by means of matrix decomposition. For example, the m × n-dimensional co-occurrence matrix R may be decomposed into a form of multiplying the m × k-dimensional user matrix U and the k × n-dimensional interaction behavior matrix V, i.e., R ═ U × V. Where k is the hidden vector dimension.

At block S308, a second user that is less than a threshold distance from the first user on the space where the hidden vector is located is determined. In some embodiments of the present disclosure, the threshold distance may be an empirical value. A second user whose distance from the first user is less than the threshold distance represents a user whose interaction behavior is similar to the first user, both of which are likely to have similar characteristics. Accordingly, at block S310, the determined label of the second user is assigned to the first user as a context feature.

In the above manner, a user similar to the interactive behavior feature of the first user can be found, and the label (context feature) of the user is taken as the label (context feature) of the first user.

The method for extracting the contextual features in the human-computer conversation according to the embodiment of the disclosure not only considers the current interactive behaviors of the user, but also considers the historical interactive behaviors of the user, the attribute information of the user and the attribute information (contextual features) of other users which can be referred to. And when the interactive behaviors are considered, not only the conversation content of the user but also the webpage browsed by the user and the virtual products bought, collected or concerned by the user are considered. Therefore, the contextual characteristics can be extracted more diversifiedly and more stably, and the execution of downstream tasks is facilitated.

Fig. 5 shows a schematic block diagram of an apparatus 500 for extracting contextual features in a human-machine conversation according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus 500 may include a processor 510 and a memory 520 storing computer programs. The computer program, when executed by the processor 510, causes the apparatus 500 to perform the steps of the method 100 as shown in fig. 1. In one example, the apparatus 500 may be a computer device or a cloud computing node. Apparatus 500 may record a first sequence of interactive activities of a first user in a human-machine conversation. The first sequence of interactive behavior includes a current interactive behavior of the first user and a historical interactive behavior of the first user. The interactive behaviors in the first sequence of interactive behaviors include: the first user's speech, the first user's web page viewed, and the virtual product the first user purchases, collects, or concerns. The apparatus 500 may select a first sub-sequence of interactive behavior within a specified time period from the first sequence of interactive behavior. The apparatus 500 may splice the interactive behaviors in the first sub-sequence of interactive behaviors into a stitched sequence. The apparatus 500 may extract contextual features from the stitched sequence.

In some embodiments of the present disclosure, the apparatus 500 may compare the stitched sequence to a plurality of preset keyword tags. In response to the stitched sequence matching one or more of the plurality of preset keyword tags, the apparatus 500 may determine the matching preset keyword tag as a contextual feature.

In some embodiments of the present disclosure, the apparatus 500 may use a topic model to extract topics in a stitched sequence. The device 500 may determine the extracted topics as contextual features.

In some embodiments of the present disclosure, the apparatus 500 may calculate the number of words of the concatenated sequence. In response to the number of words of the stitched sequence exceeding the threshold number of words, the apparatus 500 may split the stitched sequence into a plurality of subsequences and extract the topic for each subsequence separately using a topic model. Wherein the number of words of each sub-sequence does not exceed the threshold number of words.

In some embodiments of the present disclosure, the apparatus 500 may splice the first word of the sequence as a truncation start point, reading back a threshold number of words. Apparatus 500 may obtain the last sentence punctuation symbol in the read word as the truncation stop. Apparatus 500 may determine words between the intercept start point and the intercept stop point as a subsequence. The apparatus 500 may take the first word after the truncation stop as a new truncation start point and repeat the above operations until the number of words after the truncation stop is less than the threshold number of words. The apparatus 500 may then determine the word after intercepting the stop point as a sub-sequence.

In some embodiments of the present disclosure, the apparatus 500 may treat the extracted contextual features as tags for the first user.

In some embodiments of the present disclosure, the apparatus 500 may obtain a plurality of second interactive behavior sequences of a plurality of second users in the man-machine conversation. The second sequence of interaction behavior includes current interaction behavior of the second user and historical interaction behavior of the second user. The interactive behaviors in the second sequence of interactive behaviors include: the second user's speech, the second user's web page viewed, and the virtual product the second user purchases, collects, or concerns. The apparatus 500 may generate a co-occurrence matrix of interaction behaviors with which the first user and the plurality of second users correspond. The apparatus 500 may generate hidden vectors corresponding to the first user and the plurality of second users and their corresponding interaction behaviors respectively by using a matrix decomposition method. The device 500 may determine a second user that is less than a threshold distance from the first user on a space where the hidden vector is located. The apparatus 500 may assign the determined label of the second user as a contextual feature to the first user.

In an embodiment of the present disclosure, the processor 510 may be, for example, a Central Processing Unit (CPU), a microprocessor, a Digital Signal Processor (DSP), a processor based on a multi-core processor architecture, or the like. The memory 520 may be any type of memory implemented using data storage technology including, but not limited to, random access memory, read only memory, semiconductor-based memory, flash memory, disk memory, and the like.

Further, in embodiments of the present disclosure, the apparatus 500 may also include an input device 530, such as a microphone, a keyboard, a mouse, etc., for inputting the user's speech. Additionally, the apparatus 500 may further comprise an output device 540, e.g. a microphone, a display, etc., for outputting the extracted contextual features (user tags).

In other embodiments of the present disclosure, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program, when executed by a processor, is capable of implementing the steps of the method as shown in fig. 1 to 3.

In summary, the method for extracting contextual features in a human-computer conversation according to the embodiment of the present disclosure considers not only the current interactive behavior of a user, but also the historical interactive behavior of the user, the attribute information of the user, and the attribute information (contextual features) of other users that can be referred to. And when the interactive behaviors are considered, not only the conversation content of the user but also the webpage browsed by the user and the virtual products bought, collected or concerned by the user are considered. Therefore, the contextual characteristics can be extracted more diversifiedly and more stably, and the execution of downstream tasks is facilitated.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus and methods according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As used herein and in the appended claims, the singular forms of words include the plural and vice versa, unless the context clearly dictates otherwise. Thus, when reference is made to the singular, it is generally intended to include the plural of the corresponding term. Similarly, the terms "comprising" and "including" are to be construed as being inclusive rather than exclusive. Likewise, the terms "include" and "or" should be construed as inclusive unless such an interpretation is explicitly prohibited herein. Where the term "example" is used herein, particularly when it comes after a set of terms, it is merely exemplary and illustrative and should not be considered exclusive or extensive.

Further aspects and ranges of adaptability will become apparent from the description provided herein. It should be understood that various aspects of the present application may be implemented alone or in combination with one or more other aspects. It should also be understood that the description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.

Several embodiments of the present disclosure have been described in detail above, but it is apparent that various modifications and variations can be made to the embodiments of the present disclosure by those skilled in the art without departing from the spirit and scope of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of extracting contextual features in a human-machine conversation, comprising:

recording a first interaction behavior sequence of a first user in the man-machine conversation, wherein the first interaction behavior sequence comprises current interaction behaviors of the first user and historical interaction behaviors of the first user, and the interaction behaviors in the first interaction behavior sequence comprise: the method comprises the following steps that the first user speaks, a webpage browsed by the first user and a virtual product bought, collected or concerned by the first user;

selecting a first sub-sequence of interactive behaviors within a specified time period from the first sequence of interactive behaviors;

splicing the interactive behaviors in the first sub interactive behavior sequence into a spliced sequence; and

contextual features are extracted from the stitched sequence.

2. The method of claim 1, wherein extracting contextual features from the stitched sequence comprises:

comparing the spliced sequence with a plurality of preset keyword labels; and

in response to the stitched sequence matching one or more of the plurality of preset keyword tags, determining the matching preset keyword tag as the contextual feature.

3. The method of claim 1, wherein extracting contextual features from the stitched sequence comprises:

extracting topics in the stitched sequence using a topic model; and

determining the extracted topic as the contextual feature.

4. The method of claim 3, wherein extracting topics in the stitched sequence using a topic model comprises:

calculating the word number of the spliced sequence; and

in response to the concatenated sequence having a number of words that exceeds a threshold number of words, splitting the concatenated sequence into a plurality of sub-sequences and using the topic model to extract a topic for each sub-sequence separately, wherein the number of words for each sub-sequence does not exceed the threshold number of words.

5. The method of claim 4, wherein splitting the stitched sequence into a plurality of subsequences comprises:

taking the first word of the splicing sequence as a starting point for interception, and reading the word with the threshold word number backwards;

obtaining the last sentence punctuation mark in the read character as an interception stop point;

determining the words between the interception starting point and the interception stopping point as a subsequence;

taking the first word after the interception stop point as a new interception start point, and repeating the operation until the number of words after the interception stop point is less than the threshold number of words; and

determining the word after the interception stop point as a subsequence.

6. The method of any of claims 1 to 5, further comprising:

the extracted contextual features are used as labels of the first user.

7. The method of any of claims 1 to 5, further comprising:

obtaining a plurality of second interaction behavior sequences of a plurality of second users in the man-machine conversation, wherein the second interaction behavior sequences comprise current interaction behaviors of the second users and historical interaction behaviors of the second users, and the interaction behaviors in the second interaction behavior sequences comprise: the second user's speech, the web page browsed by the second user, and the virtual product bought, collected or concerned by the second user;

generating a co-occurrence matrix of the interaction behaviors of the first user and the plurality of second users corresponding thereto;

generating hidden vectors corresponding to the first user, the plurality of second users and corresponding interactive behaviors thereof respectively by adopting a matrix decomposition mode;

determining a second user whose distance from the first user on the space where the hidden vector is located is less than a threshold distance; and

assigning the determined label of the second user as a context feature to the first user.

8. The method of any of claims 1-5, wherein the virtual product comprises one or more of: situational dialog, mood assessment scale, and digital therapy.

9. An apparatus for extracting contextual features in a human-machine conversation, comprising:

at least one processor; and

at least one memory storing a computer program;

wherein the computer program, when executed by the at least one processor, causes the apparatus to perform the steps of the method according to any one of claims 1 to 8.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.