CN111460805A - Statement processing method, device and equipment - Google Patents

Statement processing method, device and equipment Download PDF

Info

Publication number
CN111460805A
CN111460805A CN201910057044.0A CN201910057044A CN111460805A CN 111460805 A CN111460805 A CN 111460805A CN 201910057044 A CN201910057044 A CN 201910057044A CN 111460805 A CN111460805 A CN 111460805A
Authority
CN
China
Prior art keywords
vocabulary
preset
sentence
splitting
vocabularies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910057044.0A
Other languages
Chinese (zh)
Inventor
陈勇
刘晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huijun Technology Co.,Ltd.
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910057044.0A priority Critical patent/CN111460805A/en
Publication of CN111460805A publication Critical patent/CN111460805A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the invention provides a statement processing method, a device and equipment, wherein the method comprises the following steps: performing word segmentation processing on the first sentence to obtain at least one first word; splitting the at least one first vocabulary, wherein the splitting process is used for splitting one vocabulary into at least two sub vocabularies; and determining the emotion type of the first sentence according to the at least one first vocabulary after the splitting processing. The accuracy of determining the emotion type of the sentence is improved.

Description

Statement processing method, device and equipment
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a statement processing method, device and equipment.
Background
In many application scenarios (e.g., web customer service), statements need to be analyzed to determine the emotional type of the statement, including angry, happy, disappointed, angry, and the like.
In the prior art, when determining the emotion type of a sentence, a plurality of words are obtained by performing word segmentation on the sentence, and the plurality of words are predicted by a model to determine the emotion type of the sentence. However, in the practical application process, many words after word segmentation processing are rare words (rare words or inaccurate words), and the times of the rare words appearing in the training words are few when the model training is performed, so that the rare words cannot be well learned in the model training process, and correspondingly, the model cannot accurately predict the emotion types according to the rare words. When a rarely used word is included in a sentence and the rarely used word is greatly related to the emotion type of the sentence, the emotion type of the sentence cannot be accurately determined, so that the accuracy of determining the emotion type of the sentence is low.
Disclosure of Invention
The embodiment of the invention provides a statement processing method, a statement processing device and statement processing equipment, which improve the accuracy of determining the emotion type of a statement.
In a first aspect, an embodiment of the present invention provides a statement processing method, including:
performing word segmentation processing on the first sentence to obtain at least one first word;
splitting the at least one first vocabulary, wherein the splitting process is used for splitting one vocabulary into at least two sub vocabularies;
and determining the emotion type of the first sentence according to the at least one first vocabulary after the splitting processing.
In a possible implementation, the splitting the at least one first vocabulary for any one of the at least one first vocabulary includes:
acquiring a preset vocabulary set, wherein the preset vocabulary set comprises at least two sub-vocabularies corresponding to a plurality of second vocabularies and/or a plurality of third vocabularies, and the frequency of the second vocabularies and the third vocabularies appearing in the preset sentence set is greater than a first frequency;
and splitting the first vocabulary according to the first vocabulary and the preset vocabulary set.
In a possible implementation manner, the splitting the first vocabulary according to the first vocabulary and the preset vocabulary set includes:
judging whether the plurality of second words or the plurality of third words comprises the first word or not;
if so, splitting the first vocabulary according to the preset vocabulary set;
if not, splitting the first vocabulary according to characters, wherein the sub-vocabulary of the first vocabulary after being split according to characters comprises one character.
In a possible implementation manner, the splitting the first vocabulary according to the preset vocabulary set includes:
when the first vocabulary is the same as a second vocabulary in the preset vocabulary set, the first vocabulary is not split;
when the first vocabulary is the same as a third vocabulary corresponding to at least two sub vocabularies in the preset vocabulary set, splitting the first vocabulary into the at least two sub vocabularies.
In a possible implementation manner, the obtaining the preset vocabulary set includes:
when the preset vocabulary set is included in a preset storage space, acquiring the preset vocabulary set in the preset storage space;
and when the preset vocabulary set is not included in the preset storage space, generating the preset vocabulary set.
In a possible implementation, the generating the preset vocabulary set includes:
determining a first vocabulary set in the preset sentence set, wherein the occurrence frequency of vocabularies in the first vocabulary set in the preset sentence set is greater than the first frequency;
splitting the vocabulary in the first vocabulary set according to characters to obtain a second vocabulary set, wherein the second vocabulary set comprises sub vocabulary of each vocabulary in the first vocabulary set;
and generating the preset vocabulary set according to the second vocabulary set.
In a possible implementation, the determining the preset vocabulary set according to the second vocabulary set includes:
merging the sub-vocabularies in the second vocabulary set, wherein the merging operation comprises the following steps: determining a sub-vocabulary pair in the second vocabulary set, and combining the sub-vocabulary pair into a vocabulary when the occurrence frequency of the sub-vocabulary pair in the preset sentence set is greater than a second frequency, wherein the vocabulary pair is two adjacent sub-vocabularies in a plurality of sub-vocabularies corresponding to the vocabulary;
and repeatedly executing the merging processing operation for N times, or repeatedly executing the merging processing operation until the number of the vocabularies included in the second vocabulary set is less than or equal to a preset threshold value, wherein N is an integer greater than 1.
In a possible implementation, the determining an emotion type of the first sentence according to the at least one first vocabulary after the splitting process includes:
processing the split at least one first vocabulary according to a preset model to obtain the emotion type of the first sentence, wherein the preset model is obtained according to a plurality of groups of sample data, and each group of sample data comprises a sample sentence and the emotion type corresponding to the sample sentence.
In one possible embodiment, the method further comprises:
acquiring the multiple groups of sample data;
performing word segmentation processing on each sample sentence in the multiple groups of sample data to obtain at least one fourth vocabulary corresponding to each sample sentence;
splitting at least one fourth vocabulary corresponding to each sample sentence;
and generating the preset model according to the at least one split fourth vocabulary corresponding to each sample statement and the emotion type corresponding to each sample statement.
In a second aspect, an embodiment of the present invention provides a statement processing apparatus, including a word segmentation module, a splitting module, and a determination module, where,
the word segmentation module is used for performing word segmentation processing on the first sentence to obtain at least one first word;
the splitting module is used for splitting the at least one first vocabulary, and the splitting process is used for splitting one vocabulary into at least two sub-vocabularies;
the determining module is configured to determine an emotion type of the first sentence according to the split at least one first vocabulary.
In one possible embodiment, the apparatus further comprises a first obtaining module, wherein,
the first acquisition module is used for acquiring a preset vocabulary set, wherein the preset vocabulary set comprises at least two sub-vocabularies corresponding to a plurality of second vocabularies and/or a plurality of third vocabularies, and the frequency of the second vocabularies and the third vocabularies appearing in the preset sentence set is greater than a first frequency;
the splitting module is specifically configured to split any one of the at least one first vocabulary according to the first vocabulary and the preset vocabulary set.
In a possible implementation, the splitting module is specifically configured to:
judging whether the plurality of second words or the plurality of third words comprises the first word or not;
if so, splitting the first vocabulary according to the preset vocabulary set;
if not, splitting the first vocabulary according to characters, wherein the sub-vocabulary of the first vocabulary after being split according to characters comprises one character.
In a possible implementation, the splitting module is specifically configured to:
when the first vocabulary is the same as a second vocabulary in the preset vocabulary set, the first vocabulary is not split;
when the first vocabulary is the same as a third vocabulary corresponding to at least two sub vocabularies in the preset vocabulary set, splitting the first vocabulary into the at least two sub vocabularies.
In a possible implementation manner, the first obtaining module is specifically configured to:
when the preset vocabulary set is included in a preset storage space, acquiring the preset vocabulary set in the preset storage space;
and when the preset vocabulary set is not included in the preset storage space, generating the preset vocabulary set.
In a possible implementation manner, the first obtaining module is specifically configured to:
determining a first vocabulary set in the preset sentence set, wherein the occurrence frequency of vocabularies in the first vocabulary set in the preset sentence set is greater than the first frequency;
splitting the vocabulary in the first vocabulary set according to characters to obtain a second vocabulary set, wherein the second vocabulary set comprises sub vocabulary of each vocabulary in the first vocabulary set;
and generating the preset vocabulary set according to the second vocabulary set.
In a possible implementation manner, the first obtaining module is specifically configured to:
merging the sub-vocabularies in the second vocabulary set, wherein the merging operation comprises the following steps: determining a sub-vocabulary pair in the second vocabulary set, and combining the sub-vocabulary pair into a vocabulary when the occurrence frequency of the sub-vocabulary pair in the preset sentence set is greater than a second frequency, wherein the vocabulary pair is two adjacent sub-vocabularies in a plurality of sub-vocabularies corresponding to the vocabulary;
and repeatedly executing the merging processing operation for N times, or repeatedly executing the merging processing operation until the number of the vocabularies included in the second vocabulary set is less than or equal to a preset threshold value, wherein N is an integer greater than 1.
In a possible implementation, the determining module is specifically configured to:
processing the split at least one first vocabulary according to a preset model to obtain the emotion type of the first sentence, wherein the preset model is obtained according to a plurality of groups of sample data, and each group of sample data comprises a sample sentence and the emotion type corresponding to the sample sentence.
In a possible implementation, the apparatus further includes a second obtaining module and a generating module, wherein,
the second obtaining module is used for obtaining the multiple groups of sample data;
the word segmentation module is further used for performing word segmentation processing on each sample sentence in the multiple groups of sample data to obtain at least one fourth word corresponding to each sample sentence;
the splitting module is further used for splitting at least one fourth vocabulary corresponding to each sample sentence;
the generation module is used for generating the preset model according to the split at least one fourth vocabulary corresponding to each sample statement and the emotion type corresponding to each sample statement.
In a third aspect, an embodiment of the present invention provides a statement processing apparatus, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the statement processing method of any of the first aspects.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer-executable instruction is stored in the computer-readable storage medium, and when a processor executes the computer-executable instruction, the statement processing method according to any one of the first aspects is implemented.
According to the sentence processing method, the sentence processing device and the sentence processing equipment, when the emotion type of the sentence is determined, not only are the words of the sentence segmented, but also the words after the words are segmented are split, and the splitting of the words after the words are processed can reduce the probability of appearance of uncommon words and the probability of appearance of words with unreasonable segmentation. Therefore, the emotion type of the sentence is determined according to the vocabulary which is not subjected to the splitting processing and the sub-vocabulary which is subjected to the splitting processing, and the accuracy of determining the emotion type of the sentence can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is an architecture diagram of a statement processing method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a statement processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for generating a predetermined vocabulary set according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a sentence processing apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of another sentence processing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a hardware structure of a statement processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is an architecture diagram of a statement processing method according to an embodiment of the present invention. Referring to fig. 1, when the emotion type of a sentence needs to be determined, a word segmentation process is performed on the sentence to divide the sentence into a plurality of vocabularies, and a more uncommon vocabulary or an unreasonable vocabulary in the plurality of vocabularies is split to obtain a corresponding sub-vocabulary, and the emotion type of the sentence is determined according to the vocabulary that is not split and the sub-vocabulary that is split.
In the method and the device, when the emotion type of the sentence is determined, the sentence is subjected to word segmentation, the vocabulary subjected to word segmentation is also subjected to splitting, and the probability of appearance of a rarely-used vocabulary and the probability of appearance of a vocabulary with unreasonable word segmentation can be reduced by splitting the vocabulary subjected to word segmentation. Therefore, the emotion type of the sentence is determined according to the vocabulary which is not subjected to the splitting processing and the sub-vocabulary which is subjected to the splitting processing, and the accuracy of determining the emotion type of the sentence can be improved.
The technical means shown in the present application will be described in detail below with reference to specific examples. It should be noted that the following embodiments may be combined with each other, and the description of the same or similar contents in different embodiments is not repeated.
Fig. 2 is a schematic flowchart of a statement processing method according to an embodiment of the present invention. Referring to fig. 2, the method may include:
s201, performing word segmentation processing on the first sentence to obtain at least one first word.
The execution subject of the embodiment of the present invention may be an electronic device, or may be a sentence processing apparatus provided in the electronic device. Alternatively, the statement processing means may be implemented by software, or may be implemented by a combination of software and hardware.
Optionally, the electronic device may be a terminal device (e.g., a device such as a computer), or may also be a device such as a server.
Alternatively, the first sentence may be any one of the sentences.
Optionally, the first sentence may be segmented by an existing arbitrary segmentation algorithm to obtain at least one first vocabulary.
For example, assuming that the first sentence is "today's really good/troublesome", the following word segmentation processing "today/really/good/troublesome" may be performed on the first sentence, and the obtained first vocabulary includes: today, real, restlessness and calamities.
S202, splitting at least one first vocabulary, wherein the splitting is used for splitting one vocabulary into at least two sub vocabularies.
Optionally, for any first vocabulary in the at least one first vocabulary, it may be determined whether the first vocabulary needs to be split. When the first vocabulary needs to be split, the first vocabulary is split. When the first vocabulary is not required to be split, the first vocabulary is not split.
Optionally, a preset vocabulary set may be generated in advance, and the preset vocabulary set may be stored in a preset storage space. The preset vocabulary set comprises at least two sub vocabularies corresponding to a plurality of second vocabularies and/or a plurality of third vocabularies, and the frequency of the second vocabularies and the third vocabularies appearing in the preset sentence set is greater than the first frequency.
In the practical application process, when the preset vocabulary set is required to be used and the preset storage space comprises the preset vocabulary set, acquiring the preset vocabulary set in the preset storage space; and when the preset storage space does not contain the preset vocabulary set, generating the preset vocabulary set.
It should be noted that, in the embodiment shown in fig. 3, a process of generating the preset vocabulary set is described in detail, and details are not repeated here.
For any one of the at least one first vocabulary, it may be determined whether the plurality of second vocabularies or the plurality of third vocabularies includes the first vocabulary.
When the plurality of second vocabularies or the plurality of third vocabularies do not comprise the first vocabulary, the first vocabulary is explained to be a uncommon vocabulary, the first vocabulary is split according to characters, and the sub-vocabulary of the first vocabulary split according to the characters comprises one character.
When the plurality of second words comprise the first word, the first word is not a uncommon word, and the word segmentation of the first word is reasonable, so that the first word is not split.
When the plurality of third words comprise the first word, the first word is not a uncommon word, but the word segmentation of the first word is unreasonable, and the first word is split according to at least two sub-words corresponding to the third word.
For example, assume that the predetermined vocabulary set is as shown in Table 1:
TABLE 1
What/do
Coupon
Goods/commerce
……
If the first sentence includes the word "good vexation", and the preset word set does not include the word "good vexation", the word division is carried out on the word "good vexation", so that two sub words "good vexation" are obtained.
Assuming that the word "coupon" is included in the first sentence, since the "coupon" is included in the preset word set, the word segmentation process is not performed on the "coupon".
Assuming that the first sentence includes "shipper", since the preset vocabulary set includes the sub-vocabularies "shipment" and "quotient" corresponding to the shipper, the "shipper" in the first sentence is split into "shipment" and "quotient".
S203, determining the emotion type of the first sentence according to the split at least one first vocabulary.
Optionally, in S202, all the vocabularies in the at least one first vocabulary may be split, a part of the vocabularies in the at least one first vocabulary may be split, or all the vocabularies in the at least one first vocabulary may not be split. Therefore, the at least one first vocabulary after the splitting process includes vocabularies and/or sub-vocabularies.
Optionally, at least one first vocabulary after the splitting process may be processed according to a preset model to obtain an emotion type of the first sentence, where the preset model is obtained by learning according to multiple groups of sample data, and each group of sample data includes one sample sentence and an emotion type corresponding to the sample sentence.
Alternatively, the preset model may be generated by the following feasible implementation manners: the method comprises the steps of obtaining a plurality of groups of sample data, carrying out word segmentation processing on each sample statement in the plurality of groups of sample data to obtain at least one fourth vocabulary corresponding to each sample statement, carrying out splitting processing on the at least one fourth vocabulary corresponding to each sample statement, and generating a preset model according to the at least one split fourth vocabulary corresponding to each sample statement and the emotion type corresponding to each sample statement.
It should be noted that, the process of splitting at least one fourth vocabulary corresponding to each sample sentence may be referred to as S202, and details are not described here.
For example, the sentence and the corresponding emotion type may be as shown in Table 2:
TABLE 2
Sentence Emotional type
Good, excessive decline Happy thank you
Why the refund is not the return of the freight Generating qi
I are also drunk Is lost
All in one month and how to deliver goods Anxiety disorder
Why my westward's clothing did not receive a refund In lost sight
Will not be a cheat bar that receives money and does not ship Worry about
Good, i go to see Neutral property
For example, assuming that the first sentence is "the sender thus calls it to go dead", the first sentence is first participled to obtain "the/sender/called/it/go dead", then the splitting process is performed to obtain "the/sender/called/it/go/dead", and it is determined that the emotion type of the first sentence is angry according to "the/sender/called/it/go/dead".
In the process, when the preset model is generated, the word segmentation processing is carried out on the sample sentence, and the splitting processing is carried out on the word after the word segmentation processing, so that the learning of rare words and unreasonable words of word segmentation of the preset model can be reduced. Correspondingly, when the emotion type of the first sentence is predicted according to the preset model, the word segmentation processing is carried out on the first sentence, and the vocabulary after the word segmentation processing is split, so that the emotion type of the first sentence can be accurately determined according to the vocabulary and/or the sub-vocabulary after the word segmentation processing and the splitting processing by the preset model.
Optionally, when the application scenario is an intelligent network client service scenario (according to a statement input by a user, a message reply is automatically performed), after determining the emotion type of the first statement, a response message corresponding to the first statement may be determined according to the emotion type of the first statement, and the response message corresponding to the first statement is sent to a client (for example, a device of a user side, such as a mobile phone, a computer, and the like). According to the method disclosed by the application, the emotion type of the first statement can be accurately determined, so that the response message corresponding to the first statement can be accurately determined according to the emotion type of the first statement, and the accuracy of determining the response message is further improved.
Optionally, when the application scenario is a speech service scenario (for example, a first sentence is played), and when the emotion type of the first sentence is determined, the first sentence may be played according to the emotion type of the first sentence. According to the method disclosed by the application, the emotion type of the first sentence can be accurately determined, so that the first sentence can be played more vividly according to the emotion type of the first sentence, the played first sentence is closer to the real voice, and the service quality of voice service is improved.
According to the sentence processing method provided by the embodiment of the invention, when the emotion type of the sentence is determined, not only is the word segmentation processing performed on the sentence performed, but also the vocabulary after the word segmentation processing is performed, and the probability of the appearance of a rarely used vocabulary and the probability of the appearance of a vocabulary with unreasonable word segmentation can be reduced by performing the splitting processing on the vocabulary after the word segmentation processing. Therefore, the emotion type of the sentence is determined according to the vocabulary which is not subjected to the splitting processing and the sub-vocabulary which is subjected to the splitting processing, and the accuracy of determining the emotion type of the sentence can be improved.
Based on any of the above embodiments, the following describes in detail the process of generating the preset vocabulary set with reference to the embodiment shown in fig. 3.
Fig. 3 is a flowchart illustrating a method for generating a preset vocabulary set according to an embodiment of the present invention. Referring to fig. 3, the method may include:
s301, a first vocabulary set is determined in the preset sentence set, and the frequency of occurrence of vocabularies in the first vocabulary set in the preset sentence set is greater than the first frequency.
For example, when the first sentence is an interchange sentence in the online customer service, the preset sentence set may be an interchange sentence in the online customer service within a preset time period.
For example, the preset period may be one month, half year, one year, etc. before the current time.
Optionally, the vocabulary in the first vocabulary set may be M vocabularies with the highest frequency of occurrence in the preset sentence set.
S302, splitting the vocabulary in the first vocabulary set according to characters to obtain a second vocabulary set, wherein the second vocabulary set comprises the sub-vocabulary of each vocabulary in the first vocabulary set.
Optionally, the sub-vocabularies corresponding to each vocabulary in the second vocabulary set have an association relationship, for example, if the second vocabulary set is a table (the table includes a plurality of rows and a column), the sub-vocabularies corresponding to each vocabulary occupy a row in the table corresponding to the second vocabulary set.
S303, acquiring the occurrence frequency of the sub-vocabulary pairs of the second vocabulary set in the preset sentence set.
The vocabulary pair is two adjacent sub vocabularies in a plurality of sub vocabularies corresponding to one vocabulary.
Optionally, the vocabulary pair is a first sub vocabulary and a second sub vocabulary in a plurality of sub vocabularies corresponding to one vocabulary.
For example, suppose that the sub-word pair in the word "how do" is divided by the word to get "what/how/do" and "what/do". Suppose that the word "coupon" is split by words to get "benefit/offer/coupon", and the sub-word pair in "benefit/offer/coupon" is "benefit" and "offer".
Optionally, the occurrence frequency of the sub-vocabulary pairs in the preset sentence set is obtained by adding the number of times of merging operations to the occurrence frequency of the merged vocabulary pairs in the preset sentence set.
For example, assuming that the word "how" is split into words and then the word "what/how/do" is obtained, the sub-vocabulary pair is "what" and "what", and the frequency of occurrence of the sub-vocabulary pair "what" and "what" in the preset sentence set is the frequency of occurrence of the word "what" in the preset sentence set plus the number of merging operations 1. After merging, the vocabulary "how/do" is obtained, the sub-vocabulary pair is "how" and "do", and the occurrence frequency of the sub-vocabulary pair in the preset sentence set is the occurrence frequency of the vocabulary "how" in the preset sentence set plus the merging operation frequency 2.
S304, when the occurrence frequency of the sub-vocabulary pairs in the preset sentence set is larger than a second frequency, combining the sub-vocabulary pairs into a vocabulary.
Alternatively, the second frequency may be greater than the first frequency.
For example, assuming that the frequency of occurrence of the sub-vocabulary pairs "what" and "how" in the preset sentence set is greater than the second frequency, the sub-vocabulary pairs are combined to obtain "what".
S305, judging whether the number of the vocabularies included in the second vocabulary set is less than or equal to a preset threshold value.
If yes, go to S306.
If not, go to S303.
Optionally, S306 may be executed after S303-S304 are repeatedly executed N times. N is an integer greater than 1.
Optionally, the number of words included in the second set of words is the sum of the number of words and sub-words.
For example, referring to Table 4, the first 3 rows of the second set of words include 6 words.
S306, determining the second vocabulary set as a preset vocabulary set.
The method shown in the embodiment of fig. 3 will be described in detail below by way of specific examples.
For example, assume a first vocabulary set is shown in Table 3:
TABLE 3
How to do
Coupon
Goods distributor
……
After splitting the words in the first set of words by word, the resulting second set of words is shown in Table 4:
TABLE 4
What/how/do
York/offer/coupon
Delivering/goods/commerce
……
Assuming that the sub-vocabulary pairs are "what" and "how", the sub-vocabulary pairs are "good" and "benefit", and the occurrence frequency of the sub-vocabulary pairs "send" and "good" in the preset sentence set is greater than the second frequency, the sub-vocabulary pairs are merged to obtain a second vocabulary set shown in table 5:
TABLE 5
What/do
Coupon/voucher
Goods/commerce
……
Assuming that the vocabulary data in the second vocabulary set shown in table 5 is greater than the preset threshold, the merging process operation is continued. Assuming that the occurrence frequency of the sub-vocabulary pairs of the offers and the coupons in the preset sentence set is greater than the second frequency, the sub-vocabulary pairs are merged to obtain a second vocabulary set shown in table 6:
TABLE 6
What/do
Coupon
Goods/commerce
……
Assuming that the vocabulary data in the second vocabulary set shown in table 6 is smaller than the preset threshold, the second vocabulary set shown in table 6 is determined as the preset vocabulary set.
By the method shown in the embodiment of fig. 3, low-frequency (low occurrence frequency) words are filtered, and unreasonable participles are split, so that the predetermined word set obtained by determination includes words with high frequency (high occurrence frequency) and reasonable participles.
By the method disclosed by the application, the accuracy of determining the emotion type is obviously improved, for example, for the emotion type 'angry', the accuracy can be improved from the former 74.8% to 77.6%. For the emotional type "anxiety", it can be promoted from the former 82.3% to 85.8%. For the emotional type "lost", the emotional type can be promoted from 84.1% to 85.3%.
Fig. 4 is a schematic structural diagram of a statement processing apparatus according to an embodiment of the present invention. Referring to fig. 4, the sentence processing apparatus 10 may include a word segmentation module 11, a splitting module 12, and a determination module 13, wherein,
the word segmentation module 11 is configured to perform word segmentation processing on the first sentence to obtain at least one first word;
the splitting module 12 is configured to split the at least one first vocabulary, where the splitting is configured to split one vocabulary into at least two sub vocabularies;
the determining module 13 is configured to determine an emotion type of the first sentence according to the split at least one first vocabulary.
The statement processing apparatus provided in the embodiment of the present invention may execute the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
Fig. 5 is a schematic structural diagram of another statement processing apparatus according to an embodiment of the present invention. On the basis of the embodiment shown in fig. 4, referring to fig. 5, the sentence processing apparatus 10 may further include a first obtaining module 14, wherein,
the first obtaining module 14 is configured to obtain a preset vocabulary set, where the preset vocabulary set includes at least two sub vocabularies corresponding to a plurality of second vocabularies and/or a plurality of third vocabularies, and frequencies of the second vocabularies and the third vocabularies appearing in the preset sentence set are greater than a first frequency;
the splitting module 12 is specifically configured to, for any first vocabulary of the at least one first vocabulary, split the first vocabulary according to the first vocabulary and the preset vocabulary set.
In a possible implementation, the splitting module 12 is specifically configured to:
judging whether the plurality of second words or the plurality of third words comprises the first word or not;
if so, splitting the first vocabulary according to the preset vocabulary set;
if not, splitting the first vocabulary according to characters, wherein the sub-vocabulary of the first vocabulary after being split according to characters comprises one character.
In a possible implementation, the splitting module 12 is specifically configured to:
when the first vocabulary is the same as a second vocabulary in the preset vocabulary set, the first vocabulary is not split;
when the first vocabulary is the same as a third vocabulary corresponding to at least two sub vocabularies in the preset vocabulary set, splitting the first vocabulary into the at least two sub vocabularies.
In a possible implementation, the first obtaining module 14 is specifically configured to:
when the preset vocabulary set is included in a preset storage space, acquiring the preset vocabulary set in the preset storage space;
and when the preset vocabulary set is not included in the preset storage space, generating the preset vocabulary set.
In a possible implementation, the first obtaining module 14 is specifically configured to:
determining a first vocabulary set in the preset sentence set, wherein the occurrence frequency of vocabularies in the first vocabulary set in the preset sentence set is greater than the first frequency;
splitting the vocabulary in the first vocabulary set according to characters to obtain a second vocabulary set, wherein the second vocabulary set comprises sub vocabulary of each vocabulary in the first vocabulary set;
and generating the preset vocabulary set according to the second vocabulary set.
In a possible implementation, the first obtaining module 14 is specifically configured to:
merging the sub-vocabularies in the second vocabulary set, wherein the merging operation comprises the following steps: determining a sub-vocabulary pair in the second vocabulary set, and combining the sub-vocabulary pair into a vocabulary when the occurrence frequency of the sub-vocabulary pair in the preset sentence set is greater than a second frequency, wherein the vocabulary pair is two adjacent sub-vocabularies in a plurality of sub-vocabularies corresponding to the vocabulary;
and repeatedly executing the merging processing operation for N times, or repeatedly executing the merging processing operation until the number of the vocabularies included in the second vocabulary set is less than or equal to a preset threshold value, wherein N is an integer greater than 1.
In a possible implementation, the determining module 13 is specifically configured to:
processing the split at least one first vocabulary according to a preset model to obtain the emotion type of the first sentence, wherein the preset model is obtained according to a plurality of groups of sample data, and each group of sample data comprises a sample sentence and the emotion type corresponding to the sample sentence.
In a possible embodiment, the apparatus further comprises a second obtaining module 15 and a generating module 16, wherein,
the second obtaining module 15 is configured to obtain the multiple sets of sample data;
the word segmentation module 11 is further configured to perform word segmentation processing on each sample sentence in the multiple sets of sample data to obtain at least one fourth word corresponding to each sample sentence;
the splitting module 12 is further configured to split at least one fourth vocabulary corresponding to each sample sentence;
the generating module 16 is configured to generate the preset model according to the split at least one fourth vocabulary corresponding to each sample sentence and the emotion type corresponding to each sample sentence.
The statement processing apparatus provided in the embodiment of the present invention may execute the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
Fig. 6 is a schematic diagram of a hardware structure of a statement processing apparatus according to an embodiment of the present invention, and as shown in fig. 6, the statement processing 20 includes: at least one processor 21 and a memory 22. The processor 21 and the memory 22 are connected by a bus 23.
Optionally, the sentence processing apparatus 20 further comprises a communication component. The communication means may comprise a transmitter and/or a receiver.
In a specific implementation process, the at least one processor 21 executes the computer-executable instructions stored in the memory 22, so that the at least one processor 21 executes the statement processing method as described in the embodiments of fig. 2 to 3.
For a specific implementation process of the processor 21, reference may be made to the method embodiments shown in fig. 2 to fig. 3, which implement the principle and the technical effect similarly, and this embodiment is not described herein again.
In the embodiment shown in fig. 6, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The present application also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the statement processing method as described above is implemented.
The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.
The division of the units is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention. Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (20)

1. A sentence processing method, comprising:
performing word segmentation processing on the first sentence to obtain at least one first word;
splitting the at least one first vocabulary, wherein the splitting process is used for splitting one vocabulary into at least two sub vocabularies;
and determining the emotion type of the first sentence according to the at least one first vocabulary after the splitting processing.
2. The method of claim 1, wherein the splitting the at least one first vocabulary for any of the at least one first vocabulary comprises:
acquiring a preset vocabulary set, wherein the preset vocabulary set comprises at least two sub-vocabularies corresponding to a plurality of second vocabularies and/or a plurality of third vocabularies, and the frequency of the second vocabularies and the third vocabularies appearing in the preset sentence set is greater than a first frequency;
and splitting the first vocabulary according to the first vocabulary and the preset vocabulary set.
3. The method of claim 2, wherein the splitting the first vocabulary according to the first vocabulary and the preset vocabulary set comprises:
judging whether the plurality of second words or the plurality of third words comprises the first word or not;
if so, splitting the first vocabulary according to the preset vocabulary set;
if not, splitting the first vocabulary according to characters, wherein the sub-vocabulary of the first vocabulary after being split according to characters comprises one character.
4. The method of claim 3, wherein the splitting the first vocabulary according to the preset vocabulary set comprises:
when the first vocabulary is the same as a second vocabulary in the preset vocabulary set, the first vocabulary is not split;
when the first vocabulary is the same as a third vocabulary corresponding to at least two sub vocabularies in the preset vocabulary set, splitting the first vocabulary into the at least two sub vocabularies.
5. The method according to any one of claims 2-4, wherein the obtaining a preset vocabulary set comprises:
when the preset vocabulary set is included in a preset storage space, acquiring the preset vocabulary set in the preset storage space;
and when the preset vocabulary set is not included in the preset storage space, generating the preset vocabulary set.
6. The method of claim 5, wherein the generating the preset collection of words comprises:
determining a first vocabulary set in the preset sentence set, wherein the occurrence frequency of vocabularies in the first vocabulary set in the preset sentence set is greater than the first frequency;
splitting the vocabulary in the first vocabulary set according to characters to obtain a second vocabulary set, wherein the second vocabulary set comprises sub vocabulary of each vocabulary in the first vocabulary set;
and generating the preset vocabulary set according to the second vocabulary set.
7. The method of claim 6, wherein determining the predetermined set of words from the second set of words comprises:
merging the sub-vocabularies in the second vocabulary set, wherein the merging operation comprises the following steps: determining a sub-vocabulary pair in the second vocabulary set, and combining the sub-vocabulary pair into a vocabulary when the occurrence frequency of the sub-vocabulary pair in the preset sentence set is greater than a second frequency, wherein the vocabulary pair is two adjacent sub-vocabularies in a plurality of sub-vocabularies corresponding to the vocabulary;
and repeatedly executing the merging processing operation for N times, or repeatedly executing the merging processing operation until the number of the vocabularies included in the second vocabulary set is less than or equal to a preset threshold value, wherein N is an integer greater than 1.
8. The method according to any one of claims 1-7, wherein the determining the emotion type of the first sentence according to the at least one first vocabulary after the splitting process comprises:
processing the split at least one first vocabulary according to a preset model to obtain the emotion type of the first sentence, wherein the preset model is obtained according to a plurality of groups of sample data, and each group of sample data comprises a sample sentence and the emotion type corresponding to the sample sentence.
9. The method of claim 8, further comprising:
acquiring the multiple groups of sample data;
performing word segmentation processing on each sample sentence in the multiple groups of sample data to obtain at least one fourth vocabulary corresponding to each sample sentence;
splitting at least one fourth vocabulary corresponding to each sample sentence;
and generating the preset model according to the at least one split fourth vocabulary corresponding to each sample statement and the emotion type corresponding to each sample statement.
10. A sentence processing device is characterized by comprising a word segmentation module, a splitting module and a determination module, wherein,
the word segmentation module is used for performing word segmentation processing on the first sentence to obtain at least one first word;
the splitting module is used for splitting the at least one first vocabulary, and the splitting process is used for splitting one vocabulary into at least two sub-vocabularies;
the determining module is configured to determine an emotion type of the first sentence according to the split at least one first vocabulary.
11. The apparatus of claim 10, further comprising a first acquisition module, wherein,
the first acquisition module is used for acquiring a preset vocabulary set, wherein the preset vocabulary set comprises at least two sub-vocabularies corresponding to a plurality of second vocabularies and/or a plurality of third vocabularies, and the frequency of the second vocabularies and the third vocabularies appearing in the preset sentence set is greater than a first frequency;
the splitting module is specifically configured to split any one of the at least one first vocabulary according to the first vocabulary and the preset vocabulary set.
12. The apparatus of claim 11, wherein the splitting module is specifically configured to:
judging whether the plurality of second words or the plurality of third words comprises the first word or not;
if so, splitting the first vocabulary according to the preset vocabulary set;
if not, splitting the first vocabulary according to characters, wherein the sub-vocabulary of the first vocabulary after being split according to characters comprises one character.
13. The apparatus of claim 12, wherein the splitting module is specifically configured to:
when the first vocabulary is the same as a second vocabulary in the preset vocabulary set, the first vocabulary is not split;
when the first vocabulary is the same as a third vocabulary corresponding to at least two sub vocabularies in the preset vocabulary set, splitting the first vocabulary into the at least two sub vocabularies.
14. The apparatus according to any one of claims 11 to 13, wherein the first obtaining module is specifically configured to:
when the preset vocabulary set is included in a preset storage space, acquiring the preset vocabulary set in the preset storage space;
and when the preset vocabulary set is not included in the preset storage space, generating the preset vocabulary set.
15. The apparatus of claim 14, wherein the first obtaining module is specifically configured to:
determining a first vocabulary set in the preset sentence set, wherein the occurrence frequency of vocabularies in the first vocabulary set in the preset sentence set is greater than the first frequency;
splitting the vocabulary in the first vocabulary set according to characters to obtain a second vocabulary set, wherein the second vocabulary set comprises sub vocabulary of each vocabulary in the first vocabulary set;
and generating the preset vocabulary set according to the second vocabulary set.
16. The apparatus of claim 15, wherein the first obtaining module is specifically configured to:
merging the sub-vocabularies in the second vocabulary set, wherein the merging operation comprises the following steps: determining a sub-vocabulary pair in the second vocabulary set, and combining the sub-vocabulary pair into a vocabulary when the occurrence frequency of the sub-vocabulary pair in the preset sentence set is greater than a second frequency, wherein the vocabulary pair is two adjacent sub-vocabularies in a plurality of sub-vocabularies corresponding to the vocabulary;
and repeatedly executing the merging processing operation for N times, or repeatedly executing the merging processing operation until the number of the vocabularies included in the second vocabulary set is less than or equal to a preset threshold value, wherein N is an integer greater than 1.
17. The apparatus according to any one of claims 10 to 16, wherein the determining module is specifically configured to:
processing the split at least one first vocabulary according to a preset model to obtain the emotion type of the first sentence, wherein the preset model is obtained according to a plurality of groups of sample data, and each group of sample data comprises a sample sentence and the emotion type corresponding to the sample sentence.
18. The apparatus of claim 17, further comprising a second acquisition module and a generation module, wherein,
the second obtaining module is used for obtaining the multiple groups of sample data;
the word segmentation module is further used for performing word segmentation processing on each sample sentence in the multiple groups of sample data to obtain at least one fourth word corresponding to each sample sentence;
the splitting module is further used for splitting at least one fourth vocabulary corresponding to each sample sentence;
the generation module is used for generating the preset model according to the split at least one fourth vocabulary corresponding to each sample statement and the emotion type corresponding to each sample statement.
19. A sentence processing apparatus, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the statement processing method of any of claims 1 to 9.
20. A computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the sentence processing method of any of claims 1-9.
CN201910057044.0A 2019-01-22 2019-01-22 Statement processing method, device and equipment Pending CN111460805A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910057044.0A CN111460805A (en) 2019-01-22 2019-01-22 Statement processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910057044.0A CN111460805A (en) 2019-01-22 2019-01-22 Statement processing method, device and equipment

Publications (1)

Publication Number Publication Date
CN111460805A true CN111460805A (en) 2020-07-28

Family

ID=71683319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910057044.0A Pending CN111460805A (en) 2019-01-22 2019-01-22 Statement processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN111460805A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132300A1 (en) * 2014-11-12 2016-05-12 International Business Machines Corporation Contraction aware parsing system for domain-specific languages
CN106776566A (en) * 2016-12-22 2017-05-31 东软集团股份有限公司 The recognition methods of emotion vocabulary and device
CN106873801A (en) * 2017-02-28 2017-06-20 百度在线网络技术(北京)有限公司 Method and apparatus for generating the combination of the entry in input method dictionary
CN107945033A (en) * 2017-11-14 2018-04-20 李勇 A kind of analysis method of network public-opinion, system and relevant apparatus
CN108182173A (en) * 2017-12-27 2018-06-19 福建中金在线信息科技有限公司 A kind of method, apparatus and electronic equipment for extracting keyword

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132300A1 (en) * 2014-11-12 2016-05-12 International Business Machines Corporation Contraction aware parsing system for domain-specific languages
CN106776566A (en) * 2016-12-22 2017-05-31 东软集团股份有限公司 The recognition methods of emotion vocabulary and device
CN106873801A (en) * 2017-02-28 2017-06-20 百度在线网络技术(北京)有限公司 Method and apparatus for generating the combination of the entry in input method dictionary
CN107945033A (en) * 2017-11-14 2018-04-20 李勇 A kind of analysis method of network public-opinion, system and relevant apparatus
CN108182173A (en) * 2017-12-27 2018-06-19 福建中金在线信息科技有限公司 A kind of method, apparatus and electronic equipment for extracting keyword

Similar Documents

Publication Publication Date Title
CN105389722B (en) Malicious order identification method and device
CN108256098B (en) Method and device for determining emotional tendency of user comment
CN110569502A (en) Method and device for identifying forbidden slogans, computer equipment and storage medium
CN111737961B (en) Method and device for generating story, computer equipment and medium
CN115392235A (en) Character matching method and device, electronic equipment and readable storage medium
CN114116973A (en) Multi-document text duplicate checking method, electronic equipment and storage medium
CN110489674B (en) Page processing method, device and equipment
CN111079379A (en) Shape and proximity character acquisition method and device, electronic equipment and storage medium
CN113850386A (en) Model pre-training method, device, equipment, storage medium and program product
CN113934834A (en) Question matching method, device, equipment and storage medium
CN112184143B (en) Model training method, device and equipment in compliance audit rule
CN111507250B (en) Image recognition method, device and storage medium
CN112926471A (en) Method and device for identifying image content of business document
CN111353836B (en) Commodity recommendation method, device and equipment
CN112527967A (en) Text matching method, device, terminal and storage medium
CN111460805A (en) Statement processing method, device and equipment
CN110245224B (en) Dialog generation method and device
CN111523322A (en) Requirement document quality evaluation model training method and requirement document quality evaluation method
CN111507114A (en) Reverse translation-based spoken language text enhancement method and system
CN111179129A (en) Courseware quality evaluation method and device, server and storage medium
CN110879832A (en) Target text detection method, model training method, device and equipment
CN110795537B (en) Method, device, equipment and medium for determining improvement strategy of target commodity
CN111581347A (en) Sentence similarity matching method and device
CN110971759A (en) Processing method and device for unsubscribed short message and server
JP7195236B2 (en) Response evaluation device, response evaluation method, and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right

Effective date of registration: 20210528

Address after: 100176 room 1004, 10th floor, building 1, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Beijing Huijun Technology Co.,Ltd.

Address before: 100086 8th floor, 76 Zhichun Road, Haidian District, Beijing

Applicant before: BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY Co.,Ltd.

Applicant before: BEIJING JINGDONG CENTURY TRADING Co.,Ltd.

TA01 Transfer of patent application right
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination