CN110347818B - Word segmentation statistical method and device, electronic equipment and computer readable storage medium - Google Patents

Word segmentation statistical method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN110347818B
CN110347818B CN201910652117.0A CN201910652117A CN110347818B CN 110347818 B CN110347818 B CN 110347818B CN 201910652117 A CN201910652117 A CN 201910652117A CN 110347818 B CN110347818 B CN 110347818B
Authority
CN
China
Prior art keywords
consultation
word segmentation
phrases
user
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910652117.0A
Other languages
Chinese (zh)
Other versions
CN110347818A (en
Inventor
吴哲慧
张迪峰
陈璇斐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Technology Co Ltd
Original Assignee
Guangzhou Huya Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Technology Co Ltd filed Critical Guangzhou Huya Technology Co Ltd
Priority to CN201910652117.0A priority Critical patent/CN110347818B/en
Publication of CN110347818A publication Critical patent/CN110347818A/en
Application granted granted Critical
Publication of CN110347818B publication Critical patent/CN110347818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services

Abstract

The application provides a word segmentation statistical method, a word segmentation statistical device, electronic equipment and a computer readable storage medium, which are used for acquiring a consultation problem carried in a consultation request and the equipment type of a user side initiating the consultation request when the consultation request of a user is analyzed and processed. After the consultation problem is split to obtain a plurality of phrases, the obtained phrases and the equipment type are stored in an associated mode. Therefore, when the corresponding consultation answers are constructed based on the statistical phrases subsequently, the corresponding consultation answers can be constructed according to different equipment types, so that the targeted consultation answers suitable for different terminal equipment are formed, and the service quality of the intelligent customer service is improved.

Description

Word segmentation statistical method and device, electronic equipment and computer readable storage medium
Technical Field
The application relates to the technical field of intelligent customer service, in particular to a word segmentation statistical method, a word segmentation statistical device, electronic equipment and a computer-readable storage medium.
Background
With the development of human-computer interaction technology, intelligent customer service is widely applied in many fields, and the intelligent customer service technology is also based on the consultation information input by the user and can intelligently output corresponding reply contents. The intelligent customer service system can help to reduce the workload of manual customer service, and the working process of the intelligent customer service is generally to analyze the consultation content of the user in advance and construct a corresponding consultation answer so as to find a matched response and feed the response back to the user when the user consults.
As can be seen from the above description, the consultation content of the user needs to be analyzed in advance to construct a consultation answer, in the prior art, when a background server analyzes a consultation problem of the user, the background server only focuses on counting and recording the content included in the consultation problem, and does not consider a specific application scenario, so that the service quality needs to be improved.
Disclosure of Invention
In order to overcome at least the above-mentioned deficiencies in the prior art, an object of the present application is to provide a word segmentation statistical method, apparatus, electronic device and computer readable storage medium.
In a first aspect, an embodiment of the present invention provides a word segmentation statistical method, where the method includes:
receiving a consultation request, acquiring a consultation problem carried in the consultation request, and acquiring the equipment type of a user side initiating the consultation request;
performing word segmentation operation on the consultation problem based on a pre-established custom user dictionary, and splitting the consultation problem into a plurality of word groups;
and associating and storing the obtained multiple phrases and the equipment types.
In an alternative embodiment, the method further comprises:
counting the number of each phrase associated with each equipment type obtained in the first preset time interval every first preset time interval;
and obtaining hot phrases according to the obtained quantity of the phrases associated with each equipment type, so that the customer service personnel can construct corresponding consultation answers according to the hot phrases of each equipment type.
In an optional embodiment, after the step of performing a word segmentation operation on the query question based on a pre-established custom user dictionary and splitting the query question into a plurality of word groups, the method further includes:
obtaining a plurality of pre-established invalid words corresponding to the equipment types;
detecting whether each phrase in the obtained phrases can be matched with any one invalid word;
and for each phrase, if the phrase is matched with any one of the plurality of invalid words, deleting the phrase.
In an optional embodiment, after the step of performing a word segmentation operation on the query question based on a pre-established custom user dictionary and splitting the query question into a plurality of word groups, the method further includes:
and carrying out duplicate removal operation on the obtained multiple phrases.
In an alternative embodiment, the method further comprises:
aiming at each consultation request received within a second preset time, utilizing a pre-stored initial user dictionary to perform word segmentation operation on the consultation problem carried in each consultation request, and splitting the consultation problem into a plurality of word groups;
counting the number of each phrase obtained in the second preset time length, and constructing keywords based on the number of each phrase and the content of each phrase;
and adding the constructed keywords to the initial user dictionary to obtain the user-defined user dictionary.
In an optional embodiment, after the step of performing a word segmentation operation on the query question carried in each query request by using a pre-stored initial user dictionary and splitting the query question into a plurality of word groups, the method further includes:
obtaining the part of speech of each phrase;
and filtering the phrases of which the part of speech is the preset specific part of speech in the obtained plurality of phrases.
In an optional embodiment, before the step of counting the number of each obtained phrase within the second preset time period, the method further includes:
and caching the phrases obtained in the preset time interval at preset time intervals in the second preset time duration, and storing the cached phrases into a database when the preset time interval is ended.
In an alternative embodiment, the method further comprises:
searching whether a consultation answer which is consistent with the equipment type and corresponds to the consultation question exists;
if so, not processing the consultation request;
and if not, performing word segmentation operation on the consultation problem based on a pre-established user-defined dictionary, and splitting the consultation problem into a plurality of word groups.
In a second aspect, an embodiment of the present invention provides a word segmentation statistical apparatus, where the apparatus includes:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for receiving a consultation request, acquiring consultation problems carried in the consultation request and acquiring the equipment type of a user side initiating the consultation request;
the word segmentation module is used for carrying out word segmentation operation on the consultation problem based on a pre-established user-defined dictionary and splitting the consultation problem into a plurality of word groups;
and the storage module is used for associating and storing the obtained multiple phrases and the equipment types.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes one or more storage media and one or more processors in communication with the storage media, where the one or more storage media store machine-executable instructions executable by the processors, and when the electronic device runs, the processors execute the machine-executable instructions to perform the word segmentation statistical method described in any one of the foregoing embodiments.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores machine-executable instructions, and the machine-executable instructions, when executed, implement the word segmentation statistical method described in any one of the foregoing embodiments.
Compared with the prior art, the method has the following beneficial effects:
the word segmentation statistical method, the word segmentation statistical device, the electronic equipment and the computer readable storage medium can be used for acquiring the consultation problem carried in the consultation request and the equipment type of the user side initiating the consultation request when analyzing and processing the consultation request of the user. After the consultation problem is split to obtain a plurality of phrases, the obtained phrases and the equipment type are stored in an associated mode. Therefore, when the corresponding consultation answers are constructed based on the statistical phrases subsequently, the corresponding consultation answers can be constructed according to different equipment types, so that the targeted consultation answers suitable for different terminal equipment are formed, and the service quality of the intelligent customer service is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic view of an application scenario of the word segmentation statistical method provided in the embodiment of the present application.
Fig. 2 is a flowchart of a word segmentation statistical method according to an embodiment of the present application.
Fig. 3 is a flowchart of a method for establishing a custom user dictionary according to an embodiment of the present application.
Fig. 4 is a flowchart of a method for filtering out phrases based on phrase parts of speech according to the embodiment of the present application.
Fig. 5 is another flowchart of the word segmentation statistical method according to the embodiment of the present application.
Fig. 6 is a flowchart of a method for filtering out a phrase based on an invalid word according to an embodiment of the present application.
Fig. 7 is another flowchart of a word segmentation statistical method according to an embodiment of the present application.
Fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Fig. 9 is a functional block diagram of a word segmentation statistic apparatus according to an embodiment of the present application.
Icon: 100-a server; 110-a storage medium; 120-a processor; 130-word segmentation statistics means; 131-an acquisition module; 132-word segmentation module; 133-a storage module; 140-a communication interface; 200-user side.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Fig. 1 is a schematic view of an application scenario of the word segmentation statistical method according to the embodiment of the present application, where the scenario includes a server 100 and a plurality of clients 200 communicating with the server 100. Each user terminal 200 can run different applications, and each application can also run on different user terminals 200. The user can utilize the application program to realize corresponding application operation.
The server 100 is a background server 100 capable of providing intelligent customer service functions for each application program. For example, the user may initiate a consultation request to the server 100 by using the user terminal 200, and the server 100 may analyze a question in the consultation request initiated by the user to obtain a corresponding consultation answer and feed the consultation answer back to the user terminal 200 for providing to the user.
In this embodiment, the user terminal 200 may be a terminal device of different device types, such as, but not limited to, a smart phone, a personal digital assistant, a tablet computer, a personal computer, a notebook computer, a virtual reality terminal device, an augmented reality terminal device, and the like. The various types of end devices can be divided into, for example, a first device type user end 200, a second device type user end 200, and the like. Each user end 200 may be installed with an internet product for providing various application functions, for example, the internet product may be an application APP, a Web page, an applet, and the like related to the application functions used in a computer or a smart phone.
Fig. 2 is a flowchart illustrating a word segmentation statistical method provided in an embodiment of the present application, which can be executed by the server 100 shown in fig. 1. It should be understood that, in other embodiments, the order of some steps in the word segmentation statistical method of the present embodiment may be interchanged according to actual needs, or some steps may be omitted or deleted. The detailed steps of the word segmentation statistical method are introduced as follows.
Step S210, receiving a consultation request, obtaining a consultation problem carried in the consultation request, and obtaining the device type of the user end 200 initiating the consultation request.
Step S220, performing word segmentation operation on the consultation problem based on a pre-established custom user dictionary, and splitting the consultation problem into a plurality of word groups.
And step S230, associating and storing the obtained multiple phrases and the equipment types.
Since the questions consulted by the user are various and the same consultation question is consulted, different users may input different specific consultation contents due to different grammar habits and different expression habits. When various consultation problems of a user are faced, if the consultation problems in different forms need to be analyzed and corresponding consultation answers are established, a great processing burden is caused.
In the embodiment, in order to accurately analyze the user's query problem and reduce the processing amount of information, the established user-defined dictionary is used for performing word segmentation operation on the query problem initiated by the user to obtain a plurality of word groups, and key information can be extracted from a large number of query problems in different forms. And then, corresponding consultation answers can be constructed by counting the specific information of the phrases obtained by word segmentation operation so as to reduce the information processing amount.
In addition, since the related functional components of the user terminal 200 of different device types are different, the running information of the application program in actual running may also be different for the same type of application program. Therefore, when the same application runs on the user terminal 200 of different device types, the information that the user needs to know may also be different. Taking a smart phone and a personal computer as examples, when the same application program runs on the smart phone and the personal computer respectively, because related functional components of the personal computer are more complete, more information can be displayed, and relatively speaking, some information may not be displayed on the smart phone. The user operating the application with the smartphone may often initiate some consultation problems that the user with the personal computer does not need to consult.
It can be seen that it is necessary to analyze the query questions initiated by the user terminals 200 of different device types, so that the specific query answers can be obtained subsequently, and the method and the system can be applied to different device types.
In addition, there is a difference in the problem that the user needs to consult with the applications of different application types running on the user terminal 200 of the same device type. For example, a user may often initiate a consultation on how to recharge the game pieces for some application types, and how to turn on the XXX function for other application types.
Therefore, it is necessary to analyze the query questions related to the ue 200 with different device types and the applications with different application types respectively, so as to adapt to different device types and application types.
In this embodiment, the device type of the user terminal 200 that initiates the consultation request may be obtained. The device type can be determined through a receiving port for receiving the consultation request, or the device type can be distinguished through a flag bit carried in the consultation request.
After the word segmentation operation is performed on the user consultation problem to obtain a plurality of phrases, the obtained phrases are stored in association with the device type of the user side 200 initiating the consultation problem. Or the obtained phrases, the device type of the user end 200, and the application type of the application program currently running on the user end 200 are stored in an associated manner. Therefore, the statistical analysis can be carried out on the consultation problems of different equipment types and different application types respectively in the follow-up process, and the targeted consultation answers can be obtained.
When the word segmentation operation is performed on the consultation problem, the pre-established user-defined user dictionary is used for splitting, and the process of establishing the pre-established user-defined user dictionary is explained below, please refer to fig. 3 in combination.
Step S310, aiming at each consultation request received within a second preset time, utilizing a pre-stored initial user dictionary to perform word segmentation operation on the consultation problem carried in each consultation request, and splitting the consultation problem into a plurality of word groups.
Step S320, counting the number of each phrase obtained within the second preset time period, and constructing a keyword based on the number of each phrase and the content of each phrase.
And step S330, adding the constructed keywords into the initial user dictionary to obtain the user-defined user dictionary.
In this embodiment, the server 100 stores an initial user dictionary in advance, where the initial user dictionary may be a user dictionary of a word segmentation tool in the server 100. The word segmentation tool can be a JIEBA word segmentation tool, a HanLP word segmentation tool or a SonwNLP word segmentation tool which are commonly used. In this embodiment, a JIEBA word segmentation tool may be used to perform word segmentation. In implementation, the JIEBA segmentation tool may be invoked to perform subsequent segmentation operations by:
List<SegToken>list=
segmenter.process(content,JiebaSegmenter.SegMode.SEARCH);
mode==JiebaSegmenter.SegMode.SEARCH;
the initial user dictionary in the word segmentation tool is a corpus, and the corpus comprises a plurality of calibrated characters and words. The segmentation operation can be performed on the consultation problem input by the user based on the calibrated characters and words in the initial user dictionary.
However, the initial user dictionary pre-stored in the server 100 is a general-purpose conventional dictionary, and the calibrated words and words contained therein are not completely set for the application programs referred to in the present application. Therefore, the initial user dictionary may not contain some words or phrases that are commonly used by the application programs in the present application. For example, the user-initiated consultation question is "please ask how the coins are recharged? "when the initial user dictionary is used to perform the word segmentation operation on the consultation problem," game "and" medal "are marked because the initial user dictionary is not marked with" game medal ". The consultation question may be split into "ask," game, "coin," how, "charge," "could be" when the initial user dictionary is used for word segmentation? ". Therefore, the phrases segmented by the initial user dictionary may not completely reflect the consulting information of the user.
Therefore, in this embodiment, the segmentation results of the query problem within a period of time are recorded and counted, and the statistical information is provided to the customer service staff, so that the customer service staff correspondingly sets some keywords according to the statistical results, and updates the initial user dictionary to continuously perfect the user dictionary for segmentation operation, so that the phrases obtained by segmenting the words in the user dictionary can better embody the query intention of the user.
Besides splitting the consultation problem into a plurality of word groups by using the initial user dictionary, the part of speech of each word group can be calibrated, such as adjectives, quantifiers, auxiliary words, exclamation words, nouns, verbs and the like. Words with some parts of speech do not help to improve intelligent customer service, and if the words with the parts of speech are all stored, a large storage resource is occupied.
In view of the above, in one possible implementation, referring to fig. 4, the word segmentation statistical method further includes the following steps:
step S410, the part of speech of each phrase is obtained.
Step S420, filtering out phrases with parts of speech being preset specific parts of speech from the obtained phrases.
In this embodiment, some specific parts of speech may be preset, which are some words without practical meaning, such as adjectives, sighs, auxiliary words, and symbols. After the consultation problem is divided into a plurality of phrases by using the words and the words marked in the initial user dictionary, the part of speech of each phrase in the plurality of phrases obtained by splitting can be obtained according to the part of speech of the words and the words marked in the initial user dictionary. And filtering out the phrases with the part of speech being the preset specific part of speech in the plurality of phrases. Therefore, the resource occupation can be reduced on the basis of not influencing the analysis of the consultation problem.
For example, the phrases divided using the initial user dictionary include "ask," game, "" coin, "" how, "" charge, ""? "what" and the symbol "of the part of speech in the word group obtained by splitting" and the predetermined specific part of speech include the adverb and the symbol "? "will be filtered out. The phrase obtained after filtering includes "ask for questions", "games", "coins" and "recharge". It should be understood that the description is only exemplary, and the actual implementation can be adjusted according to specific situations.
In a possible implementation manner, the phrases in the obtained phrases that are the same as the set characteristic part of speech can be filtered out through the following pseudo codes:
Set<String>wordSet=new HashSet<>();
for(SegToken token:list){
String properties=token.properties;
if(CutConstUtil.isValueProperties(properties)){
wordSet.add(token.word);
}
}
in this embodiment, the related information of the word group obtained by the word segmentation operation needs to be stored in the database finally, but the consultation request of the user received by the server 100 is often concurrent, and if the user needs to access the database for data storage after each word segmentation operation, the storage speed is slow.
In view of the above considerations, in this embodiment, in the second preset duration, the phrases obtained in the preset duration may be buffered at preset time intervals, for example, half an hour or an hour. And storing the cached phrases into a database when the preset time period is over. Because the data is cached in the memory at a relatively high writing speed or reading speed, the database does not need to be continuously accessed, and thus, the processing speed of the information can be accelerated.
After the analysis and statistics of the consultation problems within the second preset duration are continuously carried out, the information such as the number of each phrase obtained by word segmentation of the initial user dictionary and the latest recorded time is recorded in the database. The information can be provided for customer service staff, so that the customer service staff can set keywords correspondingly according to the consultation condition of the user and the word segmentation condition of the initial user dictionary in the period of time.
For example, if the number of phrases such as "game," "medal" and the like counted within the second preset time period is large, the customer service staff may set the phrase "game medal" as the keyword accordingly. Keywords set by customer service personnel can be added to the initial user dictionary to obtain the customized user dictionary. Therefore, some phrases related to the application program can be continuously added into the user dictionary, and a word segmentation result which is more accurate and more adaptive to the application program can be obtained when a word segmentation operation is carried out based on the user dictionary subsequently.
For example, after adding the keyword "gamepiece" in the custom user dictionary, then "please ask how to recharge the gamepiece? When the user selects the consultation question, the user can select the user to use the user dictionary to perform word segmentation operation on the consultation question, and the obtained word segmentation result is 'ask' the question ',' game coin ',' how ',' recharge ',' is? ". Therefore, compared with the word segmentation result obtained by segmenting the consultation problem by using the initial user dictionary, the word segmentation result based on the user-defined user dictionary is more consistent with the consultation intention of the user.
It should be understood that after the word segmentation operation is performed based on the initial user dictionary and the customized user dictionary is established according to the word segmentation result and the keywords set by the customer service staff, the customized user dictionary is continuously updated during actual implementation so as to continuously improve the customized user dictionary.
In one possible implementation, the user dictionary may be updated by the following pseudo code:
Set<String>DictList=redisService.smembers(unmatchedDictKey);
WordDictionary dictionary=WordDictionary.getInstance();
dictionary.resetDict();
dictionary.loadUserDictFromDatabase(DictList);
the construction process of the user-defined user dictionary is explained above, and the obtained user-defined user dictionary is used for segmenting words of the consultation questions of the user so as to provide guidance for customer service staff to construct corresponding consultation answers.
It should be understood that the present embodiment mainly aims at resolving the user's consultation question to construct missing consultation answers. Therefore, after receiving the consultation request of the user and obtaining the consultation problem carried in the consultation request, the device type of the user terminal 200 initiating the consultation request, and the application type of the application program currently running on the user terminal 200, the word segmentation statistical method provided in this embodiment further includes the following steps, please refer to fig. 5:
step S510, searching whether there is a consultation answer that is consistent with the equipment type and corresponds to the consultation question, and if so, executing the following step S520. If not, the above step S220 is executed.
Step S520, the consultation request is not processed.
If there is no consultation answer corresponding to the consultation question and consistent with the device type of the user terminal 200 initiating the consultation request in the server 100, performing word segmentation operation on the consultation question based on the established custom user dictionary.
In the multiple phrases obtained by performing the word segmentation operation on one consultation problem, a phenomenon of repeated phrases may exist, the repeated phrases are kept without help for understanding the consultation problem, and excessive resources need to be occupied instead, so that in the embodiment, the multiple phrases obtained can be subjected to the duplication removal operation.
For example, if the consult question is "about the medal, ask how to recharge the medal? After word segmentation operation is performed by using the customized user dictionary, the obtained word group comprises ' about ', ' game coin ', ' please ', ' game coin ', ' how ', ' recharge ', ' is? ". Wherein, the word segmentation result comprises two repeated word groups of game coins. In this case, the duplication removal process can be performed, and only one "token" phrase is reserved.
Besides the duplication removing operation of the obtained word segmentation result, some meaningless word groups can be filtered according to the operation, so that only the word groups containing key information are reserved, and the statistical information is more concise.
Further, during the consultation process, the consultation question may contain some phrases whose parts of speech are not meaningless as set, but which themselves are not actually helpful for understanding the consultation question. For example, for an application, the name of the application is included in the query question, and although the part of speech of the name of the application is a noun, the name of the application in the query question has no practical meaning for understanding the query question.
For the above problem, in the present embodiment, a plurality of invalid words may be created in advance and stored in the server 100. Corresponding invalid words can be established respectively for different equipment types and different application types, and the established invalid words can be stored in a form of a set.
As a possible implementation manner, please refer to fig. 6, in which the word segmentation statistical method provided in this embodiment further includes the following steps:
step S610, obtaining a plurality of pre-established invalid words corresponding to the device type.
Step S620, detecting whether each of the obtained phrases can match any one of the obtained invalid words.
Step S630, for each phrase, if the phrase matches any one of the plurality of invalid words, the phrase is deleted.
For each phrase in the plurality of phrases, if the phrase can be matched with any invalid word, the phrase is indicated to be the same as the invalid word, and the phrase can be deleted. Thus, the phrase required to record statistics is further simplified, and only the phrase containing key information is reserved.
In one possible implementation, the determination of the invalid word may be made by the following pseudo code:
String notUnmatchedKey=
String.format(Const.SET_NOT_UNMATCHED_KEY,busiType,appid);
Set<String>notUseWordList=
redisService.smembers(notUnmatchedKey);
for(String word:wordSet){
if(notUseWordList.contains(word)){
continue;
}
......
}
after a period of time of analysis and statistics is carried out on the consultation problems of the user, the customer service staff can construct consultation answers based on some problems frequently consulted by the user in the period of time so as to meet the requirements of the user.
Referring to fig. 7, the word segmentation statistical method provided in this embodiment further includes the following steps:
step S240, counting the number of each phrase associated with each device type obtained in the first preset duration every first preset duration.
Step S250, obtaining hot phrases according to the obtained number of each phrase associated with each device type, so that the customer service personnel can construct corresponding consulting answers according to the hot phrases of each device type.
As can be seen from the above, when storing the information related to the consultation problem, the multiple phrases obtained by word segmentation and the device types thereof are stored in an associated manner, or the multiple phrases, the device types and the application types are stored in an associated manner. An application of one application type can run on the user end 200 of a plurality of different device types, and the user end 200 of the same device type can run applications of a plurality of different application types. For example, if the user terminal 200 is characterized by S, the user terminals 200 of different device types can be represented by S1, S2, S3 … …, and the application programs are characterized by Y, the application programs of different application types can be represented by Y1, Y2, Y3 … …. When the word segmentation results are counted, the word segmentation results of the consultation problems related to the same equipment type and application type can be superposed.
For example, for a plurality of query questions related to the device type S1 and the application type Y1, the phrases "coins" and "recharge" in the phrases obtained by performing the above-mentioned segmentation, filtering out phrases with specific parts of speech, and repeating the process of removing the duplicate of the phrases are more. It indicates that the application program of the application type Y1 is more concerned with how the medal is charged when running on the user terminal 200 of the device type S1. Similarly, for a plurality of query questions related to the device type S2 and the application type Y1, the number of phrases "message box" and "minimum" in the plurality of phrases obtained by statistics is larger. It indicates that the application program of the application type Y1 is running on the user side 200 of the device type S2, and more of the questions the user consults are about how to minimize the message box.
Of course, it should be understood that the above description is only for illustrating the possible situations in the statistical result, and is not limited thereto, and the specific situation may be set according to the situation in the actual implementation.
When the statistical result is displayed, the words can be arranged from high to low according to the number of word group statistics, and the word group arranged in the front or in the front can be used as a hot word group to be provided for customer service personnel. After obtaining the hot phrases, the customer service personnel can construct consultation answers based on the hot phrases. The constructed consultation answers can be suitable for the device type of the user terminal 200 and the application type of the corresponding application program. When the user performs the operation based on the consultation answers fed back by the smart customer service, the operation process is completely applicable to the user terminal 200 of the operated device type and the application program of the application type.
Referring to fig. 8, a schematic diagram of exemplary components of an electronic device according to an embodiment of the present application is provided, where the electronic device may be the server 100 shown in fig. 1. The electronic device may include a storage medium 110, a processor 120, a word segmentation statistics apparatus 130, and a communication interface 140. In this embodiment, the storage medium 110 and the processor 120 are both located in the electronic device and are separately disposed. However, it should be understood that the storage medium 110 may be separate from the electronic device and may be accessed by the processor 120 through a bus interface. Alternatively, the storage medium 110 may be integrated into the processor 120, for example, may be a cache and/or general purpose registers.
The word segmentation statistic device 130 may be understood as the electronic device, or the processor 120 of the electronic device, or may be understood as a software functional module that is independent of the electronic device or the processor 120 and implements the word segmentation statistic method under the control of the electronic device.
As shown in fig. 9, the word segmentation statistical apparatus 130 may include an obtaining module 131, a word segmentation module 132, and a storage module 133, and the functions of the functional modules of the word segmentation statistical apparatus 130 are described in detail below.
The obtaining module 131 is configured to receive a consultation request, obtain a consultation problem carried in the consultation request, and obtain a device type of the user end 200 initiating the consultation request. It is understood that the obtaining module 131 may be configured to perform the step S210, and for a detailed implementation of the obtaining module 131, reference may be made to the content related to the step S210.
And the word segmentation module 132 is configured to perform word segmentation on the query question based on a pre-established user-defined user dictionary, and split the query question into a plurality of word groups. It is understood that the word segmentation module 132 can be used to perform the step S220, and the detailed implementation manner of the word segmentation module 132 can refer to the content related to the step S220.
The storage module 133 is configured to associate and store the obtained multiple phrases and the device type. It is understood that the storage module 133 can be used to execute the step S230, and for the detailed implementation of the storage module 133, reference can be made to the content related to the step S230.
Further, an embodiment of the present application also provides a computer-readable storage medium, where machine-executable instructions are stored in the computer-readable storage medium, and when the machine-executable instructions are executed, the word segmentation statistical method provided in the foregoing embodiment is implemented.
To sum up, the word segmentation statistical method, the word segmentation statistical device, the electronic device, and the computer-readable storage medium provided in the embodiments of the present application obtain the consultation problem carried in the consultation request and the device type of the user end 200 initiating the consultation request when analyzing and processing the consultation request of the user. After the consultation problem is split to obtain a plurality of phrases, the obtained phrases and the equipment type are stored in an associated mode. Therefore, when the corresponding consultation answers are constructed based on the statistical phrases subsequently, the corresponding consultation answers can be constructed according to different equipment types, so that the targeted consultation answers suitable for different terminal equipment are formed, and the service quality of the intelligent customer service is improved.
The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. A word segmentation statistical method, the method comprising:
receiving a consultation request, acquiring a consultation problem carried in the consultation request, and acquiring the equipment type of a user side initiating the consultation request;
performing word segmentation operation on the consultation problem based on a pre-established custom user dictionary, and splitting the consultation problem into a plurality of word groups;
associating and storing the obtained multiple phrases and the equipment types;
and constructing consultation answers corresponding to different equipment types based on a plurality of phrases associated with different equipment types, wherein the consultation answers are constructed by hot phrases of all equipment types, and the hot phrases are determined by the number of phrases associated with the equipment types.
2. The word segmentation statistical method according to claim 1, wherein the method further comprises:
counting the number of each phrase associated with each equipment type obtained in the first preset time interval every first preset time interval;
and obtaining hot phrases according to the obtained quantity of the phrases associated with each equipment type, so that the customer service personnel can construct corresponding consultation answers according to the hot phrases of each equipment type.
3. The word segmentation statistical method according to claim 1, wherein after the step of performing word segmentation operation on the consultation problem based on a pre-established custom user dictionary and splitting the consultation problem into a plurality of word groups, the method further comprises:
obtaining a plurality of pre-established invalid words corresponding to the equipment types;
detecting whether each phrase in the obtained phrases can be matched with any one invalid word;
and for each phrase, if the phrase is matched with any one of the plurality of invalid words, deleting the phrase.
4. The word segmentation statistical method according to claim 1, wherein after the step of performing word segmentation operation on the consultation problem based on a pre-established custom user dictionary and splitting the consultation problem into a plurality of word groups, the method further comprises:
and carrying out duplicate removal operation on the obtained multiple phrases.
5. The word segmentation statistical method according to claim 1, wherein the method further comprises:
aiming at each consultation request received within a second preset time, utilizing a pre-stored initial user dictionary to perform word segmentation operation on the consultation problem carried in each consultation request, and splitting the consultation problem into a plurality of word groups;
counting the number of each phrase obtained in the second preset time length, and constructing keywords based on the number of each phrase and the content of each phrase;
and adding the constructed keywords to the initial user dictionary to obtain the user-defined user dictionary.
6. The word segmentation statistical method according to claim 5, wherein after the step of performing word segmentation operation on the consultation problem carried in each consultation request by using a pre-stored initial user dictionary and splitting the consultation problem into a plurality of word groups, the method further comprises:
obtaining the part of speech of each phrase;
and filtering the phrases of which the part of speech is the preset specific part of speech in the obtained plurality of phrases.
7. The word segmentation statistical method according to claim 5, wherein before the step of counting the number of each obtained word group within the second preset time period, the method further comprises:
and caching the phrases obtained in the preset time interval at preset time intervals in the second preset time duration, and storing the cached phrases into a database when the preset time interval is ended.
8. The word segmentation statistical method according to any one of claims 1 to 7, wherein the method further comprises:
searching whether a consultation answer which is consistent with the equipment type and corresponds to the consultation question exists;
if so, not processing the consultation request;
and if not, performing word segmentation operation on the consultation problem based on a pre-established user-defined dictionary, and splitting the consultation problem into a plurality of word groups.
9. A word segmentation statistics apparatus, characterized in that the apparatus comprises:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for receiving a consultation request, acquiring consultation problems carried in the consultation request and acquiring the equipment type of a user side initiating the consultation request;
the word segmentation module is used for carrying out word segmentation operation on the consultation problem based on a pre-established user-defined dictionary and splitting the consultation problem into a plurality of word groups;
and the storage module is used for associating and storing the obtained multiple phrases and the equipment types, and constructing consultation answers corresponding to different equipment types based on the multiple phrases associated with different equipment types, wherein the consultation answers are constructed by hot phrases of all equipment types, and the hot phrases are determined by the number of all phrases associated with the equipment types.
10. An electronic device comprising one or more storage media and one or more processors in communication with the storage media, the one or more storage media storing processor-executable machine-executable instructions that, when executed by the electronic device, are executed by the processors to perform the word segmentation statistical method recited in any one of claims 1-8.
11. A computer-readable storage medium having stored thereon machine-executable instructions which, when executed, implement the word segmentation statistical method of any one of claims 1-8.
CN201910652117.0A 2019-07-18 2019-07-18 Word segmentation statistical method and device, electronic equipment and computer readable storage medium Active CN110347818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910652117.0A CN110347818B (en) 2019-07-18 2019-07-18 Word segmentation statistical method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910652117.0A CN110347818B (en) 2019-07-18 2019-07-18 Word segmentation statistical method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110347818A CN110347818A (en) 2019-10-18
CN110347818B true CN110347818B (en) 2022-03-25

Family

ID=68179349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910652117.0A Active CN110347818B (en) 2019-07-18 2019-07-18 Word segmentation statistical method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110347818B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1544747A3 (en) * 2003-12-19 2006-05-31 Xerox Corporation Systems and methods for normalization of linguistic structures
CN107992513A (en) * 2017-10-25 2018-05-04 中兴通讯股份有限公司 A kind of information processing system and its method for realizing information processing
CN108153798A (en) * 2016-12-02 2018-06-12 阿里巴巴集团控股有限公司 Page info processing method, apparatus and system
CN109086352A (en) * 2018-07-17 2018-12-25 深圳市艾贝比品牌管理咨询有限公司 Consultation information feedback method, terminal and storage medium based on artificial intelligence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1544747A3 (en) * 2003-12-19 2006-05-31 Xerox Corporation Systems and methods for normalization of linguistic structures
CN108153798A (en) * 2016-12-02 2018-06-12 阿里巴巴集团控股有限公司 Page info processing method, apparatus and system
CN107992513A (en) * 2017-10-25 2018-05-04 中兴通讯股份有限公司 A kind of information processing system and its method for realizing information processing
CN109086352A (en) * 2018-07-17 2018-12-25 深圳市艾贝比品牌管理咨询有限公司 Consultation information feedback method, terminal and storage medium based on artificial intelligence

Also Published As

Publication number Publication date
CN110347818A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN110765244B (en) Method, device, computer equipment and storage medium for obtaining answering operation
CN106776544B (en) Character relation recognition method and device and word segmentation method
US9582757B1 (en) Scalable curation system
US10162823B2 (en) Populating user contact entries
CN110597952A (en) Information processing method, server, and computer storage medium
CN110069698B (en) Information pushing method and device
CN110297880B (en) Corpus product recommendation method, apparatus, device and storage medium
WO2016036851A1 (en) Method and system for determining edit rules for rewriting phrases
CN112651236B (en) Method and device for extracting text information, computer equipment and storage medium
CN109634436B (en) Method, device, equipment and readable storage medium for associating input method
US20200012650A1 (en) Method and apparatus for determining response for user input data, and medium
US11423219B2 (en) Generation and population of new application document utilizing historical application documents
CN113343108A (en) Recommendation information processing method, device, equipment and storage medium
CN110187780B (en) Long text prediction method, long text prediction device, long text prediction equipment and storage medium
CN110738056B (en) Method and device for generating information
CN111737443B (en) Answer text processing method and device and key text determining method
CN110929014B (en) Information processing method, information processing device, electronic equipment and storage medium
CN110427626B (en) Keyword extraction method and device
CN109033082B (en) Learning training method and device of semantic model and computer readable storage medium
CN111460177A (en) Method and device for searching film and television expression, storage medium and computer equipment
CN110347818B (en) Word segmentation statistical method and device, electronic equipment and computer readable storage medium
CN111046168A (en) Method, apparatus, electronic device, and medium for generating patent summary information
CN108763258B (en) Document theme parameter extraction method, product recommendation method, device and storage medium
WO2019231635A1 (en) Method and apparatus for generating digest for broadcasting
CN115510247A (en) Method, device, equipment and storage medium for constructing electric carbon policy knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant