CN112527955A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN112527955A
CN112527955A CN202011403234.2A CN202011403234A CN112527955A CN 112527955 A CN112527955 A CN 112527955A CN 202011403234 A CN202011403234 A CN 202011403234A CN 112527955 A CN112527955 A CN 112527955A
Authority
CN
China
Prior art keywords
word
target
generalization
generalized
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011403234.2A
Other languages
Chinese (zh)
Inventor
孙仿逊
胡梓垣
翁志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Motors Technology Co Ltd
Guangzhou Chengxingzhidong Automotive Technology Co., Ltd
Original Assignee
Guangzhou Xiaopeng Motors Technology Co Ltd
Guangzhou Chengxingzhidong Automotive Technology Co., Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Motors Technology Co Ltd, Guangzhou Chengxingzhidong Automotive Technology Co., Ltd filed Critical Guangzhou Xiaopeng Motors Technology Co Ltd
Priority to CN202011403234.2A priority Critical patent/CN112527955A/en
Publication of CN112527955A publication Critical patent/CN112527955A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Abstract

The embodiment of the invention provides a data processing method and a data processing device, wherein the method comprises the following steps: presetting a plurality of entities and generalization word sets thereof aiming at a vehicle-mounted scene; acquiring user actual data, and determining a target generalization word from the user actual data; and determining a target entity corresponding to the target generalization word, and adding the target generalization word into a generalization word set corresponding to the target entity so as to determine a standard word corresponding to the target entity when the target generalization word is recognized by voice. According to the embodiment of the invention, the practical word generalization is carried out based on the practical data of the user, the corresponding target entity is determined by aiming at the target generalization word in the practical data of the user, and the target generalization word is added into the generalization word set corresponding to the target entity, so that more user habit statements can be excavated based on the practical word generalization, the generalization performance of natural language understanding is enhanced, and the test set of the vehicle-mounted dialogue system is expanded.

Description

Data processing method and device
Technical Field
The present invention relates to the field of vehicle technologies, and in particular, to a method and an apparatus for processing data.
Background
With the development of smart cars, the internet of vehicle-mounted systems and the richness of system application functions, for a speech interaction scene of the vehicle-mounted systems, NLU (Natural Language Understanding) is generally adopted to recognize speech to understand user requirements.
However, in a vehicle-mounted scene, there are many special entity words, and the speech expression modes of the user are various, and the vehicle-mounted dialogue system cannot accurately recognize the intention of the user through speech understanding, for example, the standard saying is "open seat heating", and the user generalized saying is "open seat heating", so that the meaning of the user saying is difficult to understand by the vehicle-mounted dialogue system.
Disclosure of Invention
In view of the above, it is proposed to provide a method and apparatus for data processing that overcomes or at least partially solves the above mentioned problems, comprising:
a method of data processing, the method comprising:
presetting a plurality of entities and generalization word sets thereof aiming at a vehicle-mounted scene;
acquiring user actual data, and determining a target generalization word from the user actual data;
and determining a target entity corresponding to the target generalization word, and adding the target generalization word into a generalization word set corresponding to the target entity so as to determine a standard word corresponding to the target entity when the target generalization word is recognized by voice.
Optionally, the determining a target entity corresponding to the target generalization word includes:
determining similar generalization words aiming at the target generalization words according to the entities and the generalization word set thereof;
and determining the entity corresponding to the similar generalization word as a target entity corresponding to the target generalization word.
Optionally, the determining, according to the plurality of entities and the generalization word set thereof, a similar generalization word for the target generalization word includes:
determining one or more candidate generalization words aiming at the target generalization word according to the entities and the generalization word set thereof;
determining, from the one or more candidate generalized words, a similar generalized word for the target generalized word.
Optionally, before the determining a target entity corresponding to the target generalization word and adding the target generalization word to a set of generalization words corresponding to the target entity, the method further includes:
judging whether the target generalized word exists in the plurality of entities and the generalized word set thereof;
and when the target generalized word is judged not to exist in the plurality of entities and the generalized word set thereof, executing the step of determining the target entity corresponding to the target generalized word, and adding the target generalized word into the generalized word set corresponding to the target entity.
Optionally, the obtaining of the user actual data and determining the target generalization word from the user actual data include:
acquiring user actual data, and filtering the user actual data;
and performing generalized word extraction on the filtered actual data of the user, and determining a target generalized word.
Optionally, the method further comprises:
receiving and analyzing voice interaction information;
when the target generalized word is identified, determining a standard word corresponding to the target entity;
and generating vehicle control information aiming at the voice interaction information according to the standard words corresponding to the target entity.
Optionally, the vehicle control information includes any one of:
interactive instruction information, standard word prompt information and voice reply information.
An apparatus for data processing, the apparatus comprising:
the system comprises a plurality of entities and generalization word set presetting modules thereof, a generalization word set presetting module and a generalization word set presetting module, wherein the entities and the generalization word set presetting module are used for presetting a plurality of entities and generalization word sets thereof aiming at a vehicle-mounted scene;
the target generalization word determining module is used for acquiring user actual data and determining a target generalization word from the user actual data;
and the target generalized word adding module is used for determining a target entity corresponding to the target generalized word and adding the target generalized word into a generalized word set corresponding to the target entity so as to determine a standard word corresponding to the target entity when the target generalized word is recognized by voice.
A server comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing a method of data processing as described above.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of data processing as described above.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, a plurality of entities and generalization word sets thereof aiming at a vehicle-mounted scene are preset, then user actual data are obtained, a target generalization word is determined from the user actual data, a target entity corresponding to the target generalization word is further determined, and the target generalization word is added into the generalization word set corresponding to the target entity, so that a standard word corresponding to the target entity is determined when the target generalization word is recognized by voice, the generalization of the entity is carried out based on the user actual data, the corresponding target entity is determined according to the target generalization word in the user actual data, and the target generalization word is further added into the generalization word set corresponding to the target entity, so that more user habit statements can be excavated based on the generalization of the entity, the generalization performance of natural language understanding is enhanced, and a test set of a vehicle-mounted dialogue system is expanded.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the description of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flow chart illustrating steps of a method for data processing according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an example of a plurality of entities and their generalized word sets according to an embodiment of the present invention;
FIG. 3 is a flow chart of steps in another method of data processing according to an embodiment of the invention;
FIG. 4 is a flow chart of steps in another method of data processing according to an embodiment of the invention;
fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart illustrating steps of a data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 101, presetting a plurality of entities and generalization word sets thereof aiming at a vehicle-mounted scene;
in the process of generalization of the entity words, a plurality of entities and generalization word sets thereof aiming at the vehicle-mounted scene can be preset, and further, the generalization of the entity words can be performed according to the plurality of entities and the generalization word sets thereof.
Specifically, a plurality of entities and generalization word sets thereof in a vehicle-mounted scene, such as basic entities and generalization word lists thereof, can be constructed offline in an artificial generalization manner or by using a generalization tool, and further, entity word generalization can be performed on the basic entities and the generalization word lists thereof in combination with online mining of user actual data.
For example, a plurality of entities and their generalization word sets may be constructed offline, through a third party synonym table, or using a generalization tool, wherein a synonym or a near-synonym tool may be used, through inputting an entity word and selecting the number of near-synonyms required for the entity word (such as TOP 50), then a lookup may be performed based on semantic similarity, N near-synonyms corresponding to the entity word are output, and then a generalization word suitable for the entity word may be selected from the N near-synonyms, so as to construct a basic entity and its generalization word table (i.e., a plurality of entities and their generalization word sets) for the in-vehicle scene.
For another example, an artificial generalization method may be adopted to generalize the entity word based on the service-related content in the vehicle-mounted scene, where the entity word may be classified, for example, into multiple classifications such as "single verb", "single noun", "compound noun", "bingo structure", and then a specific generalization method may be adopted to generalize the entity word for each classification, for example, for a compound noun, the following generalization methods may be adopted:
1. the compound nouns can be split into single nouns, then the single nouns can be respectively generalized, further the generalized single nouns can be recombined, and whether the combined compound nouns are smooth or not can be judged;
2. generalization can be performed according to business logic, for example, three-dimensional map can be corresponded to with three-dimensional head upward;
3. the intelligibility of each scene, such as "navigation volume" and "map volume", can be generalized.
102, acquiring user actual data, and determining a target generalization word from the user actual data;
in a specific implementation, the target generalization word can be determined from the actual data of the user by obtaining the actual data of the user, so as to perform subsequent entity word generalization on the target generalization word.
In an example, the actual data of the user may be mined online, such as collecting real Query (voice interaction information) online, and then an entity extraction tool may be employed to extract entity words from the actual data of the user, so as to perform subsequent entity word generalization.
Step 103, determining a target entity corresponding to the target generalization word, and adding the target generalization word into a generalization word set corresponding to the target entity, so as to determine a standard word corresponding to the target entity when the target generalization word is recognized by voice.
After the target generalized word is obtained, a target entity corresponding to the target generalized word can be determined, the target generalized word can be added into a generalized word set corresponding to the target entity, and then, for a vehicle-mounted voice interaction scene, a standard word corresponding to the target entity of the target generalized word can be determined when the target generalized word is recognized.
In an example, as shown in fig. 2, a plurality of entities and their generalized word sets (e.g., basic generalized word lists) for the in-vehicle scene may be preset through offline, where there may be a plurality of entities, and the entity words of the plurality of entities may be "trunk," "main seat," "quick charge port," and so on; one or more generalization words can be corresponding to each entity, for example, a 'trunk' can be corresponding to a basic generalization word 'tail box' and a back tail box.
After the entity words are generalized based on the actual data of the user, the target generalized words can be added into the generalized word set corresponding to the target entity, and then the expanded generalized word list mined on line can be obtained, wherein the target entity corresponding to the target generalized words obtained by mining on line, such as a tail gate and a tail gate, can be determined as a trunk, and then the tail gate and the tail gate can be added into the generalized word set as the expanded generalized words of the trunk, and remarks can be made for the expanded generalized words, such as habitual expressions in different regions.
In an embodiment of the present invention, the method may further include the following steps:
receiving and analyzing voice interaction information; when the target generalized word is identified, determining a standard word corresponding to the target entity; and generating vehicle control information aiming at the voice interaction information according to the standard words corresponding to the target entity.
In practical application, the entity word generalization is carried out based on the actual data of the user, the voice interaction information can be received and analyzed in a vehicle-mounted voice interaction scene, the standard word corresponding to the target entity can be determined when the target generalized word is identified, and then the vehicle control information aiming at the voice interaction information can be generated according to the standard word corresponding to the target entity.
In one example, the vehicle control information may include any one of:
interactive instruction information, standard word prompt information and voice reply information.
For the vehicle-mounted scene, the vehicle-mounted scene can include more proprietary entity words in the vehicle-mounted scene, and the received voice interaction information of the user can have various expression modes, such as spoken explanatory expression, expression of spoken similar concepts, and differential expression of entity words in different regions.
Through generalization of entity words, more user habit statements can be mined, so that instructions of online users can be better supported, for example, when the tail gate is opened through voice recognition, a standard word corresponding to the tail gate is determined to be a trunk, and then a vehicle-mounted dialogue system can generate an instruction for opening the trunk so as to control a vehicle; when the right charging port is opened through voice recognition, a user is prompted through a page or voice that a standard word corresponding to the right charging port is the fast charging port; or when the voice is recognized to open the tail gate, the voice can be replied to open the tail gate according with the habit of the user.
In the embodiment of the invention, a plurality of entities and generalization word sets thereof aiming at a vehicle-mounted scene are preset, then user actual data are obtained, a target generalization word is determined from the user actual data, a target entity corresponding to the target generalization word is further determined, and the target generalization word is added into the generalization word set corresponding to the target entity, so that a standard word corresponding to the target entity is determined when the target generalization word is recognized by voice, the generalization of the entity is carried out based on the user actual data, the corresponding target entity is determined according to the target generalization word in the user actual data, and the target generalization word is further added into the generalization word set corresponding to the target entity, so that more user habit statements can be excavated based on the generalization of the entity, the generalization performance of natural language understanding is enhanced, and a test set of a vehicle-mounted dialogue system is expanded.
Referring to fig. 3, a flowchart illustrating steps of another data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 301, presetting a plurality of entities and generalization word sets thereof aiming at the vehicle-mounted scene;
in the process of generalization of the entity words, a plurality of entities and generalization word sets thereof aiming at the vehicle-mounted scene can be preset, and further, the generalization of the entity words can be performed according to the plurality of entities and the generalization word sets thereof.
Step 302, acquiring user actual data, and filtering the user actual data;
in a specific implementation, the actual data of the user may be obtained, and the actual data of the user may be filtered, for example, irrelevant Query (e.g., Query of content such as chatting, navigating to a destination, and playing a song) may be filtered for real Query (voice interaction information) collected on line.
Step 303, extracting generalized words according to the filtered user actual data, and determining target generalized words;
in practical application, generalized word extraction can be performed on the filtered user actual data, and a target generalized word can be determined, so that subsequent entity word generalization can be performed on the target generalized word.
Specifically, a model or rule method may be used to extract the generalization words of the entities in the sentences of the user actual data, for example, by using a model method, a named entity recognition model (NER) may be used to extract the generalization words of the entities; or through a rule method, a regular expression can be used for matching a specific sentence pattern, for example, if the 'opening the door and reducing the volume' can be matched with the 'opening < entry >' rule, the 'opening the door and reducing the volume' can be extracted as the generalization words of the entity.
Step 304, determining similar generalization words aiming at the target generalization word according to the entities and the generalization word set thereof;
after the target generalization word is obtained, the similar generalization word aiming at the target generalization word can be determined according to the plurality of entities and the generalization word set thereof, so as to further determine the target entity corresponding to the target generalization word according to the similar generalization word.
Step 305, determining that the entity corresponding to the similar generalization word is a target entity corresponding to the target generalization word;
after the similar generalization words of the target generalization word are obtained, the entity corresponding to the similar generalization word can be determined to be the target entity corresponding to the target generalization word.
In an example, according to a plurality of entities and a generalization word set thereof, a "tail gate" (i.e., a target generalization word) extracted for actual data of a user may be determined to have the highest matching degree with a "tail box" (i.e., a similar generalization word), and then a "trunk" (i.e., an entity corresponding to the similar generalization word) corresponding to the "tail box" is determined as an entity (i.e., a target entity) for the "tail gate".
Step 306, adding the target generalization word into the generalization word set corresponding to the target entity, so as to determine a standard word corresponding to the target entity when the target generalization word is recognized by voice.
After the target entity corresponding to the target generalization word is obtained, the target generalization word can be added into the generalization word set corresponding to the target entity, and then, for a vehicle-mounted voice interaction scene, the standard word corresponding to the target entity of the target generalization word can be determined when the target generalization word is identified.
In an example, a "trunk" and a "back-end box" may be generalized by an offline manual generalization mode or a similar word tool for a standard word "trunk" corresponding to a target entity, then an entity and its generalization word set (e.g., a basic generalization word table) for the target entity may be constructed, a statement that the user "opens the tail gate" may be obtained based on mining the actual data of the user online, and a "tail gate" (i.e., a target generalization word) may be obtained by using an extraction tool, and then the "tail gate" and the basic generalization word table may be semantically matched, the highest matching degree of the "tail gate" and the "trunk" may be calculated, the "trunk" (i.e., the target entity) corresponding to the "tail gate" may be determined according to the "trunk", and the "tail gate" may be added to the expanded generalization word of the "trunk", so that the "trunk opening" by speech recognition "tail gate" may be "trunk opening", namely, voice recognition can be supported on the line to open the tail gate, and a command for opening the trunk is executed.
The generalization performance of Natural Language Understanding (NLU) is enhanced by generalizing the entity words based on the actual data of the user, so that the vehicle-mounted dialogue system can recognize the meaning of the speech understanding user, and the test set of the vehicle-mounted dialogue system can be expanded to test the performance of the vehicle-mounted dialogue system.
Referring to fig. 4, a flowchart illustrating steps of another data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 401, presetting a plurality of entities and generalization word sets thereof aiming at a vehicle-mounted scene;
in the process of generalization of the entity words, a plurality of entities and generalization word sets thereof aiming at the vehicle-mounted scene can be preset, and further, the generalization of the entity words can be performed according to the plurality of entities and the generalization word sets thereof.
Step 402, acquiring user actual data, and determining a target generalization word from the user actual data;
in a specific implementation, the target generalization word can be determined from the actual data of the user by obtaining the actual data of the user, so as to perform subsequent entity word generalization on the target generalization word.
Step 403, judging whether the target generalization word exists in the plurality of entities and the generalization word set thereof;
in practical application, whether the target generalization word exists in a plurality of entities and the generalization word set thereof can be judged according to the obtained target generalization word.
In an example, a target generalization word extracted from the actual data of the user may be compared with a plurality of entities and a generalization word set thereof (e.g., a basic entity and a generalization word list thereof) according to a plurality of entities and a generalization word set thereof constructed offline, for example, whether character strings are equal may be compared, and whether the target generalization word exists in the plurality of entities and the generalization word set thereof may be determined.
Step 404, when it is determined that the target generalized word does not exist in the plurality of entities and the generalized word set thereof, determining one or more candidate generalized words for the target generalized word according to the plurality of entities and the generalized word set thereof;
in a specific implementation, when it is determined that the target generalized word does not exist in the plurality of entities and the generalized word set thereof, one or more candidate generalized words for the target generalized word may be determined according to the plurality of entities and the generalized word set thereof.
For example, when the character string of the target generalization word is not equal to the character strings of the plurality of entities and the generalization word sets thereof, semantic similarity may be calculated for the target generalization word according to the plurality of entities and the generalization word sets thereof, and then one or more candidate generalization words most similar to the target generalization word may be screened out according to the similarity ranking, and a near sense word list for the target generalization word may be obtained.
Step 405, determining a similar generalization word aiming at the target generalization word from the one or more candidate generalization words;
after obtaining the one or more candidate generalization words, a similar generalization word for the target generalization word may be determined from the one or more candidate generalization words, so as to further determine a target entity corresponding to the target generalization word according to the similar generalization word.
Step 406, determining that the entity corresponding to the similar generalization word is a target entity corresponding to the target generalization word;
after the similar generalization words of the target generalization word are obtained, the entity corresponding to the similar generalization word can be determined to be the target entity corresponding to the target generalization word.
Step 407, adding the target generalization word into the generalization word set corresponding to the target entity, so as to determine a standard word corresponding to the target entity when the target generalization word is recognized by voice.
After the target entity corresponding to the target generalization word is obtained, the target generalization word can be added into the generalization word set corresponding to the target entity, and then, for a vehicle-mounted voice interaction scene, the standard word corresponding to the target entity of the target generalization word can be determined when the target generalization word is identified.
In an example, the target generalization words may have a plurality of words, the word list of the target generalization words may be sorted according to a sequence from high to low in frequency, and then the target entities corresponding to the high-frequency target generalization words may be determined, so that the standard entity words (i.e., standard words) corresponding to the target entities may be obtained, and the target generalization words may be used as generalization words of the standard entity words and added to the plurality of entities and the generalization word set thereof.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 5, a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
a multiple entities and generalization word set presetting module 501, configured to preset multiple entities and generalization word sets thereof for a vehicle-mounted scene;
a target generalization word determination module 502, configured to obtain user actual data and determine a target generalization word from the user actual data;
a target generalization word adding module 503, configured to determine a target entity corresponding to the target generalization word, and add the target generalization word to a generalization word set corresponding to the target entity, so as to determine a standard word corresponding to the target entity when the target generalization word is recognized by voice.
In an embodiment of the present invention, the target generalization word adding module 503 includes:
the similar generalization word determining submodule is used for determining similar generalization words aiming at the target generalization word according to the entities and the generalization word set thereof;
and the target entity determining submodule is used for determining that the entity corresponding to the similar generalization word is the target entity corresponding to the target generalization word.
In an embodiment of the present invention, the similar generalization word determination sub-module includes:
the candidate generalized word determining unit is used for determining one or more candidate generalized words aiming at the target generalized word according to the entities and the generalized word set thereof;
a similar generalization word determination unit, configured to determine a similar generalization word for the target generalization word from the one or more candidate generalization words.
In an embodiment of the present invention, the method further includes:
the target generalized word judging module is used for judging whether the target generalized words exist in the entities and the generalized word set thereof;
a determining module, configured to invoke the target generalized word adding module 503 when it is determined that the target generalized word does not exist in the plurality of entities and the generalized word set thereof.
In an embodiment of the present invention, the target generalization word determining module 502 includes:
the filtering processing module is used for acquiring user actual data and filtering the user actual data;
and the generalized word extraction module is used for extracting generalized words according to the filtered actual user data and determining target generalized words.
In an embodiment of the present invention, the method further includes:
the voice interaction information analysis module is used for receiving and analyzing the voice interaction information;
the standard word determining module is used for determining a standard word corresponding to the target entity when the target generalized word is identified;
and the vehicle control information generation module is used for generating vehicle control information aiming at the voice interaction information according to the standard words corresponding to the target entity.
In an embodiment of the present invention, the vehicle control information includes any one of:
interactive instruction information, standard word prompt information and voice reply information.
In the embodiment of the invention, a plurality of entities and generalization word sets thereof aiming at a vehicle-mounted scene are preset, then user actual data are obtained, a target generalization word is determined from the user actual data, a target entity corresponding to the target generalization word is further determined, and the target generalization word is added into the generalization word set corresponding to the target entity, so that a standard word corresponding to the target entity is determined when the target generalization word is recognized by voice, the generalization of the entity is carried out based on the user actual data, the corresponding target entity is determined according to the target generalization word in the user actual data, and the target generalization word is further added into the generalization word set corresponding to the target entity, so that more user habit statements can be excavated based on the generalization of the entity, the generalization performance of natural language understanding is enhanced, and a test set of a vehicle-mounted dialogue system is expanded.
An embodiment of the present invention also provides a server, which may include a processor, a memory, and a computer program stored on the memory and capable of running on the processor, and when executed by the processor, the computer program implements the method for processing data as above.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the above data processing method.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The method and apparatus for data processing provided above are described in detail, and a specific example is applied herein to illustrate the principles and embodiments of the present invention, and the above description of the embodiment is only used to help understand the method and core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method of data processing, the method comprising:
presetting a plurality of entities and generalization word sets thereof aiming at a vehicle-mounted scene;
acquiring user actual data, and determining a target generalization word from the user actual data;
and determining a target entity corresponding to the target generalization word, and adding the target generalization word into a generalization word set corresponding to the target entity so as to determine a standard word corresponding to the target entity when the target generalization word is recognized by voice.
2. The method of claim 1, wherein the determining a target entity to which the target generalization word corresponds comprises:
determining similar generalization words aiming at the target generalization words according to the entities and the generalization word set thereof;
and determining the entity corresponding to the similar generalization word as a target entity corresponding to the target generalization word.
3. The method of claim 2, wherein determining similar generalization words for the target generalization word from the plurality of entities and their set of generalization words comprises:
determining one or more candidate generalization words aiming at the target generalization word according to the entities and the generalization word set thereof;
determining, from the one or more candidate generalized words, a similar generalized word for the target generalized word.
4. The method according to claim 1, 2 or 3, wherein before the determining a target entity corresponding to the target generalization word and adding the target generalization word to the set of generalization words corresponding to the target entity, further comprising:
judging whether the target generalized word exists in the plurality of entities and the generalized word set thereof;
and when the target generalized word is judged not to exist in the plurality of entities and the generalized word set thereof, executing the step of determining the target entity corresponding to the target generalized word, and adding the target generalized word into the generalized word set corresponding to the target entity.
5. The method of claim 1, wherein the obtaining user actual data and determining a target generalization word from the user actual data comprises:
acquiring user actual data, and filtering the user actual data;
and performing generalized word extraction on the filtered actual data of the user, and determining a target generalized word.
6. The method of claim 1, further comprising:
receiving and analyzing voice interaction information;
when the target generalized word is identified, determining a standard word corresponding to the target entity;
and generating vehicle control information aiming at the voice interaction information according to the standard words corresponding to the target entity.
7. The method of claim 6, wherein the vehicle control information comprises any one of:
interactive instruction information, standard word prompt information and voice reply information.
8. An apparatus for data processing, the apparatus comprising:
the system comprises a plurality of entities and generalization word set presetting modules thereof, a generalization word set presetting module and a generalization word set presetting module, wherein the entities and the generalization word set presetting module are used for presetting a plurality of entities and generalization word sets thereof aiming at a vehicle-mounted scene;
the target generalization word determining module is used for acquiring user actual data and determining a target generalization word from the user actual data;
and the target generalized word adding module is used for determining a target entity corresponding to the target generalized word and adding the target generalized word into a generalized word set corresponding to the target entity so as to determine a standard word corresponding to the target entity when the target generalized word is recognized by voice.
9. A server comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing a method of data processing according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of data processing according to any one of claims 1 to 7.
CN202011403234.2A 2020-12-04 2020-12-04 Data processing method and device Pending CN112527955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011403234.2A CN112527955A (en) 2020-12-04 2020-12-04 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011403234.2A CN112527955A (en) 2020-12-04 2020-12-04 Data processing method and device

Publications (1)

Publication Number Publication Date
CN112527955A true CN112527955A (en) 2021-03-19

Family

ID=74998347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011403234.2A Pending CN112527955A (en) 2020-12-04 2020-12-04 Data processing method and device

Country Status (1)

Country Link
CN (1) CN112527955A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053394A (en) * 2021-04-27 2021-06-29 广州小鹏汽车科技有限公司 Voice processing method, server, voice processing system and storage medium
CN113539259A (en) * 2021-06-29 2021-10-22 广州小鹏汽车科技有限公司 Voice communication method and device based on vehicle
CN114049894A (en) * 2022-01-11 2022-02-15 广州小鹏汽车科技有限公司 Voice interaction method and device, vehicle and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140006373A1 (en) * 2012-06-29 2014-01-02 International Business Machines Corporation Automated subject annotator creation using subject expansion, ontological mining, and natural language processing techniques
US20170011119A1 (en) * 2015-07-06 2017-01-12 Rima Ghannam System for Natural Language Understanding
CN110675870A (en) * 2019-08-30 2020-01-10 深圳绿米联创科技有限公司 Voice recognition method and device, electronic equipment and storage medium
CN110674259A (en) * 2019-09-27 2020-01-10 北京百度网讯科技有限公司 Intention understanding method and device
CN110704391A (en) * 2019-09-23 2020-01-17 车智互联(北京)科技有限公司 Word stock construction method and computing device
CN111400458A (en) * 2018-12-27 2020-07-10 上海智臻智能网络科技股份有限公司 Automatic generalization method and device
CN111798847A (en) * 2020-06-22 2020-10-20 广州小鹏车联网科技有限公司 Voice interaction method, server and computer-readable storage medium
CN112017663A (en) * 2020-08-14 2020-12-01 博泰车联网(南京)有限公司 Voice generalization method and device and computer storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140006373A1 (en) * 2012-06-29 2014-01-02 International Business Machines Corporation Automated subject annotator creation using subject expansion, ontological mining, and natural language processing techniques
US20170011119A1 (en) * 2015-07-06 2017-01-12 Rima Ghannam System for Natural Language Understanding
CN111400458A (en) * 2018-12-27 2020-07-10 上海智臻智能网络科技股份有限公司 Automatic generalization method and device
CN110675870A (en) * 2019-08-30 2020-01-10 深圳绿米联创科技有限公司 Voice recognition method and device, electronic equipment and storage medium
CN110704391A (en) * 2019-09-23 2020-01-17 车智互联(北京)科技有限公司 Word stock construction method and computing device
CN110674259A (en) * 2019-09-27 2020-01-10 北京百度网讯科技有限公司 Intention understanding method and device
CN111798847A (en) * 2020-06-22 2020-10-20 广州小鹏车联网科技有限公司 Voice interaction method, server and computer-readable storage medium
CN112017663A (en) * 2020-08-14 2020-12-01 博泰车联网(南京)有限公司 Voice generalization method and device and computer storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053394A (en) * 2021-04-27 2021-06-29 广州小鹏汽车科技有限公司 Voice processing method, server, voice processing system and storage medium
CN113053394B (en) * 2021-04-27 2024-01-09 广州小鹏汽车科技有限公司 Speech processing method, server, speech processing system, and storage medium
CN113539259A (en) * 2021-06-29 2021-10-22 广州小鹏汽车科技有限公司 Voice communication method and device based on vehicle
CN114049894A (en) * 2022-01-11 2022-02-15 广州小鹏汽车科技有限公司 Voice interaction method and device, vehicle and storage medium

Similar Documents

Publication Publication Date Title
CN108255934B (en) Voice control method and device
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
CN110110062B (en) Machine intelligent question and answer method and device and electronic equipment
CN102549652B (en) Information retrieving apparatus
US9589563B2 (en) Speech recognition of partial proper names by natural language processing
CN112527955A (en) Data processing method and device
KR100772660B1 (en) Dialog management system, and method of managing dialog using example-based dialog modeling technique
EP1349145B1 (en) System and method for providing information using spoken dialogue interface
CN111090727B (en) Language conversion processing method and device and dialect voice interaction system
CN108364650B (en) Device and method for adjusting voice recognition result
WO2003010754A1 (en) Speech input search system
CN109741735B (en) Modeling method, acoustic model acquisition method and acoustic model acquisition device
CN103956169A (en) Speech input method, device and system
EP2643770A2 (en) Text segmentation with multiple granularity levels
CN106570180A (en) Artificial intelligence based voice searching method and device
CN108304424B (en) Text keyword extraction method and text keyword extraction device
CN110232112A (en) Keyword extracting method and device in article
CN104573099A (en) Topic searching method and device
CN109271492A (en) A kind of automatic generation method and system of corpus regular expression
CN108710653B (en) On-demand method, device and system for reading book
CN108763355B (en) User-based intelligent robot interactive data processing system and method
CN111428011A (en) Word recommendation method, device, equipment and storage medium
CN117216214A (en) Question and answer extraction generation method, device, equipment and medium
CN111680514B (en) Information processing and model training method, device, equipment and storage medium
CN116522905A (en) Text error correction method, apparatus, device, readable storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination