CN111291154B - Dialect sample data extraction method, device and equipment and storage medium - Google Patents

Dialect sample data extraction method, device and equipment and storage medium Download PDF

Info

Publication number
CN111291154B
CN111291154B CN202010054280.XA CN202010054280A CN111291154B CN 111291154 B CN111291154 B CN 111291154B CN 202010054280 A CN202010054280 A CN 202010054280A CN 111291154 B CN111291154 B CN 111291154B
Authority
CN
China
Prior art keywords
dialect
group
data
city
medical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010054280.XA
Other languages
Chinese (zh)
Other versions
CN111291154A (en
Inventor
陈鑫
肖龙源
蔡振华
李稀敏
刘晓葳
谭玉坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202010054280.XA priority Critical patent/CN111291154B/en
Publication of CN111291154A publication Critical patent/CN111291154A/en
Application granted granted Critical
Publication of CN111291154B publication Critical patent/CN111291154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a dialect sample data extraction method, which comprises the following steps: acquiring a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, wherein one dialect area corresponds to one city; classifying the dialect regions with the same first dialect into the same dialect group, and obtaining a plurality of dialect groups; sorting each dialect group according to the city data corresponding to each dialect region, and determining a target dialect region of each dialect group from each sorted dialect group; acquiring medical and American dialogue data of a city corresponding to a target dialect area of each dialect group; and taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data. Therefore, the invention theoretically needs to cover all official language areas in the selection of machine learning data, so that the generalization capability of the model can be enhanced.

Description

Dialect sample data extraction method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for extracting dialect sample data.
Background
In the field of natural language processing, task-based conversational robotics, dialects are often the place of headache, since china has roughly 7 official regions, and dialects are hundreds of thousands. The mainstream natural language processing is in chinese, that is, in the unofficial region, there may be a case where a task recognition error, such as a graph recognition, exists. For example, asking prices by people in most places says: "how much money" and some dialect areas say: "more money". The dialect data is lacking in the training sample data, so that the dialect comparison robot cannot recognize the dialect, and the dialogue robot cannot answer the user accurately.
Disclosure of Invention
The invention provides a dialect sample data extraction method, a dialect sample data extraction device, dialect sample data extraction equipment and a computer readable storage medium, and mainly aims to extract sample data for all dialect areas and improve the universality of machine learning algorithm data.
In order to achieve the above object, the present invention further provides a dialect sample data extraction method applied to an electronic device, where the method includes:
acquiring a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, wherein one dialect area corresponds to one city;
classifying the dialect regions with the same first dialect into the same dialect group, and obtaining a plurality of dialect groups;
sorting each dialect group according to the city data corresponding to each dialect region, and determining a target dialect region of each dialect group from each sorted dialect group;
acquiring medical and American dialogue data of a city corresponding to a target dialect area of each dialect group;
and taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data.
Preferably, the city data includes GDP data, and the sorting each dialect group according to the city data corresponding to each dialect group, and determining the target dialect group of each dialect group from each sorted dialect group includes:
and sorting the dialect areas in each dialect group in a descending order according to GDP data of the city corresponding to each dialect area, and taking the dialect area which is arranged at the top and is preset with N bits as a target dialect area of each dialect group.
Preferably, the city data includes medical and aesthetic consumption data, the sorting of each dialect group according to the city data corresponding to each dialect group, and the determining of the target dialect group of each dialect group from each sorted dialect group includes:
and according to the medical and American consumption data of the city corresponding to each dialect area, sequencing the dialect areas in each dialect group in a descending order, and taking the dialect area with preset N positions in the front as the target dialect area of each dialect group.
Preferably, the acquiring medical and American conversation data of a city corresponding to the target dialect area of each dialect group includes:
and acquiring medical and American dialogue data stored in a server of medical and American representative organization of a city corresponding to the target dialect area of each dialect group.
Preferably, the medical and aesthetic representative tissue comprises one or more of the following combinations: a medical and beauty representative mechanism and a medical and beauty hospital.
Preferably, the method further comprises:
acquiring a second dialect of each dialect area;
acquiring medical and American dialogue data of a city corresponding to the target dialect area of each dialect group by using the second dialect;
and using the acquired medical and American dialogue data which correspond to each dialect group and utilize the second dialect as dialect sample data.
Preferably, the method further comprises:
counting dialect sample data size of each dialect group;
and if the dialect sample data size of one target dialect group is lower than the data size threshold, increasing the dialect sample data size of the target dialect group.
To achieve the above object, the present invention further provides an electronic device, including a memory and a processor, where the memory stores a dialect sample data extraction program executable on the processor, and the dialect sample data extraction program, when executed by the processor, implements the following steps:
acquiring a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, wherein one dialect area corresponds to one city;
classifying the dialect areas with the same first dialect into the same dialect group to obtain a plurality of dialect groups;
sorting each dialect group according to the city data corresponding to each dialect region, and determining a target dialect region of each dialect group from each sorted dialect group;
acquiring medical and American dialogue data of a city corresponding to a target dialect area of each dialect group;
and taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data.
In order to achieve the above object, the present invention further provides an electronic device, comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, and one dialect area corresponds to one city;
the classification module is used for classifying the dialect regions with the same first dialect into the same dialect group and obtaining a plurality of dialect groups;
the determining module is used for sequencing each dialect group according to the city data corresponding to each dialect region and determining a target dialect region of each dialect group from each sequenced dialect group;
the acquisition module is further used for acquiring medical and American dialogue data of a city corresponding to the target dialect area of each dialect group;
the determining module is further used for taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data.
Furthermore, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a dialect sample data extraction program, which is executable by one or more processors to implement the steps of the dialect sample data extraction method as described above.
The method comprises the steps of obtaining a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, wherein one dialect area corresponds to one city; classifying the dialect regions with the same first dialect into the same dialect group, and obtaining a plurality of dialect groups; sorting each dialect group according to the city data corresponding to each dialect region, and determining a target dialect region of each dialect group from each sorted dialect group; acquiring medical and American dialogue data of a city corresponding to a target dialect area of each dialect group; and taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data. The method mainly solves the problem that the generalization of the model is too poor due to insufficient characteristics of machine learning training data. Therefore, in the machine learning data selection, theoretically, the data needs to cover all official speaking areas, so that the generalization capability of the model can be enhanced.
Drawings
Fig. 1 is a schematic flow chart illustrating a dialect sample data extraction method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of city data corresponding to a plurality of dialect areas according to the present invention;
fig. 3 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present invention;
fig. 4 is a block diagram illustrating a dialect sample data extraction procedure according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
An embodiment of the present invention provides a dialect sample data extraction method, which is applied to electronic devices, including, but not limited to, medical and American robots, terminals, electronic devices, and the like. The method comprises the steps that electronic equipment obtains a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, wherein one dialect area corresponds to one city; classifying the dialect regions with the same first dialect into the same dialect group, and obtaining a plurality of dialect groups; sorting each dialect group according to the city data corresponding to each dialect region, and determining a target dialect region of each dialect group from each sorted dialect group; acquiring medical and American dialogue data of a city corresponding to a target dialect area of each dialect group; and taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data. The method mainly solves the problem that the generalization of the model is too poor due to insufficient characteristics of machine learning training data. Therefore, in the machine learning data selection, theoretically, the data needs to cover all official speaking areas, so that the generalization capability of the model can be enhanced.
The present invention will be described in detail with reference to examples.
The invention provides a dialect sample data extraction method. Referring to fig. 1, a schematic flow chart of a dialect sample data extraction method according to an embodiment of the present invention is shown, where the schematic flow chart is applied to an electronic device. The method may be performed by an electronic device, which may be implemented by software and/or hardware. The method for extracting sample data in the present embodiment is not limited to the steps shown in the flowchart, and in addition, some steps may be omitted and the order between the steps may be changed in the steps shown in the flowchart.
In this embodiment, the dialect sample data extraction method is applied to an electronic device, and includes:
s10, acquiring a first dialect of the dialect areas and city data corresponding to each dialect area in the dialect areas.
In this embodiment, one dialect region corresponds to one city, for example, as shown in fig. 2, five dialect regions are included, each dialect region corresponds to one city, and the city corresponding to the first jiangxi language region is yellow stone.
In this embodiment, the dialects in the dialects zones can include all dialects in the region of china, wherein the dialects include but are not limited to: gan Jiang Yu, zang Xian, official Xian and Bai Xian, etc. When a plurality of dialect areas comprise all dialects as far as possible, the dialect types in subsequent samples can be ensured to be more, and the training of the machine can be more accurate.
And S11, classifying the dialect areas with the same first dialect into the same dialect group, and obtaining a plurality of dialect groups.
For example, as shown in fig. 2, the first gan region, the second gan region and the third gan region are both gan, so that the gan region, the second gan region and the third gan region are classified into gan dialect group. For example, the first dialects of the first official language region and the second official language region are north official language regions, so that the first official language region and the second official language region are classified as a north official language group.
And S12, sorting each dialect group according to the city data corresponding to each dialect area, and determining the target dialect area of each dialect group from each sorted dialect group.
In an embodiment of the present invention, the city data includes GDP data, and the sorting each dialect group according to the city data corresponding to each dialect group, and determining the target dialect group of each dialect group from each sorted dialect group includes:
and sorting the dialect areas in each dialect group in a descending order according to GDP data of the city corresponding to each dialect area, and taking the dialect area which is arranged at the top and is preset with N bits as a target dialect area of each dialect group. Through GDP data of the cities, each dialect group can be sorted, the cities with higher consumption capability in each dialect can be obtained, and medical and American dialogue data of the cities with higher consumption capability are richer, so that more sample data can be obtained.
In an embodiment of the present invention, the city data includes medical and cosmetic consumption data, the sorting each dialect group according to the city data corresponding to each dialect group, and the determining the target dialect group of each dialect group from each sorted dialect group includes:
and according to the medical and American consumption data of the city corresponding to each dialect area, sequencing the dialect areas in each dialect group in a descending order, and taking the dialect area with preset N positions in the front as the target dialect area of each dialect group. In each dialect group, medical and cosmetic consumption data of each dialect area are directly calculated, and medical and cosmetic consumption levels of cities of each dialect area can be reflected better. The higher the consumption level of medical science and American is, more medical science and American conversation data can be obtained from the dialect area, so that the sample data of the dialect group can be enriched.
And S13, acquiring medical and American dialogue data of a city corresponding to the target dialect area of each dialect group.
In an embodiment of the present invention, the acquiring medical and american conversation data of a city corresponding to the target dialect area of each dialect group includes:
and acquiring medical and American dialogue data stored in a server of medical and American representative organization of a city corresponding to the target dialect area of each dialect group. Wherein the medical and aesthetic representative tissue comprises one or more of the following combinations: a medical and beauty representative mechanism and a medical and beauty hospital.
For example, through the direct communication between the electronic device and the servers of the medical and beauty representative organization, when the sample data amount of one server is accumulated to a certain amount, the sample data amount can be directly transmitted to the electronic device. Therefore, rich dialect, doctor and beauty dialogue data can be acquired more timely.
And S14, using the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data.
In an embodiment of the invention, the method further comprises:
acquiring a second dialect of each dialect area;
acquiring medical and American dialogue data of a city corresponding to the target dialect area of each dialect group by using the second dialect;
and using the acquired medical and American dialogue data which correspond to each dialect group and utilize the second dialect as dialect sample data.
In each dialect area, although the first dialect is the main language of each dialect area, in many occasions, the user also uses the second dialect to carry out the dialogue, so that the medical and American dialogue data using the second dialect of the city corresponding to the target dialect area of each dialect group is increased, the dialect types can be increased, the sample data of the dialect can be more extensive, and the accuracy of model training is improved.
In an embodiment of the present invention, the method further includes:
counting dialect sample data size of each dialect group;
and if the dialect sample data size of one target dialect group is lower than the data size threshold, increasing the dialect sample data size of the target dialect group.
The data volume of each dialect group can be balanced by counting the dialect sample data volume of each dialect group, so that the situation that the data volume of some dialect groups is too small and the contribution to the training of a model is too small, so that the model cannot accurately identify the few dialect groups is avoided, and the balance of the contribution of each dialect group to the model can be ensured by balancing the data volume of each dialect group.
The method comprises the steps of obtaining a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, wherein one dialect area corresponds to one city; classifying the dialect regions with the same first dialect into the same dialect group, and obtaining a plurality of dialect groups; sorting each dialect group according to the city data corresponding to each dialect region, and determining a target dialect region of each dialect group from each sorted dialect group; acquiring medical and American dialogue data of a city corresponding to a target dialect area of each dialect group; and taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data. The method mainly solves the problem that the generalization of the model is too poor due to insufficient characteristics of machine learning training data. Therefore, in the selection of machine learning data, theoretically, the data needs to cover all official language areas, so that the generalization capability of the model can be enhanced.
Fig. 3 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present invention; in the present embodiment, the electronic device 1 comprises at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.
In the present embodiment, the electronic device 1 may be a Personal Computer (PC), or may be a terminal device such as a smartphone, a tablet Computer, a portable Computer, or a robot.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, for example a hard disk of the electronic device 1. The memory 11 may be an external storage device in other embodiments, such as a plug-in hard disk provided on the electronic device 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the dialect sample data extraction program 01, but also to temporarily store data that has been output or is to be output.
The processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, is used for executing program codes or Processing data stored in the memory 11, such as dialect sample data extraction program 01.
The communication bus 13 is used to realize connection communication between these components.
The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface), and is typically used to establish a communication link between the apparatus 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, the user interface may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may further comprise a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying processed information and for displaying a visualized user interface.
Fig. 3 shows only the electronic device 1 with the components 11-14 and the dialect sample data extraction program 01, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may include fewer or more components than shown, or combine certain components, or a different arrangement of components.
In the embodiment of the electronic device 1 shown in fig. 3, a dialect sample data extraction program 01 is stored in the memory 11; the processor 12 executes the dialect sample data extraction program 01 stored in the memory 11 to implement the following steps:
acquiring a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, wherein one dialect area corresponds to one city;
classifying the dialect regions with the same first dialect into the same dialect group, and obtaining a plurality of dialect groups;
sorting each dialect group according to the city data corresponding to each dialect region, and determining a target dialect region of each dialect group from each sorted dialect group;
acquiring medical and American dialogue data of a city corresponding to a target dialect area of each dialect group;
and taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data.
The functions or operation steps implemented when the above steps are executed are substantially the same as those of the above embodiments, and are not described herein again.
Alternatively, in other embodiments, the dialect sample data extracting program may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention.
For example, referring to fig. 4, a schematic diagram of program modules of a dialect sample data extraction program in an embodiment of the electronic device of the present invention is shown, in which the dialect sample data extraction program may be divided into an obtaining module 10, a classifying module 20, and a determining module 30, and exemplarily:
the system comprises an acquisition module 10, a processing module and a display module, wherein the acquisition module is used for acquiring a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, and one dialect area corresponds to one city;
the classification module 20 is configured to classify the dialect regions with the same first dialect into the same dialect group, and obtain a plurality of dialect groups;
a determining module 30, configured to sort each dialect group according to the city data corresponding to each dialect region, and determine a target dialect region of each dialect group from each sorted dialect group;
the obtaining module 10 is further configured to obtain medical and American dialogue data of a city corresponding to the target dialect area of each dialect group;
the determining module 30 is further configured to use the acquired medical and aesthetic dialogue data corresponding to each dialect group as dialect sample data.
The functions or operation steps of the above-mentioned obtaining module 10, classifying module 20 and determining module 30 when executed are substantially the same as those of the above-mentioned embodiments, and are not described herein again.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a dialect sample data extraction program is stored on the computer-readable storage medium, where the dialect sample data extraction program can be executed by one or more processors, and implemented functions or operation steps are substantially the same as those in the foregoing embodiments, and are not described herein again.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, an electronic device, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A dialect sample data extraction method, the method comprising:
acquiring a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, wherein one dialect area corresponds to one city;
classifying the dialect areas with the same first dialect into the same dialect group to obtain a plurality of dialect groups;
sequencing each dialect group according to city data corresponding to each dialect region, and determining a target dialect region of each dialect group from each sequenced dialect group, wherein the city data comprise medical and aesthetic consumption data and GDP data;
acquiring medical and American dialogue data of a city corresponding to a target dialect area of each dialect group, wherein the acquiring medical and American dialogue data of the city corresponding to the target dialect area of each dialect group comprises the following steps:
acquiring medical and American dialogue data stored in a server of medical and American representative organization of a city corresponding to a target dialect area of each dialect group;
taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data;
acquiring a second dialect of each dialect area;
acquiring medical and American dialogue data of a city corresponding to the target dialect area of each dialect group by using the second dialect;
and using the acquired medical and American dialogue data which correspond to each dialect group and utilize the second dialect as dialect sample data.
2. The method of claim 1, wherein the city data comprises GDP data, and the sorting each dialect group according to the city data corresponding to each dialect region and determining the target dialect region of each dialect group from each sorted dialect group comprises:
and sorting the dialect areas in each dialect group in a descending order according to GDP data of the city corresponding to each dialect area, and taking the dialect area which is arranged at the top and is preset with N bits as a target dialect area of each dialect group.
3. The method of claim 1, wherein the city data comprises medical and cosmetic consumption data, and wherein ranking each dialect group according to the city data corresponding to each dialect region and determining a target dialect region for each dialect group from each ranked dialect group comprises:
and according to the medical and American consumption data of the city corresponding to each dialect area, sequencing the dialect areas in each dialect group in a descending order, and taking the dialect area with the preset N positions at the top as the target dialect area of each dialect group.
4. The dialect sample data extraction method of claim 1, wherein the medical and beauty representative organization comprises one or more of the following combinations: a medical and beauty representative mechanism and a medical and beauty hospital.
5. The method of dialect sample data extraction of claim 1, the method further comprising:
counting dialect sample data size of each dialect group;
and if the dialect sample data size of one target dialect group is lower than the data size threshold, increasing the dialect sample data size of the target dialect group.
6. An electronic device for operating the dialect sample data extraction method according to any one of claims 1-5, wherein the electronic device comprises a memory and a processor, the memory having stored thereon a dialect sample data extraction program operable on the processor, the dialect sample data extraction program when executed by the processor implementing the steps of:
acquiring a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, wherein one dialect area corresponds to one city;
classifying the dialect regions with the same first dialect into the same dialect group, and obtaining a plurality of dialect groups;
sequencing each dialect group according to city data corresponding to each dialect region, and determining a target dialect region of each dialect group from each sequenced dialect group, wherein the city data comprise medical and aesthetic consumption data and GDP data;
acquiring medical and American dialogue data of a city corresponding to a target dialect area of each dialect group, wherein the acquiring medical and American dialogue data of the city corresponding to the target dialect area of each dialect group comprises the following steps:
acquiring medical and American dialogue data stored in a server of medical and American representative organization of a city corresponding to a target dialect area of each dialect group;
and taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data.
7. An electronic device for operating the dialect sample data extraction method according to any one of claims 1-5, wherein the electronic device comprises
The system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a first dialect of a plurality of dialect areas and city data corresponding to each dialect area in the plurality of dialect areas, and one dialect area corresponds to one city;
the classification module is used for classifying the dialect regions with the same first dialect into the same dialect group and obtaining a plurality of dialect groups;
the determining module is used for sequencing each dialect group according to the city data corresponding to each dialect group, and determining a target dialect group of each dialect group from each sequenced dialect group, wherein the city data comprise medical and aesthetic consumption data and GDP data;
the obtaining module is further configured to obtain medical and American dialogue data of a city corresponding to the target dialect area of each dialect group, and the obtaining module is specifically configured to:
acquiring medical and American dialogue data stored in a server of medical and American representative organization of a city corresponding to a target dialect area of each dialect group;
the determining module is further used for taking the acquired medical and American dialogue data corresponding to each dialect group as dialect sample data.
8. A computer readable storage medium having stored thereon a dialect sample data extraction program executable by one or more processors for carrying out the steps of the dialect sample data extraction method of any one of claims 1 to 5.
CN202010054280.XA 2020-01-17 2020-01-17 Dialect sample data extraction method, device and equipment and storage medium Active CN111291154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010054280.XA CN111291154B (en) 2020-01-17 2020-01-17 Dialect sample data extraction method, device and equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010054280.XA CN111291154B (en) 2020-01-17 2020-01-17 Dialect sample data extraction method, device and equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111291154A CN111291154A (en) 2020-06-16
CN111291154B true CN111291154B (en) 2022-08-23

Family

ID=71022308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010054280.XA Active CN111291154B (en) 2020-01-17 2020-01-17 Dialect sample data extraction method, device and equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111291154B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107274885A (en) * 2017-05-31 2017-10-20 广东欧珀移动通信有限公司 Audio recognition method and Related product
CN108804424A (en) * 2018-06-08 2018-11-13 广州荔支网络技术有限公司 A kind of training method of language material, device, electronic equipment and storage medium
CN109410914A (en) * 2018-08-28 2019-03-01 江西师范大学 A kind of Jiangxi dialect phonetic and dialect point recognition methods
CN110534104A (en) * 2019-07-03 2019-12-03 平安科技(深圳)有限公司 Voice match method, electronic device, the computer equipment of Intelligent dialogue system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9477652B2 (en) * 2015-02-13 2016-10-25 Facebook, Inc. Machine learning dialect identification

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107274885A (en) * 2017-05-31 2017-10-20 广东欧珀移动通信有限公司 Audio recognition method and Related product
CN108804424A (en) * 2018-06-08 2018-11-13 广州荔支网络技术有限公司 A kind of training method of language material, device, electronic equipment and storage medium
CN109410914A (en) * 2018-08-28 2019-03-01 江西师范大学 A kind of Jiangxi dialect phonetic and dialect point recognition methods
CN110534104A (en) * 2019-07-03 2019-12-03 平安科技(深圳)有限公司 Voice match method, electronic device, the computer equipment of Intelligent dialogue system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕赫."基于DNN的语言识别系统的研究与实现".《中国优秀博硕士学位论文全文数据库(硕士)信息科技》.2018, *

Also Published As

Publication number Publication date
CN111291154A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN107766809B (en) Electronic device, bill information identification method, and computer-readable storage medium
WO2022141861A1 (en) Emotion classification method and apparatus, electronic device, and storage medium
JP6894534B2 (en) Information processing method and terminal, computer storage medium
US20170300862A1 (en) Machine learning algorithm for classifying companies into industries
WO2019056793A1 (en) Device, method, and computer readable storage medium for identifying resume
CN111177349B (en) Question-answer matching method, device, equipment and storage medium
CN106156794B (en) Character recognition method and device based on character style recognition
CN112836521A (en) Question-answer matching method and device, computer equipment and storage medium
CN109213554A (en) A kind of icon layout method, computer readable storage medium and terminal device
CN111400439A (en) Network bad data monitoring method and device and storage medium
CN112992187B (en) Context-based voice emotion detection method, device, equipment and storage medium
CN111291154B (en) Dialect sample data extraction method, device and equipment and storage medium
CN110837559B (en) Statement sample set generation method, electronic device and storage medium
CN112632264A (en) Intelligent question and answer method and device, electronic equipment and storage medium
CN110598995B (en) Smart client rating method, smart client rating device and computer readable storage medium
CN111444235A (en) Django-based data serialization method and device, computer equipment and storage medium
CN116796758A (en) Dialogue interaction method, dialogue interaction device, equipment and storage medium
CN111177387A (en) User list information processing method, electronic device and computer readable storage medium
WO2022222228A1 (en) Method and apparatus for recognizing bad textual information, and electronic device and storage medium
CN112417197B (en) Sorting method, sorting device, machine readable medium and equipment
CN110555304A (en) malicious packet name detection method, malicious application detection method and corresponding devices
CN111444159B (en) Refined data processing method, device, electronic equipment and storage medium
CN114864043A (en) Cognitive training method, device and medium based on VR equipment
CN113723114A (en) Semantic analysis method, device and equipment based on multi-intent recognition and storage medium
CN114817686A (en) Data query method, device, equipment and medium based on search ranking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant