CN116386597A

CN116386597A - Dialect recognition model construction method and device, storage medium and electronic device

Info

Publication number: CN116386597A
Application number: CN202310189393.4A
Authority: CN
Inventors: 田聪
Original assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd; Haier Uplus Intelligent Technology Beijing Co Ltd
Current assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd; Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date: 2023-03-01
Filing date: 2023-03-01
Publication date: 2023-07-04

Abstract

The application discloses a method and a device for constructing a dialect recognition model, a storage medium and an electronic device, and relates to the technical field of smart families, wherein the method for constructing the dialect recognition model comprises the following steps: detecting the regional center of the target dialect region; dividing a target dialect region into a plurality of dialect acquisition regions from inside to outside by taking a region center as a center position; collecting dialect voices expressing target semantics in each dialect collecting interval in a plurality of dialect collecting intervals to obtain a plurality of sets of dialect collecting intervals with corresponding relations, and the target semantics and the dialect voices; by using a plurality of sets of dialect acquisition intervals with corresponding relations, training an initial dialect recognition model by using target semantics and dialect voice to obtain a target dialect recognition model, and adopting the technical scheme, the problems that in the related technology, the dialect recognition model has lower dialect recognition accuracy and the like are solved.

Description

Dialect recognition model construction method and device, storage medium and electronic device

Technical Field

The application relates to the technical field of smart families, in particular to a method and a device for constructing a dialect recognition model, a storage medium and an electronic device.

Background

Speech recognition technology, also known as automatic speech recognition (Automatic Speech Recognition, ASR), aims at converting lexical content in human speech into computer readable inputs, such as keys, binary codes or character sequences. Unlike speaker recognition and speaker verification, the latter attempts to identify or verify the speaker making the speech, not the lexical content contained therein.

At present, the voice recognition technology plays an important role in the wide fields of control of intelligent equipment and the like, such as voice control of intelligent household appliances, so that daily life of people is greatly facilitated, most voice recognition models based on the voice recognition technology can accurately recognize standard mandarin, some dialect recognition models adopt standard dialect corpus in the construction process, for example, the Minnan dialect recognition model of Minnan dialect is trained in the construction process by adopting the standard Minnan dialect, but in reality, the Minnan region is wide, due to factors such as regional obstruction in the dialect propagation process, the Minnan region can derive various sub dialects except the standard Minnan dialect, and differences or sizes among the sub dialects are large or small, and recognition errors often occur in recognition of the sub dialects by using the Minnan dialect recognition model, so that bad user experience is caused.

Aiming at the problems of low accuracy of dialect recognition in the dialect recognition model in the related art, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides a method and a device for constructing a dialect recognition model, a storage medium and an electronic device, so as to at least solve the problems of low dialect recognition accuracy and the like of the dialect recognition model in the related technology.

According to an embodiment of the embodiments of the present application, there is provided a method for constructing a dialect recognition model, including:

detecting a regional center of a target dialect region, wherein a dialect used by the target dialect region is a target dialect to be identified;

dividing the target dialect region into a plurality of dialect acquisition regions from inside to outside by taking the region center as a center position, wherein the plurality of dialect acquisition regions are distributed in a radial manner from inside to outside by taking the region center as a center, and the closer the dialect acquisition region is to the region center, the higher the similarity between the used language and the target dialect is;

collecting dialect voices expressing target semantics in each of a plurality of dialect collecting intervals to obtain a plurality of sets of dialect collecting intervals with corresponding relations, and the target semantics and the dialect voices;

And training an initial dialect recognition model by using a plurality of sets of dialect acquisition intervals with corresponding relations, and obtaining a target dialect recognition model by using target semantics and dialect voice, wherein the target dialect recognition model is used for recognizing the target dialect.

Optionally, the dividing the target dialect region into a plurality of dialect collection regions from inside to outside by using the region center as a center position includes:

acquiring the postal codes of all areas in the target dialect region to obtain a postal code set, wherein the central postal code of the regional center corresponding to the target dialect region is the smallest postal code in the postal code set, and a plurality of postal codes in the postal code set are sequentially arranged from small to large;

dividing a plurality of postal codes included in the postal code set into a plurality of equidistant numerical intervals from small to large in sequence by taking the central postal code as a starting value;

and dividing the region corresponding to one or more postal codes in the same numerical value interval into a dialect acquisition interval to obtain a plurality of dialect acquisition intervals.

Optionally, the collecting dialect voice expressing the target semantics in each of the plurality of dialect collecting intervals obtains a plurality of sets of dialect collecting intervals with corresponding relations, the target semantics and the dialect voice, including:

Transmitting text data expressing the target semantics to terminal equipment positioned in each dialect acquisition interval;

and receiving voice data returned by the terminal equipment in response to the text data as the dialect voice, and obtaining a plurality of sets of dialect collection intervals with corresponding relations, target semantics and the dialect voice.

recognizing voice data expressing the target semantics from a voice library corresponding to each dialect acquisition interval as the dialect voice;

and constructing a plurality of sets of dialect acquisition intervals with corresponding relations, and target semantics and dialect voice.

Optionally, the training the initial dialect recognition model by using a plurality of sets of dialect collection intervals with corresponding relations, the target semantics and the dialect voice, to obtain a target dialect recognition model includes:

constructing the initial dialect recognition model as an initial current dialect recognition model, and repeatedly executing the following steps until the target dialect recognition model is obtained:

Sequentially acquiring the dialect acquisition interval as a current interval according to the arrangement sequence of the arrangement positions of the dialect acquisition interval in the target dialect territory from inside to outside;

training the current dialect recognition model by using target semantics and dialect voice which correspond to the current interval and have a corresponding relation to obtain a first dialect recognition model;

if the current interval is not the last interval in the arrangement sequence, determining the first dialect recognition model as a next current dialect recognition model;

and determining the first dialect recognition model as the target dialect recognition model under the condition that the current interval is the last interval in the arrangement sequence.

acquiring an initial dialect identification model of a target number, wherein the target number is the number of dialect acquisition intervals;

training one initial dialect recognition model by using a set of target semantics and dialect voices with corresponding relations to obtain second dialect recognition models of the target quantity;

Calculating the average value of the model parameters of the second dialect recognition model of the target number to obtain target model parameters;

the initial dialect recognition model with the target model parameters is determined as the target dialect recognition model.

Optionally, the detecting the regional center of the target dialect region includes:

acquiring population density distribution of the target dialect region;

and determining the region with the greatest concentration of people in the target dialect region as the region center.

According to another embodiment of the embodiments of the present application, there is also provided a device for constructing a dialect recognition model, including:

the detection module is used for detecting the regional center of the target dialect region, wherein the dialect used by the target dialect region is the target dialect to be identified;

the dividing module is used for dividing the target dialect region into a plurality of dialect collecting regions from inside to outside by taking the region center as a center position, wherein the plurality of dialect collecting regions are arranged in a radial mode from inside to outside by taking the region center as a center, and the closer the dialect collecting region is to the region center, the higher the similarity between the used language and the target dialect is;

The system comprises an acquisition module, a target semantic and dialect voice acquisition module and a target semantic and dialect voice acquisition module, wherein the acquisition module is used for acquiring dialect voice expressing target semantics in each of a plurality of dialect acquisition intervals to obtain a plurality of sets of dialect acquisition intervals with corresponding relations;

the training module is used for training an initial dialect recognition model by using a plurality of sets of dialect acquisition intervals with corresponding relations, and target semantics and dialect voice to obtain a target dialect recognition model, wherein the target dialect recognition model is used for recognizing the target dialect.

According to yet another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the above-described method of constructing a dialect recognition model when run.

According to still another aspect of the embodiments of the present application, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the method for constructing the dialect recognition model described above through the computer program.

In the embodiment of the application, detecting a regional center of a target dialect region, wherein a dialect used by the target dialect region is a target dialect to be identified; dividing a target dialect region into a plurality of dialect acquisition regions from inside to outside by taking a region center as a center position, wherein the plurality of dialect acquisition regions are arranged in a radial manner from inside to outside by taking the region center as a center, and the closer the dialect acquisition region is to the region center, the higher the similarity between the used language and the target dialect is; collecting dialect voices expressing target semantics in each dialect collecting interval in a plurality of dialect collecting intervals to obtain a plurality of sets of dialect collecting intervals with corresponding relations, and the target semantics and the dialect voices; and the target dialect recognition model is used for recognizing the target dialect, namely before the dialect recognition model corresponding to the target dialect is built, detecting the regional center of a target dialect region using the target dialect, dividing the target dialect region into a plurality of dialect collection regions which are arranged in a radial manner from inside to outside by taking the regional center as a center position, collecting the dialect voice of the target semantic expressed in each dialect collection region in the plurality of dialect collection regions, obtaining a plurality of dialect collection regions with the corresponding relationship, and finally training the initial dialect recognition model by using a plurality of dialect collection regions with the corresponding relationship, thereby obtaining the target dialect recognition model for recognizing the target dialect. The method for constructing the dialect recognition model is different from the dialect voice training dialect recognition model only using the mainstream standard, and fully considers the association relation between the dialects and the regions, namely the dialect collection interval dialects with similar positions have higher voice characteristic similarity, so that the constructed target dialect recognition model can overcome the dialect difference between the regions and accurately recognize the target dialect in the target dialect region. By adopting the technical scheme, the problems of low dialect recognition accuracy of the dialect recognition model and the like in the related technology are solved, and the technical effect of improving the dialect recognition accuracy of the dialect recognition model is realized.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a schematic diagram of a hardware environment of a method of building a dialect recognition model according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of building a dialect recognition model in accordance with an embodiment of the present application;

FIG. 3 is a schematic diagram of a target dialect region according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a partition of a dialect collection interval according to an embodiment of the present application;

fig. 5 is a block diagram of a construction apparatus of a dialect recognition model according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to one aspect of the embodiments of the present application, a method for constructing a dialect recognition model is provided. The method for constructing the dialect recognition model is widely applied to full-house intelligent digital control application scenes such as intelligent Home (Smart Home), intelligent Home equipment ecology, intelligent Home (Intelligence House) ecology and the like. Alternatively, in the present embodiment, fig. 1 is a schematic diagram of a hardware environment of a method for constructing a dialect recognition model according to an embodiment of the present application, where the method for constructing a dialect recognition model may be applied to a hardware environment formed by a terminal device 102 and a server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal device 102 through a network, and may be used to provide services (such as application services and the like) for a terminal or a client installed on the terminal, a database may be set on the server or independent of the server, for providing data storage services for the server 104, and cloud computing and/or edge computing services may be configured on the server or independent of the server, for providing data computing services for the server 104.

The network may include, but is not limited to, at least one of: wired network, wireless network. The wired network may include, but is not limited to, at least one of: a wide area network, a metropolitan area network, a local area network, and the wireless network may include, but is not limited to, at least one of: WIFI (Wireless Fidelity ), bluetooth. The terminal device 102 may not be limited to a PC, a mobile phone, a tablet computer, an intelligent air conditioner, an intelligent smoke machine, an intelligent refrigerator, an intelligent oven, an intelligent cooking range, an intelligent washing machine, an intelligent water heater, an intelligent washing device, an intelligent dish washer, an intelligent projection device, an intelligent television, an intelligent clothes hanger, an intelligent curtain, an intelligent video, an intelligent socket, an intelligent sound box, an intelligent fresh air device, an intelligent kitchen and toilet device, an intelligent bathroom device, an intelligent sweeping robot, an intelligent window cleaning robot, an intelligent mopping robot, an intelligent air purifying device, an intelligent steam box, an intelligent microwave oven, an intelligent kitchen appliance, an intelligent purifier, an intelligent water dispenser, an intelligent door lock, and the like.

In this embodiment, a method for constructing a dialect recognition model is provided and applied to the above-mentioned equipment terminal, and fig. 2 is a flowchart of a method for constructing a dialect recognition model according to an embodiment of the present application, as shown in fig. 2, where the flowchart includes the following steps:

Step S202, detecting a regional center of a target dialect region, wherein a dialect used by the target dialect region is a target dialect to be identified;

step S204, the region center is taken as a center position, the target dialect region is divided into a plurality of dialect collection regions from inside to outside, wherein the plurality of dialect collection regions are arranged in a radial manner from inside to outside by taking the region center as the center, and the closer the dialect collection region is to the region center, the higher the similarity between the used language and the target dialect is;

step S206, collecting dialect voices expressing target semantics in each of a plurality of dialect collecting intervals to obtain a plurality of sets of dialect collecting intervals with corresponding relations, and the target semantics and the dialect voices;

step S208, training an initial dialect recognition model by using a plurality of sets of dialect collection intervals with corresponding relations, and obtaining a target dialect recognition model by using target semantics and dialect voice, wherein the target dialect recognition model is used for recognizing the target dialect.

Through the steps, before a dialect recognition model corresponding to a target dialect is built, detecting the regional center of a target dialect region using the target dialect, dividing the target dialect region into a plurality of dialect collection sections which are arranged in a radial mode from inside to outside by taking the regional center as a center, collecting dialect voices of target semantics in each dialect collection section in the plurality of dialect collection sections to obtain a plurality of dialect collection sections with corresponding relations, and finally training an initial dialect recognition model by using a plurality of dialect collection sections with corresponding relations, wherein the target semantics and the dialect voices are used for obtaining the target dialect recognition model for recognizing the target dialect. The method for constructing the dialect recognition model is different from the dialect voice training dialect recognition model only using the mainstream standard, and fully considers the association relation between the dialects and the regions, namely the dialect collection interval dialects with similar positions have higher voice characteristic similarity, so that the constructed target dialect recognition model can overcome the dialect difference between the regions and accurately recognize the target dialect in the target dialect region. By adopting the technical scheme, the problems of low dialect recognition accuracy of the dialect recognition model and the like in the related technology are solved, and the technical effect of improving the dialect recognition accuracy of the dialect recognition model is realized.

In the solution provided in step S202, the target dialect region is a dialect region using the target dialect, the size of the target dialect region is not limited, and may be, but not limited to, a provincial region, a municipal region, a county region, etc., fig. 3 is a schematic diagram of the target dialect region according to an embodiment of the present application, and as shown in fig. 3, the target dialect region may be, but not limited to, composed of a plurality of sub-regions (region 1 to region N), wherein region 1 is a region center of the target dialect region, and the target dialect is a dialect used by the target dialect region.

Optionally, in this embodiment, because the characteristics of the dialect voices between different regions are related to the region positions due to factors such as regional blocking during the propagation of the dialects, that is, the characteristics of the dialect voices between regions with a smaller difference between the regions with a smaller region position, that is, the characteristics of the dialect voices between regions with a larger region position, the target dialect used by the target dialect region may derive a plurality of sub-dialects due to the proximity of the region position, for example, the region 1 corresponds to the sub-dialect 1, the region 2 corresponds to the sub-dialect 2, for example, the region N corresponds to the sub-dialect N, wherein the sub-dialect 1, the sub-dialect 2 and the sub-dialect N belong to the target dialect, but, because the distance between the region 1 and the region 2 is smaller than the proximity of the region 1 and the region N, the voice features of the sub-dialect 1 and the sub-dialect 2 are more similar.

In one exemplary embodiment, the geographic center of the target dialect geographic zone may be detected, but is not limited to, by: acquiring population density distribution of the target dialect region; and determining the region with the greatest concentration of people in the target dialect region as the region center.

Alternatively, in this embodiment, as shown in fig. 3, the target dialect region may be, but not limited to, composed of a plurality of sub-regions (region 1 to region N), and the region center of the target dialect region may be detected by first obtaining the population density distribution of the target dialect region, for example, obtaining the population densities of each of the regions 1 to region N, to obtain the population density distribution of the target dialect region, where the population densities may be, but not limited to, characterized by population densities, and obtaining the region with the greatest population density from the population density distribution as the region center of the target dialect region, for example, the region 1 with the greatest population density, so that the region 1 is determined as the region center of the target dialect region.

Optionally, in this embodiment, the region with the greatest population concentration is taken as the regional center of the target dialect region, the corpus collection proportion corresponding to the regional center can be increased, the corpus training dialect recognition model of the regional center can be preferentially used to obtain parameters of the initial stage of the recognition model, the corpus corresponding to the region except the regional center of the target dialect region can be used for continuous training until the parameters converge, and the training strategy can ensure that the constructed dialect recognition model can not only recognize the main sub-dialects of the region around the regional center in the target dialect, but also can recognize all sub-dialects of the region far from the regional center.

Alternatively, in this embodiment, in addition to the above-described manner of determining the regional center corresponding to the target dialect region according to the population distribution of the target dialect region, the region where the regional geometric center of the target dialect region is located may be determined as the regional center of the target dialect region.

In the technical solution provided in step S204, fig. 4 is a schematic diagram of a division manner of a dialect collection interval according to an embodiment of the present application, as shown in fig. 4, a plurality of regions are distributed in a target dialect region, where region 1 is a region center of the target dialect region, a plurality of concentric ring regions may be constructed with region 1 as a center, each ring region corresponds to one dialect collection interval, so that the dialect collection interval 1 to the dialect collection interval N are arranged from inside to outside with the region center as a center, the coverage area A1 and the region A2 of the dialect collection interval 1, the coverage area B1 and the region B2 of the dialect collection interval 2, the coverage area C1 of the dialect collection interval 3, the region C2 and the region C3 of the dialect collection interval are virtual regions of a range, and may be generated according to actual requirements, the plurality of dialect collection intervals may completely cover the target dialect collection region, each ring region may also partially cover the target dialect region, the target dialect region may be divided into a plurality of dialect collection intervals, the dialect collection intervals may be divided into the dialect collection intervals, the ring regions may be divided into two ring regions may be divided into adjacent ring regions may be divided into two ring regions may be increased or may be increased, and the ring diameter may be increased, and the equivalent may not be increased.

In one exemplary embodiment, the target dialect region may be divided into a plurality of dialect collection intervals from inside to outside by, but not limited to, centering the region center on: acquiring the postal codes of all areas in the target dialect region to obtain a postal code set, wherein the central postal code of the regional center corresponding to the target dialect region is the smallest postal code in the postal code set, and a plurality of postal codes in the postal code set are sequentially arranged from small to large; dividing a plurality of postal codes included in the postal code set into a plurality of equidistant numerical intervals from small to large in sequence by taking the central postal code as a starting value; and dividing the region corresponding to one or more postal codes in the same numerical value interval into a dialect acquisition interval to obtain a plurality of dialect acquisition intervals.

Alternatively, in this embodiment, in addition to the above-mentioned division by using a ring, the target dialect region may be divided into a plurality of dialect collection regions from inside to outside by using a zip code, where each region may be indicated by a unique zip code, the set of zip codes corresponding to the target dialect region includes a plurality of zip codes (zip codes Y1 to Yn), the region corresponding to the smallest zip code in the set of zip codes may be determined as the region center corresponding to the target dialect region, and the plurality of zip codes included in the set of zip codes may be divided into a plurality of equidistant numerical intervals from small to large in order with the smallest zip code as a starting value, for example, at intervals of 10, the zip codes Y1 to Yn may be divided into numerical intervals 1 (Y1 to Y10), the numerical intervals 2 (Y11 to Y20), … …, the numerical intervals n (Yn-10 to Yn), and so on, and the plurality of zip codes within the same numerical interval may be collected.

Optionally, in this embodiment, the dividing magnitudes of the numerical intervals may be equidistant or non-equidistant, and the magnitude values may be adjusted and set according to actual requirements.

In the technical solution provided in step S206, the target dialect region is divided into a plurality of dialect collection regions, wherein the plurality of dialect collection regions are arranged in a radial manner from inside to outside with the region center as the center, and dialect voices expressing the target semantics in each region are collected respectively to obtain a plurality of sets of dialect collection regions with corresponding relations, and the target semantics and the dialect voices.

In one exemplary embodiment, a plurality of sets of dialect collection intervals having a correspondence relationship may be obtained by, but not limited to, collecting dialect voices expressing target semantics in each of a plurality of dialect collection intervals, the target semantics and the dialect voices: transmitting text data expressing the target semantics to terminal equipment positioned in each dialect acquisition interval; and receiving voice data returned by the terminal equipment in response to the text data as the dialect voice, and obtaining a plurality of sets of dialect collection intervals with corresponding relations, target semantics and the dialect voice.

Optionally, in this embodiment, the target semantics may, but are not limited to, selecting semantics corresponding to voices with significant characteristics different from other dialects in the target dialects, or selecting semantics corresponding to voices with higher daily use frequency.

In one exemplary embodiment, a plurality of sets of dialect collection intervals having a correspondence relationship may be obtained by, but not limited to, collecting dialect voices expressing target semantics in each of a plurality of dialect collection intervals, the target semantics and the dialect voices: recognizing voice data expressing the target semantics from a voice library corresponding to each dialect acquisition interval as the dialect voice; and constructing a plurality of sets of dialect acquisition intervals with corresponding relations, and target semantics and dialect voice.

Optionally, in this embodiment, voice data used in each of the plurality of dialect collection intervals is stored in the voice library, where the voice data is data authorized by the collector, and dialect voice can be directly identified from the voice library.

In the technical solution provided in step S208, a plurality of sets of dialect collection intervals with corresponding relations are used, and the initial dialect recognition model is trained by using the target semantics and the dialect voice, so that the obtained target dialect recognition model not only can recognize the main sub-dialects of the region around the regional center in the target dialect, but also can recognize all the sub-dialects of the region far from the regional center.

In one exemplary embodiment, the initial dialect recognition model may be trained using a plurality of sets of dialect collection intervals with correspondence, target semantics and dialect speech, resulting in a target dialect recognition model, but not limited to, by: constructing the initial dialect recognition model as an initial current dialect recognition model, and repeatedly executing the following steps until the target dialect recognition model is obtained: sequentially acquiring the dialect acquisition interval as a current interval according to the arrangement sequence of the arrangement positions of the dialect acquisition interval in the target dialect territory from inside to outside; training the current dialect recognition model by using target semantics and dialect voice which correspond to the current interval and have a corresponding relation to obtain a first dialect recognition model; if the current interval is not the last interval in the arrangement sequence, determining the first dialect recognition model as a next current dialect recognition model; and determining the first dialect recognition model as the target dialect recognition model under the condition that the current interval is the last interval in the arrangement sequence.

Optionally, in this embodiment, a plurality of sets of dialect collection intervals with correspondence are used, the target semantics and the dialect voices train the initial dialect recognition model, and the target dialect recognition model may be obtained in a echelon training manner, for example, N sets of dialect collection intervals with correspondence are used, the target semantics and the dialect voices train the initial dialect recognition model, and according to the arrangement sequence of the arrangement positions of the dialect collection intervals in the target dialect area from inside to outside, the target semantics and the dialect voices with correspondence train the current dialect recognition model in turn, until N sets of dialect collection intervals with correspondence are used, or model parameters of the current dialect recognition model converge.

In one exemplary embodiment, the initial dialect recognition model may be trained using a plurality of sets of dialect collection intervals with correspondence, target semantics and dialect speech, resulting in a target dialect recognition model, but not limited to, by: acquiring an initial dialect identification model of a target number, wherein the target number is the number of dialect acquisition intervals; training one initial dialect recognition model by using a set of target semantics and dialect voices with corresponding relations to obtain second dialect recognition models of the target quantity; calculating the average value of the model parameters of the second dialect recognition model of the target number to obtain target model parameters; the initial dialect recognition model with the target model parameters is determined as the target dialect recognition model.

Alternatively, in this embodiment, different from the above-mentioned sequential training manner, a set of initial dialect recognition models with target semantics and dialect voices having corresponding relations may be used to train the target number of initial dialect recognition models respectively, to obtain the target number of second dialect recognition models, and finally, model parameters of the target number of second dialect recognition models are calculated, such as mean value, median value, and the like, to obtain target model parameters, and the initial dialect recognition model with the target model parameters is determined as the target dialect recognition model.

Optionally, in this embodiment, during the training process of the initial dialect recognition model, correction and judgment can be performed according to the recognition content, whether the deviation between the original initial result and the original initial result is larger, error correction is performed, the situation that the voice pair and the text are not aligned is processed, and semantic supplementation is performed at the same time, so that the human or machine understandable range is reached, and the recognized result can be ensured to be normally used.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the embodiments of the present application.

FIG. 5 is a block diagram of a construction device of a dialect recognition model according to an embodiment of the present application; as shown in fig. 5, includes:

The detection module 502 is configured to detect a region center of a target dialect region, where a dialect used by the target dialect region is a target dialect to be identified;

the dividing module 504 is configured to divide the target dialect region into a plurality of dialect collection regions from inside to outside by taking the region center as a center position, wherein the plurality of dialect collection regions are arranged in a radial manner from inside to outside by taking the region center as a center, and the closer the dialect collection region is to the region center, the higher the similarity between the language used and the target dialect is;

the collection module 506 is configured to collect dialect voices expressing target semantics in each of a plurality of dialect collection intervals, so as to obtain a plurality of sets of dialect collection intervals with corresponding relations, the target semantics and the dialect voices;

the training module 508 is configured to train the initial dialect recognition model using a plurality of sets of dialect collection intervals having correspondence, and target semantics and dialect speech to obtain a target dialect recognition model, where the target dialect recognition model is used for recognizing the target dialect.

According to the embodiment, before the dialect recognition model corresponding to the target dialect is built, the regional center of the target dialect region using the target dialect is detected, the regional center is taken as the center position, the target dialect region is divided into a plurality of dialect collection sections which are arranged in a radial mode from inside to outside by taking the regional center as the center, dialect voices of target semantics are collected in each dialect collection section in the plurality of dialect collection sections, a plurality of dialect collection sections with corresponding relations are obtained, the target semantics and the dialect voices are finally used, a plurality of dialect collection sections with corresponding relations are used, and the initial dialect recognition model is trained by the target semantics and the dialect voices, so that the target dialect recognition model for recognizing the target dialect is obtained. The method for constructing the dialect recognition model is different from the dialect voice training dialect recognition model only using the mainstream standard, and fully considers the association relation between the dialects and the regions, namely the dialect collection interval dialects with similar positions have higher voice characteristic similarity, so that the constructed target dialect recognition model can overcome the dialect difference between the regions and accurately recognize the target dialect in the target dialect region. By adopting the technical scheme, the problems of low dialect recognition accuracy of the dialect recognition model and the like in the related technology are solved, and the technical effect of improving the dialect recognition accuracy of the dialect recognition model is realized.

In one exemplary embodiment, the partitioning module includes:

the first acquisition unit is used for acquiring the postal codes of all areas in the target dialect area to obtain a postal code set, wherein the central postal code of the regional center corresponding to the target dialect area is the smallest postal code in the postal code set, and a plurality of postal codes in the postal code set are sequentially arranged from small to large;

a first dividing unit, configured to divide, from small to large, a plurality of zip codes included in the zip code set into a plurality of equidistant numerical intervals, respectively, with the central zip code as a start value;

the second dividing unit is used for dividing the areas corresponding to one or more postal codes in the same numerical value interval into a dialect collecting interval to obtain a plurality of dialect collecting intervals.

In one exemplary embodiment, the acquisition module includes:

a sending unit, configured to send text data expressing the target semantics to terminal devices located in each dialect collection interval;

the receiving unit is used for receiving the voice data returned by the terminal equipment in response to the text data as the dialect voice to obtain a plurality of sets of dialect collection intervals with corresponding relations, target semantics and the dialect voice.

In one exemplary embodiment, the acquisition module includes:

the recognition unit is used for recognizing voice data expressing the target semantics from a voice library corresponding to each dialect acquisition interval as the dialect voice;

the first construction unit is used for constructing a plurality of sets of dialect collection intervals with corresponding relations, target semantics and dialect voices.

In one exemplary embodiment, the training module includes:

the second construction unit is used for constructing the initial dialect recognition model as an initial current dialect recognition model, and repeatedly executing the following steps until the target dialect recognition model is obtained:

the second acquisition unit is used for sequentially acquiring the dialect acquisition interval as a current interval according to the arrangement sequence of the arrangement positions of the dialect acquisition interval in the target dialect territory from inside to outside;

the first training unit is used for training the current dialect recognition model by using the target semantics and dialect voice which correspond to the current interval and have a corresponding relation to obtain a first dialect recognition model;

a first determining unit configured to determine the first dialect recognition model as a next current dialect recognition model in a case where the current section is not the last section in the arrangement order;

And the second determining unit is used for determining the first dialect recognition model as the target dialect recognition model when the current interval is the last interval in the arrangement sequence.

In one exemplary embodiment, the training module includes:

a third obtaining unit, configured to obtain the initial dialect identification model of a target number, where the target number is the number of dialect collection intervals;

the second training unit is used for training one initial dialect recognition model by using a set of target semantics and dialect voices with corresponding relations respectively to obtain second dialect recognition models of the target quantity;

the average unit is used for solving an average value of the model parameters of the second dialect recognition model of the target number to obtain target model parameters;

and a third determining unit configured to determine the initial dialect recognition model having the target model parameters as the target dialect recognition model.

In one exemplary embodiment, the detection module includes:

a fourth obtaining unit, configured to obtain population density distribution of the target dialect region;

and the fourth determining unit is used for determining the area with the greatest concentration of people in the target dialect area as the area center.

Embodiments of the present application also provide a storage medium including a stored program, wherein the program performs the method of any one of the above when run.

Alternatively, in the present embodiment, the above-described storage medium may be configured to store program code for performing the steps of:

s1, detecting a regional center of a target dialect region, wherein a dialect used by the target dialect region is a target dialect to be identified;

s2, dividing the target dialect region into a plurality of dialect acquisition regions from inside to outside by taking the region center as a center position, wherein the plurality of dialect acquisition regions are distributed in a radial manner from inside to outside by taking the region center as a center, and the closer the dialect acquisition region is to the region center, the higher the similarity between the used language and the target dialect is;

s3, collecting dialect voices expressing target semantics in each of a plurality of dialect collecting intervals to obtain a plurality of sets of dialect collecting intervals with corresponding relations, and the target semantics and the dialect voices;

s4, training an initial dialect recognition model by using a plurality of sets of dialect collection intervals with corresponding relations, and obtaining a target dialect recognition model by using target semantics and dialect voice, wherein the target dialect recognition model is used for recognizing the target dialect.

Embodiments of the present application also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed across a network of computing devices, or they may alternatively be implemented in program code executable by computing devices, such that they may be stored in a memory device for execution by the computing devices and, in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be implemented as individual integrated circuit modules, or as individual integrated circuit modules. Thus, the present application is not limited to any specific combination of hardware and software.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. The method for constructing the dialect recognition model is characterized by comprising the following steps of:

2. The method of claim 1, wherein the dividing the target dialect region from inside to outside into a plurality of dialect collection intervals with the region center as a center position comprises:

3. The method according to claim 1, wherein the collecting dialect voices expressing target semantics in each of the plurality of dialect collection intervals to obtain a plurality of sets of dialect collection intervals with corresponding relations, the target semantics and the dialect voices, includes:

4. The method according to claim 1, wherein the collecting dialect voices expressing target semantics in each of the plurality of dialect collection intervals to obtain a plurality of sets of dialect collection intervals with corresponding relations, the target semantics and the dialect voices, includes:

5. The method according to claim 1, wherein training the initial dialect recognition model using a plurality of sets of dialect collection intervals having correspondence, the target semantics and the dialect speech, and obtaining the target dialect recognition model includes:

6. The method according to claim 1, wherein training the initial dialect recognition model using a plurality of sets of dialect collection intervals having correspondence, the target semantics and the dialect speech, and obtaining the target dialect recognition model includes:

7. The method according to claim 1, wherein the detecting the geographical center of the target dialect area comprises:

acquiring population density distribution of the target dialect region;

8. A dialect recognition model constructing apparatus, comprising:

9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program when run performs the method of any one of claims 1 to 7.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of claims 1 to 7 by means of the computer program.