CN115273834A

CN115273834A - Translation machine and translation method

Info

Publication number: CN115273834A
Application number: CN202210882818.5A
Authority: CN
Inventors: 漆雨; 郭胜荣
Original assignee: Shenzhen Dongxiang Design Co ltd
Current assignee: Shenzhen Dongxiang Design Co ltd
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-11-01

Abstract

The invention discloses a translation machine and a translation method, wherein the translation machine comprises: the first acquisition unit is used for acquiring first audio data of the infant; a first identification unit for identifying a first key audio feature in the first audio data; the second identification unit is used for converting the first key audio features into first key text contents, searching a target infant book comprising the first key text contents in a pre-acquired book library based on the first key text contents, and identifying a first target sentence matched with the first key text contents in the target infant book; a first translation unit configured to use the first target sentence as a first translation text content of the first audio data; and the playing unit is used for playing second audio data corresponding to the first translation text content. The invention can improve the translation performance of the translation machine.

Description

Translation machine and translation method

Technical Field

The invention belongs to the technical field of translation, and particularly relates to a translation machine and a translation method.

Background

The translation is a scene frequently used in daily life of people, in daily life, people are mainly used for on-site translation, and in some special scenes, electronic equipment is also used for translation. However, currently, all electronic device related translations are translations between languages, for example: chinese is translated into English, which is translated into Chinese. Because only the translation between languages can be carried out, the translation performance of the current translation machine is poor.

Disclosure of Invention

The invention provides a translation machine and a translation method, which can solve the problem that the translation performance of the translation machine is poor.

The present invention provides a translation machine comprising:

the first acquisition unit is used for acquiring first audio data of the infant;

the first identification unit is used for identifying a first key audio feature in the first audio data, wherein the first key audio feature is an audio feature capable of accurately identifying text content, and the first audio data has an audio feature which cannot identify the text content;

the second identification unit is used for converting the first key audio features into first key text contents, searching a target infant book comprising the first key text contents in a pre-acquired book library based on the first key text contents, and identifying a first target sentence matched with the first key text contents in the target infant book;

a first translation unit configured to use the first target sentence as a first translation text content of the first audio data;

and the playing unit is used for playing second audio data corresponding to the first translation text content.

The invention also provides a translation method, which comprises the following steps:

collecting first audio data of a child;

identifying first key audio features in the first audio data, wherein the first key audio features are audio features capable of accurately identifying text content, and the first audio data has audio features incapable of identifying text content;

converting the first key audio features into first key text contents, searching a target baby book comprising the first key text contents in a pre-acquired book library based on the first key text contents, and identifying a first target sentence matched with the first key text contents in the target baby book;

taking the first target sentence as first translation text content of the first audio data;

and playing second audio data corresponding to the first translation text content.

The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in the translation method provided by the present invention.

In an embodiment of the present invention, a translation engine includes: the first acquisition unit is used for acquiring first audio data of the infant; the first identification unit is used for identifying a first key audio feature in the first audio data, wherein the first key audio feature is an audio feature capable of accurately identifying text content, and the first audio data has an audio feature which cannot identify the text content; the second identification unit is used for converting the first key audio features into first key text contents, searching a target infant book comprising the first key text contents in a pre-acquired book library based on the first key text contents, and identifying a first target sentence matched with the first key text contents in the target infant book; a first translation unit configured to use the first target sentence as a first translation text content of the first audio data; and the playing unit is used for playing second audio data corresponding to the first translation text content. According to the embodiment of the invention, the translator can translate the audio data of the children, so that the translation performance of the translator can be improved.

Drawings

FIG. 1 is a schematic structural diagram of a translation machine according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a translator according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a translation machine according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a translation engine according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a translation engine according to an embodiment of the present invention;

fig. 6 is a flowchart illustrating a translation method according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be described below clearly with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of protection of the present application.

Fig. 1 is a structural diagram of a translation machine according to an embodiment of the present invention, and as shown in fig. 1, the structural diagram includes:

the first acquisition unit 101 is used for acquiring first audio data of a child;

the first identification unit 102 is configured to identify a first key audio feature in the first audio data, where the first key audio feature is an audio feature capable of accurately identifying text content, and the first audio data has an audio feature that cannot identify text content;

a second identifying unit 103, configured to convert the first key audio feature into first key text content, search a pre-acquired book library for a target infant book that includes the first key text content based on the first key text content, and identify a first target sentence that is matched with the first key text content in the target infant book;

a first translation unit 104, configured to use the first target sentence as a first translation text content of the first audio data;

and the playing unit 105 is configured to play second audio data corresponding to the first translated text content.

The first audio data may be audio data spoken by a child while telling a book story, for example: when a certain infant reads a book at home and goes to another place, such as returning to the old, the infant wants to speak the read book to the old to hear, but the audio of the infant is not very accurate or the voice speed is too fast, so that the old can not hear clearly, and the audio of the infant can be translated by the translator provided by the invention.

The first key audio feature may be an audio feature that can accurately identify specific text content in the first audio data, and the audio feature is also an audio feature that other people can clearly listen to, so that the text content corresponding to the audio feature can identify which specific book content the infant speaks, and the audio feature that cannot identify the text content can be translated. So that other people can clearly hear the specific content of the infant.

The book library may be a database separately established for the infant, for example: when the infant reads at home, the books read by the infant are recorded in time, and the database is established, so that the books taught by the infant can be accurately and quickly inquired.

According to the embodiment of the invention, the audio data of the infant can be translated, and the translation is performed in the same language, so that the translation performance of the translator can be improved, and the content of the infant narration is translated, so that the possibility that the infant cries because other people cannot understand the content of the infant narration can be avoided, the enthusiasm of the infant for reading is improved, and the problem that the other people cannot clearly listen to the content of the infant narration to cause the infant to lose the enthusiasm of reading is avoided.

In the embodiment of the invention, the translating machine can be a handheld translating machine or a wearable translating machine.

In one embodiment, as shown in fig. 2, the translator further comprises:

the second acquisition unit 106 is configured to acquire third audio data of the infant, where the third audio data is continuous with the first audio data and has a pause interval;

a third identifying unit 107, configured to identify a second key audio feature in the third audio data, where the second key audio feature is an audio feature capable of accurately identifying text content, and the second audio data has an audio feature that cannot identify text content;

the second identification unit 103 is specifically configured to convert the first key audio feature into first key text content, convert the second key audio feature into second key text content, search a target baby book including the first key text content and the second key text content in a pre-obtained book library based on the first key text content and the second key text content, and identify a first target sentence in the target baby book that matches the first key text content, and a second target sentence in the target baby book that matches the second key text content, where the first target sentence and the second target sentence are consecutive sentences in the target baby book;

the first translation unit 104 is specifically configured to use the first target sentence as a first translation text content of the first audio data, and use the second target sentence as a second translation text content of the third audio data;

the playing unit 102 is further configured to play fourth audio data corresponding to the second translated text content.

The pause interval represents pauses between different sentences spoken by the toddler.

The first target sentence in the target baby book, which is matched with the first key text content, is a first target sentence in the target baby book, which includes the first key text content.

The first target sentence in the target baby book, which is matched with the second key text content, is the first target sentence in the target baby book, which includes the second key text content.

Therefore, the first translation text content and the second translation text content can be accurately determined through the continuous sentences of the two target sentences, the possibility that the two sentences are matched with the audio spoken by the infant but are discontinuous can be avoided, and the translation accuracy can be further improved.

In one embodiment, as shown in FIG. 3, the translator further comprises:

a detecting unit 108, configured to detect whether a current environment meets a baby audio translation condition, where the baby audio translation condition includes:

the current position does not belong to a pre-recorded resident position of the infant;

the current environment includes pre-recorded persons other than the child's associated family;

the first acquiring unit 101 is specifically configured to acquire first audio data of a child when the current environment meets the child audio translation condition.

The prerecorded resident position may be the home of the child, and translation is not needed if the child is at home, because when the parent speaks at home, the parent is accompanied by the child to read the book, and therefore, even if the child does not speak clearly, the parent can understand what the child specifically speaks based on the book that the child has read, and translation is not needed in the resident position. While outside the resident position, it may be that the child speaks to others to hear, for example: grandpa's grandpa or other people of old-fashioned family, these personnel do not know which books the infant has read, in addition, also are unclear the mode of speaking to the infant, therefore, the books content that the infant told may have the possibility of hearing indistinct to these personnel, and then need to translate.

The other persons than the family related to the child are persons who have relatively little contact with the child, and the persons do not know which books the child has read, and the speaking mode of the child is not clear, so that the contents of the books spoken by the child may be unclear to the persons, and further, translation is required.

In the embodiment, the translation can be realized only under a specific scene, so that the power consumption of translation can be saved,

In one embodiment, as shown in fig. 4, the translator further comprises:

the collecting unit 109 is configured to collect image information of a child, identify whether the image information indicates that the child is reading a book, extract book feature information or a book name from the image information when the image information indicates that the child is reading the book, search a reading book currently being read by the child in a network based on the book feature information or the book name, acquire an electronic book corresponding to the reading book, and store the electronic book in the book repository.

The collecting unit can automatically collect the books currently read by the infants in the process of reading the books by the infants, so that the books currently read by the infants are searched on a network through the extracted characteristic information or the book names, and the contents of the books are recorded in the book library, so that the audio data of the infants can be accurately translated in the translation process.

In an embodiment, the collecting unit is further configured to collect an audio data set of the infant during reading of the book, and a book image corresponding to each audio data in the audio data set in the reading of the book, establish a mapping relationship between the audio data and the book image, and store the mapping relationship in the book library, where a book image corresponding to each audio data in the reading of the book is a current image of the reading of the book displayed by the image information when the infant outputs the audio data;

the second identifying unit 103 is specifically configured to:

converting the first key audio features into first key text contents, searching a target baby book comprising the first key text contents in a pre-obtained book library based on the first key text contents, identifying candidate sentences matched with the first key text contents in the target baby book, and extracting book page images comprising the candidate sentences in the target baby book; matching the book page image with a target book image, and taking the candidate sentence as the first target sentence when the book page image is matched with the target book image; the target book image is book image which is based on historical audio data matched with the first audio data searched in the audio data set and has mapping relation with the historical audio data extracted based on the mapping relation.

The historical audio data matched with the first audio data may be that the audio data spoken by the baby during translation is the same as or similar to the audio data spoken by the baby during reading of the book, that is, the spoken data is the same text content.

The embodiment can establish the mapping relation between the audio data and the book image when the infant reads the book, so that during translation, the candidate sentences in the searched target infant book and the corresponding image contents in the target book can be matched with the pre-recorded book image contents, and when the candidate sentences are matched, the image contents of the candidate sentences are the same as the book image of the audio contents spoken by the infant during reading, so that the candidate sentences can be determined to be the first target sentences, and the accuracy of translation can be improved.

In one embodiment, as shown in fig. 5, the translator further comprises:

a third acquiring unit 110, configured to acquire fifth audio data of the infant, where the fifth audio data includes M pieces of audio data before the first audio data and also includes N pieces of audio data after the first audio data, and M and N are positive integers;

a fourth identifying unit 111, configured to identify a speech rate of the fifth audio data and the first audio data, and compare the speech rate with a preset speech rate threshold;

a second translation unit 112, configured to extract M key audio features of the M pieces of audio data, convert the M key audio features into M key text contents, identify M target sentences in the target infant book that are matched with the M key text contents based on the M key text contents, and use the M target sentences as translation text contents of the M pieces of audio data; when the speech rate reaches a preset speech rate threshold value, identifying N target sentences continuous with the first target sentence in the target infant book, and taking the N target sentences as translation text contents of the N pieces of audio data;

the playing unit 105 is further configured to play audio data corresponding to the translated text content of the M pieces of audio data, and play audio data corresponding to the translated text content of the N pieces of audio data.

The fact that the speech rate reaches the preset speech rate threshold value can be understood that the contents of the books spoken by the infants are relatively fast, and the relatively fast speech rate often indicates that the infants are particularly familiar with the contents of the books, that is, the contents of the books currently spoken by the infants are accurate, and some people may not hear the contents clearly only because the pronunciation of the books is not standard.

In the embodiment, when the speech rate reaches the preset speech rate threshold value, the following sentences do not need to be sentence by sentence, because the infant explains faster and the content of the explanation is accurate, the sentences of the following audio data in the book can be found directly based on the translated preceding sentences, so that the translation calculation amount can be saved, and the translation efficiency can be improved.

In this embodiment, sentence-by-sentence translation may be performed on the N audio data when the speech rate does not reach the preset speech rate threshold.

Fig. 6 is a flowchart of a translation method provided in an embodiment of the present invention, and as shown in fig. 6, the method includes:

601. collecting first audio data of a child;

602. identifying first key audio features in the first audio data, wherein the first key audio features are audio features capable of accurately identifying text content, and the first audio data has audio features incapable of identifying text content;

603. converting the first key audio features into first key text contents, searching a target baby book comprising the first key text contents in a pre-acquired book library based on the first key text contents, and identifying a first target sentence matched with the first key text contents in the target baby book;

604. taking the first target sentence as first translation text content of the first audio data;

605. and playing second audio data corresponding to the first translation text content.

Optionally, the method further includes:

acquiring third audio data of the infant, wherein the third audio data is continuous with the first audio data and has pause intervals;

identifying second key audio features in the third audio data, wherein the second key audio features are audio features capable of accurately identifying text content, and the second audio data has audio features incapable of identifying text content;

the converting the first key audio feature into first key text content, searching a target baby book including the first key text content in a pre-acquired book library based on the first key text content, and identifying a first target sentence matched with the first key text content in the target baby book includes:

converting the first key audio features into first key text contents, converting the second key audio features into second key text contents, searching a target baby book comprising the first key text contents and the second key text contents in a pre-obtained book library based on the first key text contents and the second key text contents, and identifying a first target sentence matched with the first key text contents in the target baby book and a second target sentence matched with the second key text contents in the target baby book, wherein the first target sentence and the second target sentence are continuous sentences in the target baby book;

the taking the first target sentence as the first translated text content of the first audio data includes:

and taking the first target sentence as the first translation text content of the first audio data, and taking the second target sentence as the second translation text content of the third audio data.

Optionally, the method further includes:

detecting whether a current environment meets infant audio translation conditions, wherein the infant audio translation conditions include:

the current environment includes prerecorded persons other than the child's associated family;

the capturing first audio data of a baby comprises:

when the current environment meets the infant audio translation condition, acquiring first audio data of an infant.

Optionally, the method further includes:

the method comprises the steps of collecting image information of a child, identifying whether the image information shows that the child reads books, extracting book characteristic information or book names from the image information when the image information shows that the child reads the books, searching the read books currently read by the child in a network based on the book characteristic information or the book names, acquiring electronic books corresponding to the read books, and storing the electronic books into a book library.

Optionally, the method further includes:

collecting an audio data set of the infant in the process of reading the book and a book image corresponding to each audio data in the audio data set in the book to be read, establishing a mapping relation between the audio data and the book image, and storing the mapping relation in the book library, wherein the book image corresponding to each audio data in the book to be read is a current image of the book to be read, which is displayed by the image information when the infant outputs the audio data;

converting the first key audio features into first key text contents, searching a target baby book comprising the first key text contents in a pre-acquired book library based on the first key text contents, identifying candidate sentences matched with the first key text contents in the target baby book, and extracting book page images comprising the candidate sentences in the target baby book; matching the book page image with a target book image, and taking the candidate sentence as the first target sentence when the book page image is matched with the target book image; the target book image is book image which is based on historical audio data matched with the first audio data searched in the audio data set and has mapping relation with the historical audio data extracted based on the mapping relation.

Optionally, the method further includes:

acquiring fifth audio data of the infant, wherein the fifth audio data comprises M pieces of audio data before the first audio data and also comprises N pieces of audio data after the first audio data, and M and N are positive integers;

recognizing the speech rate of the fifth audio data and the first audio data, and comparing the speech rate with a preset speech rate threshold value;

extracting M key audio features of the M pieces of audio data, converting the M key audio features into M key text contents, identifying M target sentences matched with the M key text contents in the target infant book based on the M key text contents, and taking the M target sentences as translation text contents of the M pieces of audio data; when the speech rate reaches a preset speech rate threshold value, identifying N target sentences continuous with the first target sentence in the target infant book, and taking the N target sentences as translation text contents of the N pieces of audio data;

the playing of the second audio data corresponding to the first translation text content includes:

and playing audio data corresponding to the translated text contents of the M pieces of audio data, and playing audio data corresponding to the translated text contents of the N pieces of audio data.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatuses in the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions recited, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the present embodiments are not limited to those precise embodiments, which are intended to be illustrative rather than restrictive, and that various changes and modifications may be effected therein by one skilled in the art without departing from the scope of the appended claims.

Claims

1. A translator, comprising:

2. The translator of claim 1, further comprising:

the second acquisition unit is used for acquiring third audio data of the infant, wherein the third audio data is continuous with the first audio data and has pause intervals;

the third identification unit is used for identifying a second key audio feature in the third audio data, wherein the second key audio feature is an audio feature capable of accurately identifying text content, and the second audio data has an audio feature which cannot identify the text content;

the second identification unit is specifically configured to convert the first key audio feature into first key text content, convert the second key audio feature into second key text content, search a target baby book including the first key text content and the second key text content in a pre-obtained book library based on the first key text content and the second key text content, and identify a first target sentence in the target baby book that is matched with the first key text content, and a second target sentence in the target baby book that is matched with the second key text content, where the first target sentence and the second target sentence are consecutive sentences in the target baby book;

the first translation unit is specifically configured to use the first target sentence as a first translation text content of the first audio data, and use the second target sentence as a second translation text content of the third audio data;

the playing unit is further configured to play fourth audio data corresponding to the second translated text content.

3. The translator of claim 1, further comprising:

the detection unit is used for detecting whether the current environment meets infant audio translation conditions, wherein the infant audio translation conditions comprise:

the first acquisition unit is specifically configured to acquire first audio data of a child when the current environment meets the child audio translation condition.

4. The translator of claim 1, further comprising:

the collecting unit is used for collecting image information of a child, identifying whether the image information represents that the child reads a book, extracting book feature information or a book name from the image information when the image information represents that the child reads the book, searching the read book currently read by the child in a network based on the book feature information or the book name, acquiring an electronic book corresponding to the read book, and storing the electronic book in the book library.

5. The translator according to claim 4, wherein the collecting unit is further configured to collect an audio data set of the child during the reading of the book, and a book image corresponding to each audio data in the audio data set in the reading of the book, and establish a mapping relationship between the audio data and the book image, and store the mapping relationship in the book repository, wherein the book image corresponding to each audio data in the reading of the book is a current image of the reading of the book displayed by the image information when the child outputs the audio data;

the second identification unit is specifically configured to:

converting the first key audio features into first key text contents, searching a target baby book comprising the first key text contents in a pre-obtained book library based on the first key text contents, identifying candidate sentences matched with the first key text contents in the target baby book, and extracting book page images comprising the candidate sentences in the target baby book; matching the book page image with a target book image, and taking the candidate sentence as the first target sentence when the book page image is matched with the target book image; the target book image is the book image which is based on the historical audio data matched with the first audio data searched in the audio data set and extracted based on the mapping relation and has the mapping relation with the historical audio data.

6. The translator according to any one of claims 1 to 4, further comprising:

a third acquisition unit, configured to acquire fifth audio data of the infant, where the fifth audio data includes M pieces of audio data before the first audio data and also includes N pieces of audio data after the first audio data, and M and N are positive integers;

a fourth identification unit, configured to identify a speech rate of the fifth audio data and the first audio data, and compare the speech rate with a preset speech rate threshold;

the second translation unit is used for extracting M key audio features of the M pieces of audio data, converting the M key audio features into M key text contents, identifying M target sentences matched with the M key text contents in the target infant book based on the M key text contents, and taking the M target sentences as the translated text contents of the M pieces of audio data; when the speech rate reaches a preset speech rate threshold value, identifying N target sentences continuous with the first target sentence in the target infant book, and taking the N target sentences as translation text contents of the N pieces of audio data;

the playing unit is further configured to play audio data corresponding to the translated text content of the M pieces of audio data, and play audio data corresponding to the translated text content of the N pieces of audio data.

7. A method of translation, comprising:

collecting first audio data of a child;

8. The method of claim 7, further comprising:

converting the first key audio features into first key text contents, converting the second key audio features into second key text contents, searching a target baby book comprising the first key text contents and the second key text contents in a pre-acquired book library based on the first key text contents and the second key text contents, and identifying a first target sentence matched with the first key text contents in the target baby book and a second target sentence matched with the second key text contents in the target baby book, wherein the first target sentence and the second target sentence are continuous sentences in the target baby book;

9. The method of claim 7, further comprising:

detecting whether a current environment meets infant audio translation conditions, wherein the infant audio translation conditions comprise:

the acquiring first audio data of a child comprises:

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the translation method according to any one of claims 7 to 9.