WO2021179703A1 - 一种手语翻译方法、装置、计算机设备及存储介质 - Google Patents

一种手语翻译方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021179703A1
WO2021179703A1 PCT/CN2020/134561 CN2020134561W WO2021179703A1 WO 2021179703 A1 WO2021179703 A1 WO 2021179703A1 CN 2020134561 W CN2020134561 W CN 2020134561W WO 2021179703 A1 WO2021179703 A1 WO 2021179703A1
Authority
WO
WIPO (PCT)
Prior art keywords
translation
data
sign language
model group
translation model
Prior art date
Application number
PCT/CN2020/134561
Other languages
English (en)
French (fr)
Inventor
洪振厚
王健宗
瞿晓阳
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021179703A1 publication Critical patent/WO2021179703A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • G06F16/636Filtering based on additional data, e.g. user or group profiles by using biological or physiological data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction

Definitions

  • This application relates to the field of sign language translation, in particular to a sign language translation method, device, computer equipment and storage medium.
  • the conversion of the language into words can realize the communication between the two parties.
  • the sign language translation system in technology ignores the problems of user individual differences and regional differences, leading to misrecognition in the process of using smart sign language translation equipment to recognize the sign language, which interferes with the communication between deaf-mute people and normal people . There may also be inaccurate sign language translation results due to regional differences.
  • the existing sign language translation devices mainly exist: the difference of sign language actions in different regions leads to the problems of low gesture recognition accuracy and insufficient translation accuracy.
  • this application provides a sign language translation method, including: acquiring sign language data carrying regional information sent by a user; selecting a translation model group associated with a preset regional range according to the regional information; wherein, each of the The translation model group is associated with a preset area range, the translation model group includes at least two translation models; the translation model in the translation model group is used to translate the sign language data to obtain the translation data; The data is converted to audio data.
  • the present application also provides a sign language translation device, including: an acquisition unit for acquiring sign language data carrying regional information sent by a user; and a model selection unit for selecting an association with a preset regional range based on the regional information Translation model group; translation unit for translating the sign language data using the translation model in the translation model group to obtain translation data; conversion unit for converting the translation data into audio data.
  • the present application also provides a computer device, the computer device including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor executes the computer program
  • the computer device including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor executes the computer program
  • the computer program When realizing the following method: acquiring the sign language data carrying regional information sent by the user; selecting the translation model group associated with a preset regional range according to the regional information; wherein, each of the translation model groups is associated with a preset regional range, so
  • the translation model group includes at least two translation models; the translation model in the translation model group is used to translate the sign language data to obtain translation data; and the translation data is converted into audio data.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor in the following method: acquiring sign language data carrying regional information sent by a user; Regional information selects a translation model group associated with a preset area range; wherein each of the translation model groups is associated with a preset area range, and the translation model group includes at least two translation models; The translation model translates the sign language data to obtain translation data; and converts the translation data into audio data.
  • This application uses corresponding translation models for deaf-mute persons in different regions to improve the accuracy of translation results; uses the translation model group to translate sign language data and obtain translation data, so as to achieve barrier-free for deaf-mute persons and ordinary people comminicate.
  • Fig. 1 is a flowchart of an embodiment of the sign language translation method described in this application.
  • Fig. 2 is a flowchart of an embodiment of the present application before selecting a translation model group associated with a preset area range according to the area information.
  • Fig. 3 is a flow chart of an embodiment of a method in which the application uses the translation model in the translation model group to translate the sign language data to obtain translation data.
  • Fig. 4 is a flow chart of an embodiment of the application using each translation model in the translation model group to translate the sign language data to obtain semantic probabilities.
  • Fig. 5 is a block diagram of an embodiment of the sign language translation device described in this application.
  • FIG. 6 is a schematic diagram of the hardware architecture of an embodiment of the computer device described in this application.
  • the technical solution of this application can be applied to the fields of artificial intelligence, blockchain and/or big data technology to realize intelligent sign language translation.
  • the data involved in this application such as translation data, audio data, and/or translation model groups, etc.
  • the sign language translation method, device, computer equipment and storage medium provided in this application are suitable for the field of smart medical services.
  • This application can be used to translate the sign language of deaf-mute persons in different regions.
  • the translation model group associated with the regional information is selected according to the regional information, so as to target the deaf-mute persons in different regions.
  • Use the corresponding translation model to improve the accuracy of the translation result; use the translation model group to translate the sign language data and obtain the translation data, so as to achieve barrier-free communication between the deaf-mute and ordinary people.
  • a sign language translation method of this embodiment includes the following steps.
  • the area information is the location information of the user (the hearing impaired), and the location information may include location information and attribution information.
  • the location information may be obtained through a positioning module in a mobile terminal used by the user, and the area information may be information of the user's attribution, and the sign language area used by the user (eg, different countries or regions) can be distinguished according to the location information.
  • the positioning information may be the current location of the user, such as information obtained by positioning according to the positioning module in the smart terminal.
  • the attribution information can be the user's household registration information or the information filled out by the user.
  • the sign language data can capture the bioelectric signal formed by the weak current generated by the muscles at rest or contraction through sensors such as wristbands and armbands. These sensors are made of conductive yarn and can capture the movement of the hand and the position of the corresponding finger. These actions and positions represent the letters, numbers, words and phrases in sign language.
  • the bracelet device can convert finger movements into electrical signals and then send them to the circuit board on the bracelet.
  • the circuit board can wirelessly transmit the signals to the smartphone. High-end mobile terminal equipment generates sign language data.
  • each of the translation model groups is associated with a predetermined area range, and the translation model group includes at least two translation models.
  • step S2 includes the following steps.
  • each translation model group is associated with a preset area range, and different preset area ranges do not overlap with each other, and different preset area ranges correspond to different translation model groups.
  • a database can be used to store the translation model groups.
  • the sign language data in this embodiment carries regional information. When translating sign language data from different regions, the database can be queried according to the regional information corresponding to the sign language data to select a preset region range that matches the regional information. .
  • the corresponding translation model group is determined according to the preset area range, so that the sign language data of the corresponding translation model in the translation model group is used for translation.
  • the area information is the location information of the user (the hearing impaired), and the location information may include location information and attribution information.
  • a translation model that matches Sichuan province is selected from the database according to the regional information to translate the sign language data; when the regional information carried by the sign language data is Jiangsu province At the time, according to the regional information, a translation model that matches Jiangsu province is selected from the database to translate the sign language data, and so on.
  • the translation model is trained, and the trained model group is stored in the database, and the translation model group of the database is performed Update, so that the sign language data from different regions can be better matched with the translation model, so as to obtain higher translation accuracy and more targeted sign language translation results.
  • the translation model group may be obtained by training an initial classification model group, or may be a translation model group trained in advance.
  • step S2 the following steps are included (refer to FIG. 2).
  • A1. Obtain the training sample set associated with the area information and the test sample set associated with the area information.
  • the training sample set is a set of data used to discover and predict potential relationships, including sign language data that has not been semantically marked for sign language actions
  • the test sample set is a set of data used to evaluate the strength and utility of the predicted relationship, including Sign language data for semantic labeling of sign language actions.
  • the marking method can adopt the manual marking method to mark the semantics of the sign language action.
  • users of different genders and ages from different regions can demonstrate actions in different emotional states for the same sign language posture according to the prompts of the mobile phone, and use sensors such as wristbands and armbands to capture muscles at rest or contraction.
  • Generate the bioelectric signal formed by the weak current obtain the sign language data, and translate the obtained sign language data through the initial classification model, and feedback and update the translation results through the feedback mechanism, and generate the corresponding sign language translation library.
  • the actual semantics of the sign language action captured by the sensor is "Where is the convenience store”.
  • the sign language semantics are translated. If the result of the translation is also "Where is the convenience store", no update and feedback will be performed.
  • Generate a corresponding sign language translation library with this semantic If the translation result is not "Where is the convenience store", feedback the wrong translation result and update the initial classification model.
  • the source of training samples can be provided by users in the same area (community).
  • only the training set is available during the training process of the model using the training set.
  • the test set is only available when testing the accuracy of the obtained model.
  • the test set is a set of data that is independent of the training set, but follows the same probability distribution as the data in the training set.
  • test sample set to test each of the initial classification models after training, and if the test result meets the preset requirements, use the trained initial classification model as a translation model.
  • the test result refers to the result of using the initial classification model to translate the sign language data.
  • the semantically tagged sign language data in the test sample set can be used to test the translation accuracy of the initial translation model, for example, the test sample There are 100 sets of sign language data in the set, and 100 sets of sign language data are tested through the initial translation model of sign language. If the accuracy of the test result is greater than or equal to 90%, it is judged that the initial translation model meets the preset requirements and the trained initial classification The model serves as a model for translation.
  • Each translation model group associated with the regional information includes a plurality of the translation models.
  • the translation model selects at least two of the following models: a long and short-term memory model, a gated recurrent unit model, and a sequence-to-sequence model.
  • Long-Short Term Memory is a special recurrent neural network (Recurrent Neural Network, RNN) that can be applied to speech recognition, language modeling, and translation.
  • RNN Recurrent Neural Network
  • the traditional neural network cannot make the temporal correlation of information. For example, when the semantics of the input sign language data is "Hello”, the traditional neural network model trained to perform sign language translation, although the translation result can be "Hello", it has no memory effect and cannot produce specific sign language.
  • the translation library cannot correctly translate the same sign language data in the future.
  • the traditional neural network cannot infer the next event based on the previous judgment event. Therefore, there is a loop in the network structure of the long and short-term memory model, which makes the previous The training information is retained.
  • the traditional recurrent neural network RNN
  • the performance of the long and short-term memory model is better, so it was selected as the translation model.
  • Gated Recurrent Unit is a commonly used gated recurrent neural network. It is a variant of the long and short-term memory model. The gated recurrent unit maintains the effect of the long and short-term memory model while making the structure more Simple and fast processing speed, so it is also very popular, so this model is selected as the training translation model in this program.
  • the sequence-to-sequence model (Sequence-To-Sequenc, Seq2Seq) also has good performance on tasks such as translation and speech recognition. It can handle a series of continuous data such as voice data, text data, and video data. , Which combines two recurrent neural networks. One neural network is responsible for receiving the source sentence, and the other recurrent neural network is responsible for outputting the sentence into the translated language. These two processes are called encoding and decoding processes respectively. Through the process of encoding and decoding, in the process of actually training the translation model, the accumulation of errors can be avoided by using the model.
  • the input sign language data from different regions is translated to obtain accurate translation results, and generate corresponding specific sign language translation libraries to improve the targeting
  • the accuracy of the translation results of sign language data in different regions avoids the problem of the same sign language actions and different sign language voices in different regions.
  • the step S3 may include: S31.
  • Each translation model in the translation model group is used to translate the sign language data to obtain semantic probabilities; in this step, Translating the sign language data refers to using each model in the model group to translate the acquired sign language data of the same sign language semantics, and to obtain the translation results separately.
  • S32. Use the semantic data corresponding to the highest semantic probability among all the semantic probabilities as the translation data.
  • the translation results of the same sign language semantics obtained separately are compared. For example, the semantic probabilities obtained by different translation models are respectively 90%, 92%, and 95%, and then the semantic probability of 95% is selected. The data is used as translation data.
  • the step S31 may include (refer to FIG. 4): S311. Extract the EMG signal in the sign language data, and perform denoising on the EMG signal by means of scoring and averaging, and perform denoising on the EMG signal after denoising.
  • the signal is cut to obtain characteristic data; specifically, in this embodiment, the start point and end point of the EMG signal are determined, the EMG signal is calculated and averaged, and the db12 wavelet is performed on the signal averaged by the algorithm.
  • Transform noise reduction identify whether the signal is within the preset threshold range, if so, the signal is an active segment (if the signal is higher than the initial threshold and lower than the offset threshold, it is regarded as an active segment), and the characteristic data corresponding to the active segment is extracted .
  • the final output is the probability representing the semantics of the current sign language
  • the translation model can be a long-term short-term memory model, a gated recurrent unit model, and a sequence-to-sequence model.
  • the translation model can be a long-term and short-term memory model.
  • the long-term and short-term memory model itself is used for training.
  • the sign language data in the training sample set the characteristics of the current sign language data are judged, and the final output is the probability representing the semantics of the current sign language.
  • TTS Text to speech
  • the step S4 may include: mapping the translation data to a preset sign language speech library to obtain audio data matching the translation data; wherein, the preset sign language speech library includes: translation data and the The audio data associated with the translation data.
  • the audio data is stored through a preset sign language speech library.
  • the audio data can also be stored in a node of a blockchain.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the acquired sign language speech library may be fed back to users in different regions, or stored in the cloud, and downloaded by users in need.
  • the audio data may be data in a sign language speech library, or may be audio data obtained according to translation data.
  • the data audio can be displayed by the pronunciation module of the sign language translation device.
  • the meaning of the text represented by the audio data is "Where is the convenience store?"
  • Use the pronunciation module to read the audio data aloud for the normal communication between the deaf and mute.
  • the step S4 may further include: recognizing the semantic information of the translation data, and converting the semantic information into the audio data by using a voice converter.
  • natural language processing technology can be used to process the semantic information of the translation data.
  • the semantic information of the translation data can be text information, and the text information can be a sentence or Is a word.
  • syntactic and semantic analysis syntactic analysis of the semantic information of the translation data, polysemous word disambiguation, and complete semantic information with high accuracy are obtained.
  • the semantic information of the translation data is "Hello, I would like to ask where is the convenience store thank you”.
  • the voice converter adopts TTS (Text to speech) text-to-speech technology to complete the conversion of translation data to audio data, so as to realize barrier-free communication between deaf-mute persons and ordinary persons.
  • the sign language translation method can be used to translate the sign language of deaf-mute persons in different regions.
  • the translation model group associated with the regional information is selected according to the regional information, Therefore, for the deaf-mute people in different regions, corresponding translation models are used to improve the accuracy of the translation results; the translation model group is used to translate the sign language data and obtain the translation data, so as to realize the freedom of the deaf-mute and ordinary people. Obstacle to communication.
  • a translation apparatus 1 of this embodiment includes: an acquisition unit 11, a model selection unit 12, a translation unit 13, and a conversion unit 14.
  • the acquiring unit 11 is configured to acquire sign language data carrying area information sent by a user.
  • the area information is location information of the user (the hearing impaired), and the location information may include location information and attribution information.
  • the location information may be obtained through a positioning module in a mobile terminal used by the user, and the area information may be information of the user's attribution, and the sign language area used by the user (eg, different countries or regions) can be distinguished according to the location information.
  • the sign language data can use sensors such as wristbands and armbands to capture bioelectric signals formed by weak currents generated by muscles when they are stationary or contracted, and send the signals to mobile terminals such as mobile phones to generate sign language data.
  • the model selection unit 12 is configured to select a translation model group associated with a preset area range according to the area information.
  • the translation model group may be obtained by training an initial classification model group, or may be a translation model group trained in advance.
  • the method further includes: obtaining a training sample set associated with the area information and a test sample set associated with the area information; Each initial classification model in the classification model group is trained separately; the test sample set is used to test each of the initial classification models after training. If the test result meets the preset requirements, the initial The classification model is used as a translation model; each translation model group associated with the regional information includes a plurality of the translation models.
  • the translation unit 13 is configured to use the translation model in the translation model group to translate the sign language data to obtain translation data.
  • the translation model selects the following at least two models: a long and short-term memory model, a gated recurrent unit model, and a sequence-to-sequence model.
  • a long and short-term memory model According to the sign language data from different regions, through the trained translation model, the input sign language data from different regions is translated to obtain accurate translation results, and generate corresponding specific sign language translation libraries to improve the sign language data for different regions The accuracy of the translation result.
  • the conversion unit 14 is used to convert the translation data into audio data.
  • TTS Text To Speech
  • text-to-speech technology can be used to complete the conversion of translation data to audio data.
  • the data and audio can be displayed by the pronunciation module of the sign language translation device, and the audio data can be transferred by the pronunciation module Read it aloud for normal communication between deaf-mute people and Puri people.
  • the translation model group associated with the regional information is selected according to the regional information, so as to target different regions.
  • the deaf-mute people in the area adopt the corresponding translation model to improve the accuracy of the translation result; the translation model group is used to translate the sign language data and obtain the translation data, so as to realize the barrier-free communication between the deaf-mute person and ordinary people.
  • the present application also provides a computer device 2 which includes a plurality of computer devices 2.
  • the components of the sign language translation device 1 of the second embodiment can be dispersed in different computer devices 2.
  • the computer device 2 It can be a smartphone, tablet, laptop, desktop computer, rack server, blade server, tower server, or rack server (including independent servers, or server clusters composed of multiple servers) that execute the program, etc. .
  • the computer device 2 of this embodiment at least includes, but is not limited to: a memory, a processor, and a computer program stored in the memory and capable of running on the processor. When the processor executes the computer program, part or All steps.
  • the computer equipment may also include a network interface and/or a sign language translation device.
  • FIG. 6 only shows the computer device 2 with components, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access Memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 21 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2.
  • the memory 21 may also be an external storage device of the computer device 2, for example, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the computer device 2, such as the program code of the sign language translation method of the first embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 23 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 23 is generally used to control the overall operation of the computer device 2, for example, to perform data interaction or communication-related control and processing with the computer device 2.
  • the processor 23 is used to run the program code or process data stored in the memory 21, for example, to run the sign language translation device 1 and the like.
  • the network interface 22 may include a wireless network interface or a wired network interface, and the network interface 22 is generally used to establish a communication connection between the computer device 2 and other computer devices 2.
  • the network interface 22 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and the external terminal.
  • the network may be an intranet (Intranet), the Internet (Internet), a global system of mobile communication (Global System of Mobile) communication, GSM), Wideband Code Division Multiple Access (Wideband Code Division Multiple Access, WCDMA), 4G network, 5G network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 6 only shows the computer device 2 with components 21-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the sign language translation device 1 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and composed of one or more program modules. Is executed by two processors (in this embodiment, the processor 23) to complete the application.
  • this application also provides a computer-readable storage medium, which includes multiple storage media, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM ), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic storage, magnetic disks, optical disks, servers, App applications
  • a shopping mall, etc. has a computer program stored thereon, and the program is executed by the processor 23 to realize corresponding functions.
  • the computer-readable storage medium of this embodiment is used to store the sign language translation device 1, and when executed by the processor 23, the sign language translation method of the first embodiment is implemented.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physiology (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

一种手语翻译方法、装置、计算机设备及存储介质,涉及手语翻译领域,可对不同区域的聋哑人士的手语进行翻译,通过获取用户发送的携带区域信息的手语数据,根据区域信息选择与所述区域信息关联的翻译模型组,从而针对不同区域的聋哑人士,采用相应的翻译模型,提高了翻译结果的准确性;采用翻译模型组对所述手语数据进行翻译,并获取翻译数据,从而实现聋哑人士和普通人的无障碍交流。

Description

一种手语翻译方法、装置、计算机设备及存储介质
本申请要求于2020年10月20日提交中国专利局、申请号为202011122840.7,发明名称为“一种手语翻译方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及手语翻译领域,尤其涉及一种手语翻译方法、装置、计算机设备及存储介质。
背景技术
随着我国残疾人事业的发展,聋哑人参与社会的需求也在不断增强。近些年,随着语言学,计算机科学,图形图像学,机械精细化等各个学科的相关研究的不断完善,国内外对手语翻译系统的研究也在不断深入,市场上也出现了许多便携式手语语音互译设备,比如手语让不懂手语的人也能与使用手语的残障人士顺畅沟通,方便了听障人士与普通人的日常沟通,这些研究主要集中在基于视觉手语翻译器。
发明人发现,基于视觉手语翻译器的主要工作过程是:通过图像采集设备采集手部关键点动作,进而获得手势数据,再将手语成可视化的文字或通过语音软件将朗读出来,反之将正常人的语言转化为文字就可以实现双方沟通。发明人意识到,现有的视觉手语翻译器虽然将手语识别和手语合成结合起来,实现对手语数据的翻译,然而不同国家或地区采用不同的手语标准,手语的姿势也不是统一的,现有技术中的手语翻译系统忽略用户个体差异性和地区差异性的问题,导致在使用智能手语翻译设备对手语进行识别的过程中出现误识别的情况,干扰了聋哑人与正常人之间的交流。也会存在因为地区差异性导致的手语翻译结果不准确的问题。
综上所述,现有的手语翻译装置主要存在:不同区域的手语动作的差异性,导致手势识别精度低、翻译准确度不够的问题。
技术问题
针对现有的手语翻译设备存在的不同区域的手语动作的差异性,导致手势识别精度低、翻译准确度不够的问题,现提供一种旨在提高针对不同区域的手语翻译结果的精度的基于一种手语翻译方法、装置、计算机设备及存储介质。
技术解决方案
为实现上述目的,本申请提供一种手语翻译方法,包括:获取用户发送的携带区域信息的手语数据;根据所述区域信息选择与预设区域范围关联的翻译模型组;其中,每一所述翻译模型组关联一预设区域范围,所述翻译模型组包括至少两个翻译模型;采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据;将所述翻译数据转换为音频数据。
为实现上述目的,本申请还提供了一种手语翻译装置,包括:获取单元,用于获取用户发送的携带区域信息的手语数据;模型选择单元,根据所述区域信息选择与预设区域范围关联的翻译模型组;翻译单元,用于采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据;转换单元,用于将所述翻译数据转换为音频数据。
为实现上述目的,本申请还提供了一种计算机设备,所述计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下方法:获取用户发送的携带区域信息的手语数据;根据所述区域信息选择与预设区域范围关联的翻译模型组;其中,每一所述翻译模型组关联一预设区域范围,所述翻译模型组包括至少两个翻译模型;采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据;将所述翻译数据转换为音频数据。
为实现上述目的,本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行以下方法:获取用户发送的携带区域信息的手语数据;根据所述区域信息选择与预设区域范围关联的翻译模型组;其中,每一所述翻译模型组关联一预设区域范围,所述翻译模型组包括至少两个翻译模型;采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据;将所述翻译数据转换为音频数据。
有益效果
本申请针对不同区域的聋哑人士,采用相应的翻译模型,提高了翻译结果的准确性;采用翻译模型组对手语数据进行翻译,并获取翻译数据,从而实现聋哑人士和普通人的无障碍交流。
附图说明
图1为本申请所述的手语翻译方法的一种实施例的流程图。
图2为本申请在根据所述区域信息选择与预设区域范围关联的翻译模型组之前一种实施例的方法流程图。
图3为本申请采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据的一种实施例的方法流程图。
图4为本申请采用所述翻译模型组中的每一个翻译模型对所述手语数据进行翻译,以获取语义概率一种实施例的方法流程图。
图5为本申请所述的手语翻译装置的一种实施例的模块图。
图6为本申请所述的计算机设备一实施例的硬件架构示意图。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的技术方案可应用于人工智能、区块链和/或大数据技术领域,以实现智能化手语翻译。可选的,本申请涉及的数据如翻译数据、音频数据和/或翻译模型组等可存储于数据库中,或者可以存储于区块链中,比如通过区块链分布式存储,本申请不做限定。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
本申请提供的一种手语翻译方法、装置、计算机设备及存储介质,适用于智能医疗业务领域。本申请可用于对不同区域的聋哑人士的手语进行翻译,通过获取用户发送的携带区域信息的手语数据,根据区域信息选择与所述区域信息关联的翻译模型组,从而针对不同区域的聋哑人士,采用相应的翻译模型,提高了翻译结果的准确性;采用翻译模型组对所述手语数据进行翻译,并获取翻译数据,从而实现聋哑人士和普通人的无障碍交流。
实施例一。
请参阅图1,本实施例的一种手语翻译方法,包括以下步骤。
S1. 获取用户发送的携带区域信息的手语数据。
在本步骤中,所述区域信息为用户(听障人士)的位置信息,所述位置信息可以包括定位信息、归属地信息。所述位置信息可以通过用户使用的移动终端中的定位模块获取,区域信息可以是用户的归属地的信息,根据该位置信息区分用户使用的手语区域(如:不同的国家或地区)。定位信息可以是用户当前的所在位置,如根据智能终端中的定位模块定位获取的信息。归属地信息可以是用户的户籍地信息,也可以是用户自行填写的信息。
所述手语数据可以通过手环、臂环等传感器捕捉肌肉在静止或收缩时产生微弱电流所形成的生物电信号,这些传感器是由导电纱制成的可以捕捉手的动作和相应手指的位置,而这些动作和位置代表手语中的字母、数字、单词和短语,手环装置可以将手指的运动转化为电信号然后发送到手环上的电路板上,该电路板可以将信号无线传输到智能手机上等移动终端设备,生成了手语数据。
S2. 根据所述区域信息选择与预设区域范围关联的翻译模型组。
其中,每一所述翻译模型组关联一预设区域范围,所述翻译模型组包括至少两个翻译模型。
进一步地,所述步骤S2包括以下步骤。
S21.将所述区域信息与多个预设区域范围进行匹配,以获取与所述区域信息匹配的所述预设区域范围。
在本实施例中,每个翻译模型组关联一预设区域范围,不同的预设区域范围之间不互相重叠,不同的预设区域范围对应不同的翻译模型组,可采用数据库存储翻译模型组。本实施例中的手语数据携带有区域信息,在对来自不同区域的手语数据进行翻译的时候,可根据手语数据对应的区域信息,查询数据库,以选择与该区域信息相匹配的预设区域范围。
S22.获取与所述预设区域范围关联的所述翻译模型组。
在本实施例中,根据该预设区域范围确定相应的翻译模型组,从而采用翻译模型组中相对应的翻译模型对手语数据进行翻译。区域信息为用户(听障人士)的位置信息,所述位置信息可以包括定位信息、归属地信息。
作为举例而非限定,当手语数据携带的区域信息为四川省时,根据该区域信息从数据库中选择与四川省相匹配的翻译模型对手语数据进行翻译;当手语数据携带的区域信息为江苏省时,根据该区域信息从数据库中选择与江苏省相匹配的翻译模型对手语数据进行翻译,依此类推。
需要说明的是:如果存储翻译模型组的数据库中不存在与区域信息相匹配的翻译模型,则对翻译模型进行训练,并将训练好的模型组存储到数据库中,对数据库的翻译模型组进行更新,从而可以对来自不同区域的手语数据有更好的翻译模型的匹配,以获得翻译准确度更高的、针对性更强的手语翻译结果。
其中,所述翻译模型组可以是将初始分类模型组进行训练得到的,也可以是事先训练好的翻译模型组。
进一步地,在所述步骤S2之前还包括(参考图2所示)以下步骤。
A1. 获取与区域信息关联的训练样本集合和与区域信息关联的测试样本集合。
在本步骤中,练样本集合是用于发现和预测潜在关系的一组数据,包括没有进行手语动作语义标记的手语数据,测试样本集合是用于评估预测关系强度和效用的一组数据,包括进行手语动作语义标记的手语数据。
其中,标记方式可以采用人工标记的方式对手语动作语义进行标记。
A2. 采用所述训练样本集合对初始分类模型组中的每一个初始分类模型分别进行训练。
在本步骤中,可以由来自不同区域的多个不同性别及年龄段的用户根据手机提示针对同一手语姿势演示不同情绪状态下的动作,通过手环、臂环等传感器捕捉肌肉在静止或收缩时产生微弱电流所形成的生物电信号,获取手语数据,通过初始分类模型,对获取的手语数据进行翻译,并通过反馈机制,对翻译的结果进行反馈和更新,并生成相应的手语翻译库。例如通过传感器捕捉到的手语动作的实际语义是“便利店在哪里”,通过初始分类模型,对手语语义进行翻译,如果翻译的结果同样是“便利店在哪里”,则不进行更新和反馈,生成相应的存有此语义的手语翻译库,如果翻译的结果不是“便利店在哪里”,则将错误的翻译结果进行反馈,并对初始分类模型进行更新。
其中,在初始分类模型组的训练阶段,训练样本的来源可以由同一区域(社区)的用户提供,同时,在利用训练集合进行模型的训练过程中,只有训练集合可用。测试集合仅在对得到的模型进行准确度测试的时候可用。测试集合是独立于训练集合,但是遵循与训练集合中数据相同的概率分布的一组数据。
A3. 采用所述测试样本集合对训练后的每一个所述初始分类模型进行测试,若测试结果符合预设要求,则将训练后的所述初始分类模型作为翻译模型。
在本步骤中,测试结果是指利用初始分类模型对手语数据进行翻译的结果,可以采用测试样本集合中的进行了语义标记的手语数据对初始翻译模型的翻译准确率进行测试,例如,测试样本集合中有100组手语数据,通过手语初始翻译模型分别对100组手语数据进行测试,若测试结果的准确率大于等于90%,则判断该初始翻译模型符合预设要求,将训练好的初始分类模型作为翻译的模型。
A4. 每一个与所述区域信息关联的翻译模型组包括多个所述翻译模型。
在本实施例中,所述翻译模型选择以下至少两种模型:长短期记忆模型,门控循环单元模型,序列到序列模型。
长短期记忆模型(Long-Short Term Memory)是一种可以应用于语音识别、语言建模、翻译的一种特殊的递归神经网络(Recurrent Neural Network, RNN),在传统的RNN中,在训练过程中,无法体现出RNN的长期记忆的效果,因此需要一个存储单元来存储记忆,因此长短期记忆模型被提出;传统的神经网络做不到信息的时序关联。例如,当输入的手语数据的语义是“你好”,传统的经过训练的可以进行手语翻译的神经网络模型,虽然翻译的结果可以是“你好”,但是没有记忆效果,不能生产特定的手语翻译库,不能对未来同样的手语数据进行正确的翻译,即传统的神经网络做不到根据之前的判断事件来推理得到下一个事件,因此长短期记忆模型的网络结构中存在回环,使得之前的训练信息得以保留,虽然传统的递归神经网络(RNN)也可以解决这一问题,但是长短期记忆模型的性能要更好,因此被选做翻译模型。
门控循环单元(Gated Recurrent Unit,简称GRU)是一种常用的门控循环神经网络,是长短期记忆模型的一个变体,门控循环单元保持了长短期记忆模型的效果同时又使结构更加简单,处理速度快,所以它也非常流行,因此本方案中选择此模型作为训练的翻译模型。
序列到序列模型(Sequence-To-Sequenc,Seq2Seq)在翻译、语音识别等任务上同样具有良好的性能,可以对语音数据、文本数据、视频数据等一系列具有连续关系的数据进行很好的处理,其联合了两个循环神经网络。一个神经网络负责接收源句子,另一个循环神经网络负责将句子输出成翻译的语言。这两个过程分别称为编码和解码的过程,通过编码解码的过程,在实际对翻译模型进行训练的过程中,通过使用该模型可以避免错误累计的情况。
S3.采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据。
在本步骤中,针对来自不同区域的手语数据,通过训练好的翻译模型,对输入的不同来自不同区域的手语数据进行翻译,获取准确的翻译结果,并生成相应特定的手语翻译库,提高针对不同区域的手语数据翻译结果的准确率,避免了不同区域手语动作相同,手语语音不同的问题。
进一步地,参阅图3所示,所述步骤S3可包括:S31. 分别采用所述翻译模型组中的每一个翻译模型对所述手语数据进行翻译,以获取语义概率;在本步骤中,对所述手语数据进行翻译是指对获取的同一手语语义的手语数据,分别采用模型组中的每一个模型,分别进行翻译,并分别获取翻译结果。S32. 将所有所述语义概率中最高的语义概率对应的语义数据作为翻译数据。在本步骤中,是将分别获取的同一手语语义的翻译结果进行对比,例如通过不同的翻译模型分别获取的语义概率分别是90%、92%、95%,则选择语义概率为95%的语义数据作为翻译数据。
进一步地,在所述步骤S31可包括(参考图4所示):S311. 提取所述手语数据中的EMG信号,采用算分平均的方式对所述EMG信号进行降噪,并对降噪后的信号进行切割以获取特征数据;具体地,在本实施例中,确定所述EMG信号的起始点和终点,对所述EMG信号进行算分平均,在对经算法平均后的信号进行db12小波变换降噪;识别信号是否在预设阈值范围内,若是,则该信号为活动段(若信号高于起始阈值且低于偏移阈值则视为活动段),提取活动段对应的特征数据。S312. 将所述特征数据输入所述翻译模型,通过所述翻译模型对所述特征数据进行识别以获取所述语义概率;在本步骤中,在完成通过翻译模型对手语数据特征提取的任务后,最后输出为表示当前手语语义的概率,其中所述翻译模型可以是长短期记忆模型,门控循环单元模型,序列到序列模型。例如翻译模型具体可以是长短期记忆模型,同时,由于长短期记忆模型本身被用来训练根据训练样本集合中的手语数据,判断当前手语数据的特征,最后输出为表示当前手语语义的概率,最后利用这些条件随机模型替换通过传统的特征提取函数获取当前数据的某个特征的概率的步骤,因此也缓解了条件随机场模型对于人工提供的特征提取函数和翻译结果相关性的依赖问题。
S4. 将所述翻译数据转换为音频数据。
在本步骤中,可以采用TTS(Text to speech)文字转语音技术来完成翻译数据到音频数据的转换。
进一步地,所述步骤S4可以包括:将所述翻译数据映射到预设手语语音库,获取与所述翻译数据匹配的音频数据;其中,所述预设手语语音库包括:翻译数据及与所述翻译数据关联的音频数据。
在本实施例中,通过预设手语语音库存储音频数据。
需要强调的是,为进一步保证上述音频数据的私密和安全性,上述音频数据还可以存储于一区块链的节点中。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
在本实施例中,可以将获取的手语语音库反馈给不同区域的用户,或者存在云端,由有需求的用户自行下载。所述音频数据可以是手语语音库里面的数据,也可以是根据翻译数据获得的音频数据。
其中,可将数据音频以手语翻译设备自带的发音模块,通过发音语音的方式展示出来。如音频数据代表的文字含义是“便利店在哪里”,利用发音模块将音频数据朗读出来,供聋哑人士与普瑞人的正常沟通。
进一步地,所述步骤S4还可以包括:识别所述翻译数据的语义信息,采用语音转换器将所述语义信息转换为所述音频数据。
在本实施例中,可以采用自然语言处理技术(简称NLP)对所述翻译数据的语义信息进行处理,所述翻译数据的语义信息可以是文字信息,所述文字信息可以是一句话,也可以是一个词语。通过句法语义分析,对翻译数据的语义信息进行句法分析、多义词消歧,获取准确率高的完整语义信息。例如翻译数据的语义信息是“你好我想请问一下便利店在哪里谢谢你”,通过采用自然语言处理技术进行处理之对翻译数据的语义信息进行处理后,可以获取“你好,我想请问一下,便利店在哪里,谢谢你”的语义信息,该语义信息的表述更清晰。所述语音转换器采用TTS(Text to speech)文字转语音技术,来完成翻译数据到音频数据的转换,以便实现聋哑人士和普通人士的无障碍沟通。
在本实施例中,手语翻译方法可用于对不同区域的聋哑人士的手语进行翻译,通过获取用户发送的携带区域信息的手语数据,根据区域信息选择与所述区域信息关联的翻译模型组,从而针对不同区域的聋哑人士,采用相应的翻译模型,提高了翻译结果的准确性;采用翻译模型组对所述手语数据进行翻译,并获取翻译数据,从而实现聋哑人士和普通人的无障碍交流。
实施例二。
请参阅图5,本实施例的一种翻译装置1,包括:获取单元11、模型选择单元12、翻译单元13、转换单元14。
获取单元11,用于获取用户发送的携带区域信息的手语数据。
所述区域信息为用户(听障人士)的位置信息,所述位置信息可以包括定位信息、归属地信息。所述位置信息可以通过用户使用的移动终端中的定位模块获取,区域信息可以是用户的归属地的信息,根据该位置信息区分用户使用的手语区域(如:不同的国家或地区)。所述手语数据可以通过手环、臂环等传感器捕捉肌肉在静止或收缩时产生微弱电流所形成的生物电信号,并将所述信号发送到手机等移动终端,生成手语数据。
模型选择单元12,用于根据所述区域信息选择与预设区域范围关联的翻译模型组。
本实施例中,所述翻译模型组可以是将初始分类模型组进行训练得到的,也可以是事先训练好的翻译模型组。在根据所述区域信息选择与所述区域信息关联的翻译模型组之前还包括:获取与所述区域信息关联的训练样本集合和与区域信息关联的测试样本集合;采用所述训练样本集合对初始分类模型组中的每一个初始分类模型分别进行训练;采用所述测试样本集合对训练后的每一个所述初始分类模型进行测试,若测试结果符合预设要求,则将训练后的所述初始分类模型作为翻译模型;每一个与所述区域信息关联的翻译模型组包括多个所述翻译模型。
翻译单元13,用于采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据。
所述翻译模型选择以下至少两种模型:长短期记忆模型,门控循环单元模型,序列到序列模型。针对来自不同区域的手语数据,通过训练好的翻译模型,对输入的不同来自不同区域的手语数据进行翻译,获取准确的翻译结果,并生成相应特定的手语翻译库,提高针对不同区域的手语数据翻译结果的准确率。
转换单元14,用于将所述翻译数据转换为音频数据。
可以采用TTS(Text To Speech)文字转语音技术来完成翻译数据到音频数据的转换,可将数据音频以手语翻译设备自带的发音模块,通过发音语音的方式展示出来,利用发音模块将音频数据朗读出来,供聋哑人士与普瑞人的正常沟通。
在本实施例中,可用于对不同区域的聋哑人士的手语进行翻译,通过获取用户发送的携带区域信息的手语数据,根据区域信息选择与所述区域信息关联的翻译模型组,从而针对不同区域的聋哑人士,采用相应的翻译模型,提高了翻译结果的准确性;采用翻译模型组对所述手语数据进行翻译,并获取翻译数据,从而实现聋哑人士和普通人的无障碍交流。
实施例三。
为实现上述目的,本申请还提供一种计算机设备2,该计算机设备2包括多个计算机设备2,实施例二的手语翻译装置1的组成部分可分散于不同的计算机设备2中,计算机设备2可以是执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。本实施例的计算机设备2至少包括但不限于:存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述方法中的部分或全部步骤。可选的,该计算机设备还可包括网络接口和/或手语翻译装置。例如,可通过系统总线相互通信连接的存储器21、处理器23、网络接口22以及手语翻译装置1(参考图6)。需要指出的是,图6仅示出了具有组件-的计算机设备2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
本实施例中,所述存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,所述存储器21还可以既包括计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备2的操作系统和各类应用软件,例如实施例一的手语翻译方法的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器23在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器23通常用于控制计算机设备2的总体操作例如执行与所述计算机设备2进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器23用于运行所述存储器21中存储的程序代码或者处理数据,例如运行所述的手语翻译装置1等。
所述网络接口22可包括无线网络接口或有线网络接口,该网络接口22通常用于在所述计算机设备2与其他计算机设备2之间建立通信连接。例如,所述网络接口22用于通过网络将所述计算机设备2与外部终端相连,在所述计算机设备2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图6仅示出了具有部件21-23的计算机设备2,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器21中的所述手语翻译装置1还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并由一个或多个处理器(本实施例为处理器23)所执行,以完成本申请。
实施例四。
为实现上述目的,本申请还提供一种计算机可读存储介质,其包括多个存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器23执行时实现相应功能。本实施例的计算机可读存储介质用于存储手语翻译装置1,被处理器23执行时实现实施例一的手语翻译方法。
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种手语翻译方法,包括:
    获取用户发送的携带区域信息的手语数据;
    根据所述区域信息选择与预设区域范围关联的翻译模型组;
    其中,每一所述翻译模型组关联一预设区域范围,所述翻译模型组包括至少两个翻译模型;
    采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据;
    将所述翻译数据转换为音频数据。
  2. 根据权利要求1所述的手语翻译方法,其中,根据所述区域信息选择与预设区域范围关联的翻译模型组,包括:
    将所述区域信息与多个预设区域范围进行匹配,以获取与所述区域信息匹配的所述预设区域范围;
    获取与所述预设区域范围关联的所述翻译模型组。
  3. 根据权利要求1所述的手语翻译方法,其中,根据所述区域信息选择与所述区域信息关联的翻译模型组,之前还包括:
    获取与所述区域信息关联的训练样本集合和与区域信息关联的测试样本集合;
    采用所述训练样本集合对初始分类模型组中的每一个初始分类模型分别进行训练;
    采用所述测试样本集合对训练后的每一个所述初始分类模型进行测试,若测试结果符合预设要求,则将训练后的所述初始分类模型作为翻译模型;
    每一个与所述区域信息关联的翻译模型组包括多个所述翻译模型。
  4. 根据权利要求1所述的手语翻译方法,其中,采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据,包括:
    分别采用所述翻译模型组中的每一个所述翻译模型对所述手语数据进行翻译,以获取语义概率;
    将所有所述语义概率中最高的语义概率对应的语义数据作为翻译数据。
  5. 根据权利要求4所述的手语翻译方法,其中,采用所述翻译模型组中的每一个翻译模型对所述手语数据进行翻译,以获取语义概率,包括:
    所述手语数据包括EMG信号;
    提取所述手语数据中的EMG信号,采用算分平均的方式对所述EMG信号进行降噪,并对降噪后的信号进行切割以获取特征数据;
    将所述特征数据输入所述翻译模型,通过所述翻译模型对所述特征数据进行识别以获取所述语义概率。
  6. 根据权利要求1所述的手语翻译方法,其中,将所述翻译数据转换为音频数据,包括:
    将所述翻译数据映射到预设手语语音库,获取与所述翻译数据匹配的音频数据;
    其中,所述预设手语语音库包括:翻译数据及与所述翻译数据关联的音频数据。
  7. 根据权利要求1所述的手语翻译方法,其中,将所述翻译数据转换为音频数据,包括:
    识别所述翻译数据的语义信息,采用语音转换器将所述语义信息转换为所述音频数据。
  8. 一种手语翻译装置,包括:
    获取单元,用于获取用户发送的携带区域信息的手语数据;
    模型选择单元,根据所述区域信息选择与预设区域范围关联的翻译模型组;
    翻译单元,用于采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据;
    转换单元,用于将所述翻译数据转换为音频数据。
  9. 一种计算机设备,所述计算机设备包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下方法:
    获取用户发送的携带区域信息的手语数据;
    根据所述区域信息选择与预设区域范围关联的翻译模型组;
    其中,每一所述翻译模型组关联一预设区域范围,所述翻译模型组包括至少两个翻译模型;
    采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据;
    将所述翻译数据转换为音频数据。
  10. 根据权利要求9所述的计算机设备,其中,根据所述区域信息选择与预设区域范围关联的翻译模型组时,具体实现:
    将所述区域信息与多个预设区域范围进行匹配,以获取与所述区域信息匹配的所述预设区域范围;
    获取与所述预设区域范围关联的所述翻译模型组。
  11. 根据权利要求9所述的计算机设备,其中,根据所述区域信息选择与所述区域信息关联的翻译模型组之前,所述处理器执行所述计算机程序时还用于实现:
    获取与所述区域信息关联的训练样本集合和与区域信息关联的测试样本集合;
    采用所述训练样本集合对初始分类模型组中的每一个初始分类模型分别进行训练;
    采用所述测试样本集合对训练后的每一个所述初始分类模型进行测试,若测试结果符合预设要求,则将训练后的所述初始分类模型作为翻译模型;
    每一个与所述区域信息关联的翻译模型组包括多个所述翻译模型。
  12. 根据权利要求9所述的计算机设备,其中,采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据时,具体实现:
    分别采用所述翻译模型组中的每一个所述翻译模型对所述手语数据进行翻译,以获取语义概率;
    将所有所述语义概率中最高的语义概率对应的语义数据作为翻译数据。
  13. 根据权利要求12所述的计算机设备,其中,采用所述翻译模型组中的每一个翻译模型对所述手语数据进行翻译,以获取语义概率时,具体实现:
    所述手语数据包括EMG信号;
    提取所述手语数据中的EMG信号,采用算分平均的方式对所述EMG信号进行降噪,并对降噪后的信号进行切割以获取特征数据;
    将所述特征数据输入所述翻译模型,通过所述翻译模型对所述特征数据进行识别以获取所述语义概率。
  14. 根据权利要求9所述的计算机设备,其中,将所述翻译数据转换为音频数据时,具体实现:
    将所述翻译数据映射到预设手语语音库,获取与所述翻译数据匹配的音频数据;其中,所述预设手语语音库包括:翻译数据及与所述翻译数据关联的音频数据;或者,
    识别所述翻译数据的语义信息,采用语音转换器将所述语义信息转换为所述音频数据。
  15. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下方法:
    获取用户发送的携带区域信息的手语数据;
    根据所述区域信息选择与预设区域范围关联的翻译模型组;
    其中,每一所述翻译模型组关联一预设区域范围,所述翻译模型组包括至少两个翻译模型;
    采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据;
    将所述翻译数据转换为音频数据。
  16. 根据权利要求15所述的计算机可读存储介质,其中,根据所述区域信息选择与预设区域范围关联的翻译模型组时,具体实现:
    将所述区域信息与多个预设区域范围进行匹配,以获取与所述区域信息匹配的所述预设区域范围;
    获取与所述预设区域范围关联的所述翻译模型组。
  17. 根据权利要求15所述的计算机可读存储介质,其中,根据所述区域信息选择与所述区域信息关联的翻译模型组之前,所述计算机程序被处理器执行时还用于实现:
    获取与所述区域信息关联的训练样本集合和与区域信息关联的测试样本集合;
    采用所述训练样本集合对初始分类模型组中的每一个初始分类模型分别进行训练;
    采用所述测试样本集合对训练后的每一个所述初始分类模型进行测试,若测试结果符合预设要求,则将训练后的所述初始分类模型作为翻译模型;
    每一个与所述区域信息关联的翻译模型组包括多个所述翻译模型。
  18. 根据权利要求15所述的计算机可读存储介质,其中,采用所述翻译模型组中的所述翻译模型对所述手语数据进行翻译,获取翻译数据时,具体实现:
    分别采用所述翻译模型组中的每一个所述翻译模型对所述手语数据进行翻译,以获取语义概率;
    将所有所述语义概率中最高的语义概率对应的语义数据作为翻译数据。
  19. 根据权利要求18所述的计算机可读存储介质,其中,采用所述翻译模型组中的每一个翻译模型对所述手语数据进行翻译,以获取语义概率时,具体实现:
    所述手语数据包括EMG信号;
    提取所述手语数据中的EMG信号,采用算分平均的方式对所述EMG信号进行降噪,并对降噪后的信号进行切割以获取特征数据;
    将所述特征数据输入所述翻译模型,通过所述翻译模型对所述特征数据进行识别以获取所述语义概率。
  20. 根据权利要求15所述的计算机可读存储介质,其中,将所述翻译数据转换为音频数据时,具体实现:
    将所述翻译数据映射到预设手语语音库,获取与所述翻译数据匹配的音频数据;其中,所述预设手语语音库包括:翻译数据及与所述翻译数据关联的音频数据;或者,
    识别所述翻译数据的语义信息,采用语音转换器将所述语义信息转换为所述音频数据。
PCT/CN2020/134561 2020-10-20 2020-12-08 一种手语翻译方法、装置、计算机设备及存储介质 WO2021179703A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011122840.7 2020-10-20
CN202011122840.7A CN112256827A (zh) 2020-10-20 2020-10-20 一种手语翻译方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021179703A1 true WO2021179703A1 (zh) 2021-09-16

Family

ID=74244342

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134561 WO2021179703A1 (zh) 2020-10-20 2020-12-08 一种手语翻译方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN112256827A (zh)
WO (1) WO2021179703A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114157920A (zh) * 2021-12-10 2022-03-08 深圳Tcl新技术有限公司 一种展示手语的播放方法、装置、智能电视及存储介质
WO2024083138A1 (zh) * 2022-10-19 2024-04-25 维沃移动通信有限公司 手语识别方法、装置、电子设备及可读存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780013A (zh) * 2021-07-30 2021-12-10 阿里巴巴(中国)有限公司 一种翻译方法、设备和可读介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140046661A1 (en) * 2007-05-31 2014-02-13 iCommunicator LLC Apparatuses, methods and systems to provide translations of information into sign language or other formats
CN106295603A (zh) * 2016-08-18 2017-01-04 广东技术师范学院 汉语手语双向翻译系统、方法和装置
CN110008839A (zh) * 2019-03-08 2019-07-12 西安研硕信息技术有限公司 一种自适应手势识别的智能手语交互系统及方法
CN110210721A (zh) * 2019-05-14 2019-09-06 长沙手之声信息科技有限公司 一种远程手语在线翻译客服分配方法及装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868282A (zh) * 2016-03-23 2016-08-17 乐视致新电子科技(天津)有限公司 聋哑人进行信息交流的方法、装置及智能终端
CN106383579A (zh) * 2016-09-14 2017-02-08 西安电子科技大学 一种基于emg和fsr的精细手势识别系统及方法
CN109271901A (zh) * 2018-08-31 2019-01-25 武汉大学 一种基于多源信息融合的手语识别方法
CN109214347A (zh) * 2018-09-19 2019-01-15 北京因时机器人科技有限公司 一种跨语种的手语翻译方法、装置和移动设备
CN109960814B (zh) * 2019-03-25 2023-09-29 北京金山数字娱乐科技有限公司 模型参数搜索方法以及装置
CN110413106B (zh) * 2019-06-18 2024-02-09 中国人民解放军军事科学院国防科技创新研究院 一种基于语音和手势的增强现实输入方法及系统
CN110992783A (zh) * 2019-10-29 2020-04-10 东莞市易联交互信息科技有限责任公司 一种基于机器学习的手语翻译方法及翻译设备
CN111354246A (zh) * 2020-01-16 2020-06-30 浙江工业大学 一种用于帮助聋哑人交流的系统及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140046661A1 (en) * 2007-05-31 2014-02-13 iCommunicator LLC Apparatuses, methods and systems to provide translations of information into sign language or other formats
CN106295603A (zh) * 2016-08-18 2017-01-04 广东技术师范学院 汉语手语双向翻译系统、方法和装置
CN110008839A (zh) * 2019-03-08 2019-07-12 西安研硕信息技术有限公司 一种自适应手势识别的智能手语交互系统及方法
CN110210721A (zh) * 2019-05-14 2019-09-06 长沙手之声信息科技有限公司 一种远程手语在线翻译客服分配方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114157920A (zh) * 2021-12-10 2022-03-08 深圳Tcl新技术有限公司 一种展示手语的播放方法、装置、智能电视及存储介质
CN114157920B (zh) * 2021-12-10 2023-07-25 深圳Tcl新技术有限公司 一种展示手语的播放方法、装置、智能电视及存储介质
WO2024083138A1 (zh) * 2022-10-19 2024-04-25 维沃移动通信有限公司 手语识别方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN112256827A (zh) 2021-01-22

Similar Documents

Publication Publication Date Title
JP7122341B2 (ja) 翻訳品質を評価するための方法と装置
CN109918680B (zh) 实体识别方法、装置及计算机设备
US10977452B2 (en) Multi-lingual virtual personal assistant
US10978047B2 (en) Method and apparatus for recognizing speech
WO2021232725A1 (zh) 基于语音交互的信息核实方法、装置、设备和计算机存储介质
US11217236B2 (en) Method and apparatus for extracting information
WO2021179703A1 (zh) 一种手语翻译方法、装置、计算机设备及存储介质
CN110956959A (zh) 语音识别纠错方法、相关设备及可读存储介质
US20220092276A1 (en) Multimodal translation method, apparatus, electronic device and computer-readable storage medium
KR20170034227A (ko) 음성 인식 장치 및 방법과, 음성 인식을 위한 변환 파라미터 학습 장치 및 방법
CN113205817A (zh) 语音语义识别方法、系统、设备及介质
CN109256125B (zh) 语音的离线识别方法、装置与存储介质
CN107844470B (zh) 一种语音数据处理方法及其设备
CN115309877B (zh) 对话生成方法、对话模型训练方法及装置
CN112463942A (zh) 文本处理方法、装置、电子设备及计算机可读存储介质
CN106713111B (zh) 一种添加好友的处理方法、终端及服务器
US20230127787A1 (en) Method and apparatus for converting voice timbre, method and apparatus for training model, device and medium
CN112669842A (zh) 人机对话控制方法、装置、计算机设备及存储介质
KR20130086971A (ko) 음성인식 질의응답 시스템 및 그것의 운용방법
KR20140123369A (ko) 음성인식 질의응답 시스템 및 그것의 운용방법
CN116189663A (zh) 韵律预测模型的训练方法和装置、人机交互方法和装置
CN112836522B (zh) 语音识别结果的确定方法及装置、存储介质及电子装置
CN115691503A (zh) 语音识别方法、装置、电子设备和存储介质
CN112233648B (zh) 结合rpa及ai的数据的处理方法、装置、设备及存储介质
CN111554300B (zh) 音频数据处理方法、装置、存储介质及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923931

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923931

Country of ref document: EP

Kind code of ref document: A1