CN116052709A - Sign language generation method and device, electronic equipment and storage medium - Google Patents

Sign language generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116052709A
CN116052709A CN202310036732.5A CN202310036732A CN116052709A CN 116052709 A CN116052709 A CN 116052709A CN 202310036732 A CN202310036732 A CN 202310036732A CN 116052709 A CN116052709 A CN 116052709A
Authority
CN
China
Prior art keywords
sign language
information
converted
semantic content
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310036732.5A
Other languages
Chinese (zh)
Inventor
王玮
袁明亮
苏文畅
刘学学
李全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Tingjian Technology Co ltd
Original Assignee
Anhui Tingjian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Tingjian Technology Co ltd filed Critical Anhui Tingjian Technology Co ltd
Priority to CN202310036732.5A priority Critical patent/CN116052709A/en
Publication of CN116052709A publication Critical patent/CN116052709A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/009Teaching or communicating with deaf persons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Business, Economics & Management (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a sign language generation method, device, electronic equipment and storage medium, and information to be converted can be obtained, wherein the information to be converted comprises text information and/or voice information. Determining semantic content of information to be converted, and converting the semantic content into sign language semantic content with a semantic expression mode conforming to a sign language expression mode. And generating sign language action images based on the sign language semantic content. By the arrangement, sign language action images can be automatically generated based on the information to be converted, so that the hearing impaired can acquire effective external information.

Description

Sign language generation method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a sign language generating method, device, electronic apparatus, and storage medium.
Background
The hearing-impaired person cannot acquire the sound information due to the hearing system failure, and some hearing-impaired persons cannot effectively read the characters and cannot acquire the character information. Therefore, communication through sign language is an important way for the hearing impaired to acquire external information. However, in daily life, few people grasp sign language, so that effective external information is difficult to obtain by hearing impaired people.
Disclosure of Invention
Based on the above requirements, the application provides a sign language generation method, a sign language generation device, electronic equipment and a storage medium, so as to solve the problem that effective external information is difficult to obtain by hearing impaired people in the prior art.
The technical scheme provided by the application is as follows:
in one aspect, the present application provides a sign language generating method, including:
obtaining information to be converted; the information to be converted comprises text information and/or voice information;
determining semantic content of the information to be converted, and converting the semantic content into sign language semantic content with a semantic expression mode conforming to a sign language expression mode;
and generating sign language action images based on the sign language semantic content.
Further, in the above method, converting the semantic content into sign language semantic content whose semantic expression matches the sign language expression includes:
determining the target field to which the information to be converted belongs;
if the fact that professional vocabulary information in the target domain exists in the semantic content is detected, determining interpretation information of the professional vocabulary information in the target domain;
and determining the sign language semantic content by combining the interpretation information of the professional vocabulary information in the target field.
Further, in the method, before determining the semantic content of the information to be converted, the method further includes:
detecting the language corresponding to the information to be converted;
and if the language corresponding to the information to be converted does not belong to the set language type, translating the information to be converted into the set language type.
Further, in the above method, if the information to be converted includes voice information, before determining the semantic content of the information to be converted, the method further includes:
and converting the voice information in the information to be converted into text information.
Further, in the above method, generating a sign language action image based on the sign language semantic content includes:
converting the sign language semantic content into a sign language sequence text conforming to a sign language grammar rule;
and generating a sign language action image according to the sign language order text.
Further, in the above method, converting the sign language semantic content into a sign language word sequence text conforming to a sign language grammar rule includes:
and carrying out word replacement processing and/or word order adjustment processing on the sign language semantic content according to the sign language grammar rule to obtain the sign language word order text conforming to the sign language grammar rule.
Further, in the above method, generating a sign language action image according to the sign language order text includes:
determining an action code corresponding to the sign language order text from a preset sign language action code library;
and generating sign language action images based on the action codes.
Further, in the above method, generating a sign language action image based on the action code includes:
based on the action codes, driving the virtual image of the target object to execute corresponding sign language actions to obtain a sign language action image of the virtual image of the target object; the target object is an object outputting the information to be converted.
Further, in the above method, the information to be converted is extracted from a set video; after generating the sign language action image based on the sign language semantic content, the method further comprises the following steps:
and synthesizing the sign language action image into the setting video.
On the other hand, the application provides a sign language generating device, which comprises:
the acquisition module is used for acquiring information to be converted; the information to be converted comprises text information and/or voice information;
the determining module is used for determining semantic content of the information to be converted and converting the semantic content into sign language semantic content with a semantic expression mode conforming to a sign language expression mode;
And the generation module is used for generating sign language action images based on the sign language semantic content.
In another aspect, the present application provides an electronic device, including:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to implement the sign language generating method described in any one of the above by running the program in the memory.
In another aspect, the present application provides a storage medium comprising: the storage medium stores a computer program which, when executed by a processor, implements the sign language generation method described in any one of the above.
The sign language generating method can acquire the information to be converted, wherein the information to be converted comprises text information and/or voice information. Determining semantic content of information to be converted, and converting the semantic content into sign language semantic content with a semantic expression mode conforming to a sign language expression mode. And generating sign language action images based on the sign language semantic content. By the arrangement, sign language action images can be automatically generated based on the information to be converted, so that the hearing impaired can acquire effective external information.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
Fig. 1 is a flow chart of a sign language generating method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of converting semantic content into sign language semantic content according to an embodiment of the present application;
FIG. 3 is a flowchart of generating sign language action images according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating another exemplary generation of sign language action images according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a face-to-face communication scenario provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of a set video scene including sign language action images according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a sign language generating device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Summary of the application
The embodiment of the application is suitable for various scenes needing sign language translation. For example, in the scene that the hearing impaired people do shopping, seek medical attention, legal service and the like and in the scene that the hearing impaired people read texts, watch videos, watch live broadcast and other information acquisition. By adopting the technical scheme of the embodiment of the application, the sign language action image can be automatically generated according to the voice information or the text information, so that the hearing impaired can acquire the effective external information in each scene.
The sign language is a specific action by gestures, and the sign language simulates images or syllables to form certain meanings or words according to the change of the gestures, so that people with hearing impairment can communicate with people with hearing impairment, and people with hearing impairment can communicate with each other more conveniently and efficiently.
At present, very few people who are skilled in mastering sign language and can communicate by using the sign language can obtain effective information and communicate with each other difficultly by people with hearing impairment. Moreover, with the rapid development of internet technology, people can touch various audio and video information, such as television programs, network videos, network living broadcast, etc., but the hearing impaired people cannot acquire the audio information from the audio and video information. In addition, due to the fact that the sign language order is different from the normal language order, partial hearing impaired people cannot read normal books. These problems all make it difficult for the hearing impaired to obtain effective external information.
Based on the above, the application provides a sign language generating method, device, electronic equipment and storage medium.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Exemplary method
The method for generating sign language can be executed by electronic equipment, and the electronic equipment can be any equipment with data and instruction processing functions, such as a computer, an intelligent terminal, a server and the like. Referring to fig. 1, the method includes:
s101, obtaining information to be converted.
The information to be converted refers to information to be converted into sign language action images. The information to be converted may include text information and/or voice information. That is, in the embodiment of the present application, external text information and/or voice information may be obtained in real time as information to be converted, and then the information to be converted is converted to generate sign language action images. If the information to be converted includes voice information, voice recognition may be performed on the voice information, and after the voice information is converted into text information, the information to be converted may be converted to generate a sign language action image.
In different application scenarios, the acquisition mode of the information to be converted may be different.
For example, in a scene where an impaired person communicates with other persons face to face, voice information of an object communicating with the impaired person may be acquired as information to be converted through a device having a recording function. The equipment with the recording function comprises a recorder, a recording pen or a smart phone and the like.
In a scene of reading by the hearing impaired person, acquiring a text read by the hearing impaired person through equipment with shooting and/or scanning functions as information to be converted. Devices with photographing or scanning functions include a scanning pen, a video camera, a smart phone, etc.
When watching audio and video information, the device with the recording function can be used for acquiring the audio information of the audio and video information as information to be converted, or the device with the shooting and/or scanning function can be used for acquiring the subtitle information in the audio and video information as information to be converted, and the device for playing the audio and video information can be used for extracting the audio information or the subtitle information of the audio and video information as the information to be converted.
In addition, when audio and video information is manufactured, the sound information or the caption information of the audio and video information can be extracted, the sound information or the caption information of the audio and video information is used as information to be converted to generate sign language action images through sign language conversion, and the sign language action images are synthesized into the audio and video information, so that people with hearing impairment can watch the sign language action images simultaneously when watching the audio and video.
S102, determining semantic content of the information to be converted, and converting the semantic content into sign language semantic content with a semantic expression mode conforming to a sign language expression mode.
The semantic content of the information to be converted represents the true meaning of each word in the information to be converted. In this embodiment, first, the semantic content of the information to be converted is determined.
Specifically, the semantic analysis model can be used for processing the information to be converted, and determining the semantic content of the information to be converted. The semantic analysis model may be a natural language processing (NLP, natural LanguageProcessing) model, and the input information to be converted may be subjected to semantic understanding analysis by using the NLP model to determine semantic content of the information to be converted. The semantic analysis model may also be a pre-trained deep recognition network model, through which the associated text content may be analyzed, so as to obtain the true meaning of the information to be converted. It should be understood that the example of the semantic analysis model in this embodiment is only for easy understanding, and does not form any limitation.
Further, the embodiment converts the semantic content into sign language semantic content whose semantic expression conforms to the sign language expression.
In particular, some words in the semantic content may not have corresponding sign language expressions, or some words in the semantic content may have different definitions in different fields, and then such words may correspond to multiple sign language expressions. In order to solve the above-mentioned problem, in this embodiment, it is necessary to convert such vocabulary so that the converted vocabulary has a semantically clear sign language expression.
For example, in the embodiment of the present application, the target field to which the information to be converted belongs may be determined first, if a vocabulary without a corresponding sign language expression mode exists in the semantic content or a vocabulary corresponding to multiple sign language expression modes exists in the semantic content, then the interpretation information of the vocabulary in the target field may be further determined, and the interpretation information of the vocabulary in the target field is used to replace the vocabulary, so that all the vocabularies in the semantic content have sign language expression modes with definite semantics, and further the purpose of converting the semantic content into sign language semantic content whose semantic expression modes conform to the sign language expression modes is achieved.
In still another example, sign language action code libraries of each field may be preset, where the sign language action code library of any field includes sign language actions of nouns in the field and action codes corresponding to the sign language actions. Specifically, if the vocabulary without the corresponding sign language expression mode or the vocabulary corresponding to multiple sign language expression modes exists in the semantic content, a target sign language action coding library corresponding to the target field to which the vocabulary belongs can be further determined, the action code corresponding to the vocabulary is determined from the target sign language action coding library, and the vocabulary is replaced by the action code corresponding to the vocabulary, so that all the vocabularies and/or the codes in the semantic content have the sign language expression mode with definite semantics, and the purpose of converting the semantic content into the sign language semantic content with the sign language expression mode is achieved.
S103, generating sign language action images based on sign language semantic content.
After the sign language semantic content is obtained, the virtual image can be driven to execute corresponding limb actions and/or facial actions according to the sign language semantic content, and a sign language action image is obtained. The sign language action image may be in a picture form or a video form, which is not limited in this embodiment.
In this embodiment, the actor may be requested to make different sign language actions in advance, and the sign language actions are scanned by the dynamic capture technology, and key action information of the hands and the faces is recorded, so as to obtain action contents of the different sign language actions. And the professional can draw three-dimensional animations of different sign language actions, extract the sign language actions from the three-dimensional animations, record key action information of hands and faces and obtain action contents of the different sign language actions. Wherein sign language actions include basic words, numbers, letters, etc.
The motion content is then stored in a sign language motion coding library, and different codes are set for different motion content, which can drive the avatar to perform the corresponding motion. In addition, feedback of the user can be collected in the later period, and the sign language library is continuously perfected.
If a vocabulary having no corresponding sign language expression or a vocabulary having a plurality of sign language expressions exists in the semantic content, if a target sign language motion code library corresponding to the target domain to which the vocabulary belongs is determined according to the description in the above embodiment, a motion code corresponding to the vocabulary is determined from the target sign language motion code library, and the vocabulary is replaced with the motion code corresponding to the vocabulary, it is necessary to further convert the non-encoded vocabulary in the sign language semantic content into the motion code according to the description in the present embodiment, and further convert all the sign language semantic content into the motion code.
In this embodiment, according to the sign language semantic content, the action codes corresponding to the vocabularies in the sign language semantic content may be determined, and under the driving of the action codes, the avatar may execute the corresponding limb actions and/or facial actions, so as to obtain the sign language action image.
In the above embodiment, the information to be converted can be obtained, where the information to be converted includes text information and/or voice information. Determining semantic content of information to be converted, and converting the semantic content into sign language semantic content with a semantic expression mode conforming to a sign language expression mode. And generating sign language action images based on the sign language semantic content. By the arrangement, sign language action images can be automatically generated based on the information to be converted, so that the hearing impaired can acquire effective external information.
As an alternative implementation manner, as shown in fig. 2, in another embodiment of the present application, the steps of the above embodiment convert semantic content into sign language semantic content whose semantic expression conforms to the sign language expression, and specifically may include the following steps:
s201, determining the target field to which the information to be converted belongs.
The target area may be entered by an impaired person or other object. For example, if the sign language action image is a sign language action video, and the current scene is an audio/video producer, and the setting video including the sign language action video is produced, then other objects may be the above-mentioned audio/video producer. In this embodiment, the target area input by the hearing impaired person or other objects is determined as the target area to which the information to be converted belongs.
In addition, the target field to which the information to be converted belongs may be determined according to the semantic content of the information to be converted by analyzing the voice content, which is not limited in this embodiment. For example, if the information to be converted includes a large number of words in the computer domain, it may be determined that the domain to which the information to be converted belongs is the computer domain. If the information to be converted contains a large number of words in the mechanical domain, it can be determined that the domain to which the information to be converted belongs is the mechanical domain.
S202, if the fact that professional vocabulary information in the target domain exists in the semantic content is detected, determining interpretation information of the professional vocabulary information in the target domain.
The professional vocabulary information in the target field refers to a vocabulary specific to the field or a vocabulary with different definitions in other fields, and the vocabulary generally has no corresponding sign language expression mode or corresponds to multiple sign language expression modes in different fields. In order to determine the meaning-clear sign language expression mode corresponding to the professional vocabulary information, the interpretation information of the professional vocabulary information in the target field can be determined.
Specifically, the professional vocabulary information can be interpreted according to the meaning of the professional vocabulary information in the target field to obtain the interpretation information of the professional vocabulary information in the target field; or setting professional vocabulary explanation databases of different fields, and determining the explanation information corresponding to the professional vocabulary information from the professional vocabulary explanation databases of the preset target field as the explanation information of the professional vocabulary information in the target field, wherein the embodiment is not limited.
S203, determining sign language semantic content by combining interpretation information of the professional vocabulary information in the target field.
The interpretation information of the professional vocabulary information in the target field can be used for replacing the professional vocabulary information, so that the sign language semantic content which accords with the sign language expression mode is obtained.
In the above embodiment, by determining the interpretation information of the new information of the patent vocabulary in the target field, the sign language expression mode with definite meaning corresponding to the professional vocabulary information can be determined.
As an alternative implementation manner, disclosed in another embodiment of the present application, before determining the semantic content of the information to be converted in the steps of the above embodiment, the method specifically may include the following steps:
detecting languages corresponding to the information to be converted; and if the language corresponding to the information to be converted does not belong to the set language type, translating the information to be converted into the set language type.
Specifically, because sign language expression rules of different languages or language systems are different, the same sign language may express different meanings. For example, in one language, the little vertical finger means "small, weak", while in other language, the same little vertical finger action may mean "wife". Under the condition that the sign language expression rule of the language corresponding to the information to be converted is different from the sign language expression rule of the mastered language of the hearing impaired, if the sign language conversion is performed according to the sign language expression rule of the language corresponding to the information to be converted, the hearing impaired may receive the wrong information.
Based on this, in the embodiment of the present application, before determining the semantic content of the information to be converted, the language corresponding to the information to be converted may be detected first, and if the language corresponding to the information to be converted does not belong to the set language type, the information to be converted is translated, and the information to be converted is translated into the set language type.
The set language types are corresponding to sign language expression rules which can be understood by the hearing impaired. For example, in the scenario of producing the audio/video information, the language of the distribution region of the audio/video information may be determined as the set language type; in the scenes that the hearing-impaired person performs face-to-face communication with other people, reads or watches audio/video information, etc., the hearing-impaired person may set the language types set in the foregoing according to the actual situation of the hearing-impaired person, which is not limited in this embodiment.
In the above embodiment, the language type of the information to be converted can be determined as the set language type, and then operations such as semantic understanding, sign language semantic content conversion, sign language action image generation and the like are performed according to the information to be converted of the set language type, so that the hearing impaired can understand the meaning of the sign language action in the generated sign language action image.
As an alternative implementation manner, it is disclosed in another embodiment of the present application that if the information to be converted includes voice information, before determining the semantic content of the information to be converted in the steps of the above embodiment, the method specifically may include the following steps:
and converting the voice information in the information to be converted into text information.
Specifically, if it is detected that the information to be converted includes voice information, for example, the information to be converted is all voice information or the information to be converted includes voice information and text information, in this embodiment, the voice information in the information to be converted needs to be converted into text information first, so that operations such as semantic understanding, sign language semantic content conversion, sign language action image generation and the like are performed based on the text information.
The voice information may be converted into the text information by using a mature voice conversion technology in the prior art, which is not limited in this embodiment.
In the above embodiment, before determining the semantic content of the information to be converted, the voice information in the information to be converted is converted into the text information, so that semantic understanding can be performed based on the text information, and the semantic content corresponding to the information to be converted is determined more quickly.
For example, if the language type of the information to be converted does not belong to the set language type, the voice information in the information to be converted can be converted into text information, then the text information is translated, the language type of the information to be converted is translated into the set language type, then semantic understanding is performed on the translated text information, so that semantic content is obtained, and the subsequent steps are executed.
As an alternative implementation manner, as shown in fig. 3, in another embodiment of the present application, the steps of the above embodiment generate sign language action images based on sign language semantic content, and may specifically include the following steps:
s301, converting the sign language semantic content into a sign language sequence text conforming to a sign language grammar rule.
The grammar rules of sign language have certain differences from the normal language order rules. Taking Chinese as an example, the normal word sequence in Chinese is 'I want to go home', and the normal word sequence is converted into sign language as 'I want to go home'; the normal Chinese language sequence is "I have a new clothes" and is converted into sign language "I have a new clothes". Moreover, some words are often omitted from sign language, such as, for example, adjectives, imaginary words, etc. Taking Chinese as an example, a book in Chinese is converted into a sign language into a book, a piece of paper is converted into a sign language into a paper, and I and King in Chinese are converted into a sign language into a film watching I and King in the case of watching a film.
Thus, in embodiments of the present application, sign language semantic content is converted into sign language prosodic text conforming to the sign language grammar rules. The sign language grammar rule corresponds to the language type set in the above embodiment.
For example, sign language conversion rules corresponding to sign language semantic content of different word types, sentence structures can be stored in advance, the word types comprise verbs, nouns, graduated words, virtual words and the like, the sentence types comprise statement sentences, question sentences and the like, and the sentence structures comprise subjects, predicates, objects and the like. In this embodiment, after obtaining the sign language content, the sign language content is disassembled, sentence structure contents such as a subject, a predicate, an object and the like in the sign language content are extracted as keywords, sentence types of the sign language content are determined based on the keywords, word types of the keywords are determined, then a sign language conversion rule corresponding to the sign language content is determined from pre-stored sign language conversion rules according to the word types of the keywords and the sentence types of the sign language content, and the sign language conversion rule is utilized to convert the sign language content, so as to obtain a sign language sequence text conforming to the sign language grammar rule.
S302, generating a sign language action image according to the sign language order text.
According to the sign language sequence text, the action codes corresponding to the words in the sign language sequence text can be determined, and under the driving of the action codes, the virtual image can execute corresponding limb actions and/or facial actions to obtain the sign language action image conforming to the sign language grammar rule. Specifically, the sign language action picture or the sign language action video may be generated according to the sign language order text, which is not limited in this embodiment.
In the above embodiment, the sign language grammar rule is used for adjusting the sign language semantic content to obtain the sign language sequence text conforming to the sign language grammar rule, so that the sign language action image conforming to the sign language grammar rule can be obtained when the sign language is converted, and the understanding of the hearing impaired can be facilitated.
As an alternative implementation manner, as shown in fig. 4, in another embodiment of the present application, the steps of the above embodiment convert sign language semantic content into sign language prosody text conforming to a sign language grammar rule, and specifically may include the following steps:
s401, performing word replacement processing and/or word order adjustment processing on the sign language semantic content according to the sign language grammar rule to obtain a sign language word order text conforming to the sign language grammar rule.
Specifically, in this embodiment, when the sign language semantic content is converted into the sign language sequence text conforming to the sign language grammar rule, the method specifically includes: if the language order of the sign language semantic content does not accord with the sign language grammar rule, adjusting the language order of the sign language semantic content according to the sign language grammar rule, for example, adjusting 'I want to go home' to 'home return to I want'; if the word of the sign language semantic content does not accord with the sign language grammar rule, the word which does not accord with the sign language grammar rule is replaced by the word which accords with the sign language grammar rule, for example, a piece of paper is replaced by a piece of paper.
In the above embodiment, the sign language grammar rule is used for adjusting the sign language semantic content to obtain the sign language sequence text conforming to the sign language grammar rule, so that the sign language action image conforming to the sign language grammar rule can be obtained when the sign language is converted, and the understanding of the hearing impaired can be facilitated.
As an alternative implementation manner, as shown in fig. 4, in another embodiment of the present application, the steps of the above embodiment generate a sign language action image according to a sign language order text, and may specifically include the following steps:
s402, determining an action code corresponding to the sign language order text from a preset sign language action code library.
In the embodiment of the application, a sign language action coding library is preset, and the sign language action coding library comprises different sign language actions and action codes corresponding to the different sign language actions. The avatar is driven by using the motion code, so that the avatar can execute a sign language motion corresponding to the motion code. In this embodiment, the motion codes of the sign language motions corresponding to the vocabularies in the sign language-order text are determined from a preset sign language motion code library, and the motion codes are combined into a motion code sequence.
S403, generating sign language action images based on the action codes.
Based on the action coding sequence combined by the action coding codes, a set of sign language actions can be generated. In this embodiment, the motion coding sequence is used to drive the avatar to execute the corresponding sign language motion, so as to obtain a sign language motion image.
The transition motion codes can be added between the motion codes corresponding to different sign language motions, or the virtual image is arranged to increase the transition motion in the middle when executing two different sign language motions, so that the sign language motions executed by the virtual image are smoother.
In the above embodiments, the avatar can be driven by the motion code to execute the corresponding sign language motion, thereby generating the sign language motion image.
As an alternative implementation manner, in another embodiment of the present application, the steps of the above embodiment generate sign language action images based on action codes, and specifically may include the following steps:
based on the motion codes, driving the virtual image of the target object to execute corresponding sign language motions to obtain a sign language motion image of the virtual image of the target object; the target object is an object for outputting information to be converted.
If there are a plurality of objects currently outputting the voice to be converted, different avatars may be set for the plurality of objects. By way of example, a 3D model of a virtual person may be built by a Unity3D engine. After the information to be converted is obtained, a target object outputting the current information to be converted can be determined, then the virtual image corresponding to the target object is driven to execute corresponding limb actions and/or facial actions according to sign language semantic content corresponding to the current information to be converted, and a sign language action image is obtained and then output through an engine. The different virtual images may be represented by different shapes or different display positions in the sign language action image, which is not limited in this embodiment.
For example, as shown in fig. 5, in a scene where an impaired person communicates with other persons face to face, if there is an object a and an object B communicating with the impaired person face to face, different avatars may be provided for the object a and the object B. If different avatars are represented by different display positions in the sign language action image, the position a corresponds to the avatar of the object a and the position B corresponds to the avatar of the object B. When the object A speaks or writes text information for communication, the voice information and/or the text information of the object A can be used as information to be converted to obtain sign language semantic content corresponding to the information to be converted of the object A, and corresponding limb actions and/or facial actions are executed based on the virtual image of the sign language semantic content driving position a corresponding to the information to be converted of the object A, so that a sign language action image is obtained. The sign language action image can be output through a device with a display screen of an impaired person, such as a mobile phone, a tablet computer or a smart watch.
As another example, as shown in fig. 6, if the sign language action image is in a video form, in the case of creating a setting video scene including the sign language action image, if there are an object C and an object D in the setting video, different avatars may be set for the object C and the object D. If different avatars are embodied by different profiles, then avatar C may be the avatar of object C and avatar D may be the avatar of object D. When the object C speaks or writes text information, the voice information and/or text information of the object C can be used as information to be converted to obtain sign language semantic content corresponding to the information to be converted of the object C, and the corresponding limb actions and/or facial actions are executed by the image C based on the sign language semantic content corresponding to the information to be converted of the object C to obtain a sign language action video. So that the sign language action video can be synchronously output with the setting video.
In the above embodiment, when there are multiple objects outputting the voice to be converted currently, different avatars are set for the multiple objects, and the avatars corresponding to the target object outputting the current information to be converted are controlled to execute corresponding limb actions and/or facial actions, so as to obtain sign language action images, so that the hearing impaired can quickly determine the object outputting the voice to be converted currently.
As an optional implementation manner, in another embodiment of the present application, it is disclosed that the information to be converted is extracted from the setting video, and after the step of the above embodiment generates the sign language action image based on the sign language semantic content, the method specifically may include the following steps:
and synthesizing the sign language action image into the set video.
Specifically, in the process of manufacturing a set video scene containing sign language action images, information to be converted is extracted from the set video, after the sign language action images are generated based on the information to be converted, the sign language action images are pushed into the set video, so that the sign language action images can be synchronously played when the set video is played, and the hearing impaired can acquire related information.
In addition, in the process of making the set video scene including the sign language action image, the sign language action image is generally in a video form, so that the sign language action image can be synchronously played when the set video is played, and the hearing impaired can synchronously acquire related information.
The setting video may be a live video, a short video, or a movie or a drama video, which is not limited in this embodiment.
Exemplary apparatus
Corresponding to the sign language generating method, the embodiment of the application also discloses a sign language generating device, as shown in fig. 7, which comprises:
an acquisition module 100, configured to acquire information to be converted; the information to be converted comprises text information and/or voice information;
the determining module 110 is configured to determine semantic content of information to be converted, and convert the semantic content into sign language semantic content whose semantic expression conforms to the sign language expression;
the generating module 120 is configured to generate a sign language action image based on the sign language semantic content.
As an alternative implementation, disclosed in another embodiment of the present application, the determining module 110 of the above embodiment includes:
the first determining unit is used for determining the target field to which the information to be converted belongs;
the second determining unit is used for determining interpretation information of the professional vocabulary information in the target domain if the professional vocabulary information in the target domain exists in the semantic content;
and the third determining unit is used for determining sign language semantic content by combining the interpretation information of the professional vocabulary information in the target field.
As an alternative implementation manner, in another embodiment of the present application, an apparatus of the above embodiment is disclosed, and further includes:
the detection module is used for detecting languages corresponding to the information to be converted before determining semantic content of the information to be converted;
and the translation module is used for translating the information to be converted into the set language type if the language corresponding to the information to be converted does not belong to the set language type.
As an alternative implementation manner, in another embodiment of the present application, an apparatus of the above embodiment includes:
the conversion module is used for converting the voice information in the information to be converted into text information before determining the semantic content of the information to be converted if the information to be converted comprises the voice information.
As an alternative implementation, disclosed in another embodiment of the present application, the generating module 120 includes:
the conversion unit is used for converting the sign language semantic content into a sign language sequence text conforming to the sign language grammar rule;
and the generating unit is used for generating the sign language action image according to the sign language order text.
As an alternative implementation manner, in another embodiment of the present application, the conversion unit of the above embodiment is specifically configured to, when converting sign language semantic content into sign language order text conforming to a sign language grammar rule:
And carrying out word replacement processing and/or word order adjustment processing on the sign language semantic content according to the sign language grammar rule to obtain the sign language word order text conforming to the sign language grammar rule.
As an optional implementation manner, in another embodiment of the present application, the generating unit of the above embodiment is specifically configured to, when generating a sign language action image according to a sign language order text:
determining an action code corresponding to the sign language sequence text from a preset sign language action code library;
based on the motion code, a sign language motion image is generated.
As an optional implementation manner, in another embodiment of the present application, the generating unit of the above embodiment is disclosed, and specifically configured to, when generating a sign language action image based on an action code:
based on the motion codes, driving the virtual image of the target object to execute corresponding sign language motions to obtain a sign language motion image of the virtual image of the target object; the target object is an object for outputting information to be converted.
As an alternative implementation manner, in another embodiment of the present application, it is disclosed that the information to be converted is extracted from the setting video;
the device of the above embodiment further comprises:
and the synthesis module is used for synthesizing the sign language action image into the set video.
Specifically, for the specific working content of each unit of the sign language generating device, please refer to the content of the embodiment of the sign language generating method, which is not described herein.
Exemplary electronic device, computer program product, and storage Medium
Another embodiment of the present application further provides an electronic device, referring to fig. 8, including:
a memory 200 and a processor 210;
wherein the memory 200 is connected to the processor 210 for storing a program;
the processor 210 is configured to implement the sign language generating method disclosed in any one of the above embodiments by running the program stored in the memory 200.
Specifically, the electronic device may further include: a bus, a communication interface 220, an input device 230, and an output device 240.
The processor 210, the memory 200, the communication interface 220, the input device 230, and the output device 240 are interconnected by a bus. Wherein:
a bus may comprise a path that communicates information between components of a computer system.
The processor 210 may be a general-purpose processor, such as a general-purpose Central Processing Unit (CPU), microprocessor, etc., or may be an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in accordance with aspects of the present application. But may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
Processor 210 may include a main processor, and may also include a baseband chip, modem, and the like.
The memory 200 stores programs for executing the technical solutions of the present application, and may also store an operating system and other critical services. In particular, the program may include program code including computer-operating instructions. More specifically, memory 200 may include read-only memory (ROM), other types of static storage devices that may store static information and instructions, random access memory (random accessmemory, RAM), other types of dynamic storage devices that may store information and instructions, disk storage, flash, and the like.
The input device 230 may include means for receiving data and information entered by a user, such as a keyboard, mouse, camera, scanner, light pen, voice input device, touch screen, pedometer, or gravity sensor, among others.
Output device 240 may include means, such as a display screen, printer, speakers, etc., that allow information to be output to a user.
The communication interface 220 may include devices using any transceiver or the like for communicating with other devices or communication networks, such as ethernet, radio Access Network (RAN), wireless Local Area Network (WLAN), etc.
The processor 210 executes the program stored in the memory 200 and invokes other devices, which can be used to implement the steps of the sign language generating method provided in the above embodiments of the present application.
In addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by the processor 210, cause the processor 210 to perform the steps of the sign language generation method provided by the embodiments described above.
The computer program product may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium, on which computer program instructions are stored, which when executed by a processor, cause the processor 210 to perform the steps of the sign language generation method provided by the above embodiments.
A computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In particular, the specific working content of each part of the electronic device, the computer program product, and the storage medium, and the specific processing content of the computer program product or the computer program on the storage medium when executed by the processor may refer to the content of each embodiment of the sign language generating method, which is not described herein.
For the foregoing method embodiments, for simplicity of explanation, the methodologies are shown as a series of acts, but one of ordinary skill in the art will appreciate that the present application is not limited by the order of acts described, as some acts may, in accordance with the present application, occur in other orders or concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.
The steps in the method of each embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs, and the technical features described in each embodiment can be replaced or combined.
In the embodiments of the present application, the modules and sub-modules in the terminal may be combined, divided, and pruned according to actual needs.
In the embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of modules or sub-modules is merely a logical function division, and there may be other manners of division in actual implementation, for example, multiple sub-modules or modules may be combined or integrated into another module, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules or sub-modules illustrated as separate components may or may not be physically separate, and components that are modules or sub-modules may or may not be physical modules or sub-modules, i.e., may be located in one place, or may be distributed over multiple network modules or sub-modules. Some or all of the modules or sub-modules may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional module or sub-module in each embodiment of the present application may be integrated in one processing module, or each module or sub-module may exist alone physically, or two or more modules or sub-modules may be integrated in one module. The integrated modules or sub-modules may be implemented in hardware or in software functional modules or sub-modules.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software elements may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

1. A method for generating sign language, comprising:
obtaining information to be converted; the information to be converted comprises text information and/or voice information;
determining semantic content of the information to be converted, and converting the semantic content into sign language semantic content with a semantic expression mode conforming to a sign language expression mode;
and generating sign language action images based on the sign language semantic content.
2. The method of claim 1, wherein converting the semantic content into sign language semantic content whose semantic expression conforms to a sign language expression comprises:
determining the target field to which the information to be converted belongs;
If the fact that professional vocabulary information in the target domain exists in the semantic content is detected, determining interpretation information of the professional vocabulary information in the target domain;
and determining the sign language semantic content by combining the interpretation information of the professional vocabulary information in the target field.
3. The method of claim 1, further comprising, prior to determining the semantic content of the information to be converted:
detecting the language corresponding to the information to be converted;
and if the language corresponding to the information to be converted does not belong to the set language type, translating the information to be converted into the set language type.
4. The method of claim 1, further comprising, if the information to be converted includes voice information, prior to determining semantic content of the information to be converted:
and converting the voice information in the information to be converted into text information.
5. The method of claim 1, wherein generating sign language action images based on the sign language semantic content comprises:
converting the sign language semantic content into a sign language sequence text conforming to a sign language grammar rule;
And generating a sign language action image according to the sign language order text.
6. The method of claim 5, wherein converting the sign language semantic content into sign language prosodic text conforming to a sign language grammar rule comprises:
and carrying out word replacement processing and/or word order adjustment processing on the sign language semantic content according to the sign language grammar rule to obtain the sign language word order text conforming to the sign language grammar rule.
7. The method of claim 5, wherein generating a sign language action image from the sign language inflexion text comprises:
determining an action code corresponding to the sign language order text from a preset sign language action code library;
and generating sign language action images based on the action codes.
8. The method of claim 7, wherein generating sign language action images based on the action codes comprises:
based on the action codes, driving the virtual image of the target object to execute corresponding sign language actions to obtain a sign language action image of the virtual image of the target object; the target object is an object outputting the information to be converted.
9. The method of claim 1, wherein the information to be converted is extracted from a setup video; after generating the sign language action image based on the sign language semantic content, the method further comprises the following steps:
And synthesizing the sign language action image into the setting video.
10. A sign language generating apparatus comprising:
the acquisition module is used for acquiring information to be converted; the information to be converted comprises text information and/or voice information;
the determining module is used for determining semantic content of the information to be converted and converting the semantic content into sign language semantic content with a semantic expression mode conforming to a sign language expression mode;
and the generation module is used for generating sign language action images based on the sign language semantic content.
11. An electronic device, comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to implement the sign language generating method according to any one of claims 1 to 9 by running a program in the memory.
12. A storage medium, comprising: the storage medium has stored thereon a computer program which, when executed by a processor, implements a method of generating sign language according to any one of claims 1 to 9.
CN202310036732.5A 2023-01-10 2023-01-10 Sign language generation method and device, electronic equipment and storage medium Pending CN116052709A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310036732.5A CN116052709A (en) 2023-01-10 2023-01-10 Sign language generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310036732.5A CN116052709A (en) 2023-01-10 2023-01-10 Sign language generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116052709A true CN116052709A (en) 2023-05-02

Family

ID=86127004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310036732.5A Pending CN116052709A (en) 2023-01-10 2023-01-10 Sign language generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116052709A (en)

Similar Documents

Publication Publication Date Title
US9282377B2 (en) Apparatuses, methods and systems to provide translations of information into sign language or other formats
CN110517689B (en) Voice data processing method, device and storage medium
CN112449253B (en) Interactive video generation
CN102244788A (en) Information processing method, information processing device, scene metadata extraction device, loss recovery information generation device, and programs
CN113035199B (en) Audio processing method, device, equipment and readable storage medium
CN110781328A (en) Video generation method, system, device and storage medium based on voice recognition
US9525841B2 (en) Imaging device for associating image data with shooting condition information
CN113392273A (en) Video playing method and device, computer equipment and storage medium
KR101104777B1 (en) System and Method for generating sign language animation
CN111797265A (en) Photographing naming method and system based on multi-mode technology
Otoom et al. Ambient intelligence framework for real-time speech-to-sign translation
Friedland et al. Multimedia computing
KR102148021B1 (en) Information search method and apparatus in incidental images incorporating deep learning scene text detection and recognition
CN110992960A (en) Control method, control device, electronic equipment and storage medium
US20230326369A1 (en) Method and apparatus for generating sign language video, computer device, and storage medium
Campos et al. Machine generation of audio description for blind and visually impaired people
CN111160051B (en) Data processing method, device, electronic equipment and storage medium
CN117171369A (en) Content generation method, device, computer equipment and storage medium
RU2668721C1 (en) Method for displaying subtitles in the process of playing media content (options)
CN116052709A (en) Sign language generation method and device, electronic equipment and storage medium
KR102350359B1 (en) A method of video editing using speech recognition algorithm
KR102281298B1 (en) System and method for video synthesis based on artificial intelligence
Kumar et al. Development of a speech to Indian sign language translator
CN114741472A (en) Method and device for assisting in reading picture book, computer equipment and storage medium
CN113269855A (en) Method, equipment and storage medium for converting text semantics into scene animation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination