CN117577129A - Limb action generating system, method, equipment and medium - Google Patents

Limb action generating system, method, equipment and medium Download PDF

Info

Publication number
CN117577129A
CN117577129A CN202311416012.8A CN202311416012A CN117577129A CN 117577129 A CN117577129 A CN 117577129A CN 202311416012 A CN202311416012 A CN 202311416012A CN 117577129 A CN117577129 A CN 117577129A
Authority
CN
China
Prior art keywords
user
limb
language
content
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311416012.8A
Other languages
Chinese (zh)
Inventor
祝丰年
罗婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cloudminds Shanghai Robotics Co Ltd
Original Assignee
Cloudminds Shanghai Robotics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cloudminds Shanghai Robotics Co Ltd filed Critical Cloudminds Shanghai Robotics Co Ltd
Priority to CN202311416012.8A priority Critical patent/CN117577129A/en
Publication of CN117577129A publication Critical patent/CN117577129A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/487Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention relates to a limb motion generation system, a method, equipment and a medium. The system comprises: the user analysis module is used for determining the region information of the first user and/or the second user; wherein the first user is a user who views the limb actions; the language analysis module is used for determining key words contained in language contents to be played to the first user; the limb characteristic matching module is used for determining the corresponding limb action type based on the region information; and the action generating module is used for utilizing the keyword to match the limb characteristics contained in the limb action type and generating the limb actions containing the limb characteristics for being displayed to the first user. In the scheme, after the regional information of the first user and the second user is determined, the type of the limb action is determined according to the regional information, and when language content is played for the first user, the limb action conforming to the regional information required by the first user is synchronously played, so that the limb action is generated according to the required regional culture attribute when the action is generated, and the voice interaction experience is improved.

Description

Limb action generating system, method, equipment and medium
Technical Field
The invention relates to the technical field of computers and discloses a limb action generating system, a limb action generating method, limb action generating equipment and a limb action generating medium.
Background
Traditional schemes of speech generation actions are based on the specific country to generate the corresponding limb actions or some of the world's general limb languages. But in reality, different languages represent different regions, different cultural attributes exist in different regions, each cultural attribute has a characteristic limb language, such as calling "hello" and Japanese bow, and Thailand holds together hands, and the same language may have a plurality of countries and regions, and the corresponding limb languages also have certain differences, such as the same country as French, congo democratic republic and French. Therefore, there is a need for a limb motion that can accurately generate a match to an interaction link according to the actual needs of the user.
Disclosure of Invention
The invention aims to provide a limb action generating system, a method, equipment and a medium, which are used for realizing a scheme for generating limb actions which can accord with cultural attributes required by a user.
In order to solve the above technical problems, in a first aspect, the implementation of the present invention provides a motion generation scheme based on speech recognition, including the following steps: the system comprises a voice recognition module, a voice feature analysis module, a limb feature matching module and an action generation module.
The user analysis module is used for determining the region information of the first user and/or the second user; wherein the first user is a first user who views limb movements;
the language analysis module is used for determining key words contained in language contents to be played to the first user;
the limb characteristic matching module is used for determining a corresponding limb action type based on the region information;
and the action generating module is used for utilizing the keyword to match the limb characteristics contained in the limb action type and generating limb actions which are used for being displayed to the first user and contain the limb characteristics.
Optionally, the user analysis module is further configured to: acquiring voice data provided by the first user or the second user;
the system further comprises: and the voice characteristic analysis module is used for determining the region information of the first user or the second user based on the analysis result of the voice data.
Optionally, the voice feature analysis module is further configured to score the language types in the text content according to the language types, and determine the language scores corresponding to the contained language types;
and determining the target language type corresponding to the voice data according to the comparison result of the plurality of language scores.
Optionally, the user analysis module is further configured to: acquiring the region information provided by the first user or the second user; or,
and acquiring text information and an IP address provided by the first user or the second user, and determining the region information of the first user or the second user based on the text information and the IP address.
Optionally, the language analysis module is used for responding to the interaction request provided by the first user and generating language content for replying to the interaction request;
and determining key words which are contained in the language content and matched with the limb actions.
Optionally, the language analysis module is configured to obtain language content provided by the second user; the language content comprises: second user text content or second user speech data;
when the language content is second user text content, determining key words which are contained in the second user text content and are matched with the limb actions;
when the language content is second user voice data, converting the second user voice data into second user text content; and determining key words which are contained in the text content of the second user and are matched with the limb actions.
Optionally, the limb feature matching module is further configured to determine a corresponding limb action type based on the region information of the first user or the region information of the second user.
Optionally, the action generating module is further configured to:
and adding key actions for representing the key words in preset limb actions so as to obtain the limb actions corresponding to the language content.
Optionally, the system further comprises a playing module for:
playing the language content and the limb action type of the limb action; or,
playing the language content is different from the limb action type of the limb action.
In a second aspect, an embodiment of the present application proposes a method for generating a limb motion, where the method includes:
determining regional information of a first user and/or a second user; wherein the first user is a first user who views limb movements;
determining key words contained in language contents to be played to the first user;
determining a corresponding limb action type based on the region information;
and matching the limb characteristics contained in the limb action type by using the keyword, and generating the limb actions which are used for being displayed to the first user and contain the limb characteristics.
In a third aspect, an embodiment of the present application proposes an electronic device, including: a memory and a processor; wherein,
the memory is used for storing programs;
the processor is coupled to the memory for executing the program stored in the memory for implementing the method of the second aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, where the computer-readable storage medium is capable of implementing the steps in the method according to the second aspect when executed.
In the embodiment of the application, the limb action generating system comprises a user analysis module, a user analysis module and a user analysis module, wherein the user analysis module is used for determining regional information of a first user and/or a second user; wherein the first user is a user who views limb movements; the language analysis module is used for determining key words contained in language contents to be played to the first user; the limb characteristic matching module is used for determining a corresponding limb action type based on the region information; and the action generating module is used for utilizing the keyword to match the limb characteristics contained in the limb action type and generating limb actions which are used for being displayed to the first user and contain the limb characteristics. According to the method, the device and the system, the generation scheme is based on the limb action, after the regional information of the first user and the second user is determined, the corresponding limb action type is determined according to the regional information, when the language content is played for the first user, the limb actions conforming to the regional information required by the first user can be synchronously played, so that the generation can be performed according to the required regional culture attribute when the actions are generated, and the experience sense of voice interaction is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of a limb motion generating system according to an embodiment of the present application;
fig. 2 is a flow chart of a limb motion generating method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a limb motion generating device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device corresponding to the limb motion generating device provided in the embodiment shown in fig. 3.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present invention, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments.
Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, it is the meaning of "including but not limited to".
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
Fig. 1 is a schematic structural diagram of a limb motion generating system according to an embodiment of the present application. As can be seen from fig. 1, the first user is a user who views the limb movements presented by the system, and the second user is a real user who interacts with the first user, or a virtual user generated by a computer (local computer or cloud computer). The system comprises:
a user analysis module 11, configured to determine regional information of the first user and/or the second user; wherein the first user is a user viewing limb movements.
The language analysis module 12 is configured to determine a keyword word contained in the language content to be played to the first user.
And the limb characteristic matching module 13 is used for determining the corresponding limb action type based on the region information.
The action generating module 14 is configured to match the limb characteristics included in the limb action type with the keyword, and generate a limb action including the limb characteristics for displaying to the first user.
The limb actions generated in the system are limb actions for presentation to the first user, and specifically, the limb actions related to the language content are presented to the first user through a display device (the system may include a display device or may be a separate display device capable of communicating with the system) while the language content of the second user is being played. The language content is related to the limb action, and it can be understood that the characters, words, symbols and other contents contained in the language content are displayed through the limb action, so that the first user can also see the displayed limb action while listening or watching the played language content. When the limb actions are displayed, the limb actions are ensured to be synchronous with the language content playing. For example, the limb action of "bow" or "nod" is presented while the language content "hello" is played.
The second user may be a real user or a computer-generated virtual user. Thus, the original language received by the system may be speech data or text content provided by the authentic second user, or text content generated by a computer. Further, the system performs corresponding conversion processing on the original voice to generate language content which can be understood by the first user.
The language content can be in a voice form or a text subtitle form, and when the language content is played in the voice form, the language content can be played through a loudspeaker; when the language content is played in the form of text subtitles, the presentation may be played through a display device.
The region information refers to a region with specific region culture attributes, and different region information corresponds to different language types, clothes types, custom habits, etiquette cultures, calling modes and limb action types. When the regional information is determined, the regional information can be divided in detail, for example, the regional information is divided into specific cities, counties, towns and villages of Guangxi Zhuang autonomous regions, because different villages may be different minority nations, corresponding regional culture attributes are different, and corresponding limb action types are different. If there is no obvious difference in the regional culture attributes, the regional culture attributes may not need to be finely divided, for example, the regional culture attributes may be used as regional information in units of cities or counties according to administrative regions. Of course, the region information may be a region in which the nationality is mainly concentrated.
When generating the limb movement, the limb movement can be a limb movement obtained by combining a series of limb characteristics and conventional limb movements. Because in practice, by conventional limb movements, such as smiling, nodding, turning the body, etc.
Through the scheme, when the first user is interacted with, the language content is played to the first user, and meanwhile, the limb actions are displayed. In order to make the limb actions more in line with the requirements of the first user, the specific limb action type of the first user or the second user needs to be determined according to the region information of the first user or the second user, and further, the limb characteristics are determined from the limb action types according to the key words contained in the language content to be played to the first user. Further, a limb feature is generated and associated with the language content. The obtained limb actions are more accurate, and the requirements of a first user are met, so that the limb actions can be displayed for thousands of people and thousands of faces.
In one or more embodiments of the present application, the user analysis module 11 is further configured to: acquiring voice data provided by the first user or the second user; the system further comprises: a voice characteristic analysis module 15, configured to determine the region information of the first user or the second user based on an analysis result of the voice data.
In practical applications, in order to accurately determine the region information of the first user or accurately determine the region information for the second user, the determination may be performed by voice data. Specifically, after the system receives the sentence (i.e. the voice data) uttered by the first user or the second user, the voice feature analysis module 15 is used to analyze the sentence, so as to determine the regional information to which the first user or the second user belongs. For example, the voice data may be analyzed using a machine learning model to determine regional information.
For example, machine learning model recognition languages mainly require the following steps: feature extraction: firstly, feature extraction is required to be carried out on language voice signals, and a classification model is established. In the feature extraction process, speech feature extraction and linear prediction algorithm (LP) based on Mel Frequency Cepstrum Coefficient (MFCC) can be adopted to preprocess speech, so that the dimension of data can be reduced, and the model classification accuracy can be improved.
Feature selection: another key factor affecting the accuracy of language identification is language feature selection. The learner needs to perform feature screening by combining factors such as physical characteristics of language, cultural background and the like on the basis of the traditional feature extraction method. For languages, the tone, phoneme, phonological and the like have obvious influence on the voice characteristics, so that it is very important to analyze and process the voice signals in specific details.
Through the scheme, the first user or the second user interacted with the system is subjected to voice data acquisition (for example, a microphone is used for voice data acquisition), and then the acquired voice data is analyzed to determine region information. The method can accurately identify the region information of the first user or the second user without providing personal information or other private information to the user, and further determine the type of limb actions familiar to the first user or the second user based on the region information.
In one or more embodiments of the present application, the voice feature analysis module 15 is further configured to score the text content according to language types, and determine the language scores corresponding to the language types; and determining the target language type corresponding to the voice data according to the comparison result of the plurality of language scores.
In practical application, after converting voice data into text content, splitting the text content, splitting according to the language types contained in the text content, establishing groups of words and words obtained after splitting according to the language types, and respectively calculating the scores of the groups where different language types are located. The higher the score of the packet, the more available content the language class occupies in the speech data, and in particular in speech data containing multiple language classes, in order to understand the first user language class, including languages of various countries, dialects of various places, etc., for example, also chinese language, various kinds of local chinese dialects may be included.
Since there are multiple language mixed scenes, each word, phrase, or character in the text content is counted as an independent element, and the combined partial scores are then added together for each element to obtain the overall document score for each language. This produces a document score for each language that can be ranked to determine the dominant language of the document. For example, the voice data uttered by the first user or the second user is "please flag the bug existing in the main program as NG", and then the voice data is converted into text content by conversion processing. The text content is identified, the language types are identified, and the text content is grouped according to the language types. English groupings can be obtained: main bug NG. And (3) obtaining Chinese grouping: please mark the presence in the program as. Each word is labeled 1 score and the word is labeled 2 scores. The statistical calculation shows that the English grouping is 3 points and the Chinese grouping is 11 points. Since the chinese packet 11 is larger than the english packet 3, the target language type of the voice data is known as chinese.
By the scheme, the voice data containing multiple language types can be accurately analyzed and the target language types can be determined. Of course, the above scheme can also be used in a noisy environment containing multiple languages, and the actual language type of the first user can be accurately identified from the environment. By utilizing the scheme, the language types and the limb action types corresponding to the language types can be flexibly switched according to the actual interaction requirements.
In one or more embodiments of the present application, the user analysis module 11 is further configured to: acquiring the region information provided by the first user or the second user; or, acquiring text information and an IP address provided by the first user or the second user, and determining the region information of the first user or the second user based on the text information and the IP address.
In practical application, the regional information can also be provided by the user, for example, when the first user or the second user registers personal information, the first user or the second user can fill out basic non-confidential information such as age, sex, native, ethnicity and the like of the individual. The local information of the user can be known based on native place, ethnicity, and the like.
In addition, if the first user and the second user do not interact with the system through a voice mode, but interact through a text mode, relevant information which can show the language type and is contained in the text information can be further acquired, and if the relevant information is allowed, the IP address of the user can also be acquired. And further comprehensively analyzing and determining the regional information of the user. The text information here includes text content specific to the dialect or a product specific to a certain region. For example, the four-river dialect "i am" corresponds to "i cheat what you do" of chinese, and the text content contains obvious dialect text content, so that the region information can be easily determined to be four-river.
By the above scheme, under the condition that the voice data of the first user or the second user is not received and the regional information of the first user and the second user is not explicitly described, the regional information of the user can be accurately determined by identifying the text information.
In one or more embodiments of the present application, the language analysis module 12 is configured to generate, in response to an interaction request provided by the first user, language content for replying to the interaction request;
and determining key words which are contained in the language content and matched with the limb actions.
It should be noted that in a sentence or a section of speech, not every word or word can be matched with the upper limb action, some words are difficult to match with the limb action, for example, storm snow, sprinting and the like do not have obvious limb actions, while some words have obvious limb actions, for example, words of "hello" of calling and calling are bow corresponding to japanese, the words of tay are corresponding to two hands and the limb actions corresponding to most other languages are smiling nods. Therefore, it is necessary to identify key words for contents included in the language contents. So as to find the corresponding limb characteristics for the key words, and further generate limb actions by the limb characteristic combination.
When the second user is a virtual user, the interactive request made by the first user is directly received by the system, and language content for replying to the interactive request is generated. When the second user is a real user, the interactive request of the first user is sent to the second user through the system, and then the second user can send language content to the system through voice or text.
By analyzing the language content and determining the matched key words through the scheme, the corresponding limb characteristics can be accurately determined, so that limb actions conforming to the language content are generated.
In one or more embodiments of the present application, the language analysis module 12 is configured to obtain language content provided by the second user; the language content comprises: second user text content or second user speech data;
when the language content is second user text content, determining key words which are contained in the second user text content and are matched with the limb actions;
when the language content is second user voice data, converting the second user voice data into second user text content; and determining key words which are contained in the text content of the second user and are matched with the limb actions.
In practical application, when the second user is a real user, the system can receive language contents in different forms provided by the second user, and the ways of searching the keyword by the language contents in different forms are different. Specifically, when the content provided by the second user is text content, the keyword contained therein may be directly extracted, and the text content of the second user may be converted (e.g., language translated) into language content (e.g., voice form or text subtitle form) that can be understood by the first user.
When the content provided by the second user is voice data, the voice data may be converted into text, so as to identify key words contained in the text, and the text content of the second user may be converted (e.g., language translated) into language content (e.g., voice form or text subtitle form) that can be understood by the first user. By the method, the content to be expressed by the second user can be accurately expressed in the limb characteristic mode.
In one or more embodiments of the present application, the limb feature matching module 13 is further configured to determine a corresponding limb action type based on the geographic information of the first user or the geographic information of the second user.
As can be seen from the foregoing, different geographical information has different geographical cultural properties, which means that different limb features are possible. Therefore, the corresponding limb action type can be further determined according to the region information.
In practical application, when the type of the limb action is determined, the limb action is displayed to the first user, but according to different scenes, the limb action can be displayed according to the region information of the first user or the region information of the second user. For example, if the system is disposed in the greeting robot, the first user needs to interact with the greeting robot in a manner meeting the requirements of the first user in the process of interaction between the first user and the greeting robot, that is, the greeting robot needs to determine the corresponding limb action type according to the regional information to which the first user belongs, and then display the limb action type corresponding to the first user in the process of interaction with the first user. By way of further example, the system is arranged on a virtual robot in a cultural communication scene or a local museum presentation scene, the first user being a guest interacting with the virtual robot. The requirements of smooth interaction between the virtual robot and the first user are met, and meanwhile, the regional culture characteristics are required to be displayed for the first user serving as the identity of the tourist. In other words, the virtual robot needs to play the language content through the language types that the first user can understand or understand, and at the same time, perform the demonstration through the avatar image and the limb action type of the local region culture attribute.
Therefore, when the type of the limb action is determined based on the region information, the determination can be performed according to the actual application scene. The method is only used as an example, and does not limit the technical scheme of the application, and in practical application, a user can select according to the actual requirements of the user.
For example, limb characteristics are matched, detailed region information is obtained through voice characteristic analysis, and characteristics of limb languages corresponding to the region are matched through the region information; wherein the matching of the language features of the limb is obtained by, but not limited to, the following: and matching through a pre-collected limb language feature library, namely collecting the feature of limb actions of each region, and establishing the mapping between the language feature and the limb language feature. The corresponding matching model is trained through artificial intelligence, and a data model is built based on voice, voice characteristics, regional culture attributes and the like, so that the mapping of language characteristics and limb language characteristics is carried out through artificial intelligence.
In one or more embodiments of the present application, the action generating module 14 is further configured to: and adding key actions for representing the key words in preset limb actions so as to obtain the limb actions corresponding to the language content.
When the corresponding limb action is to be generated for the language content expressed by the first user, determining the key words contained in the language content, and matching the limb characteristics corresponding to the key words according to the limb action types, wherein the limb characteristics and the conventional limb actions can be combined together, so that a complete set of limb actions is obtained.
For example, the language content is "you good, welcome to XX visit, please enter-! "the identified keyword is: you get good, please get in. When the first user is Japanese, the corresponding limb features are bow and guide indoors, and the played language type is Japanese; when the first person is a Thailand person, the corresponding limb action features are that both hands are in ten and the user directs indoors, and the played language type is Thailand.
And generating actions, namely matching the key words with the limb characteristics, and integrating the limb characteristics when generating the limb actions, so that the generated actions are more in line with the habit of the first user. Most of the currently mainstream motion generation models are based on real motion data sources (resources such as video and pictures), and motion generation models are built to generate motion data according to text contents or input information such as current environment. However, the method has the essential difference that the limb characteristics are added in advance when the model is built, and training of the action generating model is performed based on the limb characteristics, so that corresponding actions can be generated according to the limb characteristics and text contents. Therefore, the model can directly output limb actions after inputting the text.
In one or more embodiments of the present application, the system further includes a play module 16 for: playing the language content and the limb action type of the limb action; or playing the language content and the limb action type of the limb action are different.
In practical applications, the language content is to be played for the first user to watch or listen, so that the language content needs to be displayed and played according to the mode that the first user can understand, that is, the language content accords with the region information of the first user. However, when the limb movement is displayed, the limb movement can be displayed according to the region information of the first user, the region information of the second user and the set region information. It will also occur that the language content is played in the same or different type as the limb action, but whether the same or different, ensures that the limb action presentation remains synchronized with the language content play.
Based on the same thought, the embodiment of the application also provides a limb action generating method. Fig. 2 is a schematic flow chart of a limb motion generating method according to an embodiment of the present application. As can be seen from fig. 2, the method comprises the following steps:
201: determining regional information of a first user and/or a second user; wherein the first user is a user viewing limb movements.
202: and determining key words contained in language contents to be played to the first user.
203: and determining the corresponding limb action type based on the region information.
204: and matching the limb characteristics contained in the limb action type by using the keyword, and generating the limb actions which are used for being displayed to the first user and contain the limb characteristics.
Optionally, the method further comprises: the geographical information of the first user or the second user is determined based on the analysis result of the voice data.
Optionally, the method further comprises: scoring the text content according to the language types, and determining the language scores corresponding to the contained language types;
and determining the target language type corresponding to the voice data according to the comparison result of the plurality of language scores.
Optionally, the method further comprises: acquiring the region information provided by the first user or the second user; or,
and acquiring text information and an IP address provided by the first user or the second user, and determining the region information of the first user or the second user based on the text information and the IP address.
Optionally, the method further comprises: generating language content for replying to the interaction request in response to the interaction request provided by the first user;
and determining key words which are contained in the language content and matched with the limb actions.
Optionally, the method further comprises: acquiring language content provided by a second user; the language content comprises: second user text content or second user speech data;
when the language content is second user text content, determining key words which are contained in the second user text content and are matched with the limb actions;
when the language content is second user voice data, converting the second user voice data into second user text content; and determining key words which are contained in the text content of the second user and are matched with the limb actions.
Optionally, the method further comprises: and determining the corresponding limb action type based on the region information of the first user or the region information of the second user.
Optionally, the method further comprises: and adding key actions for representing the key words in preset limb actions so as to obtain the limb actions corresponding to the language content.
Optionally, the method further comprises: playing the language content and the limb action type of the limb action; or,
playing the language content is different from the limb action type of the limb action.
Based on the same thought, the embodiment of the application also provides a limb action generating device. Fig. 3 is a schematic structural diagram of a limb motion generating device according to an embodiment of the present application. As can be seen from fig. 3, the method specifically includes:
a first determining module 31, configured to determine regional information of the first user and/or the second user; wherein the first user is a user viewing limb movements.
A second determining module 32, configured to determine a keyword included in the language content to be played to the first user.
And a third determining module 33, configured to determine a corresponding limb action type based on the region information.
A generating module 34, configured to match the limb characteristics included in the limb action type with the keyword, and generate a limb action including the limb characteristics for displaying to the first user.
In one possible design, the structure of the gift sending device shown in fig. 4 may be implemented as an electronic device. As shown in fig. 4, the electronic device may include: a processor 41 and a memory 42. Wherein said memory 42 has stored thereon executable code which, when executed by said processor 41, at least enables said processor 41 to implement a visual effect enhancement method as provided in the previous embodiments. The electronic device may also include a communication interface 43 for communicating with other devices or communication networks.
In addition, embodiments of the present invention provide a non-transitory machine-readable storage medium having executable code stored thereon, which when executed by a processor in a server or computer, causes the processor to perform the limb action generating method corresponding to fig. 2 provided in the foregoing embodiments.
The apparatus embodiments described above are merely illustrative, wherein the various modules illustrated as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by adding necessary general purpose hardware platforms, or may be implemented by a combination of hardware and software. Based on such understanding, the foregoing aspects and their substantial or contributing portions may be embodied in the form of a computer product, which may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (12)

1. A limb motion generating system, the system comprising:
the user analysis module is used for determining the region information of the first user and/or the second user; wherein the first user is a user who views limb movements;
the language analysis module is used for determining key words contained in language contents to be played to the first user;
the limb characteristic matching module is used for determining a corresponding limb action type based on the region information;
and the action generating module is used for utilizing the keyword to match the limb characteristics contained in the limb action type and generating limb actions which are used for being displayed to the first user and contain the limb characteristics.
2. The system of claim 1, wherein the user analysis module is further configured to: acquiring voice data provided by the first user or the second user;
the system further comprises: and the voice characteristic analysis module is used for determining the region information of the first user or the second user based on the analysis result of the voice data.
3. The system of claim 2, wherein the speech feature analysis module is further configured to score the text content according to language types, and determine a language score corresponding to each of the language types;
and determining the target language type corresponding to the voice data according to the comparison result of the plurality of language scores.
4. The system of claim 1, wherein the user analysis module is further configured to: acquiring the region information provided by the first user or the second user; or,
and acquiring text information and an IP address provided by the first user or the second user, and determining the region information of the first user or the second user based on the text information and the IP address.
5. The system of claim 1, wherein the linguistic analysis module is configured to generate, in response to the first user-provided interaction request, linguistic content for replying to the interaction request;
and determining key words which are contained in the language content and matched with the limb actions.
6. The system of claim 1, wherein the linguistic analysis module is configured to obtain linguistic content provided by the second user; the language content comprises: second user text content or second user speech data;
when the language content is second user text content, determining key words which are contained in the second user text content and are matched with the limb actions;
when the language content is second user voice data, converting the second user voice data into second user text content; and determining key words which are contained in the text content of the second user and are matched with the limb actions.
7. The system of claim 1, wherein the limb characteristics matching module is further configured to determine a corresponding limb action type based on the geographic information of the first user or the geographic information of the second user.
8. The system of claim 1, wherein the action generation module is further configured to:
and adding key actions for representing the key words in preset limb actions so as to obtain the limb actions corresponding to the language content.
9. The system of claim 1, further comprising a play module for:
playing the language content and the limb action type of the limb action; or,
playing the language content is different from the limb action type of the limb action.
10. A method of limb motion generation, the method comprising:
determining regional information of a first user and/or a second user; wherein the first user is a user who views limb movements;
determining key words contained in language contents to be played to the first user;
determining a corresponding limb action type based on the region information;
and matching the limb characteristics contained in the limb action type by using the keyword, and generating the limb actions which are used for being displayed to the first user and contain the limb characteristics.
11. An electronic device, the electronic device comprising: a memory and a processor; wherein,
the memory is used for storing programs;
the processor, coupled to the memory, is configured to execute the program stored in the memory for implementing the method of claim 10.
12. A computer readable storage medium, which when executed is capable of implementing the steps in the method of claim 10.
CN202311416012.8A 2023-10-27 2023-10-27 Limb action generating system, method, equipment and medium Pending CN117577129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311416012.8A CN117577129A (en) 2023-10-27 2023-10-27 Limb action generating system, method, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311416012.8A CN117577129A (en) 2023-10-27 2023-10-27 Limb action generating system, method, equipment and medium

Publications (1)

Publication Number Publication Date
CN117577129A true CN117577129A (en) 2024-02-20

Family

ID=89861453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311416012.8A Pending CN117577129A (en) 2023-10-27 2023-10-27 Limb action generating system, method, equipment and medium

Country Status (1)

Country Link
CN (1) CN117577129A (en)

Similar Documents

Publication Publication Date Title
CN111667811B (en) Speech synthesis method, apparatus, device and medium
CN110647636A (en) Interaction method, interaction device, terminal equipment and storage medium
US8972265B1 (en) Multiple voices in audio content
CN110808034A (en) Voice conversion method, device, storage medium and electronic equipment
CN106683662A (en) Speech recognition method and device
CN109256133A (en) A kind of voice interactive method, device, equipment and storage medium
CN114401417B (en) Live stream object tracking method, device, equipment and medium thereof
CN114465737B (en) Data processing method and device, computer equipment and storage medium
CN110600033A (en) Learning condition evaluation method and device, storage medium and electronic equipment
CN110853422A (en) Immersive language learning system and learning method thereof
US20180288109A1 (en) Conference support system, conference support method, program for conference support apparatus, and program for terminal
US20220309949A1 (en) Device and method for providing interactive audience simulation
CN115082602A (en) Method for generating digital human, training method, device, equipment and medium of model
CN116958342A (en) Method for generating actions of virtual image, method and device for constructing action library
CN116828246B (en) Digital live broadcast interaction method, system, equipment and storage medium
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
CN117577129A (en) Limb action generating system, method, equipment and medium
US20220301250A1 (en) Avatar-based interaction service method and apparatus
US20220309936A1 (en) Video education content providing method and apparatus based on artificial intelligence natural language processing using characters
Gayathri et al. Sign language recognition for deaf and dumb people using android environment
CN110781327B (en) Image searching method and device, terminal equipment and storage medium
KR102098377B1 (en) Method for providing foreign language education service learning grammar using puzzle game
Pari et al. SLatAR-A Sign Language Translating Augmented Reality Application
CN111160051A (en) Data processing method and device, electronic equipment and storage medium
CN117271751B (en) Interaction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination