CN117273147A - Method and device for generating pattern on electronic panel, electronic equipment and storage medium - Google Patents

Method and device for generating pattern on electronic panel, electronic equipment and storage medium Download PDF

Info

Publication number
CN117273147A
CN117273147A CN202311239713.9A CN202311239713A CN117273147A CN 117273147 A CN117273147 A CN 117273147A CN 202311239713 A CN202311239713 A CN 202311239713A CN 117273147 A CN117273147 A CN 117273147A
Authority
CN
China
Prior art keywords
pattern
model
target
electronic panel
display effect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311239713.9A
Other languages
Chinese (zh)
Inventor
张裕松
毛跃辉
梁博
陶梦春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN202311239713.9A priority Critical patent/CN117273147A/en
Publication of CN117273147A publication Critical patent/CN117273147A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application relates to a method, a device, an electronic device and a storage medium for generating patterns on an electronic panel, wherein the method comprises the following steps: acquiring voice data input by a user and a preset display effect of patterns; recognizing text information corresponding to the voice data through a voice recognition model; performing feature extraction and semantic understanding on the text information by adopting a text-based graph model, and generating patterns of the preset display effect corresponding to the text information; the pattern is displayed on an electronic panel of the smart device. The method and the device realize personalized customization of the patterns on the electronic panel in the intelligent equipment.

Description

Method and device for generating pattern on electronic panel, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of intelligent devices, and in particular, to a method and apparatus for generating a pattern on an electronic panel, an electronic device, and a storage medium.
Background
Along with the intellectualization of the equipment, most intelligent households can be controlled by voice at present, and the intelligent equipment is also provided with an electronic panel for displaying the change of control parameters, but the technology and the final implementation are difficult to meet the requirement of modern users on the intellectualization of the households.
The user wants to customize the image style and the UI interface of the electronic panel according to the intention and the aesthetic of the user, but no practical case exists at present, and the user can generate a desired pattern according to the voice of the user.
Disclosure of Invention
The application provides a method, a device, electronic equipment and a storage medium for generating patterns on an electronic panel, so as to solve the problem of generating patterns aiming at the voice of a user.
In a first aspect, the present application provides a method of generating a pattern on an electronic panel, the method comprising: acquiring voice data input by a user and a preset display effect of patterns; recognizing text information corresponding to the voice data through a voice recognition model; performing feature extraction and semantic understanding on the text information by adopting a text-based graph model, and generating patterns of the preset display effect corresponding to the text information; the pattern is displayed on an electronic panel of the smart device.
Optionally, after the pattern is displayed on the electronic panel, the method further comprises: acquiring a target control instruction issued by a user to intelligent equipment; determining the display effect of target elements in the pattern corresponding to the target control parameters in the target control instruction, wherein the pattern is composed of a plurality of elements; and adjusting the display effect of the target element in the electronic panel of the intelligent device while adjusting the intelligent device according to the target control parameter.
Optionally, determining the display effect of the target element in the pattern corresponding to the target control parameter in the target control instruction includes: acquiring the type of the pattern; determining pattern elements in the pattern according to the type of the pattern; determining a target element of a target type in the pattern corresponding to the target control parameter in the target control instruction according to a first corresponding relation between the type of the control parameter and the type of the pattern element; and determining the display effect of the target element corresponding to the adjustment direction of the target control parameter according to a second corresponding relation between the adjustment direction of the control parameter and the display effect of the pattern element.
Optionally, after the pattern is displayed on the electronic panel, the method further comprises: displaying pattern elements of the pattern on an interactive interface on the electronic panel, each type of pattern element comprising at least one element; determining target elements selected by a user, wherein the target elements are elements to be patterned selected from each pattern element; and combining a new pattern according to the target element, and replacing the original pattern in the electronic panel with the new pattern.
Optionally, the voice recognition model includes an acoustic model and a language model, and recognizing the text information corresponding to the voice data through the voice recognition model includes: identifying each Chinese character in the speech data as a Chinese syllable with a tone by the acoustic model based on a deep convolutional neural network, wherein the Chinese syllable comprises a Chinese pinyin corresponding to the Chinese character speech and a number corresponding to the tone of the Chinese character speech; the tonal Chinese syllables are converted into corresponding Chinese character information by the language model based on a transducer model.
Optionally, before converting the tonal chinese syllables into corresponding chinese character information by the language model based on a transducer model, the method further comprises: connecting a full connection layer and a softmax layer at the output part of the encoder of the transducer model; an encoder with the fully connected layer and the softmax layer is employed as the language model.
Optionally, the preset display effect includes at least one of a pattern style, a pattern parameter, and a dynamic and static effect.
In a second aspect, the present application provides an apparatus for generating a pattern on an electronic panel, the apparatus comprising: the acquisition module is used for acquiring voice data input by a user and a preset display effect of the pattern; the recognition module is used for recognizing the text information corresponding to the voice data through the voice recognition model; the generation module is used for carrying out feature extraction and semantic understanding on the text information by adopting a text-generated graph model, and generating patterns of the preset display effect corresponding to the text information; and the display module is used for displaying the pattern on the electronic panel of the intelligent equipment.
In a third aspect, the present application provides an electronic device, including: at least one communication interface; at least one bus connected to the at least one communication interface; at least one processor coupled to the at least one bus; at least one memory coupled to the at least one bus.
In a fourth aspect, the present application also provides a computer storage medium storing computer-executable instructions for performing the method of generating a pattern on an electronic panel according to any one of the above-described applications.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: after the user inputs voice data, the text information is obtained through voice recognition, then the text information is generated into patterns with preset display effects, the preset display effects are personalized and customized by the user according to the preference and the demand of the user, and thus the user can see personalized patterns matched with the preference and the demand of the user on the electronic panel.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.
Fig. 1 is a flowchart of a method for generating a pattern on an electronic panel according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a transducer model architecture according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a CLIP model according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a similarity matrix of a text-generated graph model according to an embodiment of the present disclosure;
FIG. 5 is an overall flow chart of a device panel system design provided in an embodiment of the present application;
FIG. 6 is an overall architecture diagram of a speech recognition model provided in an embodiment of the present application;
fig. 7 is a schematic structural diagram of an apparatus for generating a pattern on an electronic panel according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.
The following disclosure provides many different embodiments, or examples, for implementing different structures of the invention. In order to simplify the present disclosure, components and arrangements of specific examples are described below. They are, of course, merely examples and are not intended to limit the invention. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
The application provides a method for generating patterns on an electronic panel, which is applied to intelligent equipment, wherein the intelligent equipment comprises, but is not limited to, intelligent air conditioners, intelligent refrigerators and the like with display panels, and is used for generating customized patterns on the electronic panel of the intelligent equipment according to user voices, as shown in fig. 1, and the method comprises the following steps:
step 101: and acquiring the voice data input by the user and the preset display effect of the pattern.
The user sets up the preset bandwagon effect of pattern on intelligent device's electronic panel in advance, presets the bandwagon effect and includes at least one of pattern style, pattern parameter and dynamic and static effect, and pattern style includes abstract art, watercolor, sketch etc. and pattern parameter includes intensity, the thickness degree of line, the size etc. of pattern of colour, and dynamic and static effect includes static pattern and dynamic pattern, and dynamic pattern is the pattern even use AI technique to add dynamic effect for the pattern, like flowing ripple of water, gradual change color change etc..
After the user wakes up the intelligent device, voice data is sent out, the voice data can be parameter adjusting instructions of the intelligent device, and also can be simple instructions for generating xx patterns, a voice acquisition module is arranged in the intelligent device, and voice data input by the user is acquired through the voice acquisition module.
Step 102: and recognizing text information corresponding to the voice data through the voice recognition model.
The intelligent equipment is also integrated with a voice recognition model, and text information corresponding to the voice data is recognized through the voice recognition model. The speech recognition model can be an end-to-end speech recognition model constructed based on a transducer language model, or constructed based on other language models.
Step 103: and carrying out feature extraction and semantic understanding on the text information by adopting a text-generated graph model, and generating patterns of a preset display effect corresponding to the text information.
The intelligent device is also integrated with a text-generated graph model, and the text-generated graph model can be built based on CLIP (Contrastive Language-Image Pre-Training), performs feature extraction and semantic understanding on the text information, and generates patterns of preset display effects corresponding to the text information.
Step 104: the pattern is displayed on an electronic panel of the smart device.
The smart device displays the pattern on the electronic panel.
In the application, after the user inputs voice data, the user obtains text information through voice recognition, the text information is then generated into the pattern with the preset display effect, the preset display effect is personalized and customized by the user according to the preference and the demand of the user, and thus the user can see the personalized pattern matched with the preference and the demand of the user on the electronic panel.
As an alternative embodiment, after the pattern is displayed on the electronic panel, the method further comprises: acquiring a target control instruction issued by a user to intelligent equipment; determining the display effect of target elements in a pattern corresponding to target control parameters in a target control instruction, wherein the pattern is composed of a plurality of elements; and adjusting the display effect of the target element in the electronic panel of the intelligent device while adjusting the intelligent device according to the target control parameter.
The pattern is composed of a plurality of elements. After the pattern is displayed on the electronic panel, a user issues a target control instruction to the intelligent device, the intelligent device determines target control parameters in the target control instruction, and then determines the display effect of target elements in the pattern corresponding to the target control parameters, so that the display effect of the target elements in the electronic panel of the intelligent device can be adjusted while the intelligent device is adjusted according to the target control parameters, the display effect of the pattern can be changed when the user issues the control instruction, the scene effect of air-conditioning parameter adjustment is realized, the user is given a feeling of moving a illusion scene, and the experience of the user controlling the air conditioner is improved.
For example, if the pattern is water, the pattern element is water ripple, and when the user adjusts the amount of wind to become large, the water ripple in the pattern becomes large.
For example, if the pattern is a digital character, such as an air conditioning puck, increasing the wind speed may cause the hair and clothing of the air conditioning puck to fly with the wind, and increasing the temperature may cause its cheeks to turn red.
As an optional implementation manner, determining the display effect of the target element in the pattern corresponding to the target control parameter in the target control instruction includes: acquiring the type of the pattern; determining pattern elements in the pattern according to the type of the pattern; determining a target element of a target type in a pattern corresponding to the target control parameter in the target control instruction according to a first corresponding relation between the type of the control parameter and the type of the pattern element; and determining the display effect of the target element corresponding to the adjustment direction of the target control parameter according to the second corresponding relation between the adjustment direction of the control parameter and the display effect of the pattern element.
The pattern types include a plurality of types, such as water, a digital character, or snow, each type of bolus having a different pattern element. The process of determining the display effect of the pattern elements is as follows: the database stores a first corresponding relation between the type of the control parameter and the type of the pattern element, the intelligent equipment determines the type of the target element in the pattern according to the target control parameter and the first corresponding relation in the target control instruction, the database also stores a second corresponding relation between the adjusting direction of the control parameter and the display effect of the pattern element, and the intelligent equipment determines the display effect of the target element according to the adjusting direction of the target control parameter and the second corresponding relation.
If the pattern is a digital character, such as an air conditioning puck, the first correspondence includes: wind speed-hair and clothing of air conditioning demon, temperature-cheek of air conditioning demon, second correspondence includes: increasing the wind speed-the hair and clothes flutter with the wind, and increasing the temperature-the cheeks turn red.
As an alternative embodiment, the interactive interface on the electronic panel presents pattern elements of the pattern, each type of pattern element comprising at least one element; determining target elements selected by a user, wherein the target elements are elements to be patterned selected from each pattern element; and combining a new pattern according to the target elements, and replacing the original pattern in the electronic panel with the new pattern.
The interactive interface displays pattern elements with patterns, and each type of pattern element comprises at least one element, that is, each pattern element can have multiple choices to meet the preference of users. The user selects a target element from the pattern elements, then composes a new pattern based on the target element, and replaces the original pattern in the electronic panel with the new pattern, updating the appearance of the pattern.
If the pattern does not have a digital character, the digital character displayed by the electronic panel can be drawn by the applicant, or can be an image sketch generated by using an AI image generation model, and a user can also carry out personalized transformation on the digital character so as to meet the requirement of the user. Specifically, an interactive interface is provided on the electronic panel, and a user can adjust the pattern on the interactive interface.
The interactive interface displays pattern elements of the digital character, including the body, the head, the expression, the color and the action of the digital character, and each type of pattern element comprises at least one element, that is, each pattern element can have multiple choices to meet the preference of the user. The user selects a target element from the pattern elements, then composes a new digital character based on the target element, and updates the original digital character in the electronic panel by using the new digital character to update the appearance of the digital character.
The interface of the electronic panel of the development device and the interactive design interface are the display area of the pattern, and the interactive design interface is the area for DIY modification of the user. The user can select different elements such as head, body, expression and the like in the DIY area, and adjust the attributes such as color, size and the like, thereby defining the image of the air conditioning eidolon/digital person. The customized pattern design of the equipment panel is realized, the visual appeal of the panel is increased through artistic style generation, dynamic drawing effect and personalized interaction design, and the user experience is improved.
In addition, after the electronic panel generates the pattern, the user can edit and modify the pattern, and the pattern content can be adjusted by voice or an interactive interface on the electronic panel. The content of the adjustment includes, but is not limited to: pattern style selection, parameter setting and static and dynamic effect setting. For example, the user may adjust the parameter settings by switching different panel patterns by touching the patterns. If voice commands are used, the user can speak corresponding commands, such as "switch pattern", "pattern is enlarged", etc., and the air conditioner panel will respond and operate accordingly.
In addition, user interface testing and feedback collection can be performed on the electronic panel, and the user interaction interface is continuously improved and optimized according to user requirements and feedback.
The process of module design and module integration into the device is described below.
1. And (5) designing a voice acquisition module.
The method for realizing the voice acquisition module can comprise the following steps:
a. a voice input device, such as a microphone, is acquired.
b. A function or thread of sound recordings is designed to capture audio input using appropriate programming languages and libraries.
c. Setting the proper audio sample rate and format, PCM audio formats using sample rates of 16KHz or higher are often recommended.
d. Buffering the audio data for each speech input segment typically stores several seconds of audio in each buffer.
e. Audio processing techniques may be used for real-time noise suppression and echo cancellation.
2 end-to-end speech recognition model construction based on a transducer language model.
Constructing an end-to-end voice recognition model based on a transducer language model to perform voice recognition words, generally long-paragraph voice recognition words, and combining a specific denoising and gain method, wherein the method can be implemented according to the following steps:
a. a large number of speech data sets with corresponding text are collected and preprocessed.
b. A transducer model is constructed that includes a plurality of encoder and decoder layers, and associated self-attention mechanisms.
c. Feature extraction is performed on the speech data, and MFCC or other audio feature extraction methods may be used.
d. The audio features are input into a transducer model for training, and corresponding text sequences are output.
e. Model training is performed using a sequence-to-sequence training approach and optimization is performed using an appropriate penalty function.
f. When reasoning is performed, the speech features are input into the model and the most probable text sequence is decoded using an algorithm such as BeamSearch.
In the process of recognizing words by voice, specific denoising and gain methods, such as voice enhancement, noise reduction and signal gain, can be combined to improve the accuracy and robustness of voice recognition.
FIG. 2 is a schematic diagram of a transducer model architecture.
3. And (5) constructing a CLIP-based text-generated graph model.
The CLIP-based text-based graph model is used for semantic understanding and generating a proper pattern, and can be realized according to the following steps:
a. a training dataset is prepared comprising input text information and pattern data.
b. And (5) building a CLIP-based end-to-end model of the text-generated graph. The model takes text information as input, and generates proper pattern output through feature extraction and semantic understanding of various levels.
c. Training is performed using an open source algorithm to achieve end-to-end conversion of text to patterns.
d. In the reasoning stage, the input text information is used as the input of a model, and corresponding pattern data is generated.
Fig. 3 is a schematic diagram of a CLIP model. FIG. 4 is a schematic diagram of a text-generated graph model similarity matrix.
1.4 model integration into the equipment panel system.
The voice acquisition module, the voice recognition model and the meridional graph model are integrated into the equipment panel system, and can be carried out according to the following steps:
a. and designing a proper interface and a proper communication protocol, and connecting and data interacting the voice acquisition module with the voice recognition model.
b. The speech recognition model is embedded in the device panel system and integrated and co-operates with other functional modules.
c. A suitable hardware platform, such as an embedded processor or cloud server, is built for the system to support the deployment and operation of the model.
d. Suitable user interfaces, such as voice interactive interfaces and graphical interfaces, are configured for user interaction and operation with the system. Optimizing system performance and user experience, ensuring that the device panel system can accurately respond to voice control of a user and generate corresponding patterns.
FIG. 5 is an overall flow chart of a device panel system design.
The Transformer is a model for solving the sequence problem, is one of the models with the best effect in the field of natural language processing at present, and can not only obtain the dependency relationship between the words at the encoding end and the decoding end by using the Self-attribute structure, but also effectively acquire the dependency relationship between the words of the Transformer, and can capture the interdependence characteristics of sentences with longer distance.
The transducer model, like the traditional attention mechanism model, is divided into an Encoder and a Decode, and is mainly used in the field of machine translation to solve the problem of sequence mapping with a fixed length. As shown in fig. 2, the structure of the Decoder comprises an Encoder encoding module and a Decoder decoding module, and a plurality of small encoding and decoding modules are arranged in each encoding and decoding module. The robustness of the end-to-end model requires a sufficiently large training corpus to perform adequate training. Modifications to the acoustic modeling unit are contemplated. The Chinese language is a special syllable language with tone, and has a plurality of homonyms, so that a plurality of substitution errors can occur in a Chinese speech recognition system, and the end-to-end model lacks linguistic knowledge to correct, so that the model performance is poorer and worse.
Constructing an end-to-end voice recognition model based on a transducer language model, performing two-point improvement based on an original voice recognition model, firstly, combining the specificity of Chinese language, providing a syllable with tone as an acoustic modeling unit, and constructing an acoustic modeling unit based on a DCNN (Dynamic Convolution Neural Network, deep convolutional neural network) acoustic model; secondly, a language model is added in the process of transform coding, the acoustic model and the language model are subjected to shallow fusion in the process of decoding, and the overall recognition performance of the model is improved by introducing language information. An overall architecture diagram of the speech recognition model is shown in fig. 6.
Specifically, each Chinese character in the voice data is recognized as a Chinese syllable with tone through an acoustic model based on a deep convolutional neural network, wherein the Chinese syllable comprises a Chinese pinyin corresponding to the Chinese character voice and a number corresponding to the tone of the Chinese character voice, and then the Chinese syllable with tone is converted into corresponding Chinese character information through a language model based on a transducer model, so that the model recognition accuracy is improved.
For example, the input of the acoustic model is a section of audio or audio file, the output is a tonal Chinese syllable such as "da4jia1hao3", the input of the language model is a tonal Chinese syllable such as "ni3hao3", and the output is a corresponding Chinese character "hello".
The language model is mainly based on a transducer model, a full-connection layer and a Softmax layer are connected to an Encoder output part of the transducer model, and an Encoder with the full-connection layer and the Softmax layer is adopted as the language model.
In the training process of the model, syllables with tones are used as an acoustic modeling unit, the model is input into an audio file in wav format, and the model is output into more than 1000 Chinese phonetic alphabets with tones. The input layer word vector dimension of the language model is 256, the input layer word vector is initialized by using an Xavier, and the output layer size is the number of Chinese characters (at least 3000 Chinese characters). In a transform model, multi-head attention num_heads=8, 6 layers of self-attention modules are stacked for an encoder, batch_size is 100 in training, cross entropy function cross_entcopy is adopted as a final loss function of a network, in order to better learn the network to carry out label smoothing processing on labels, initial learning rate is set to be 0.0003, momentum is 0.9, weight is randomly initialized, and drop_rate is 0.2 in a training stage.
The voice recognition training is used for preprocessing the processed data to generate two dictionary pinyin dictionaries and a Chinese character dictionary, wherein the pinyin dictionaries are used for processing input data, converting the input data into digital data, mapping the test pinyin to a representation space by using the dictionary generated in the training stage in the testing stage, digitally representing tag data by using the Chinese character dictionary in the training stage, and mapping a prediction result from numbers to Chinese characters by using the Chinese character dictionary in the testing stage; and then, carrying out data preprocessing on the input data pinyin and the Chinese character labels to complete training.
The CLIP model is a pretrained neural network model published by OpenAI in the early 2021 for matching images and text, and is pretrained directly using a large amount of internet data, so that the best performance at present is achieved on many task performances.
In the current image classification field, trained models typically suffer from the following problems: the model requires the use of a large amount of formatted annotation data, which is often costly to obtain. The model has better effect in the current data set, but the generalization capability of the model is probably poorer, and the migration to a new training task is also difficult. Meanwhile, a large number of image text pairs exist on the Internet (in a webpage, a developer generally adds a text remark for a picture), and in fact, the materials can be used as marked data sets, and training is carried out by using the data sets, so that the problem of high cost of obtaining marked data can be solved, and meanwhile, a model with strong generalization capability is easier to obtain because the data volume on the Internet is larger and the difference of the data is larger.
Based on the above concepts, text and images are encoded separately using hundreds of millions of pairs of image text from open sources in a network, followed by training using metric training, with the goal of improving the similarity of images to text.
In the prediction stage, cosine similarity is calculated to obtain a predicted value, also by a series of generated text pairs and the target image, as shown in fig. 3.
The training process of CLIP is as follows:
(1) Data preprocessing, downloading and cleaning.
The ConceptualCaptures dataset used for model training is image text information data extracted from billions of Internet web pages by Google, and several types of filtering are performed, so that the dataset has higher quality and accuracy, and a total of about 300 thousands of test data and 8000 verification data.
The data set can be downloaded and uniformly stored through the python script, and after the data set is downloaded, the data set needs to be cleaned, and the downloaded empty picture or incomplete picture data can be used after being filtered. The downloading and cleaning of the part can be completed by simply writing the python script
(2) And (5) adjusting an open source code.
Before distributed training, adjustments need to be made to the open source code (patent end with annex open source website), the main adjustment content includes:
parameter adjustment, including super parameters, file paths, etc.; fine tuning an entry file, and introducing hf_env and hfai; using FFDataset to replace original Dataset; the cluster break logic is adapted.
Step1: the parameter adjusting part is simpler, and can adjust the file paths of the data set and the verification set, and parameters such as batch-size, epochs and the like according to own needs. Two parameters are added here for representing FFRecord file addresses of the training set and the validation set, respectively.
Step2: codes of hf_env, hfai are introduced.
Step3: to improve training performance, FFDataset is used instead of Dataset of torch, here given an implementation code, using the newly added parameters above.
Step4: because the task scheduling rule is time-sharing priority scheduling, that is to say, after the task is submitted, the task may be interrupted and suspended by the platform, the task interruption may actually force the current process to be ended, and the initialization logic may be executed again when the task is restarted, so that the task needs to be saved in time in order to resume training after the interruption, and the history record before loading continues training in the case of task recovery, and meanwhile, the model should be saved after the whole training is ended.
Through the above processing, the tasks can be verified and operated.
(3) The test runs and submits the training.
Test runs were performed (generally, it is recommended to test runs first, and then submit training after no obvious problems). The test run will print the run information directly at the current terminal.
(4) And (5) verifying the model.
For the verification test of the CLIP model, various modes are provided for the openAI open source document, such as zero-shot verification by using an ImageNet data set, and the like, and the open AI open source document can be developed.
To facilitate understanding of the model, a comparative visualization of the type referred to by CLIP is presented for simple verification. Some pictures and text information of the skimage are used, the similarity is calculated by using a trained model, and then the similarity is printed in a two-dimensional matrix mode, as shown in fig. 4.
The application provides a device for generating a pattern on an electronic panel, as shown in fig. 7, the device comprises:
the acquiring module 701 is configured to acquire preset display effects of voice data and patterns input by a user;
the recognition module 702 is configured to recognize text information corresponding to the voice data through a voice recognition model;
the generating module 703 is configured to perform feature extraction and semantic understanding on the text information by using a text-generated graph model, and generate a pattern of a preset display effect corresponding to the text information;
and the display module 704 is used for displaying the pattern on the electronic panel of the intelligent device.
Optionally, the apparatus is further configured to:
acquiring a target control instruction issued by a user to intelligent equipment;
determining the display effect of target elements in a pattern corresponding to target control parameters in a target control instruction, wherein the pattern is composed of a plurality of elements;
and adjusting the display effect of the target element in the electronic panel of the intelligent device while adjusting the intelligent device according to the target control parameter.
Optionally, the apparatus is further configured to:
acquiring the type of the pattern;
determining pattern elements in the pattern according to the type of the pattern;
determining a target element of a target type in a pattern corresponding to the target control parameter in the target control instruction according to a first corresponding relation between the type of the control parameter and the type of the pattern element;
and determining the display effect of the target element corresponding to the adjustment direction of the target control parameter according to the second corresponding relation between the adjustment direction of the control parameter and the display effect of the pattern element.
Optionally, the apparatus is further configured to:
displaying pattern elements of a pattern at an interactive interface on the electronic panel, each type of pattern element comprising at least one element;
determining target elements selected by a user, wherein the target elements are elements to be patterned selected from each pattern element;
and combining a new pattern according to the target elements, and replacing the original pattern in the electronic panel with the new pattern.
Optionally, the speech recognition model includes an acoustic model and a language model, and the recognition module 702 is configured to:
identifying each Chinese character voice in voice data as a Chinese syllable with a tone through an acoustic model based on a deep convolutional neural network, wherein the Chinese syllable comprises a Chinese pinyin corresponding to the Chinese character voice and a number corresponding to the tone of the Chinese character voice;
the Chinese syllables with tone are converted into corresponding Chinese character information through a language model based on a transducer model.
Optionally, the apparatus is further configured to:
connecting a full connection layer and a softmax layer at the output part of an encoder of the transducer model;
an encoder with a full connectivity layer and a softmax layer was used as a language model.
Optionally, the preset display effect includes at least one of a pattern style, a pattern parameter, and a dynamic and static effect.
As shown in fig. 8, an embodiment of the present application provides an electronic device, which includes a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete communication with each other through the communication bus 804.
A memory 803 for storing a computer program.
In one embodiment of the present application, the processor 801 is configured to implement the method for generating a pattern on an electronic panel provided in any one of the foregoing method embodiments when executing a program stored in the memory 803.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of generating a pattern on an electronic panel as provided in any of the method embodiments described above.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, or may be implemented by hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the related art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method described in the respective embodiments or some parts of the embodiments.
It is to be understood that the terminology used herein is for the purpose of describing particular example embodiments only, and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "includes," "including," and "having" are inclusive and therefore specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order described or illustrated, unless an order of performance is explicitly stated. It should also be appreciated that additional or alternative steps may be used.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of creating a pattern on an electronic panel, the method comprising:
acquiring voice data input by a user and a preset display effect of patterns;
recognizing text information corresponding to the voice data through a voice recognition model;
performing feature extraction and semantic understanding on the text information by adopting a text-based graph model, and generating patterns of the preset display effect corresponding to the text information;
the pattern is displayed on an electronic panel of the smart device.
2. The method of claim 1, wherein after displaying the pattern on the electronic panel, the method further comprises:
acquiring a target control instruction issued by a user to intelligent equipment;
determining the display effect of target elements in the pattern corresponding to the target control parameters in the target control instruction, wherein the pattern is composed of a plurality of elements;
and adjusting the display effect of the target element in the electronic panel of the intelligent device while adjusting the intelligent device according to the target control parameter.
3. The method of claim 2, wherein determining a presentation effect of a target element in the pattern corresponding to a target control parameter in the target control instruction comprises:
acquiring the type of the pattern;
determining pattern elements in the pattern according to the type of the pattern;
determining a target element of a target type in the pattern corresponding to the target control parameter in the target control instruction according to a first corresponding relation between the type of the control parameter and the type of the pattern element;
and determining the display effect of the target element corresponding to the adjustment direction of the target control parameter according to a second corresponding relation between the adjustment direction of the control parameter and the display effect of the pattern element.
4. The method of claim 1, wherein after displaying the pattern on the electronic panel, the method further comprises:
displaying pattern elements of the pattern on an interactive interface on the electronic panel, each type of pattern element comprising at least one element;
determining target elements selected by a user, wherein the target elements are elements to be patterned selected from each pattern element;
and combining a new pattern according to the target element, and replacing the original pattern in the electronic panel with the new pattern.
5. The method of claim 1, wherein the speech recognition model comprises an acoustic model and a language model, and wherein recognizing text information corresponding to the speech data by the speech recognition model comprises:
identifying each Chinese character in the speech data as a Chinese syllable with a tone by the acoustic model based on a deep convolutional neural network, wherein the Chinese syllable comprises a Chinese pinyin corresponding to the Chinese character speech and a number corresponding to the tone of the Chinese character speech;
the tonal Chinese syllables are converted into corresponding Chinese character information by the language model based on a transducer model.
6. The method of claim 5, wherein prior to converting the tonal chinese syllables into corresponding chinese character information by the language model based on a transducer model, the method further comprises:
connecting a full connection layer and a softmax layer at the output part of the encoder of the transducer model;
an encoder with the fully connected layer and the softmax layer is employed as the language model.
7. The method of claim 1, wherein the preset presentation effect comprises at least one of a pattern style, a pattern parameter, and a dynamic and static effect.
8. An apparatus for creating a pattern on an electronic panel, the apparatus comprising:
the acquisition module is used for acquiring voice data input by a user and a preset display effect of the pattern;
the recognition module is used for recognizing the text information corresponding to the voice data through the voice recognition model;
the generation module is used for carrying out feature extraction and semantic understanding on the text information by adopting a text-generated graph model, and generating patterns of the preset display effect corresponding to the text information;
and the display module is used for displaying the pattern on the electronic panel of the intelligent equipment.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method of any of claims 1-7 when executing a program stored on a memory.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-7.
CN202311239713.9A 2023-09-22 2023-09-22 Method and device for generating pattern on electronic panel, electronic equipment and storage medium Pending CN117273147A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311239713.9A CN117273147A (en) 2023-09-22 2023-09-22 Method and device for generating pattern on electronic panel, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311239713.9A CN117273147A (en) 2023-09-22 2023-09-22 Method and device for generating pattern on electronic panel, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117273147A true CN117273147A (en) 2023-12-22

Family

ID=89215563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311239713.9A Pending CN117273147A (en) 2023-09-22 2023-09-22 Method and device for generating pattern on electronic panel, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117273147A (en)

Similar Documents

Publication Publication Date Title
JP7194779B2 (en) Speech synthesis method and corresponding model training method, device, electronic device, storage medium, and computer program
JP7464621B2 (en) Speech synthesis method, device, and computer-readable storage medium
US11837216B2 (en) Speech recognition using unspoken text and speech synthesis
KR102484967B1 (en) Voice conversion method, electronic device, and storage medium
CN111667816B (en) Model training method, speech synthesis method, device, equipment and storage medium
CN109859736B (en) Speech synthesis method and system
CN110556100B (en) Training method and system of end-to-end speech recognition model
KR102523797B1 (en) Method and apparatus for registering properties of voice synthesis model, electronic equipment, storage medium, and computer program product
US11488577B2 (en) Training method and apparatus for a speech synthesis model, and storage medium
KR20210146368A (en) End-to-end automatic speech recognition for digit sequences
CN112802448A (en) Speech synthesis method and system for generating new tone
JP7228998B2 (en) speech synthesizer and program
CN112184859B (en) End-to-end virtual object animation generation method and device, storage medium and terminal
CN113450758B (en) Speech synthesis method, apparatus, device and medium
CN117710533B (en) Music conditional dance animation generation method based on diffusion model
KR20240068704A (en) Contrast Siamese networks for semi-supervised speech recognition.
CN115101046A (en) Method and device for synthesizing voice of specific speaker
CN117219052A (en) Prosody prediction method, apparatus, device, storage medium, and program product
CN115905485A (en) Common-situation conversation method and system based on common-sense self-adaptive selection
CN113628609A (en) Automatic audio content generation
CN117496972B (en) Audio identification method, audio identification device, vehicle and computer equipment
CN117690456A (en) Small language spoken language intelligent training method, system and equipment based on neural network
CN115116443A (en) Training method and device of voice recognition model, electronic equipment and storage medium
CN113012681A (en) Awakening voice synthesis method based on awakening voice model and application awakening method
CN117273147A (en) Method and device for generating pattern on electronic panel, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination