CN117479127A - 5G-based intelligent terminal and method for Internet of vehicles - Google Patents

5G-based intelligent terminal and method for Internet of vehicles Download PDF

Info

Publication number
CN117479127A
CN117479127A CN202311804178.7A CN202311804178A CN117479127A CN 117479127 A CN117479127 A CN 117479127A CN 202311804178 A CN202311804178 A CN 202311804178A CN 117479127 A CN117479127 A CN 117479127A
Authority
CN
China
Prior art keywords
network
network request
training
logarithmic mel
voice signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311804178.7A
Other languages
Chinese (zh)
Inventor
赖海平
章洪亮
龚利恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhangrui Electronic Co ltd
Original Assignee
Shenzhen Zhangrui Electronic Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhangrui Electronic Co ltd filed Critical Shenzhen Zhangrui Electronic Co ltd
Priority to CN202311804178.7A priority Critical patent/CN117479127A/en
Publication of CN117479127A publication Critical patent/CN117479127A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/2869Terminals specially adapted for communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/02Terminal devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A5G-based intelligent terminal for Internet of vehicles and a method thereof are disclosed. This car networking intelligent terminal includes: the Modem module is used for connecting a 5G network to provide network connection and communication functions; the local control program writing module is used for writing a local control program to realize control of network connection and communication functions; a network manager for managing network connections and network configurations; the network connection server is used for processing network connection services; and the 5G-Tbox network control terminal is used for providing network requests and Modem telephone voice services. In this way, more intelligent and convenient vehicle network services can be provided, and further development of the internet of vehicles technology is promoted.

Description

5G-based intelligent terminal and method for Internet of vehicles
Technical Field
The application relates to the field of internet of vehicles, and more particularly relates to an internet of vehicles intelligent terminal and method based on 5G.
Background
The intelligent terminal of the Internet of vehicles is equipment for connecting vehicles with the Internet to realize information exchange and data transmission between the vehicles and users. With the rapid development of 5G technology and the popularization of the Internet of vehicles, the Internet of vehicles intelligent terminal based on 5G becomes a popular field of the automobile industry. The intelligent terminal can realize high-speed communication and data transmission between the vehicle and the Internet by connecting with the 5G network, and provides more intelligent functions and services for the vehicle.
However, the network control unit of the existing vehicle-mounted intelligent cockpit system needs to provide services from the Android request. That is, in existing vehicle-mounted intelligent cockpit systems, the 5G network control flow is often excessively coupled with the Android system. This means that the network control of the system depends on the stability and robustness of the Android system. If the Android system has problems, the problems that the network control flow of the whole intelligent cabin system is abnormal, the network cannot be accessed normally and the like can be caused.
Accordingly, an optimized 5G-based internet of vehicles intelligent terminal is desired.
Disclosure of Invention
The present application has been made in order to solve the above technical problems. The embodiment of the application provides a 5G-based intelligent terminal for the Internet of vehicles and a method. The system can provide more intelligent and convenient vehicle network service and promote the further development of the vehicle networking technology.
According to one aspect of the present application, there is provided a 5G-based internet of vehicles intelligent terminal, which includes:
the Modem module is used for connecting a 5G network to provide network connection and communication functions;
the local control program writing module is used for writing a local control program to realize control of network connection and communication functions;
a network manager for managing network connections and network configurations;
the network connection server is used for processing network connection services; and
and the 5G-Tbox network control terminal is used for providing network requests and Modem telephone voice services.
According to another aspect of the present application, there is provided a management method of a 5G-based internet of vehicles intelligent terminal, including:
acquiring a network request voice signal input by a user;
performing voice recognition on the network request voice signal to obtain a network request text description;
the network request text description passes through a semantic replenishment generator based on an AIGC model to obtain an augmented network request text description; and
and generating a network request instruction based on the extended network request text description.
Compared with the prior art, the application provides a 5G-based intelligent terminal of the Internet of vehicles and a method thereof, and the intelligent terminal of the Internet of vehicles comprises: the Modem module is used for connecting a 5G network to provide network connection and communication functions; the local control program writing module is used for writing a local control program to realize control of network connection and communication functions; a network manager for managing network connections and network configurations; the network connection server is used for processing network connection services; and the 5G-Tbox network control terminal is used for providing network requests and Modem telephone voice services. In this way, more intelligent and convenient vehicle network services can be provided, and further development of the internet of vehicles technology is promoted.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly introduced below, which are not intended to be drawn to scale in terms of actual dimensions, with emphasis on illustrating the gist of the present application.
Fig. 1 is a schematic block diagram of a 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application.
Fig. 2 is a schematic block diagram of the network manager in the 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application.
Fig. 3 is a schematic block diagram of the voice recognition module in the 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application.
Fig. 4 is a schematic block diagram of the voice signal feature extraction unit in the 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application.
Fig. 5 is a schematic block diagram of a training module further included in the 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application.
Fig. 6 is a flowchart of a management method of a 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application.
Fig. 7 is a schematic diagram of a system architecture of a management method of a 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application.
Fig. 8 is an application scenario diagram of a 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some, but not all embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present application without making any inventive effort, are also within the scope of the present application.
As used in this application and in the claims, the terms "a," "an," "the," and/or "the" are not specific to the singular, but may include the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
Although the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.
Flowcharts are used in this application to describe the operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Aiming at the problem of excessive coupling of 5G network control to Android, the 5G network control and the Android decoupling are separated by realizing the joint management network of a network manager and a connectivityService in a loose coupling mode. By using technologies such as a built-in Modem module, native control program and the like, the network manager and connectivityService jointly manage the network. Therefore, even if the Android system has problems, the 5G-Tbox network control terminal can still provide functions such as network requests and modem telephone voice services, and stability and reliability of the system are guaranteed. By decoupling the 5G network control and the Android system, the influence of the Android system fault on the whole intelligent cabin system can be reduced, independent network control capability is provided, and the system can normally access the network even if the Android system is abnormal.
Fig. 1 is a schematic block diagram of a 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application. As shown in fig. 1, a 5G-based internet of vehicles intelligent terminal 100 according to an embodiment of the present application includes: a Modem module 110 for connecting to a 5G network to provide network connection and communication functions; a local control program writing module 120 for writing a local control program to realize control of network connection and communication functions; a network manager 130 for managing network connections and network configurations; a network connection server 140 for processing services of the network connection; and a 5G-Tbox network control terminal 150 for providing network requests and Modem telephony voice services.
In particular, in the technical scheme of the application, the 5G-Tbox network control terminal is used for providing network requests and Modem telephone voice services. The system can process network requests of users, such as sending and receiving data, and can also provide Modem telephone voice service to realize voice communication function in the vehicle. In existing internet of vehicles intelligent terminals, users often need to send network requests, such as navigation instructions, query information, etc., through voice input. How to accurately recognize and understand a user's network request voice signal becomes a technical challenge.
In order to further optimize the original scheme of the intelligent terminal of the internet of vehicles, the original network request voice signals input by the user are required to be subjected to voice recognition analysis so as to expand the instruction of the original network request, so that the network request requirements of the user can be fully understood, and the corresponding control can be timely carried out to improve the satisfaction degree of the user.
Fig. 2 is a schematic block diagram of the network manager 130 of the 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application. As shown in fig. 2, according to an embodiment of the present application, the network manager 130 includes: an input signal acquisition module 131 for acquiring a network request voice signal input by a user; a voice recognition module 132, configured to perform voice recognition on the network request voice signal to obtain a network request text description; a semantic replenishment module 133 for passing the web request text description through an AIGC model-based semantic replenishment generator to obtain an augmented web request text description; and a network request instruction generation module 134, configured to generate a network request instruction based on the extended network request text description.
Specifically, in the technical scheme of the present application, first, a network request voice signal input by a user is acquired. Further, the network request voice signal is subjected to voice recognition to obtain a network request text description. It should be appreciated that in a practical application scenario, the vehicle interior environment may be subject to various noise sources, such as engine noise, vehicle driving sound, air conditioning sound, and the like. These noises may be mixed into the network request voice signal sent by the user, resulting in degradation of the signal quality and affecting the voice recognition effect. Therefore, the network request voice signal needs to be subjected to noise reduction processing to obtain a noise-reduced network request voice signal. Noise components can be effectively removed by carrying out noise reduction processing on the network request voice signals, and main information of the voice signals is reserved. Therefore, the subsequent feature extraction and voice recognition algorithm can more accurately analyze and recognize the voice signal, and the recognition performance of the system is improved.
Then, for the noise-reduced network request voice signal, in order to better perform voice recognition detection on the voice signal, spectral feature analysis is required to perform voice recognition on the voice signal and convert the voice signal into a corresponding text description. Moreover, it is also considered that since logarithmic mel spectrogram is a common acoustic feature representation method, it combines mel frequency scale and logarithmic compression, and can better simulate the manner in which human ears perceive tones. Therefore, in the technical scheme of the application, the log mel spectrogram of the network request voice signal after noise reduction is further extracted. In particular, the log mel-graph can divide the noise-reduced network-requested speech signal into a series of time windows in the frequency domain, and calculate the spectral energy distribution in each window, thereby facilitating the spectral analysis of the speech signal.
In speech recognition tasks, log mel-strams are a common form of spectral features representing speech signals. When the analysis of the voice signal of the network request is actually performed, the key instruction feature information in the voice spectrum feature of the network request input by the user should be focused on the spatial position, so in order to further extract the useful information in the log-mel spectrogram, in the technical scheme of the application, the log-mel spectrogram is obtained by using a convolutional neural network model of a spatial attention mechanism. In this way, by using the convolutional neural network model of the spatial attention mechanism, local features in time and frequency dimensions of the log-mel spectrogram can be extracted, and meanwhile important spectral region feature information related to voice signal features can be focused more, so that voice recognition detection of a user is facilitated.
In order to more accurately identify the network request voice of the user, so as to fully supplement the network request command, in the technical scheme of the application, the logarithmic mel spectrogram characteristic matrix and the logarithmic mel spectrogram are fused by utilizing a residual thought, so that the original spectrum information of the network request voice signal and the spectrum characteristic information focused on an important area in the network request voice signal are fused, and a multi-scale logarithmic mel spectrogram is obtained, which is favorable for carrying out subsequent voice identification tasks.
And then, the optimized multi-scale logarithmic mel spectrogram is passed through a voice recognizer to obtain the network request text description. Specifically, the multi-scale logarithmic mel spectrogram can convert the frequency spectrum characteristics of the original voice signal into corresponding network request text descriptions through the voice recognizer so as to realize conversion from the voice signal to the text, thereby better understanding the network request of the user.
Further, after performing voice recognition on the network request voice signal to obtain the network request text description, processing the network request text description by using a semantic replenishment generator based on an AIGC model to obtain an extended network request text description. It should be understood that, the network request text description is input into the AIGC model, the extended network request text description may be obtained by extending the network request text description through the encoding capability of the model, and the network request instruction is generated based on the extended network request text description. Therefore, the voice recognition and understanding of the user network request can be fully performed, so that corresponding actions can be timely performed, and the satisfaction degree of the user can be improved. It should be noted that the AIGC model is a semantic complementary generator based on artificial intelligence technology, and is generally called Adaptive Information Generation and Comprehension (adaptive information generation and understanding) model, which is a model for natural language processing, and aims to expand semantic information of text by encoding and generating input text. The main role of the AIGC model is to understand the semantics of text by learning a large amount of language data and to generate supplementary information related to the input text. According to the input network request text description, the expanded network request text description can be generated by utilizing the coding capacity inside the network request text description, so that the semantic information of the original text is enriched. The augmented network request text description generated by the AIGC model may be used for further processing, such as generating network request instructions. Thus, the system can more comprehensively understand the network request of the user, so that the corresponding action can be more accurately made, and the satisfaction degree of the user is improved. The application of the AIGC model in speech recognition and understanding can provide more semantic information by expanding the text description of the network request, so that the understanding capability of the system to the user request is improved, and the user experience is further improved.
Accordingly, as shown in fig. 3, the voice recognition module 132 includes: the voice signal noise reduction unit 1321 is configured to perform noise reduction processing on the network request voice signal to obtain a noise-reduced network request voice signal; a voice signal feature extraction unit 1322, configured to perform signal processing on the noise-reduced network request voice signal to obtain a voice signal feature; a network request text generation unit 1323 is configured to determine the network request text description based on the speech signal feature.
More specifically, as shown in fig. 4, the speech signal feature extraction unit 1322 includes: the voice signal spectrum analysis subunit 13221 is configured to extract a log mel spectrogram of the noise-reduced network request voice signal; a voice signal spectrum feature extraction subunit 13222, configured to perform feature extraction on the logarithmic mel-spectrogram through a feature extractor based on a deep neural network model to obtain a logarithmic mel-spectrogram feature matrix; and a residual feature fusion unit 13223, configured to fuse the logarithmic mel-spectrogram feature matrix and the logarithmic mel-spectrogram by using a residual idea to obtain a multi-scale logarithmic mel-spectrogram as the speech signal feature. It should be understood that the residual idea refers to a technique of introducing residual connections in a deep neural network, the basic idea of which is to pass the input directly to the output layer of the network by adding a jump connection in the network, thus enabling the network to learn the residual information better. In the voice signal feature extraction, a residual feature fusion unit fuses a logarithmic mel spectrogram feature matrix and a logarithmic mel spectrogram by utilizing a residual thought to obtain a multi-scale logarithmic mel spectrogram as a feature of a voice signal. Through residual connection, the network can learn the details and changes of the voice signals better, and the expression capability of the features is improved. Residual feature fusion can help to improve the representation of speech signals, improving the accuracy and robustness of speech recognition. By fusing the multi-scale logarithmic mel spectrograms, voice features on different time scales can be captured, and more comprehensive information is provided to support subsequent voice recognition tasks. The introduction of residual ideas and the operation of residual feature fusion can enhance the representation capability of voice signal features and improve the performance of voice recognition.
More specifically, the deep neural network model is a convolutional neural network model using a spatial attention mechanism. It is worth mentioning that the spatial attention mechanism is a variant of the attention mechanism for processing sequence data. In conventional attention mechanisms, the attention weight is obtained by calculating the correlation of each position in the input sequence with the target position. Whereas in the spatial attention mechanism, the attention weights are derived by calculating the correlation of each position in the input sequence with all other positions. The role of the spatial attention mechanism is to capture global dependency information in the sequence data. Traditional attention mechanisms may only focus on local information related to the target location, while ignoring relationships between other locations. The spatial attention mechanism can more comprehensively capture the correlation information in the sequence data by calculating the correlation among all the positions, so that the performance of the model is improved. In speech signal processing, the spatial attention mechanism may be applied in speech recognition tasks. By using a convolutional neural network model of the spatial attention mechanism, the features of the speech signal can be better extracted and an accurate network request text description generated. The spatial attention mechanism can help the model better capture the correlation between different positions when processing the voice signal, thereby improving the accuracy and performance of voice recognition.
More specifically, the network request text generation unit 1323 is configured to: optimizing each characteristic value of the multi-scale logarithmic mel spectrogram to obtain an optimized multi-scale logarithmic mel spectrogram; and the optimized multi-scale logarithmic mel spectrogram is passed through a voice recognizer to obtain the network request text description.
In the technical scheme of the application, when the logarithmic mel spectrogram characteristic matrix and the logarithmic mel spectrogram are fused by utilizing a residual thought to obtain a multi-scale logarithmic mel spectrogram, the logarithmic mel spectrogram is obtained by carrying out frequency domain analysis based on short-time Fourier transform on the network request voice signal, and the logarithmic mel spectrogram characteristic matrix is obtained by carrying out explicit space coding based on a convolution kernel on the logarithmic mel spectrogram, so that a significant difference of characteristic expression exists between a high-dimensional characteristic manifold of the logarithmic mel spectrogram characteristic matrix in a high-dimensional characteristic space and the logarithmic mel spectrogram, namely, a dimensional alignment deviation exists between the logarithmic mel spectrogram characteristic matrix and the logarithmic mel spectrogram. Based on the above, if a traditional residual calculation mode (calculating a weighted sum according to positions between the logarithmic mel spectrogram feature matrix and the logarithmic mel spectrogram) is directly adopted to fuse the logarithmic mel spectrogram feature matrix and the logarithmic mel spectrogram, the obtained multi-scale logarithmic mel spectrogram has multi-dimensional multi-depth associated semantic feature distribution.
Therefore, considering the variability between the multi-dimensional multi-depth associated semantic feature distributions of the multi-scale log-mel spectrogram, the distribution sparsification of the associated semantic feature distribution of the multi-scale log-mel spectrogram and the dimension subset is caused, so that when the multi-scale log-mel spectrogram is subjected to quasi probability regression mapping through a voice recognizer, the convergence of probability density distribution of regression probability of each feature value of the multi-scale log-mel spectrogram is poor, and the accuracy of network request text description obtained through the voice recognizer is affected.
Therefore, preferably, optimizing each eigenvalue of the multi-scale logarithmic mel-spectrogram, specifically expressed as optimizing each eigenvalue of the multi-scale logarithmic mel-spectrogram to obtain an optimized multi-scale logarithmic mel-spectrogram, includes: optimizing each characteristic value of the multi-scale logarithmic mel spectrogram by using the following optimization formula to obtain the optimized multi-scale logarithmic mel spectrogram; wherein, the optimization formula is:
wherein,is the multi-scale logarithmic mel-pattern, < >>And->Is the multi-scale logarithmic Mel spectrum ++>Is>And->Characteristic value, and->Is the multi-scale logarithmic Mel spectrum ++>Global feature mean,/, of>Is the +.f. of the optimized multiscale logarithmic mel spectrum>And characteristic values.
Specifically, for the multi-scale logarithmic mel-profileLocal probability density mismatch of probability density distribution in probability space caused by sparse distribution in high-dimensional feature space, and imitating the multi-scale logarithmic Mel spectrogram ∈by regularized global self-consistent class coding>Global self-consistent relation of coding behaviors of high-dimensional features in probability space to adjust error landscapes of feature manifold in high-dimensional open space domain and realize ∈of multi-scale logarithmic mel spectrogram ∈ ->Is used for encoding the self-consistent matching type embedded in the explicit probability space, thereby improving the multi-scale logarithmic mel spectrogram ++>The convergence of the probability density distribution of the regression probabilities of (c) improves the accuracy of the text description of the network request that it gets through the speech recognizer.
Further, in the technical solution of the present application, the 5G-based internet of vehicles intelligent terminal further includes a training module for training the convolutional neural network model using the spatial attention mechanism, the speech recognizer, and the semantic replenishment generator based on the AIGC model. It should be understood that the training module refers to a module for training various components in the 5G-based internet of vehicles intelligent terminal. In the speech recognition technology, a training module plays a vital role, and the training module is used for training the model through a large amount of marking data and corresponding algorithms so that the model can better understand and process speech signals. Specifically, in the technical solution of the present application, the training module is configured to train a convolutional neural network model using a spatial attention mechanism, a speech recognizer, and a semantic replenishment generator based on an AIGC model. Through the training module, the models can learn and optimize parameters gradually, and the recognition and understanding capability of the models on the voice signals is improved. Training modules typically require a large amount of annotation data, including speech signals and corresponding text or commands. By inputting these data into the model and comparing the model output with the labeling data, the model parameters can be adjusted by the back propagation algorithm to gradually approach the optimal state. The training module aims to enable the model to learn the characteristics and modes of the voice signal through the continuous iterative training process, and improve the recognition accuracy and robustness of the voice signal. The effective training of the training module can improve the performance of the whole system and provide more reliable and efficient voice recognition capability for the intelligent terminal of the Internet of vehicles based on 5G.
In one specific example, as shown in fig. 5, the training module 200 includes: a training data acquisition unit 210 for acquiring training data including a training network request voice signal input by a user and a real description of the extended network request text description; a training signal noise reduction processing unit 220, configured to perform noise reduction processing on the training network request voice signal to obtain a trained noise-reduced network request voice signal; the training signal spectrum analysis unit 230 is configured to extract a training log mel spectrogram of the trained and noise-reduced network request voice signal; a training signal spectrum feature extraction unit 240, configured to pass the training log mel spectrogram through the convolutional neural network model using a spatial attention mechanism to obtain a training log mel spectrogram feature matrix; a matrix expansion unit 250, configured to expand the training log mel spectrogram feature matrix into a log mel spectrogram expansion feature vector, and expand the training log mel spectrogram into a log mel spectrogram expansion vector; a common manifold implicit similarity loss unit 260 for calculating a common manifold implicit similarity factor of the logarithmic mel-spectrum expansion feature vector and the logarithmic mel-spectrum expansion vector to obtain a common manifold implicit similarity loss function value; and a model training unit 270 for training the convolutional neural network model using the spatial attention mechanism, the speech recognizer, and the AIGC model-based semantic replenishment generator based on the common manifold implicit similarity loss function value and traveling in a gradient descent direction.
Particularly, in the technical scheme of the application, dimension alignment deviation exists between the training log mel spectrogram characteristic matrix and the training log mel spectrogram. Based on the above, if the training log mel spectrogram feature matrix and the training log mel spectrogram are fused by directly adopting a traditional residual calculation mode (calculating the weighted sum according to the position between the training log mel spectrogram feature matrix and the training log mel spectrogram), the obtained high-dimensional feature manifold expression of the training multi-scale log mel spectrogram has poor geometric monotonicity, and the accuracy of subsequent voice recognition is affected.
Based on this, the applicant of the present application developed feature vectors for the logarithmic mel-spectrum obtained after the training of the logarithmic mel-spectrum feature matrix, for example, denoted asAnd a logarithmic Mel spectrogram expansion vector obtained by expanding the training logarithmic Mel spectrogram, for example, denoted as +.>A common manifold implicit similarity factor is introduced as a loss function.
Accordingly, in a specific example, the common manifold implicit similarity loss unit 260 is configured to: calculating a common manifold implicit similarity factor of the logarithmic mel-spectrum expansion feature vector and the logarithmic mel-spectrum expansion vector by using the following loss formula to obtain the common manifold implicit similarity loss function value; wherein, the loss formula is:
wherein,and->The logarithmic mel-pattern expansion feature vector and the logarithmic mel-pattern expansion vector, +.>Representing a transpose operation->Representing the two norms of the vector, and +.>Representing the square root of the Frobenius norm of the matrix, the logarithmic Mel spectrum expansion feature vector and the logarithmic Mel spectrum expansion vector are both in the form of column vectors, and->For the weight super parameter, ++>Representing vector multiplication, ++>Representing multiplication by location +.>Representing difference by position +.>Representing the common manifold implicit similarity loss function value.
Here, the common manifold implicit similarity factor may develop feature vectors with the log mel-spectrumAnd said logarithmic mel-pattern expansion vector +.>The structured association between them represents the common manifold of the respective feature manifold in the cross dimension and shares the logarithmic mel-spectrum expansion feature vector ++>And said logarithmic mel-pattern expansion vector +.>And (3) common constraints of manifold structural factors such as variability, correspondence, relevance and the like, so as to measure distribution similarity of geometric derivative structure representation depending on common manifold, so as to promote geometric monotonicity of the training log-mel spectrogram characteristic matrix relative to high-dimensional characteristic manifold expression of the training log-mel spectrogram, and improve the expression effect of the training multi-scale log-mel spectrogram. Therefore, the network state can be monitored in real time and the grade label of the network state can be displayed, so that the network state of the Internet of vehicles can be judged more intuitively, and the operation that a user needs to stably connect under the condition of poor network condition is avoided. Through the mode, the satisfaction degree of the user on the intelligent terminal of the Internet of vehicles can be improved, measures are timely taken to solve the network problem, and the normal operation of the Internet of vehicles is ensured. Therefore, the voice recognition and understanding of the user network request can be fully performed, and corresponding actions can be timely taken to improve the satisfaction of the user.
In summary, the 5G-based internet of vehicles intelligent terminal 100 according to the embodiments of the present application is illustrated, which can provide more intelligent and convenient vehicle network services, and promote further development of internet of vehicles technology.
As described above, the 5G-based internet of vehicles intelligent terminal 100 according to the embodiment of the present application may be implemented in various terminal devices, for example, a server or the like having a management algorithm of the 5G-based internet of vehicles intelligent terminal according to the embodiment of the present application. In one example, the 5G-based internet of vehicles intelligent terminal 100 according to embodiments of the present application may be integrated into the terminal device as a software module and/or hardware module. For example, the 5G-based internet of vehicles intelligent terminal 100 according to the embodiments of the present application may be a software module in the operating system of the terminal device, or may be an application program developed for the terminal device; of course, the 5G-based internet of vehicles intelligent terminal 100 according to the embodiments of the present application may also be one of numerous hardware modules of the terminal device.
Alternatively, in another example, the 5G-based internet of vehicles intelligent terminal 100 and the terminal device according to the embodiments of the present application may be separate devices, and the 5G-based internet of vehicles intelligent terminal 100 may be connected to the terminal device through a wired and/or wireless network and transmit the interaction information according to the agreed data format.
Fig. 6 is a flowchart of a management method of a 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application. Fig. 7 is a schematic diagram of a system architecture of a management method of a 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application. As shown in fig. 6 and fig. 7, a management method of a 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application includes: s110, acquiring a network request voice signal input by a user; s120, carrying out voice recognition on the network request voice signal to obtain a network request text description; s130, enabling the network request text description to pass through a semantic replenishment generator based on an AIGC model to obtain an expanded network request text description; and S140, generating a network request instruction based on the extended network request text description.
Here, it will be understood by those skilled in the art that the specific operations of the respective steps in the above-described 5G-based internet of vehicles intelligent terminal management method have been described in detail in the above description of the 5G-based internet of vehicles intelligent terminal 100 with reference to fig. 1 to 5, and thus, repetitive descriptions thereof will be omitted.
Fig. 8 is an application scenario diagram of a 5G-based internet of vehicles intelligent terminal according to an embodiment of the present application. As shown in fig. 8, in this application scenario, first, a network request voice signal (e.g., D illustrated in fig. 8) input by a user is acquired, and then, the network request voice signal is input to a server (e.g., S illustrated in fig. 8) in which a management algorithm of a 5G-based internet of vehicles intelligent terminal is deployed, wherein the server can process the network request voice signal using the management algorithm of the 5G-based internet of vehicles intelligent terminal to obtain the network request text description.
This application uses specific words to describe embodiments of the application. Reference to "a first/second embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present application. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present application may be combined as suitable.
Furthermore, those skilled in the art will appreciate that the various aspects of the invention are illustrated and described in the context of a number of patentable categories or circumstances, including any novel and useful procedures, machines, products, or materials, or any novel and useful modifications thereof. Accordingly, aspects of the present application may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may take the form of a computer product, comprising computer-readable program code, embodied in one or more computer-readable media.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The foregoing is illustrative of the present invention and is not to be construed as limiting thereof. Although a few exemplary embodiments of this invention have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this invention. Accordingly, all such modifications are intended to be included within the scope of this invention as defined in the following claims. It is to be understood that the foregoing is illustrative of the present invention and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The invention is defined by the claims and their equivalents.

Claims (10)

1. 5G-based intelligent terminal for Internet of vehicles, which is characterized by comprising:
the Modem module is used for connecting a 5G network to provide network connection and communication functions;
the local control program writing module is used for writing a local control program to realize control of network connection and communication functions;
a network manager for managing network connections and network configurations;
the network connection server is used for processing network connection services; and
and the 5G-Tbox network control terminal is used for providing network requests and Modem telephone voice services.
2. The 5G-based internet of vehicles intelligent terminal of claim 1, wherein the network manager comprises:
the input signal acquisition module is used for acquiring a network request voice signal input by a user;
the voice recognition module is used for carrying out voice recognition on the network request voice signal to obtain a network request text description;
the semantic supplementing module is used for enabling the network request text description to pass through a semantic supplementing generator based on an AIGC model to obtain an expanded network request text description; and
and the network request instruction generation module is used for generating a network request instruction based on the extended network request text description.
3. The 5G-based internet of vehicles intelligent terminal of claim 2, wherein the voice recognition module comprises:
the voice signal noise reduction unit is used for carrying out noise reduction processing on the network request voice signal to obtain a network request voice signal after noise reduction;
the voice signal feature extraction unit is used for carrying out signal processing on the network request voice signal after noise reduction so as to obtain voice signal features;
and the network request text generation unit is used for determining the network request text description based on the voice signal characteristics.
4. The 5G-based internet of vehicles intelligent terminal of claim 3, wherein the voice signal feature extraction unit comprises:
the voice signal spectrum analysis subunit is used for extracting a logarithmic mel spectrogram of the network request voice signal after noise reduction;
the voice signal spectrum feature extraction subunit is used for carrying out feature extraction on the logarithmic mel spectrogram through a feature extractor based on a deep neural network model so as to obtain a logarithmic mel spectrogram feature matrix; and
and the residual feature fusion unit is used for fusing the logarithmic Mel spectrogram feature matrix and the logarithmic Mel spectrogram by utilizing a residual thought so as to obtain a multi-scale logarithmic Mel spectrogram as the voice signal feature.
5. The 5G-based internet of vehicles intelligent terminal of claim 4, wherein the deep neural network model is a convolutional neural network model using a spatial attention mechanism.
6. The 5G-based internet of vehicles intelligent terminal of claim 5, wherein the network request text generation unit is configured to:
optimizing each characteristic value of the multi-scale logarithmic mel spectrogram to obtain an optimized multi-scale logarithmic mel spectrogram;
and the optimized multi-scale logarithmic mel spectrogram is passed through a voice recognizer to obtain the network request text description.
7. The 5G-based internet of vehicles intelligent terminal of claim 6, wherein optimizing the feature values of the multi-scale logarithmic mel-spectrum to obtain an optimized multi-scale logarithmic mel-spectrum comprises: optimizing each characteristic value of the multi-scale logarithmic mel spectrogram by using the following optimization formula to obtain the optimized multi-scale logarithmic mel spectrogram;
wherein, the optimization formula is:
wherein,is the multi-scale logarithmic mel-pattern, < >>And->Is the multi-scale logarithmic Mel spectrum ++>Is>And (d)Characteristic value, and->Is the multi-scale logarithmic Mel spectrum ++>Global feature mean>Is the +.f. of the optimized multiscale logarithmic mel spectrum>And characteristic values.
8. The 5G-based internet of vehicles intelligent terminal of claim 7, further comprising a training module for training the convolutional neural network model using spatial attention mechanisms, the speech recognizer, and the AIGC model-based semantic replenishment generator;
wherein, training module includes:
the training data acquisition unit is used for acquiring training data, wherein the training data comprises training network request voice signals input by a user and real description of the extended network request text description;
the training signal noise reduction processing unit is used for carrying out noise reduction processing on the training network request voice signal to obtain a training noise-reduced network request voice signal;
the training signal spectrum analysis unit is used for extracting a training log-mel spectrogram of the network request voice signal after training and noise reduction;
the training signal spectrum characteristic extraction unit is used for enabling the training log-mel spectrogram to pass through the convolutional neural network model using the spatial attention mechanism so as to obtain a training log-mel spectrogram characteristic matrix;
the matrix expansion unit is used for expanding the training logarithmic mel spectrogram characteristic matrix into a logarithmic mel spectrogram expansion characteristic vector and expanding the training logarithmic mel spectrogram into a logarithmic mel spectrogram expansion vector;
a common manifold implicit similarity loss unit, configured to calculate a common manifold implicit similarity factor of the logarithmic mel-spectrum expansion feature vector and the logarithmic mel-spectrum expansion vector to obtain a common manifold implicit similarity loss function value; and
and the model training unit is used for training the convolutional neural network model using the spatial attention mechanism, the voice recognizer and the semantic replenishment generator based on the AIGC model based on the implicit similarity loss function value of the common manifold and through propagation in the gradient descending direction.
9. The 5G-based internet of vehicles intelligent terminal of claim 8, wherein the common manifold implicit similarity loss unit is configured to:
calculating a common manifold implicit similarity factor of the logarithmic mel-spectrum expansion feature vector and the logarithmic mel-spectrum expansion vector by using the following loss formula to obtain the common manifold implicit similarity loss function value;
wherein, the loss formula is:
wherein,and->The logarithmic mel-pattern expansion feature vector and the logarithmic mel-pattern expansion vector,representing a transpose operation->Representing the two norms of the vector, and +.>Representing the square root of the Frobenius norm of the matrix, the logarithmic Mel spectrum expansion feature vector and the logarithmic Mel spectrum expansion vector are both in the form of column vectors, and->For the weight super parameter, ++>Representing vector multiplication, ++>Representing multiplication by location +.>Representing difference by position +.>Representing the common manifold implicit similarity loss function value.
10. A5G-based management method of an intelligent terminal of the Internet of vehicles is characterized by comprising the following steps:
acquiring a network request voice signal input by a user;
performing voice recognition on the network request voice signal to obtain a network request text description;
the network request text description passes through a semantic replenishment generator based on an AIGC model to obtain an augmented network request text description; and
and generating a network request instruction based on the extended network request text description.
CN202311804178.7A 2023-12-26 2023-12-26 5G-based intelligent terminal and method for Internet of vehicles Pending CN117479127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311804178.7A CN117479127A (en) 2023-12-26 2023-12-26 5G-based intelligent terminal and method for Internet of vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311804178.7A CN117479127A (en) 2023-12-26 2023-12-26 5G-based intelligent terminal and method for Internet of vehicles

Publications (1)

Publication Number Publication Date
CN117479127A true CN117479127A (en) 2024-01-30

Family

ID=89623773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311804178.7A Pending CN117479127A (en) 2023-12-26 2023-12-26 5G-based intelligent terminal and method for Internet of vehicles

Country Status (1)

Country Link
CN (1) CN117479127A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784849A (en) * 2018-07-27 2020-02-11 福特全球技术公司 Cellular pair V2X authentication and authorization
US20230113950A1 (en) * 2021-10-07 2023-04-13 Nvidia Corporation Unsupervised alignment for text to speech synthesis using neural networks
CN116740654A (en) * 2023-08-14 2023-09-12 安徽博诺思信息科技有限公司 Substation operation prevention and control method based on image recognition technology
CN117234341A (en) * 2023-11-15 2023-12-15 中影年年(北京)文化传媒有限公司 Virtual reality man-machine interaction method and system based on artificial intelligence
CN117236665A (en) * 2023-11-14 2023-12-15 中国信息通信研究院 Material production scheduling optimization method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784849A (en) * 2018-07-27 2020-02-11 福特全球技术公司 Cellular pair V2X authentication and authorization
US20230113950A1 (en) * 2021-10-07 2023-04-13 Nvidia Corporation Unsupervised alignment for text to speech synthesis using neural networks
CN116740654A (en) * 2023-08-14 2023-09-12 安徽博诺思信息科技有限公司 Substation operation prevention and control method based on image recognition technology
CN117236665A (en) * 2023-11-14 2023-12-15 中国信息通信研究院 Material production scheduling optimization method and system
CN117234341A (en) * 2023-11-15 2023-12-15 中影年年(北京)文化传媒有限公司 Virtual reality man-machine interaction method and system based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN106683680B (en) Speaker recognition method and device, computer equipment and computer readable medium
CN102737633B (en) Method and device for recognizing speaker based on tensor subspace analysis
CN109637545B (en) Voiceprint recognition method based on one-dimensional convolution asymmetric bidirectional long-short-time memory network
US9355642B2 (en) Speaker recognition method through emotional model synthesis based on neighbors preserving principle
CN112100349A (en) Multi-turn dialogue method and device, electronic equipment and storage medium
CN109887484A (en) A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device
JP7213943B2 (en) Audio processing method, device, device and storage medium for in-vehicle equipment
WO2014114116A1 (en) Method and system for voiceprint recognition
CN113707125B (en) Training method and device for multi-language speech synthesis model
CN110428854A (en) Sound end detecting method, device and the computer equipment of vehicle-mounted end
CN110704597B (en) Dialogue system reliability verification method, model generation method and device
CN114038457B (en) Method, electronic device, storage medium, and program for voice wakeup
CN105845141A (en) Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness
Sheng et al. GANs for children: A generative data augmentation strategy for children speech recognition
CN111599339B (en) Speech splicing synthesis method, system, equipment and medium with high naturalness
Hu et al. Speaker recognition based on short utterance compensation method of generative adversarial networks
CN111833852B (en) Acoustic model training method and device and computer readable storage medium
CN117479127A (en) 5G-based intelligent terminal and method for Internet of vehicles
CN114974310A (en) Emotion recognition method and device based on artificial intelligence, computer equipment and medium
CN115689603A (en) User feedback information collection method and device and user feedback system
CN108831487A (en) Method for recognizing sound-groove, electronic device and computer readable storage medium
CN114360500A (en) Speech recognition method and device, electronic equipment and storage medium
CN114297409A (en) Model training method, information extraction method and device, electronic device and medium
Yang et al. Tensor completion for recovering multichannel audio signal with missing data
US20240161727A1 (en) Training method for speech synthesis model and speech synthesis method and related apparatuses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination