CN108109633A - The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test - Google Patents

The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test Download PDF

Info

Publication number
CN108109633A
CN108109633A CN201711384472.1A CN201711384472A CN108109633A CN 108109633 A CN108109633 A CN 108109633A CN 201711384472 A CN201711384472 A CN 201711384472A CN 108109633 A CN108109633 A CN 108109633A
Authority
CN
China
Prior art keywords
audio
data
test
mrow
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711384472.1A
Other languages
Chinese (zh)
Inventor
靳源
冯大航
陈孝良
苏少炜
常乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING WISDOM TECHNOLOGY Co Ltd
Beijing SoundAI Technology Co Ltd
Original Assignee
BEIJING WISDOM TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING WISDOM TECHNOLOGY Co Ltd filed Critical BEIJING WISDOM TECHNOLOGY Co Ltd
Priority to CN201711384472.1A priority Critical patent/CN108109633A/en
Publication of CN108109633A publication Critical patent/CN108109633A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Present disclose provides a kind of unattended high in the clouds sound banks to gather the system with intellectual product test, including:Sound storehouse Data acquisition and storage unit, for completing the acquisition of original audio data, and it stores to high in the clouds, it includes recording device, self-service acquisition module and cloud server, and Test data generation is with using unit, for being generated by the original audio data mass in high in the clouds the test audio signal of specification is specified to test tested intellectual product, and the equipment voice data automatic aligning mark returned will be gathered, including processing module and playing device.The disclosure improves the ratio and collecting efficiency of acquisition valid data;And speaker can independently complete gatherer process, accomplish unattended;Realize data uploads high in the clouds in real time simultaneously, avoids the unexpected loss of data;And energy automatic batch metaplasia realizes that equipment voice data aligns with test data into the test data of specified format.

Description

The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test
Technical field
This disclosure relates to data under voice field more particularly to a kind of unattended high in the clouds sound bank acquisition and intelligence The System and method for of product test.
Background technology
Speech recognition technology is fermented and accumulated by very long, extensive commercial horizontal reaching in recent years, is started Smart home, intelligent vehicle-mounted system and a series of speech recognition softwares research and development overbearing tide.Realize it is man-machine between it is intelligent, hommization Effectively interaction builds man-machine communication's environment of efficient natural, has become the active demand of the application of current information technology and development. Depth neuroid is the important research direction of nowadays speech recognition, it needs the training set voice data of magnanimity that could instruct More accurately acoustic model is practised, improves the accuracy of identification.And build extensive, high-fineness, high naturalness and height The sound bank of accuracy is to the stability of speech synthesis system also important.As speech recognition product is more and more, Tester is also required to substantial amounts of voice data and is tested to ensure product quality, and the processing of voice data is also required to expend Tester's great effort.To sum up, the acquisition voice data structure sound bank of high efficiency, high quality, mass is to voice data Processing become particularly important.
First, the acquisition mode in traditional voice storehouse needs recording personnel in specific playback environ-ment speaker to be instructed to complete record The sound storehouse of sound language material is recorded, and this acquisition mode relies on substantial amounts of manual operation, such as recording personnel is needed to operate software configuration Sound card setting is recorded, and the later stage will carry out prolonged editing and mark work, and the position of such as manual modification recording error is adjusted Volume balance of the section per section audio, all can big heavy discount which results in jcharge factor and quality.Secondly, this acquisition mode leads to Often store data on collecting device, then entirety is uploaded to cloud server, there is many risks among these, if record There are emergency situations during system, such as power-off or equipment are damaged suddenly suddenly, and the data of acquisition is caused not preserve. Or occur delete operation by mistake when artificially arranging, lead to problems such as loss of data.Finally, traditional test mode requirement test Every audio file of sound bank is carried out splicing as broadcasting source of sound by personnel, and carries out long-time record to tested intellectual product Sound, since device product hardware problem or internal audio frequency Processing Algorithm problem are frequently encountered recording data and raw tone number The problem of according to misalignment, this detects intellectual product phonetic recognization rate, wake-up rate detects or the standard of machine learning training pattern Exactness all has a significant impact.And since the memory of different product is different, record length is limited, so playing accounting for for audio files It is also required to adjust accordingly for different product with space size, which increases the workloads of tester and research staff.
Disclosure
(1) technical problems to be solved
Present disclose provides a kind of unattended high in the clouds sound banks to gather the System and method for intellectual product test, with At least partly solve the technical issues of set forth above.
(2) technical solution
According to one aspect of the disclosure, a kind of unattended high in the clouds sound bank acquisition is provided to test with intellectual product System, including:Sound storehouse Data acquisition and storage unit for completing the acquisition of original audio data, and is stored to high in the clouds, bag It includes:Recording device, for gathering the audio of speaker;Self-service acquisition module obtains the audio of recording device acquisition by sound card, And it carries out matching generation original audio data with language material text;Cloud server is connected with self-service acquisition module, for will be original Voice data preserves beyond the clouds;Test data generation is with using unit, for passing through the original audio data batch metaplasia in high in the clouds Test audio signal into specified specification tests tested intellectual product, including:Processing module is connected to cloud service Device for obtaining the original audio data in high in the clouds, generates test audio signal;Playing device is connected to processing module, for Test audio signal is played under the control of processing module, for tested intellectual product test.
In the disclosure some embodiments, the processing module is additionally operable to that the equipment voice data returned will be gathered automatic Alignment mark, including:The equipment voice data that tested intellectual product is generated by collecting test audio signal is obtained, and will be original All time coordinates in the time-labeling file of voice data are multiplied to obtain new time coordinate with ratio cc, generate equipment sound The time-labeling file of frequency evidence, wherein, the ratio cc is equipment voice data and the ratio of original audio data duration.
In the disclosure some embodiments, the self-service acquisition module is additionally operable to display and has read textual data and remaining text Number, and judge whether speaker misreads;And operated according to speaker, recording pause and continuation are controlled in Recording Process.
According to another aspect of the disclosure, provide a kind of unattended high in the clouds sound bank acquisition and surveyed with intellectual product The method of examination, including:
Step S1, speaker are gathered by the self-service sound storehouse of completing of recording device and self-service acquisition module, and voice data is real-time Upload to cloud server;
Step S2, processing module extraction high in the clouds original audio data, generates test audio signal, and passes through playing device and broadcast It puts;
Step S3, tested intellectual product collect the audio of playing device broadcasting, and generate equipment voice data and be back to Processing module, processing module carry out the time-labeling file of calculating processing generation equipment voice data, output test result.
In the disclosure some embodiments, the step S2 further comprises:
Step S21, allocating default test data duration, the often mute duration to be inserted among section audio, and initializing slow It deposits;
Step S22 randomly selects the audio in sound storehouse, and previous audio splicing, and will be mute tired with the cycle per section audio Product splicing, calculates audio total length;
Step S23 is calculated and is recorded the duration in Xun Huan per secondary audio program as time-labeling Tk, generation mark text text Part;
Step S24 judges whether audio total length is more than and sets length, if more than length is set then to go to step S25, if Not less than length is set then to determine whether new audio file, if there is then reincarnation step S22, terminate to generate if not Test audio signal;
Step S25 is inserted into chirp signals to total signal two ends, and chirp signal expressions are:
WhereinflFor the initial frequency of swept-frequency signal, fhFor the termination frequency of swept-frequency signal;φ0Represent frequency sweep The phase of signal, T are duration, and A is amplitude, preserve testing audio, initialization caching, and go to step S22.
In the disclosure some embodiments, the step S3 further comprises:
Step S31, tested intellectual product collect the audio of playing device broadcasting, and generate equipment voice data and be back to Processing module, processing module read the original audio data of generation and equipment voice data;
Step S32, processing module detect chirp signal head and the tail endpoints in audio;
Step S33 utilizes time coordinate computing device acquisition voice data duration and original test audio data time length ratio Value:
Wherein, α is equipment acquisition the ratio between audio and testing audio sample rate;TybegFor equipment audio time started, Tyend For the equipment audio end time;TxbegFor original test audio time started, TxendFor the equipment original test audio end time.
All time coordinates that original time is marked in file are multiplied with α to obtain new time coordinate by step S34, are given birth to The time-labeling file of forming apparatus voice data.
In the disclosure some embodiments, the step S33 further comprises:
Sub-step S321, the generation chirp signal identical with test audio signal fall chirp signals progress time domain Turn to obtain matched filter h (t)=x (T-t);
Sub-step S322, by equipment acquisition voice data y (t) and original audio data x (t) before tens second datas respectively with The matched filter carries out convolution, obtains the output signal r of matched filter1(t)=h (t) * y (t), r2(t)=h (t) * x (t);
Sub-step S323 searches the output signal r of matched filter1(t),r2(t) time coordinate of signal maximum point As signal starting point time coordinate similarly detects signal tail point time coordinate.
In the disclosure some embodiments, the step S1 further comprises:
Step S11 reads language material text file information,
Step S12, and judge whether recording terminates, completion of recording if terminating, if not terminating to go to step S13;
Step S13 alternately displays wake-up word and records with language material text for speaker, calculated automatically according to text size Every section of text long recording time;
Step S14 often gathers a section audio and just calculates time domain draw energyWith setting Determine normalized energy value to make the difference and amplification factor is obtainedFinal normalization audio yn=axnIn storage Cloud server is reached, wherein, N is to gather the total sampling number of echo frequency, xnFor the audio volume control sequence gathered back, YrmsTo set The average energy value after fixed normalization, ynFor the audio volume control sequence after normalization;
Step S15, real-time display has been read and remaining text number in Recording Process;
Step S16, judges whether speaker misreads, and recording error control rerecords data before covering, return to step S12.
In the disclosure some embodiments, before the reading language material text, step is further included:
Step S10, the name information for gathering speaker are used to preserve the name of recording file;It sets and wakes up word, configuration record Sound default parameters, including recording sample frequency and quantified precision.
In the disclosure some embodiments, in Recording Process, speaker by self-service acquisition module control recording pause and Continue.
(3) advantageous effect
It can be seen from the above technical proposal that the unattended high in the clouds sound bank acquisition of the disclosure and intellectual product test System and method at least has the advantages that one of them:
1) just automatically saved due in gatherer process, often collecting a new data, program automatically adopts text segmentation Collection, and audio volume is normalized and is stored, and the signal of preservation is uploaded to by WIFI to the cloud specified under same LAN Hold server.The improvement of the structure is solved in gatherer process, and accidental interruption, which occurs, in acquisition causes gathered data not have under preserving The phenomenon that coming has achieved the effect that upload in record;
2) after acquisition, research staff, which can directly generate and be downloaded from cloud server, adds head and the tail marking signal Self-defined duration voice data, and corresponding time-labeling text is generated, using the voice data as playing source of sound pair After tested intellectual product is recorded, new mark file can be automatically generated;
3) due to using recording device, with speaker real-time communication speaker oneself need not be allowed to complete collecting work, from And realize unattended acquisition mode, can control and rerecord in real time when the bright read error of speaker, improve collecting efficiency with Acquisition quality.
Description of the drawings
Fig. 1 is that the unattended high in the clouds sound bank acquisition of the embodiment of the present disclosure and the structure of the system of intellectual product test are shown It is intended to.
Fig. 2 is the method flow diagram of the unattended high in the clouds sound bank acquisition and intellectual product test of the embodiment of the present disclosure.
Fig. 3 is embodiment of the present disclosure automatic collection program flow diagram.
Fig. 4 is embodiment of the present disclosure test audio signal product process figure.
Fig. 5 is the time-labeling file generated flow chart of embodiment of the present disclosure equipment voice data.
Specific embodiment
Purpose, technical scheme and advantage to make the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference The disclosure is further described in attached drawing.
Disclosure some embodiments will be done with reference to appended attached drawing in rear and more comprehensively describe to property, some of but not complete The embodiment in portion will be illustrated.In fact, the various embodiments of the disclosure can be realized in many different forms, and should not be construed To be limited to this several illustrated embodiment;Relatively, these embodiments are provided so that the disclosure meets applicable legal requirement.
In first exemplary embodiment of the disclosure, a kind of unattended high in the clouds sound bank acquisition and intelligence are provided The system of energy product test.Fig. 1 is that the unattended high in the clouds sound bank acquisition of the first embodiment of the present disclosure is tested with intellectual product System structure diagram.As shown in Figure 1, the unattended high in the clouds sound bank acquisition of the disclosure is with what intellectual product was tested System includes:Sound storehouse Data acquisition and storage unit, Test data generation are with using unit.
The high in the clouds sound bank unattended to the present embodiment gathers each with the system of intellectual product test individually below Component is described in detail.
The sound storehouse Data acquisition and storage unit for completing the acquisition of original audio data, and is stored to high in the clouds, bag It includes:
Recording device for gathering the audio of speaker, generates language material text, it is preferable that the recording device is using record Sound microphone and computer sound card;
Self-service acquisition module obtains the language material text of recording device acquisition by sound card, generates original audio data;It is preferred that Ground, the self-service acquisition module are self-service acquisition PC;
Cloud server is connected with self-service acquisition module, for original audio data to be preserved beyond the clouds, the high in the clouds clothes Business device and the connection of self-service acquisition module use WIFI or wired connection;
The Test data generation specifies the testing audio of specification to believe with the cardinal principle using unit for mass generation Number tested intellectual product is tested, gather the data automatic aligning mark returned;Including:
Playing device is connected with processing module, for playing test data;
Processing module is connected with the cloud server, for obtaining the original audio data in high in the clouds, the test tone of generation Frequency signal, and the equipment voice data of passback is obtained, generate the time-labeling file of equipment voice data.Preferably, the place Manage the PC computers that module is research staff;The processing module is with cloud server using WIFI, bluetooth, infrared or wired mode Connection.
Tested intellectual product is connected to processing module, the audio of acquisition playing device output, and is back to processing module; When all time coordinates in the time-labeling file of original audio data are multiplied to obtain with ratio cc new by the processing module Between coordinate, generate the time-labeling file of equipment voice data, the ratio cc for equipment voice data and original audio data when Long ratio.
The self-service acquisition module is additionally operable to display and has read textual data and remaining textual data, and judges whether speaker is read It is wrong;And operated according to speaker, recording pause and continuation are controlled in Recording Process.
So far, the system introduction of the first embodiment of the present disclosure unattended high in the clouds sound bank acquisition and intellectual product test It finishes.
In second exemplary embodiment of the disclosure, a kind of unattended high in the clouds sound bank acquisition and intelligence are provided The method of energy product test, Fig. 2 are the side of the unattended high in the clouds sound bank acquisition and intellectual product test of the embodiment of the present disclosure Method flow chart.As shown in Fig. 2, this method includes:
Step S1, speaker are gathered by the self-service sound storehouse of completing of recording device and self-service acquisition module, and voice data passes through WIFI uploads to cloud server in real time.
Step S2, processing module extraction high in the clouds original audio data, generates test audio signal, and passes through playing device and broadcast It puts.
Step S3, tested intellectual product collect the audio of playing device broadcasting, and generate equipment voice data and be back to Processing module calculates original audio data and the ratio of equipment voice data duration by processing module, and generates equipment audio The time-labeling file of data, outputs test result.
Fig. 3 is the flow chart of embodiment of the present disclosure sound storehouse acquisition, as shown in figure 3, the step S1 further comprises:
Step S11 reads language material text file information,
Step S12, and judge whether recording terminates, completion of recording if terminating, if not terminating to go to step S13;
Step S13, speaker observation alternately display wake up word record with language material text, program according to text size oneself It is dynamic to calculate every section of text long recording time;
Step S14 often gathers a section audio and just calculates time domain draw energyWith setting Determine normalized energy value to make the difference and amplification factor is obtainedFinal normalization audio yn=axnIn storage Cloud server is reached, wherein, N is to gather the total sampling number of echo frequency, xnFor the audio volume control sequence gathered back, YrmsTo set The average energy value after fixed normalization, ynFor the audio volume control sequence after normalization;
Step S15, real-time display has been read and remaining text number in Recording Process;
Step S16, judges whether speaker misreads, the controllable data before rerecording covering of recording error, return to step S12, it is preferable that it is described rerecord covering before data include removing a data on high in the clouds, and re-read a language material text This.
In above-mentioned Recording Process, speaker can control pause and continuation.
Before the reading language material text, step is further included:
Step S10, the name information for gathering speaker are used to preserve the name of recording file;It sets and wakes up word, configuration record Sound default parameters, such as recording sample frequency and quantified precision.
Fig. 4 is embodiment of the present disclosure test audio signal product process figure, as shown in figure 4, the step S2 is further wrapped It includes:
Step S21, allocating default test data duration, the often mute duration to be inserted among section audio, and initializing slow It deposits;
Step S22 randomly selects the audio in sound storehouse, and previous audio splicing, and will be mute tired with the cycle per section audio Product splicing, calculates audio total length;
Step S23 is calculated and is recorded the duration in Xun Huan per secondary audio program as time-labeling Tk, generation mark text text Part;
Step S24 judges whether audio total length is more than and sets length, if more than length is set then to go to step S25, if Not less than length is set then to determine whether new audio file, if there is then reincarnation step S22, terminate to generate if not Test audio signal;
Step S25 is inserted into Linear chirp, i.e. chirp signals, chirp signal expressions to total signal two ends ForWhereinFl is the initial frequency of swept-frequency signal, and fh is The termination frequency of swept-frequency signal, φ0Represent the phase of swept-frequency signal, T is duration, and f is chosen in the present embodimentlFrom 2000Hz to fh8000Hz, duration T be 500ms, amplitude A 1, φ0For 0, testing audio, initialization caching are preserved, and goes to step S22.
Fig. 5 is tested the time-labeling file generated flow chart of the equipment voice data of intellectual product, such as Fig. 5 for the present embodiment Shown, the step S3 further comprises:
Step S31, tested intellectual product collect the audio of playing device broadcasting, and generate equipment voice data and be back to Processing module, processing module read the original audio data of generation and equipment voice data;In the present embodiment, the playing device For loud speaker;
Step S32, processing module detect chirp signal head and the tail endpoints in audio;
Step S33 utilizes time coordinate computing device acquisition audio duration and original test audio data duration ratio
This is equipment acquisition the ratio between audio and testing audio sample rate;TybegFor equipment audio time started, TyendTo set The standby audio end time;TxbegFor original test audio time started, TxendFor the equipment original test audio end time;
Step S34 reads the length information that original audio corresponds to text marking, i.e. time in original time mark file All time coordinates are multiplied with α to obtain new time coordinate, generate the time-labeling file of equipment voice data by coordinate.
Wherein, the step S32 further comprises:
Sub-step S321, the generation chirp signal identical with test audio signal fall chirp signals progress time domain Turn to obtain matched filter h (t)=x (T-t);
Sub-step S322, by equipment acquisition voice data y (t) and original audio data x (t) before tens second datas respectively with The matched filter carries out convolution, obtains the output signal r of matched filter1(t)=h (t) * y (t), r2(t)=h (t) * x (t);
Sub-step S323 searches the output signal r of matched filter1(t),r2(t) time coordinate of signal maximum point As signal starting point time coordinate can similarly detect signal tail point time coordinate.
By the sound storehouse acquisition method of the present invention, valid data account for more than the 80% of total data, and collecting efficiency also carries significantly It is high;And speaker can independently complete gatherer process, accomplish unattended, save the cost of labor of recording;It realizes simultaneously Data upload high in the clouds in real time, avoid the unexpected loss of data;And can automatic batch metaplasia into specified format test Data, the data returned of intelligent sound product acquisition can be realized aligns with test data, and generates accurate markup information.
In order to achieve the purpose that brief description, in above-described embodiment 1, any technical characteristic narration for making same application is all And in this, without repeating identical narration.
So far, the method introduction of the second embodiment of the present disclosure unattended high in the clouds sound bank acquisition and intellectual product test It finishes.
So far, attached drawing is had been combined the embodiment of the present disclosure is described in detail.It should be noted that it in attached drawing or says In bright book text, the realization method that does not illustrate or describe is form known to a person of ordinary skill in the art in technical field, and It is not described in detail.In addition, the above-mentioned definition to each element and method be not limited in mentioning in embodiment it is various specific Structure, shape or mode, those of ordinary skill in the art simply can be changed or replaced to it.
And the shape and size of each component do not reflect actual size and ratio in figure, and only illustrate the embodiment of the present disclosure Content.In addition, in the claims, any reference symbol between bracket should not be configured to the limit to claim System.
Furthermore word "comprising" does not exclude the presence of element or step not listed in the claims.Before element Word "a" or "an" does not exclude the presence of multiple such elements.
In addition, unless specifically described or the step of must sequentially occur, there is no restriction in more than institute for the order of above-mentioned steps Row, and can change or rearrange according to required design.And above-described embodiment can be based on the considerations of design and reliability, that This mix and match is used using or with other embodiment mix and match, i.e., the technical characteristic in different embodiments can be freely combined Form more embodiments.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein. Various general-purpose systems can also be used together with teaching based on this.As described above, required by constructing this kind of system Structure be obvious.In addition, the disclosure is not also directed to any certain programmed language.It should be understood that it can utilize various Programming language realizes content of this disclosure described here, and the description done above to language-specific is to disclose this public affairs The preferred forms opened.
The disclosure can be by means of including the hardware of several different elements and by means of properly programmed computer It realizes.The all parts embodiment of the disclosure can be with hardware realization or to be run on one or more processor Software module is realized or realized with combination thereof.It it will be understood by those of skill in the art that can be in practice using micro- Processor or digital signal processor (DSP) are some or all in the relevant device according to the embodiment of the present disclosure to realize The some or all functions of component.The disclosure be also implemented as performing method as described herein a part or Whole equipment or program of device (for example, computer program and computer program product).Such journey for realizing the disclosure Sequence can may be stored on the computer-readable medium or can have the form of one or more signal.Such signal can It obtains either providing on carrier signal or providing in the form of any other to download from internet website.
Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.It can be the module or list in embodiment Member or component be combined into a module or unit or component and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it may be employed any Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and attached drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification is (including adjoint power Profit requirement, summary and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation It replaces.If also, in the unit claim for listing equipment for drying, several in these devices can be by same hard Part item embodies.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of each open aspect, Above in the description of the exemplary embodiment of the disclosure, each feature of the disclosure is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor The disclosure of shield requires features more more than the feature being expressly recited in each claim.It is more precisely, such as following Claims reflect as, open aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim is in itself Separate embodiments all as the disclosure.
Particular embodiments described above has carried out the purpose, technical solution and advantageous effect of the disclosure further in detail It describes in detail bright, it should be understood that the foregoing is merely the specific embodiments of the disclosure, is not limited to the disclosure, it is all Within the spirit and principle of the disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of the disclosure Within the scope of shield.

Claims (10)

1. a kind of system of unattended high in the clouds sound bank acquisition and intellectual product test, including:
Sound storehouse Data acquisition and storage unit for completing the acquisition of original audio data, and is stored to high in the clouds, including:
Recording device, for gathering the audio of speaker;
Self-service acquisition module obtains the audio that recording device gathers by sound card, and with language material text match generate it is original Voice data;
Cloud server is connected with self-service acquisition module, for original audio data to be preserved beyond the clouds;
Test data generation is with using unit, for passing through the test that specification is specified in the generation of the original audio data mass in high in the clouds Audio signal tests tested intellectual product, including:
Processing module is connected to cloud server, for obtaining the original audio data in high in the clouds, generates test audio signal;
Playing device is connected to processing module, for playing test audio signal under the control of processing module, for tested intelligence Product test.
2. system according to claim 1, wherein,
The processing module is additionally operable to that the equipment voice data automatic aligning mark returned will be gathered, including:Obtain tested intelligence The equipment voice data that product is generated by collecting test audio signal, and will be in the time-labeling file of original audio data All time coordinates are multiplied to obtain new time coordinate with ratio cc, generate the time-labeling file of equipment voice data, wherein, The ratio cc is equipment voice data and the ratio of original audio data duration.
3. system according to claim 2, wherein,
The self-service acquisition module is additionally operable to display and has read textual data and remaining textual data, and judges whether speaker misreads;With And operated according to speaker, recording pause and continuation are controlled in Recording Process.
4. a kind of method of unattended high in the clouds sound bank acquisition and intellectual product test, including:
Step S1, speaker are gathered by the self-service sound storehouse of completing of recording device and self-service acquisition module, and voice data uploads in real time To cloud server;
Step S2, processing module extraction high in the clouds original audio data generate test audio signal, and are played by playing device;
Step S3, tested intellectual product collect the audio of playing device broadcasting, and generate equipment voice data and be back to processing Module, processing module carry out the time-labeling file of calculating processing generation equipment voice data, output test result.
5. according to the method described in claim 4, the step S2 further comprises:
Step S21, allocating default test data duration, the often mute duration to be inserted among section audio, and initialize caching;
Step S22 randomly selects the audio in sound storehouse, and previous audio splicing, and by mute with being spelled per section audio circulative accumulation It connects, calculates audio total length;
Step S23 is calculated and is recorded the duration in Xun Huan per secondary audio program as time-labeling Tk, generation mark text file;
Step S24 judges whether audio total length is more than and sets length, if more than length is set then to go to step S25, if not surpassing It crosses and length is set then to determine whether new audio file, if there is then reincarnation step S22, terminate generation test if not Audio signal;
Step S25 is inserted into chirp signals to total signal two ends, and chirp signal expressions are:
<mrow> <mi>x</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>A</mi> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mo>&amp;lsqb;</mo> <msub> <mi>&amp;phi;</mi> <mn>0</mn> </msub> <mo>+</mo> <mn>2</mn> <mi>&amp;pi;</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mn>0</mn> </msub> <mi>t</mi> <mo>+</mo> <mfrac> <mi>k</mi> <mn>2</mn> </mfrac> <msup> <mi>t</mi> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow>
WhereinflFor the initial frequency of swept-frequency signal, fhFor the termination frequency of swept-frequency signal;φ0Represent swept-frequency signal Phase, T is duration, and A is amplitude, preserves testing audio, and initialization caching and goes to step S22.
6. according to the method described in claim 4, the step S3 further comprises:
Step S31, tested intellectual product collect the audio of playing device broadcasting, and generate equipment voice data and be back to processing Module, processing module read the original audio data of generation and equipment voice data;
Step S32, processing module detect chirp signal head and the tail endpoints in audio;
Step S33 utilizes time coordinate computing device acquisition voice data duration and original test audio data duration ratio:
<mrow> <mi>&amp;alpha;</mi> <mo>=</mo> <mfrac> <mrow> <msub> <mi>T</mi> <mrow> <mi>y</mi> <mi>e</mi> <mi>n</mi> <mi>d</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>T</mi> <mrow> <mi>y</mi> <mi>b</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> </mrow> <mrow> <msub> <mi>T</mi> <mrow> <mi>x</mi> <mi>e</mi> <mi>n</mi> <mi>d</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>T</mi> <mrow> <mi>x</mi> <mi>b</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> </mrow> </mfrac> </mrow>
Wherein, α is equipment acquisition the ratio between audio and testing audio sample rate;TybegFor equipment audio time started, TyendTo set The standby audio end time;TxbegFor original test audio time started, TxendFor the equipment original test audio end time.
All time coordinates that original time is marked in file are multiplied with α to obtain new time coordinate by step S34, and generation is set The time-labeling file of standby voice data.
7. according to the method described in claim 6, the step S33 further comprises:
Sub-step S321, the generation chirp signal identical with test audio signal reverse chirp signals progress time domain To matched filter h (t)=x (T-t);
Sub-step S322, by equipment acquisition voice data y (t) and original audio data x (t) before tens second datas respectively with this Convolution is carried out with wave filter, obtains the output signal r of matched filter1(t)=h (t) * y (t), r2(t)=h (t) * x (t);
Sub-step S323 searches the output signal r of matched filter1(t),r2(t) time coordinate of signal maximum point is Signal starting point time coordinate similarly detects signal tail point time coordinate.
8. according to the method described in claim 4, the step S1 further comprises:
Step S11 reads language material text file information,
Step S12, and judge whether recording terminates, completion of recording if terminating, if not terminating to go to step S13;
Step S13 alternately displays wake-up word and records with language material text for speaker, every section is calculated automatically according to text size Text long recording time;
Step S14 often gathers a section audio and just calculates time domain draw energyReturn with setting One change energy value makes the difference and amplification factor is obtainedFinal normalization audio yn=axnStorage is uploaded to Cloud server, wherein, N is to gather the total sampling number of echo frequency, xnFor the audio volume control sequence gathered back, YrmsReturn for setting The average energy value after one change, ynFor the audio volume control sequence after normalization;
Step S15, real-time display has been read and remaining text number in Recording Process;
Step S16, judges whether speaker misreads, and recording error control rerecords data before covering, return to step S12.
9. according to the method described in claim 8, before the reading language material text, step is further included:
Step S10, the name information for gathering speaker are used to preserve the name of recording file;It sets and wakes up word, configuration recording is silent Parameter is recognized, including recording sample frequency and quantified precision.
10. according to the method described in claim 8, in Recording Process, speaker controls recording pause by self-service acquisition module And continuation.
CN201711384472.1A 2017-12-20 2017-12-20 The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test Pending CN108109633A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711384472.1A CN108109633A (en) 2017-12-20 2017-12-20 The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711384472.1A CN108109633A (en) 2017-12-20 2017-12-20 The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test

Publications (1)

Publication Number Publication Date
CN108109633A true CN108109633A (en) 2018-06-01

Family

ID=62210498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711384472.1A Pending CN108109633A (en) 2017-12-20 2017-12-20 The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test

Country Status (1)

Country Link
CN (1) CN108109633A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754822A (en) * 2019-01-22 2019-05-14 平安科技(深圳)有限公司 The method and apparatus for establishing Alzheimer's disease detection model
CN111489741A (en) * 2020-04-07 2020-08-04 四川虹美智能科技有限公司 Method and device for managing voice library
CN112306857A (en) * 2020-02-24 2021-02-02 北京字节跳动网络技术有限公司 Method and apparatus for testing applications
CN112652296A (en) * 2020-12-23 2021-04-13 北京华宇信息技术有限公司 Streaming voice endpoint detection method, device and equipment
CN115171657A (en) * 2022-05-26 2022-10-11 青岛海尔科技有限公司 Voice equipment testing method and device and storage medium
CN116758939A (en) * 2023-08-21 2023-09-15 北京希尔贝壳科技有限公司 Multi-device audio data alignment method, device and storage medium
CN115171657B (en) * 2022-05-26 2024-10-22 青岛海尔科技有限公司 Voice equipment testing method and device and storage medium

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5619554A (en) * 1994-06-08 1997-04-08 Linkusa Corporation Distributed voice system and method
CN1831829A (en) * 2006-04-20 2006-09-13 北京理工大学 Method for quickly forming voice data base for key word checkout task
CN1841496A (en) * 2005-03-31 2006-10-04 株式会社东芝 Method and apparatus for measuring speech speed and recording apparatus therefor
CN101310315A (en) * 2005-11-18 2008-11-19 雅马哈株式会社 Language learning device, method and program and recording medium
CN101383103A (en) * 2006-02-28 2009-03-11 安徽中科大讯飞信息科技有限公司 Spoken language pronunciation level automatic test method
CN101089641B (en) * 2007-07-12 2010-09-08 北京中星微电子有限公司 Method, system for implementing audio-frequency equipment test
CN101938391A (en) * 2010-08-31 2011-01-05 中山大学 Voice processing method, system, remote controller, set-top box and cloud server
CN102075988A (en) * 2009-11-24 2011-05-25 中国移动通信集团浙江有限公司 System and method for locating end-to-end voice quality fault in mobile communication network
US8019605B2 (en) * 2007-05-14 2011-09-13 Nuance Communications, Inc. Reducing recording time when constructing a concatenative TTS voice using a reduced script and pre-recorded speech assets
CN102779508A (en) * 2012-03-31 2012-11-14 安徽科大讯飞信息科技股份有限公司 Speech corpus generating device and method, speech synthesizing system and method
EP2541544A1 (en) * 2011-06-30 2013-01-02 France Telecom Voice sample tagging
CN105096934A (en) * 2015-06-30 2015-11-25 百度在线网络技术(北京)有限公司 Method for constructing speech feature library as well as speech synthesis method, device and equipment
CN105529037A (en) * 2015-12-04 2016-04-27 中国电子科技集团公司第五十研究所 Communication equipment voice quality evaluation testing system and testing method
US20160165050A1 (en) * 2002-03-15 2016-06-09 Intellisist, Inc. System and Method for Message-Based Call Communication
CN105788588A (en) * 2014-12-23 2016-07-20 深圳市腾讯计算机系统有限公司 Navigation voice broadcasting method and apparatus
CN105895077A (en) * 2015-11-15 2016-08-24 乐视移动智能信息技术(北京)有限公司 Recording editing method and recording device
CN106356052A (en) * 2016-10-17 2017-01-25 腾讯科技(深圳)有限公司 Voice synthesis method and device
CN106710597A (en) * 2017-01-04 2017-05-24 广东小天才科技有限公司 Voice data recording method and device
CN106897379A (en) * 2017-01-20 2017-06-27 广东小天才科技有限公司 Method for automatically generating LRC time axis file of voice file and related equipment
CN106971009A (en) * 2017-05-11 2017-07-21 网易(杭州)网络有限公司 Speech data library generating method and device, storage medium, electronic equipment
CN107086040A (en) * 2017-06-23 2017-08-22 歌尔股份有限公司 Speech recognition capabilities method of testing and device
CN107195316A (en) * 2017-04-28 2017-09-22 北京声智科技有限公司 Training data preparation system and method for far field speech recognition
CN107221319A (en) * 2017-05-16 2017-09-29 厦门盈趣科技股份有限公司 A kind of speech recognition test system and method

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5619554A (en) * 1994-06-08 1997-04-08 Linkusa Corporation Distributed voice system and method
US20160165050A1 (en) * 2002-03-15 2016-06-09 Intellisist, Inc. System and Method for Message-Based Call Communication
CN1841496A (en) * 2005-03-31 2006-10-04 株式会社东芝 Method and apparatus for measuring speech speed and recording apparatus therefor
CN101310315A (en) * 2005-11-18 2008-11-19 雅马哈株式会社 Language learning device, method and program and recording medium
CN101383103A (en) * 2006-02-28 2009-03-11 安徽中科大讯飞信息科技有限公司 Spoken language pronunciation level automatic test method
CN1831829A (en) * 2006-04-20 2006-09-13 北京理工大学 Method for quickly forming voice data base for key word checkout task
US8019605B2 (en) * 2007-05-14 2011-09-13 Nuance Communications, Inc. Reducing recording time when constructing a concatenative TTS voice using a reduced script and pre-recorded speech assets
CN101089641B (en) * 2007-07-12 2010-09-08 北京中星微电子有限公司 Method, system for implementing audio-frequency equipment test
CN102075988A (en) * 2009-11-24 2011-05-25 中国移动通信集团浙江有限公司 System and method for locating end-to-end voice quality fault in mobile communication network
CN101938391A (en) * 2010-08-31 2011-01-05 中山大学 Voice processing method, system, remote controller, set-top box and cloud server
EP2541544A1 (en) * 2011-06-30 2013-01-02 France Telecom Voice sample tagging
CN102779508A (en) * 2012-03-31 2012-11-14 安徽科大讯飞信息科技股份有限公司 Speech corpus generating device and method, speech synthesizing system and method
CN105788588A (en) * 2014-12-23 2016-07-20 深圳市腾讯计算机系统有限公司 Navigation voice broadcasting method and apparatus
CN105096934A (en) * 2015-06-30 2015-11-25 百度在线网络技术(北京)有限公司 Method for constructing speech feature library as well as speech synthesis method, device and equipment
CN105895077A (en) * 2015-11-15 2016-08-24 乐视移动智能信息技术(北京)有限公司 Recording editing method and recording device
CN105529037A (en) * 2015-12-04 2016-04-27 中国电子科技集团公司第五十研究所 Communication equipment voice quality evaluation testing system and testing method
CN106356052A (en) * 2016-10-17 2017-01-25 腾讯科技(深圳)有限公司 Voice synthesis method and device
CN106710597A (en) * 2017-01-04 2017-05-24 广东小天才科技有限公司 Voice data recording method and device
CN106897379A (en) * 2017-01-20 2017-06-27 广东小天才科技有限公司 Method for automatically generating LRC time axis file of voice file and related equipment
CN107195316A (en) * 2017-04-28 2017-09-22 北京声智科技有限公司 Training data preparation system and method for far field speech recognition
CN106971009A (en) * 2017-05-11 2017-07-21 网易(杭州)网络有限公司 Speech data library generating method and device, storage medium, electronic equipment
CN107221319A (en) * 2017-05-16 2017-09-29 厦门盈趣科技股份有限公司 A kind of speech recognition test system and method
CN107086040A (en) * 2017-06-23 2017-08-22 歌尔股份有限公司 Speech recognition capabilities method of testing and device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754822A (en) * 2019-01-22 2019-05-14 平安科技(深圳)有限公司 The method and apparatus for establishing Alzheimer's disease detection model
CN112306857A (en) * 2020-02-24 2021-02-02 北京字节跳动网络技术有限公司 Method and apparatus for testing applications
CN111489741A (en) * 2020-04-07 2020-08-04 四川虹美智能科技有限公司 Method and device for managing voice library
CN111489741B (en) * 2020-04-07 2022-09-06 四川虹美智能科技有限公司 Method and device for managing voice library
CN112652296A (en) * 2020-12-23 2021-04-13 北京华宇信息技术有限公司 Streaming voice endpoint detection method, device and equipment
CN112652296B (en) * 2020-12-23 2023-07-04 北京华宇信息技术有限公司 Method, device and equipment for detecting streaming voice endpoint
CN115171657A (en) * 2022-05-26 2022-10-11 青岛海尔科技有限公司 Voice equipment testing method and device and storage medium
CN115171657B (en) * 2022-05-26 2024-10-22 青岛海尔科技有限公司 Voice equipment testing method and device and storage medium
CN116758939A (en) * 2023-08-21 2023-09-15 北京希尔贝壳科技有限公司 Multi-device audio data alignment method, device and storage medium
CN116758939B (en) * 2023-08-21 2023-11-03 北京希尔贝壳科技有限公司 Multi-device audio data alignment method, device and storage medium

Similar Documents

Publication Publication Date Title
CN108109633A (en) The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
CN108711436A (en) Speaker verification&#39;s system Replay Attack detection method based on high frequency and bottleneck characteristic
CN110533974A (en) A kind of intelligent Auto-generating Test Paper method, system and computer readable storage medium
CN105912560A (en) Detect sports video highlights based on voice recognition
US20180122260A1 (en) Musical performance evaluation system and method
CN108206027A (en) A kind of audio quality evaluation method and system
Greshler et al. Catch-a-waveform: Learning to generate audio from a single short example
Allen et al. Using self-organizing maps to classify humpback whale song units and quantify their similarity
CN110211592A (en) Intelligent sound data processing equipment and method
Sulun et al. On filter generalization for music bandwidth extension using deep neural networks
Sabathé et al. Deep recurrent music writer: Memory-enhanced variational autoencoder-based musical score composition and an objective measure
CN106297841A (en) Audio follow-up reading guiding method and device
CN109410918A (en) For obtaining the method and device of information
CN110223365A (en) A kind of notes generation method, system, device and computer readable storage medium
TW202036534A (en) Speech synthesis method, device, and equipment
CN107342079A (en) A kind of acquisition system of the true voice based on internet
CN110246489A (en) Audio recognition method and system for children
CN111402922B (en) Audio signal classification method, device, equipment and storage medium based on small samples
CN110364184A (en) Accuracy in pitch appraisal procedure based on depth convolutional neural networks DCNN and CTC algorithm
CN110232277A (en) Detection method, device and the computer equipment at webpage back door
CN106098081A (en) The acoustic fidelity identification method of audio files and device
CN113707111B (en) Method and computer program for processing music score data displayed in multiple lines into playing data
CN103744971B (en) Method and equipment for actively pushing information
CN110415722A (en) Audio signal processing method, storage medium, computer program and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180601

RJ01 Rejection of invention patent application after publication