CN104350545B - Self-recording unit - Google Patents
Self-recording unit Download PDFInfo
- Publication number
- CN104350545B CN104350545B CN201280073736.0A CN201280073736A CN104350545B CN 104350545 B CN104350545 B CN 104350545B CN 201280073736 A CN201280073736 A CN 201280073736A CN 104350545 B CN104350545 B CN 104350545B
- Authority
- CN
- China
- Prior art keywords
- content
- storage part
- data
- information storage
- identification data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 239000000284 extract Substances 0.000 claims abstract description 16
- 238000003860 storage Methods 0.000 claims description 75
- 238000012360 testing method Methods 0.000 claims description 25
- 238000001514 detection method Methods 0.000 claims description 10
- 230000033228 biological regulation Effects 0.000 claims description 8
- 238000004321 preservation Methods 0.000 claims description 3
- 238000000034 method Methods 0.000 description 29
- 238000010586 diagram Methods 0.000 description 17
- 238000012545 processing Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 101000661816 Homo sapiens Suppression of tumorigenicity 18 protein Proteins 0.000 description 4
- 101000911772 Homo sapiens Hsc70-interacting protein Proteins 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 2
- 108090000237 interleukin-24 Proteins 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 101100018027 Pisum sativum HSP70 gene Proteins 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/034—Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/80—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
- H04N9/82—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
- H04N9/8205—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
- G11B2020/10537—Audio or video recording
- G11B2020/10546—Audio or video recording specifically adapted for audio data
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
Abstract
Self-recording unit according to the present invention, by played data is carried out speech recognition, extract the song title corresponding with the content such as melody, artist name etc. based on the recognition result obtained and identify data, it is possible in the case of the information not sending, receiving content to external equipment, receive the identification data of this content, and by corresponding with content for these identification data association, and automatically carry out record.
Description
Technical field
The present invention relates to the recognition result obtained according to played data is carried out speech recognition, automatically extract letter
Cease and carry out the self-recording unit recorded.
Background technology
Such as Patent Document 1 discloses data below processing means, in this data processing equipment, to by
The played data that play console is play is analyzed, and is categorized into the content-datas such as melody and session and extracts,
Again the content-data that extraction obtains is quantized, then the content-data after quantizing is sent to external equipment also
Compare, then receive the artist name corresponding with this content-data etc. and identify data, the knowledge this received
Other data are corresponding with the content-data extracted associatedly to be preserved.
Prior art literature
Patent documentation
Patent documentation 1
Japanese Patent Laid-Open 2008-27573 publication
Summary of the invention
Invent technical problem to be solved
But, existing data processing equipment this for such as patent documentation 1, in order to carry out content-data
Identify, need the characteristic quantity by the content-data recorded to send to external equipment, and receive identification data, therefore,
There is problems with, it may be assumed that in the case of not communicating with external equipment foundation, cannot be carried out data process.Additionally,
There is problems, it may be assumed that in order to content new with new song etc. is corresponding, need the number that external equipment is possessed
It is updated according to storehouse, and in order to identify more content, it is necessary to increase the number of the content that external equipment is had
According to number.
The present invention completes to solve the problems referred to above, its object is to provide a kind of self-recording unit,
Can not sending to external equipment, receive in the case of played data extracts the information of the content obtained, obtain
The identification data of this content, and by corresponding with content for these identification data association, automatically carry out record.
Solve the technical scheme that technical problem is used
In order to reach above-mentioned purpose, the self-recording unit of the present invention is characterised by, including: voice acquisition unit,
This voice acquisition unit detects and obtains the voice of the identification data comprising content and this content from played data;
Fixed phrase storage part, statement when introducing described content is stored by this fixed phrase storage part;Speech recognition
Portion, described voice acquisition unit the speech data got is identified by this speech recognition section, and based on this knowledge
The statement stored in other result and described fixed phrase storage part, extracts the identification data of described content and carries out
Output;Control portion, this control portion in the case of the identification data receiving described content from described speech recognition section,
Send start time and the instruction of finish time detecting described content;Content interval test section, this content interval is examined
Survey portion, according to the instruction from described control portion, the speech data got based on described voice acquisition unit, is detected
The start time of described content and finish time;Video speech record portion, this video speech record portion is to described content
The content in content interval between start time and finish time that interval test section detection obtains carries out record;With
And information storage part, this information storage part at least store by described video speech record portion record obtain content, with
And the identification data of described content, described control portion is by the identification data of described content and described video speech record portion
The corresponding association of content that record obtains, and it is saved in described information storage part.
The effect of invention
According to the self-recording unit of the present invention, by played data is carried out speech recognition, based on the knowledge obtained
Other result extracts the song title corresponding with the content such as melody, artist name etc. and identifies data, it is possible to not
Send, receive the information of content to external equipment in the case of, receive the identification data of this content, and by this identification
Data association corresponding with content, and automatically carry out record.
Accompanying drawing explanation
Fig. 1 is the block diagram of an example of the self-recording unit representing embodiment 1.
Fig. 2 is the figure that the song representing and being stored in fixed phrase storage part introduces an example of statement.
Fig. 3 is to represent to be stored in the corresponding association of the song title of information storage part, artist name and melody
The figure of one example of data.
Fig. 4 is the flow chart of the action of the self-recording unit representing embodiment 1.
Fig. 5 is the block diagram of an example of the self-recording unit representing embodiment 2.
Fig. 6 be represent be stored in the song title of information storage part, artist name, melody and obtain number of times institute right
The figure of one example of the information that should associate.
Fig. 7 is the flow chart of the action of the self-recording unit representing embodiment 2.
Fig. 8 is the flow chart of the action of the self-recording unit representing embodiment 3.
Fig. 9 is the block diagram of an example of the self-recording unit representing embodiment 4.
Figure 10 is the flow chart of the action of the self-recording unit representing embodiment 4.
Figure 11 is the block diagram of an example of the self-recording unit representing embodiment 5.
Figure 12 is the flow chart of the action of the self-recording unit representing embodiment 5.
Figure 13 is the block diagram of an example of the self-recording unit representing embodiment 6.
Figure 14 is the block diagram of another example of the self-recording unit representing embodiment 6.
Figure 15 is the flow chart of the action of the self-recording unit representing embodiment 6.
Specific embodiment
Below, referring to the drawings embodiments of the present invention are described in detail.
Embodiment 1.
Fig. 1 is the block diagram of an example of the self-recording unit representing embodiments of the present invention 1.This enforcement
In mode, as from the played data play by radio, television set etc., as the knowledge to content and this content
Other data carry out voice acquisition, speech recognition and carry out the self-recording unit recorded, and carry out as a example by following situation
Explanation, i.e. by identification data i.e. song title and the artist name of music content (melody) and this content (melody)
The corresponding situation associatedly carrying out preserving.It addition, it is the most identical.
This self-recording unit includes: voice acquisition unit 1, speech recognition section 2, fixed phrase storage part 3, control
Portion 4 processed, information storage part 5, content interval test section 6 and video speech record portion 7.Additionally, it is real at this
Execute in mode 1, although eliminate diagram, but this self-recording unit also includes being come by button, touch panel etc.
Obtain the input unit 8 of input signal, by video data or the output unit 9 that carried out exporting by voice (with reference to aftermentioned
Fig. 9 of embodiment 4).
This self-recording unit obtains from the played data exported by the audio frequency apparatus such as radio, television set and knows
Other voice, the result obtained according to this identification, extract the melody (content) play title (song title),
Artistical titles (artist name) etc. identify data, by itself and melody (content) corresponding association, and automatically
The identification such as song title, artist name data are stored information storage part by ground.
Voice acquisition unit 1 detects and obtains the voice of the identification data comprising content and this content from played data.
Now, the voice from audio frequency apparatus output is obtained by circuit input etc..In situation about acquiring in an analog fashion
Under, carry out A/D conversion, by converting thereof into such as PCM, (Pulse Code Modulation: pulse is compiled
Code modulation) number format of form etc. obtains.
Speech recognition section 2 has identification dictionary (not shown), the speech data acquiring voice acquisition unit 1
It is identified.Specifically, detect interval with the personnel's of taking corresponding voice of the contents such as what is said or talked about, extract
The characteristic quantity of the speech data that this voice is interval, based on this feature amount, utilizes and identifies that dictionary is identified processing,
Then in the way of character string, export voice identification result.As identifying processing, use such as HMM (Hidden
Markov Model: implicit expression Markov models) the such conventional method of method carries out, therefore, saves herein
Slightly illustrate.Additionally, speech recognition section 2 also can be positioned in the server on network as described later.
Speech recognition utilized herein can be used together following two speech recognition, it may be assumed that to identifying dictionary in advance
Middle registration identify the type of syntax speech recognition that is identified of vocabulary and can be by identifying " あ (a) " continuously
The syllable of characters such as " い (i) " " う (u) " " え (e) " " お (o) " identifies arbitrary string
Large vocabulary continuous speech recognition.Alternatively, it is also possible to make using the following method, i.e. by big vocabulary identify continuously into
The all of identification of row, then carries out morphological analysis to recognition result.For morphological analysis, use such as HMM method this
The conventional method of sample is carried out, and therefore, omits the description herein.
Fixed phrase storage part 3 storage has the lower predicates commonly used when introducing song such as radio station DJ or host
Statement when sentence is used as introducing melody (content), such as shown in Fig. 2 " next song is < artist name
< song title > of > ", " you listened to be < song title > of < artist name > ".Below, it is referred to as
Song introduces statement.
Then, above-mentioned speech recognition section 2 identifies the speech data that voice acquisition unit 1 gets, and with reference to solid
Determine term storage portion 3, i.e. based on identifying the recognition result that obtains of speech data and being stored in fixed phrase storage part
The statement of 3, extracts (identification data) such as the song title of melody (content), artist name, and exports.
As concrete extracting method, the song stored for fixed phrase storage part 3 introduces statement, available major term
Converge and identify < artist name > and the part of < song title > continuously, utilize type of syntax speech recognition to identify
Part in addition.
Control portion 4 (identifies number in recognition result speech recognition section 2 exported i.e. song title, artist name etc.
According to) character string as input, (identify number receiving the song title of this melody (content), artist name etc.
According to) in the case of, the order started to content interval described later test section 6 output action, i.e. send detection melody
The start time of (content) and the instruction of finish time.
Information storage part 5 is such as it is shown on figure 3, at least storage has melody (content), this melody (content)
Artist name, song title (identification data).It addition, as shown in Figure 3, artist name, song title (are being known
Other data) corresponding with melody (content) associatedly preserve while, it is also possible to will obtain (recording) this
The acquisition date etc. of melody (content) preserves the most explicitly.And it is possible to as shown in Fig. 3 (a) that
Sample ground stores data by song title, it is also possible to store the number after collecting by artist as shown in Fig. 3 (b)
According to.It addition, information storage part 5 can be hard disk, it is also possible to be SD card etc..
Content interval test section 6 is according to the instruction from control portion 4, from the language got by voice acquisition unit 1
Sound data detect start time and the finish time of melody (content).Specifically, will be from voice acquisition unit
The digital voice data of 1 output, as input, utilizes the frequecy characteristic amount that the digital voice data inputted has
Deng, melody (content) and the boundary interval of session (part beyond content) in detection speech data.If detection
Interval to the beginning of melody, then send the order that record starts to video speech record portion 7 described later, if detecting
The end of melody is interval, then send the order of record end to video speech record portion 7.For starting interval, knot
The detection that bundle is interval, uses T/F to analyze such conventional method and carries out, therefore, omit herein
Explanation.
Video speech record portion 7 is according to the order of content interval test section 6, the most only to content interval test section 6
Melody (content) part in content interval between the start time and the finish time that detect carries out record, and
It is stored in information storage part 5.
Then, the song title received from speech recognition section 2 and artist name (are identified number by above-mentioned control portion 4
According to) melody (content) the corresponding association that recorded with video speech record portion 7, it is then stored in information
In storage part 5.
Then, utilizing the flow chart shown in Fig. 4, the action to the self-recording unit of embodiment 1 is said
Bright.
First, voice acquisition unit 1 obtains, by circuit input, the voice (step ST11) inputted by audio frequency apparatus.
Now, in the case of analog format, audio frequency apparatus the voice inputted is carried out A/D conversion, such as, is converted to
PCM format, to obtain as numerical data.
Then, the speech data that voice acquisition unit 1 is got by speech recognition section 2 is identified, with character string
Mode exports recognition result.Now, on the basis of comparing with fixed phrase storage part 3, big by carrying out
Vocabulary continuous speech identification, extracts song title and artist name (step ST12).
Control portion 4 accepts song title, artist name from speech recognition section 2, moves making content interval test section 6
Make to indicate.Content interval test section 6 uses signal processing technology, the audio frequency getting voice acquisition unit 1
Voice processes, and extracts the characteristic quantities such as frequency, and the beginning of detection melody part is interval (step ST13), and
The order that record starts is sent to video speech record portion 7.
Then, video speech record portion 7 accepts the order from content interval test section 6, from step ST13
The record (step ST14) starting melody is acted in the starting position of the melody detected.
Additionally, content interval test section 6 uses signal processing technology, the audio speech got is processed,
Extracting characteristic quantity, the end of detection melody part is interval (step ST15), and to 7, video speech record portion
Send the order of record end.
Then, video speech record portion 7 accepts the order from content interval test section 6, stops the note of melody
Record (step ST16), and the melody of this recording is stored information storage part 5 (step ST17).
Finally, control portion 4 by the song title got from the speech recognition section 2 extracted in step ST12,
Artist name is associated with the melody preserved in step ST17, is then saved into information storage part 5 (step
Rapid ST18).
Result preserves association table the most as shown in Figure 3.
Thus, by being based only upon the played data of radio, television set etc., carry out make use of the continuous language of big vocabulary
The speech recognition of sound identification, thus without the external data base for the identification data with reference to content, it is possible to save
The time this external data base is made, updated, and, it is not required that set up between external data base
Communication.
Additionally, due to carry out recorded content so that identification data and content beginning can be extracted as condition,
Therefore, it is possible to the most only preserve song portions, without the capacity build-up of pressure to storage medium.
As it has been described above, according to present embodiment 1, by played data is carried out speech recognition, based on the knowledge obtained
Other result extracts the song title corresponding with the content such as melody, artist name etc. and identifies data, it is possible to
Do not send, receive the information of content to external equipment in the case of, receive the identification data of this content, by this identification
Data association corresponding with content, and automatically carry out record.
Embodiment 2.
Fig. 5 is the block diagram of an example of the self-recording unit representing embodiments of the present invention 2.To with reality
Execute the label that the identical structure mark of the structure illustrated by mode 1 is identical, and omit repeat specification.Shown below
Embodiment 2 in, compared with embodiment 1, the letter that control portion 4 is stored by referring to information storage part 5
Breath, only record meet the content of user preferences.
In information storage part 5 the most in the manner depicted in FIG. 6, the art that not only will export from speech recognition section 2
Family's name, song title (identification data) is corresponding with melody (content) associatedly preserves, and also preserves and comprises
Obtaining the data obtaining number of times of each melody (content), this artistical melody (content), this information stores
The data that portion 5 is stored can carry out reference by control portion 4.
Then, (identification data) such as speech recognition section 2 is exported by control portion 4 song title, artist name
Character string is as input, and this song title and artist name (identification data) be recorded information storage part 5, and
And these data stored by referring to information storage part 5 are (to the relevant letter comprising this content obtaining number of times
Breath), when only the number of times in this content of acquisition is more than stipulated number, to content interval, test section 6 output action is opened
The order begun.
Then, utilizing the flow chart shown in Fig. 7, the action to the self-recording unit of embodiment 2 is said
Bright.
First, voice acquisition unit 1 obtains, by circuit input, the voice (step ST21) inputted by audio frequency apparatus.
Now, in the case of analog format, audio frequency apparatus the voice inputted is carried out A/D conversion, such as, is converted to
PCM format, to obtain as numerical data.
Then, the speech data that voice acquisition unit 1 is got by speech recognition section 2 is identified, with character string
Mode exports recognition result.Now, on the basis of comparing with fixed phrase storage part 3, big by carrying out
Vocabulary continuous speech identification, thus extracts song title and artist name (step ST22).
If control portion 4 obtains song title, artist name from speech recognition section 2, then for the song title obtained,
Artist name, with reference to the data stored in information storage part 5, is obtaining this song title, the content of artist name
The situation that number of times is more than stipulated number (situation that step ST23 is yes) under, make content interval test section 6
Action, carries out the process of step ST24~ST29.
It addition, for step ST24~the process of ST29, due to the step shown in Fig. 4 of embodiment 1
The process of ST13~ST18 is identical, therefore omits the description.
On the other hand, in step ST23, the song title extracted in step ST22, artist name
Melody obtain number of times less than in the situation (situation that step ST23 is no) of stipulated number, control portion 4 will be from
Speech recognition section 2 output song title, artist name acquisition number of times increase once, be then saved in information storage
Portion 5 (step ST30).
Thereby, it is possible to only record the melody having obtained the song title of more than stipulated number, artist name, i.e.
Only record meets the content of user preferences such that it is able to the most only record song portions, without to storage medium
Capacity build-up of pressure.
As it has been described above, according to present embodiment 2, in addition to the effect of embodiment 1, it is possible to only record meets
The content of user preferences, it is possible to the most only record song portions, without causing the capacity of storage medium
Pressure.
Embodiment 3.
Represent block diagram and the embodiment 2 of an example of the self-recording unit of embodiments of the present invention 3
Block diagram shown in Fig. 5 is identical, therefore omits diagram and explanation.In embodiment 3 shown below, if with enforcement
Mode 2 is compared, then and whether this melody of not according to (content) meets the hobby of user and decide whether to carry out melody
The order that the interval detection of (content) starts, but decide whether to carry out melody according to the matching degree of speech recognition
The order that the interval detection of (content) starts.
It addition, in present embodiment 3, when recognition result is exported to control portion 4 by speech recognition section 2,
Its matching degree identified also is exported together with this recognition result.
Then, utilizing the flow chart shown in Fig. 8, the action to the self-recording unit of embodiment 3 is said
Bright.
First, voice acquisition unit 1 obtains, by circuit input, the voice (step ST31) inputted by audio frequency apparatus.
Now, in the case of analog format, audio frequency apparatus the voice inputted is carried out A/D conversion, such as, is converted to
PCM format, obtains as numerical data.
Then, voice acquisition unit 1 speech data got is identified by speech recognition section 2, with character
String mode output recognition result.Now, on the basis of comparing with fixed phrase storage part 3, by carrying out
Large vocabulary continuous speech recognition, thus extracts song title and artist name (step ST32).
When recognition result is exported by speech recognition section 2, represent the voice after being identified in speech recognition section 2
The matching degree of accuracy (most like degree) be output the most together, control portion 4 obtains the coupling of this identification the most simultaneously
Degree, and only this identification matching degree regulation value more than situation (situation that step ST33 is yes) under,
Make test section 6 action of content interval, carry out the process of step ST34~ST39.
It addition, for step ST34~the process of ST39, due to the step shown in Fig. 4 of embodiment 1
The process of ST13~ST18 is identical, therefore omits the description.
On the other hand, in step ST33, the matching degree in speech recognition is less than the situation (step of the value of regulation
The situation that ST33 is no) under, directly terminate to process.
Here, the concrete example of matching degree is illustrated.Such as, in large vocabulary continuous speech recognition, known
The words of the accuracy (most like degree) of each sound of the other voice host etc. because hearing from played data are said
Fluency, the minimizing of noise and uprise, generally, if reaching the matching degree of 60~more than 70%, be then judged as defeated
Go out this sound (character).Accordingly, as the value of the regulation in step ST33, by being redefined for such as
80%, thus in the case of having correctly identified voice, only advance to the process after step ST34.
Additionally, such as, introduce statement (Fig. 2) in the song stored with fixed phrase storage part 3 and compare
Type of syntax speech recognition in, it is also possible to reach percent how many according to consistent statement, calculate and identified
Whether voice is that song introduces such matching degree.In this case, as the value of the regulation in step ST33,
By being redefined for such as 80%, thus only the syntax of song introduction is being carried out correctly speech recognition
In the case of advance to the process after step ST34.
Thereby, it is possible to prevent the voice identification result relatively low based on matching degree and make content interval test section 6 mistakenly
The situation of action, and be prevented from preserving and be associated with wrong song title, artist name (identification data)
Melody (content).
As it has been described above, according to present embodiment 3, in addition to the effect of embodiment 1, due to can be only at language
The matching degree of sound identification more than the value of regulation in the case of the identification data of recorded content and content such that it is able to anti-
Only preserve the content that the identification data with mistake are associated, and be prevented from the capacity build-up of pressure to storage medium.
Embodiment 4.
Fig. 9 is the block diagram of an example of the self-recording unit representing embodiments of the present invention 4.Additionally,
The label identical to the structure mark identical with the structure illustrated by embodiment 1~3, and omit repeat specification.
In the block diagram of present embodiment 4, also to input unit 8 and the output unit eliminating diagram in embodiment 1~3
9 are shown, this input unit 8 utilize button, touch panel etc. obtain input signal, thus accept from
User operation input, this output unit 9 by video data or carry out audio frequency output present data to user,
In embodiments discussed below 4, via these input units 8 and output unit 9, user is able to select whether to protect
Deposit melody (content).
If control portion 4 obtains the word of (identification data) such as the song title of speech recognition section 2 output, artist name
Symbol string, then present these song title, artist name etc. (identification data) via output unit 9, thus come
Be confirmed whether to preserve to user, and accept the input from user via input unit 8, thus judge be
No melody to be preserved (content).Specifically, the situation meaning input to be preserved is being received via input unit
Under, by (identification data) and melody (content) corresponding passes such as the song title of melody (content), artist name
It is saved in information storage part 5 connection, in the case of receiving and meaning the input not preserved, only preserves melody (interior
Hold) song title, artist name etc. (identification data).
Input unit 8, for inputting the wish of user, can be such as button, touch screen etc., it is also possible to is to utilize
The phonetic entry using speech recognition of mike etc. or gesture input.In addition it is also possible to be combinations thereof.
Output unit 9 such as can utilize synthesis voice to export song title, the artist name exported by control portion 4
(identification data), it is also possible on display picture, show character.In addition it is also possible to use both sides simultaneously
Formula exports.
Then, utilizing the flow chart shown in Figure 10, the action to the self-recording unit of embodiment 4 is said
Bright.
It addition, for step ST41~the process of ST46, due to the step shown in Fig. 4 of embodiment 1
The process of ST11~ST16 is identical, therefore omits the description.
Then, in step ST46, video speech record portion 7 is receiving from content interval test section 6
Order and after stopping the record of melody, control portion 4 sends output song title, the finger of artist name to output unit 9
Show, be confirmed whether to preserve this melody (step ST47) to user's request.
Via input unit 8, the melody shown by song title, artist name be have selected situation to be preserved user
Under, i.e. if whether input unit 8 preserves about melody to receive means user's input (step ST48 to be preserved
The situation being yes), then the melody recorded in video speech record portion 7 is saved in information storage part 5 (step
ST49), and by song title, artist name it is saved in information storage part 5 (step ST50) explicitly with this melody.
On the other hand, in step ST48, in the case of user does not select to preserve, i.e. for melody
Whether to preserve, input unit 8 receives and means the user's input not preserved, and in this situation, (step ST48 is
No situation) under, only song title, artist name are saved in information storage part 5, and to this song title, art
The song title of acquisition number of times etc. of family's name, artist name information are updated (step ST51).
As it has been described above, according to present embodiment 4, in addition to the effect of embodiment 1, have recorded content it
After be made whether confirmation to be preserved to user again, only preserve in the case of needs preserve, therefore, it is possible to
Prevent from preserving the undesirable content of user.
Embodiment 5.
Figure 11 is the block diagram of an example of the self-recording unit representing embodiments of the present invention 5.Additionally,
The label identical to the structure mark identical with the structure illustrated by embodiment 1~4, and omit repeat specification.
In embodiment 5 shown below, if compared with embodiment 4, then control portion 4 is to content interval test section 6
Detect that the melody that when terminating interval, video speech record portion 7 is recorded of melody is preserved with information storage part 5
Melody compare, in the case of the melody having preserved same song name, artist name, preserve tonequality
A preferable side.
Control portion 4 obtains the video speech record portion 7 when content interval test section 6 detects the end interval of melody
The melody that recording obtains, and the excellent degree of the tonequality of this melody is quantized.Now, excellent as to tonequality
Degree carries out the method quantized, and uses the usual ways such as signal to noise ratio (S/N ratio), therefore, saves here
Slightly illustrate.It addition, the benchmark of the excellent degree as tonequality, it is possible to use record length, it is also possible to use letter
Make an uproar than the combination with record length.
Further, the data that control portion 4 is stored by referring to information storage part 5, in speech recognition section 2
The identification data of the content extracted, it determines whether there are identical data in information storage part 5 and (there is identical song
Song name, the song of artist name), if existence, then the melody (content) recording of video speech record portion 7 obtained,
The tonequality of the melody (content) preserved with information storage part 5 compares, only by video speech record portion 7
In the case of the tonequality of the melody (content) obtained of newly the recording tonequality higher than existing melody, automatically coverage information
The melody (content) that storage part 5 is preserved, and preserve.
Then, utilizing the flow chart shown in Figure 12, the action to the self-recording unit of embodiment 5 is said
Bright.
It addition, for step ST61~the process of ST66, due to the step shown in Fig. 4 of embodiment 1
The process of ST11~ST16 is identical, therefore omits the description.
Then, in step ST66, video speech record portion 7 is accepting the life from content interval test section 6
After the record made and stop melody, control portion 4 discriminant information storage part 5 is preserved and step
The song title that detected by speech recognition section 2 in ST62, the melody (step ST67) that artist name is identical,
In the saved situation (situation that step ST67 is yes) having identical melody, further obtaining step ST64~
The melody obtained by the recording of video speech record portion 7 in ST66, is carried out the excellent degree of the tonequality to this melody
The tonequality of the melody preserved in the tonequality information obtained after quantizing and information storage part 5 compares (step
ST68)。
The tonequality of melody obtained when video speech record portion 7 recording in step ST64~ST66 is than the pleasure preserved
During bent tonequality height (situation that step ST68 is yes), the melody that the recording of video speech record portion 7 obtains is preserved
To information storage part 5 (step ST69), and song title, artist name are saved in letter explicitly with this melody
Breath storage part 5 (step ST70).
Additionally, in the judgement of step ST67, information storage part 5 is not preserved the situation of identical melody
In (situation that step ST67 is no), it is also carried out the process of above-mentioned steps ST69 and ST70.
On the other hand, in step ST68, the tonequality of the melody obtained of recording in video speech record portion 7 is being protected
In situation (situation that step ST68 is no) below the tonequality of the melody deposited, only by song title, artist name
It is saved in information storage part 5, and this song title, artist name are obtained the song title of number of times etc., artist name
Information is updated (step ST71).
As it has been described above, according to present embodiment 5, in addition to the effect of embodiment 1, for obtained
Song title, artist name, when the tonequality of the melody newly got is higher, record this melody (content), newly
When the tonequality of the melody got is below the tonequality of the melody preserved, do not cover melody (content), by this
The mode of kind, it is possible to be automatically updated to the preferable content of tonequality all the time.
It addition, preferably in 5, situations below is illustrated, i.e. the song obtained of recording newly
In the case of bent tonequality is higher than the tonequality of the song preserved, automatically carry out covering preserving but it also may to
User preserves after being made whether the confirmation of preservation to be covered again.
In this case, not only newly record the melody obtained below the tonequality of the melody preserved time do not cover
Melody (content), and in the case of the melody obtained of newly recording is higher than the tonequality of the melody preserved, also want
Obtaining carrying out again after user confirms covering preservation, thereby, it is possible to according to the situation of user, select to preserve tonequality relatively
A good side, even if or tonequality slightly worse but still select to retain the melody of the recording state that user is liked.
Embodiment 6.
Figure 13 is the block diagram of an example of the self-recording unit representing embodiments of the present invention 6.Additionally,
The label identical to the structure mark identical with the structure illustrated by embodiment 1~5, and omit repeat specification.
In embodiment 6 shown below, if compared with embodiment 2, then speech recognition section 2 is by multiple speech recognitions
Device 21,22,23, constitute, polyglot is possessed respectively to identification dictionary (not shown), by these
Language uses multiple speech recognition engine, carries out speech recognition by polyglot.
It is said that in general, the speech recognition capabilities of the speech recognition engine Foreign Language of such as Japanese is more weak, saying English
In the case of literary composition, the accuracy of identification making speech recognition engine in English is higher.Therefore, possesses such as day pragmatic
Speech recognition device 2-1, the speech recognition device 2-2 of English, the speech recognition device 2-3 of German, etc. this
Sample be respectively provided with the speech recognition device 21 of the various language identifying dictionary of various language, 22,23,.
Here, with use these multiple speech recognition devices 21,22,23, the speech recognition section that is connected in parallel
Illustrate in case of 2.
Then, when the voice exported from voice acquisition unit 1 is identified by speech recognition section 2, make corresponding to
The speech recognition device 21,22,23 of polyglot enters concurrently with respective identification dictionary (not shown)
Action is made, utilize each speech recognition device 21,22,23, polyglot is carried out respectively speech recognition,
And its result is exported to control portion 4.Now, each speech recognition device 21,22,23, know in output
The matching degree of this identification is also exported while other result.
Control portion 4 according to by multiple speech recognition device 21,22,23, matching degree is in the result that identifies
High result, determines the language of identified voice, is extracted by language the highest for the matching degree by this identification
The song title of melody (content), artist name etc. (identification data) be saved in information storage part 5.
Alternatively, it is also possible to as shown in figure 14, use following speech recognition section 2 to replace the language shown in Figure 13
Sound identification part 2, i.e. utilize a speech recognition device 20 switch multiple speech recognition dictionary 20-1,20-2,
20-3, thus it is identified.
Then, utilizing the flow chart shown in Figure 15, the action to the self-recording unit of embodiment 6 is said
Bright.
First, voice acquisition unit 1 obtains, by circuit input, the voice (step ST81) inputted by audio frequency apparatus.
Now, in the case of analog format, audio frequency apparatus the voice inputted is carried out A/D conversion, such as, is converted to
PCM format, to obtain as numerical data.
Then, the speech data that voice acquisition unit 1 is got by speech recognition section 2 is identified, with character string
Mode exports recognition result.Now, on the basis of comparing with fixed phrase storage part 3, big by carrying out
Vocabulary continuous speech identification, thus extracts song title and artist name (step ST82).
Control portion 4 obtains the accurate of the voice of the various language being identified in expression speech recognition section 2 the most simultaneously
The matching degree of degree (most like degree), matching degree based on this identification determines the language (step of song title, artist name
Rapid ST83).Such as, language the highest for matching degree is defined as the language of song title, artist name.Thus, logical
Cross multilingual speech recognition dictionary and be prevented from the speech recognition that precision is relatively low, even the song of foreign country
Name, artist name, it is also possible to be correctly identified.
Further, the matching degree of the speech recognition of the language that control portion 4 is determined in step ST83 is in the value of regulation
Time above (situation that step ST84 is yes), make test section 6 action of content interval, carry out step ST85~SST90
Process.
It addition, for step ST85~the process of ST90, due to the step shown in Fig. 4 of embodiment 1
The process of ST13~ST18 is identical, therefore omits the description.
It addition, in step ST83, as the language determining song title, artist name based on the matching degree identified
The method of speech, it is contemplated that following various methods: such as to all of multiple language possessing speech recognition dictionary
Speech carries out speech recognition, and the matching degree identifying them compares, the method determining the language that matching degree is the highest;
And preset the threshold value of the matching degree of identification, if the matching degree identified is more than set threshold value, judge
For being this language, remaining language is not the most carried out that language identification is such determines method etc., it is possible to use wherein
A certain method.
As it has been described above, according to present embodiment 6, in addition to the effect of embodiment 1, carry out employing various
The speech recognition of the speech recognition engine of language, determines language by matching degree based on this identification, even if thus
It is the song title of foreign language, artist name, it is also possible to be correctly identified and preserve.
It addition, in the above-described embodiment, the situation with content as melody, in case of being music content
It is illustrated, but is not limited to music content, for example, it is possible to extract the interval of the content about sports relay also
Carry out record, the interval of the content about talk show can be extracted and carry out record, it is also possible to extract about record
The interval of the content of sheet also carries out record.
As long as the self-recording unit of the present invention is able to receive that the device of the played data of radio, television set,
Even if do not possess with the outside communication unit communicated in the case of, the poor environment of the connection status of network
Under, also can be suitable for.
It addition, the present application can carry out independent assortment to each embodiment in the range of its invention, or right
Each embodiment be formed arbitrarily the arbitrary element that key element carries out deforming or omit in each embodiment.
Industrial practicality
As long as the self-recording unit of the present invention is able to receive that the device of the played data of radio, television set,
Even if do not possess with the outside communication unit communicated in the case of, the poor environment of the connection status of network
Under, also can be suitable for.
Label declaration
1 voice acquisition unit, 2 speech recognition sections, 3 fixed phrase storage parts, 4 control portions, 5 information storages
Portion, 6 content interval test sections, 7 video speech record portions, 8 input units, 9 output units, 20,21,22,
23, speech recognition device, 20-1,20-2,20-3, identification dictionary.
Claims (6)
1. a self-recording unit, it is characterised in that including:
Voice acquisition unit, this voice acquisition unit detects and obtains the voice of the identification data comprising content and this content from played data;
Fixed phrase storage part, fixed phrase when introducing described content is stored by this fixed phrase storage part;
Speech recognition section, described voice acquisition unit the speech data got is identified by this speech recognition section, and based on the fixed phrase stored in this recognition result and described fixed phrase storage part, extracts the identification data of described content and export;
Control portion, this control portion, in the case of the identification data receiving described content from described speech recognition section, sends start time and the instruction of finish time detecting described content;
Content interval test section, this content interval test section, according to the instruction from described control portion, the speech data got based on described voice acquisition unit, detects start time and the finish time of described content;
Video speech record portion, the content in content interval between start time and finish time that test section detection in described content interval is obtained by this video speech record portion carries out record;And
Information storage part, this information storage part at least stores the content and the identification data of described content obtained by described video speech record portion record,
The identification data of described content are recorded, with described video speech record portion, the corresponding association of content obtained by described control portion, and are saved in described information storage part.
2. self-recording unit as claimed in claim 1, it is characterised in that
The packet that described information storage part is stored contains the number of times obtaining described content,
The data that described control portion is stored by referring to described information storage part, time only more than the number of times that the number of times obtaining described content is regulation, by corresponding with described content for the identification data of described content association, and are saved in described information storage part.
3. self-recording unit as claimed in claim 1, it is characterised in that
Described speech recognition section also exports the matching degree of this identification while exporting described recognition result,
Described control portion only when the matching degree of described identification is more than the value of regulation, by corresponding with described content for the identification data of described content association, and is saved in described information storage part.
4. self-recording unit as claimed in claim 1, it is characterised in that also include:
Accept the input unit of the operation input from user;And
The output unit of data is presented to described user,
Described control portion is by corresponding with described content for the identification data of described content association and when being saved in described information storage part, it is confirmed whether to carry out described preservation to described user via described output unit, receive via described input unit mean input to be preserved time, by corresponding with described content for the identification data of described content association and be saved in described information storage part, receive via described input unit mean the input not preserved time, only the identification data of described content are saved in described information storage part.
5. self-recording unit as claimed in claim 1, it is characterised in that
The data that described control portion is stored by referring to described information storage part, differentiate whether described information storage part exists the data identical with the identification data of the described content extracted, if existing, the tonequality of the content then content that described video speech record portion record obtains preserved with described information storage part compares, and in the case of the tonequality of content that only obtains at described video speech record portion record is higher, the content obtained by this video speech record portion record is covered the content that described information storage part is preserved, and preserves.
6. self-recording unit as claimed in claim 1, it is characterised in that
Described speech recognition section has the identification dictionary of polyglot, carries out speech recognition by described polyglot, also exports the matching degree of this identification while exporting this recognition result,
Described control portion matching degree based on described identification determines the language of the identification data of described content, by the identification data association corresponding with described content of the content extracted according to this language determined, and is saved in described information storage part.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2012/003652 WO2013183078A1 (en) | 2012-06-04 | 2012-06-04 | Automatic recording device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104350545A CN104350545A (en) | 2015-02-11 |
CN104350545B true CN104350545B (en) | 2016-10-05 |
Family
ID=49711508
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201280073736.0A Expired - Fee Related CN104350545B (en) | 2012-06-04 | 2012-06-04 | Self-recording unit |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP5591428B2 (en) |
CN (1) | CN104350545B (en) |
WO (1) | WO2013183078A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015161632A (en) * | 2014-02-28 | 2015-09-07 | 富士通テン株式会社 | Image display system, head-up display device, image display method, and program |
US11328727B2 (en) * | 2017-03-31 | 2022-05-10 | Optim Corporation | Speech detail recording system and method |
JP2019200393A (en) * | 2018-05-18 | 2019-11-21 | シャープ株式会社 | Determination device, electronic apparatus, response system, method for controlling determination device, and control program |
JP7009338B2 (en) * | 2018-09-20 | 2022-01-25 | Tvs Regza株式会社 | Information processing equipment, information processing systems, and video equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1104392A (en) * | 1993-12-21 | 1995-06-28 | 罗伊·J·曼科维茨 | Apparatus and method for identifying broadcast programs and accessing information relating thereto |
CN1726489A (en) * | 2002-10-28 | 2006-01-25 | 格雷斯诺特有限公司 | Personal audio recording system |
CN101611578A (en) * | 2006-12-18 | 2009-12-23 | Ubc媒体集团 | The method of structure and deal with data file request |
CN101996627A (en) * | 2009-08-21 | 2011-03-30 | 索尼公司 | Speech processing apparatus, speech processing method and program |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003085884A (en) * | 2001-09-14 | 2003-03-20 | Pioneer Electronic Corp | Information recording device |
JP2007219178A (en) * | 2006-02-16 | 2007-08-30 | Sony Corp | Musical piece extraction program, musical piece extraction device, and musical piece extraction method |
JP4442585B2 (en) * | 2006-05-11 | 2010-03-31 | 三菱電機株式会社 | Music section detection method and apparatus, and data recording method and apparatus |
JP2011223205A (en) * | 2010-04-07 | 2011-11-04 | Onkyo Corp | Broadcast recording apparatus and program for the same |
-
2012
- 2012-06-04 JP JP2014519697A patent/JP5591428B2/en not_active Expired - Fee Related
- 2012-06-04 CN CN201280073736.0A patent/CN104350545B/en not_active Expired - Fee Related
- 2012-06-04 WO PCT/JP2012/003652 patent/WO2013183078A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1104392A (en) * | 1993-12-21 | 1995-06-28 | 罗伊·J·曼科维茨 | Apparatus and method for identifying broadcast programs and accessing information relating thereto |
CN1726489A (en) * | 2002-10-28 | 2006-01-25 | 格雷斯诺特有限公司 | Personal audio recording system |
CN101611578A (en) * | 2006-12-18 | 2009-12-23 | Ubc媒体集团 | The method of structure and deal with data file request |
CN101996627A (en) * | 2009-08-21 | 2011-03-30 | 索尼公司 | Speech processing apparatus, speech processing method and program |
Also Published As
Publication number | Publication date |
---|---|
WO2013183078A1 (en) | 2013-12-12 |
CN104350545A (en) | 2015-02-11 |
JP5591428B2 (en) | 2014-09-17 |
JPWO2013183078A1 (en) | 2016-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108305632A (en) | A kind of the voice abstract forming method and system of meeting | |
CN1333363C (en) | Audio signal processing apparatus and audio signal processing method | |
US8209169B2 (en) | Synchronization of an input text of a speech with a recording of the speech | |
CN106373598B (en) | The control method and device of audio replay | |
US10529340B2 (en) | Voiceprint registration method, server and storage medium | |
CN107305541A (en) | Speech recognition text segmentation method and device | |
CN104350545B (en) | Self-recording unit | |
CN102937959A (en) | Automatically creating a mapping between text data and audio data | |
CN107274916A (en) | The method and device operated based on voiceprint to audio/video file | |
CN109448460A (en) | One kind reciting detection method and user equipment | |
CN107103915A (en) | A kind of audio data processing method and device | |
CN111341305A (en) | Audio data labeling method, device and system | |
CN107644637A (en) | Phoneme synthesizing method and device | |
CN108305611B (en) | Text-to-speech method, device, storage medium and computer equipment | |
US9691389B2 (en) | Spoken word generation method and system for speech recognition and computer readable medium thereof | |
US20130030794A1 (en) | Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof | |
CN103053173B (en) | Interest interval determines that device, interest interval determine that method and interest interval determine integrated circuit | |
CN101326571B (en) | Audio recognizing device | |
CN105895102A (en) | Recording editing method and recording device | |
CN113450774A (en) | Training data acquisition method and device | |
CN113782026A (en) | Information processing method, device, medium and equipment | |
US7680654B2 (en) | Apparatus and method for segmentation of audio data into meta patterns | |
CN109635151A (en) | Establish the method, apparatus and computer equipment of audio retrieval index | |
Veiga et al. | Towards automatic classification of speech styles | |
CN114078470A (en) | Model processing method and device, and voice recognition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20161005 Termination date: 20200604 |
|
CF01 | Termination of patent right due to non-payment of annual fee |