CN104350545B - Self-recording unit - Google Patents

Self-recording unit Download PDF

Info

Publication number
CN104350545B
CN104350545B CN201280073736.0A CN201280073736A CN104350545B CN 104350545 B CN104350545 B CN 104350545B CN 201280073736 A CN201280073736 A CN 201280073736A CN 104350545 B CN104350545 B CN 104350545B
Authority
CN
China
Prior art keywords
content
storage part
data
information storage
identification data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201280073736.0A
Other languages
Chinese (zh)
Other versions
CN104350545A (en
Inventor
山下裕生
岩崎知弘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Publication of CN104350545A publication Critical patent/CN104350545A/en
Application granted granted Critical
Publication of CN104350545B publication Critical patent/CN104350545B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/034Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/82Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
    • H04N9/8205Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • G11B2020/10537Audio or video recording
    • G11B2020/10546Audio or video recording specifically adapted for audio data

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)

Abstract

Self-recording unit according to the present invention, by played data is carried out speech recognition, extract the song title corresponding with the content such as melody, artist name etc. based on the recognition result obtained and identify data, it is possible in the case of the information not sending, receiving content to external equipment, receive the identification data of this content, and by corresponding with content for these identification data association, and automatically carry out record.

Description

Self-recording unit
Technical field
The present invention relates to the recognition result obtained according to played data is carried out speech recognition, automatically extract letter Cease and carry out the self-recording unit recorded.
Background technology
Such as Patent Document 1 discloses data below processing means, in this data processing equipment, to by The played data that play console is play is analyzed, and is categorized into the content-datas such as melody and session and extracts, Again the content-data that extraction obtains is quantized, then the content-data after quantizing is sent to external equipment also Compare, then receive the artist name corresponding with this content-data etc. and identify data, the knowledge this received Other data are corresponding with the content-data extracted associatedly to be preserved.
Prior art literature
Patent documentation
Patent documentation 1
Japanese Patent Laid-Open 2008-27573 publication
Summary of the invention
Invent technical problem to be solved
But, existing data processing equipment this for such as patent documentation 1, in order to carry out content-data Identify, need the characteristic quantity by the content-data recorded to send to external equipment, and receive identification data, therefore, There is problems with, it may be assumed that in the case of not communicating with external equipment foundation, cannot be carried out data process.Additionally, There is problems, it may be assumed that in order to content new with new song etc. is corresponding, need the number that external equipment is possessed It is updated according to storehouse, and in order to identify more content, it is necessary to increase the number of the content that external equipment is had According to number.
The present invention completes to solve the problems referred to above, its object is to provide a kind of self-recording unit, Can not sending to external equipment, receive in the case of played data extracts the information of the content obtained, obtain The identification data of this content, and by corresponding with content for these identification data association, automatically carry out record.
Solve the technical scheme that technical problem is used
In order to reach above-mentioned purpose, the self-recording unit of the present invention is characterised by, including: voice acquisition unit, This voice acquisition unit detects and obtains the voice of the identification data comprising content and this content from played data; Fixed phrase storage part, statement when introducing described content is stored by this fixed phrase storage part;Speech recognition Portion, described voice acquisition unit the speech data got is identified by this speech recognition section, and based on this knowledge The statement stored in other result and described fixed phrase storage part, extracts the identification data of described content and carries out Output;Control portion, this control portion in the case of the identification data receiving described content from described speech recognition section, Send start time and the instruction of finish time detecting described content;Content interval test section, this content interval is examined Survey portion, according to the instruction from described control portion, the speech data got based on described voice acquisition unit, is detected The start time of described content and finish time;Video speech record portion, this video speech record portion is to described content The content in content interval between start time and finish time that interval test section detection obtains carries out record;With And information storage part, this information storage part at least store by described video speech record portion record obtain content, with And the identification data of described content, described control portion is by the identification data of described content and described video speech record portion The corresponding association of content that record obtains, and it is saved in described information storage part.
The effect of invention
According to the self-recording unit of the present invention, by played data is carried out speech recognition, based on the knowledge obtained Other result extracts the song title corresponding with the content such as melody, artist name etc. and identifies data, it is possible to not Send, receive the information of content to external equipment in the case of, receive the identification data of this content, and by this identification Data association corresponding with content, and automatically carry out record.
Accompanying drawing explanation
Fig. 1 is the block diagram of an example of the self-recording unit representing embodiment 1.
Fig. 2 is the figure that the song representing and being stored in fixed phrase storage part introduces an example of statement.
Fig. 3 is to represent to be stored in the corresponding association of the song title of information storage part, artist name and melody The figure of one example of data.
Fig. 4 is the flow chart of the action of the self-recording unit representing embodiment 1.
Fig. 5 is the block diagram of an example of the self-recording unit representing embodiment 2.
Fig. 6 be represent be stored in the song title of information storage part, artist name, melody and obtain number of times institute right The figure of one example of the information that should associate.
Fig. 7 is the flow chart of the action of the self-recording unit representing embodiment 2.
Fig. 8 is the flow chart of the action of the self-recording unit representing embodiment 3.
Fig. 9 is the block diagram of an example of the self-recording unit representing embodiment 4.
Figure 10 is the flow chart of the action of the self-recording unit representing embodiment 4.
Figure 11 is the block diagram of an example of the self-recording unit representing embodiment 5.
Figure 12 is the flow chart of the action of the self-recording unit representing embodiment 5.
Figure 13 is the block diagram of an example of the self-recording unit representing embodiment 6.
Figure 14 is the block diagram of another example of the self-recording unit representing embodiment 6.
Figure 15 is the flow chart of the action of the self-recording unit representing embodiment 6.
Specific embodiment
Below, referring to the drawings embodiments of the present invention are described in detail.
Embodiment 1.
Fig. 1 is the block diagram of an example of the self-recording unit representing embodiments of the present invention 1.This enforcement In mode, as from the played data play by radio, television set etc., as the knowledge to content and this content Other data carry out voice acquisition, speech recognition and carry out the self-recording unit recorded, and carry out as a example by following situation Explanation, i.e. by identification data i.e. song title and the artist name of music content (melody) and this content (melody) The corresponding situation associatedly carrying out preserving.It addition, it is the most identical.
This self-recording unit includes: voice acquisition unit 1, speech recognition section 2, fixed phrase storage part 3, control Portion 4 processed, information storage part 5, content interval test section 6 and video speech record portion 7.Additionally, it is real at this Execute in mode 1, although eliminate diagram, but this self-recording unit also includes being come by button, touch panel etc. Obtain the input unit 8 of input signal, by video data or the output unit 9 that carried out exporting by voice (with reference to aftermentioned Fig. 9 of embodiment 4).
This self-recording unit obtains from the played data exported by the audio frequency apparatus such as radio, television set and knows Other voice, the result obtained according to this identification, extract the melody (content) play title (song title), Artistical titles (artist name) etc. identify data, by itself and melody (content) corresponding association, and automatically The identification such as song title, artist name data are stored information storage part by ground.
Voice acquisition unit 1 detects and obtains the voice of the identification data comprising content and this content from played data. Now, the voice from audio frequency apparatus output is obtained by circuit input etc..In situation about acquiring in an analog fashion Under, carry out A/D conversion, by converting thereof into such as PCM, (Pulse Code Modulation: pulse is compiled Code modulation) number format of form etc. obtains.
Speech recognition section 2 has identification dictionary (not shown), the speech data acquiring voice acquisition unit 1 It is identified.Specifically, detect interval with the personnel's of taking corresponding voice of the contents such as what is said or talked about, extract The characteristic quantity of the speech data that this voice is interval, based on this feature amount, utilizes and identifies that dictionary is identified processing, Then in the way of character string, export voice identification result.As identifying processing, use such as HMM (Hidden Markov Model: implicit expression Markov models) the such conventional method of method carries out, therefore, saves herein Slightly illustrate.Additionally, speech recognition section 2 also can be positioned in the server on network as described later.
Speech recognition utilized herein can be used together following two speech recognition, it may be assumed that to identifying dictionary in advance Middle registration identify the type of syntax speech recognition that is identified of vocabulary and can be by identifying " あ (a) " continuously The syllable of characters such as " い (i) " " う (u) " " え (e) " " お (o) " identifies arbitrary string Large vocabulary continuous speech recognition.Alternatively, it is also possible to make using the following method, i.e. by big vocabulary identify continuously into The all of identification of row, then carries out morphological analysis to recognition result.For morphological analysis, use such as HMM method this The conventional method of sample is carried out, and therefore, omits the description herein.
Fixed phrase storage part 3 storage has the lower predicates commonly used when introducing song such as radio station DJ or host Statement when sentence is used as introducing melody (content), such as shown in Fig. 2 " next song is < artist name < song title > of > ", " you listened to be < song title > of < artist name > ".Below, it is referred to as Song introduces statement.
Then, above-mentioned speech recognition section 2 identifies the speech data that voice acquisition unit 1 gets, and with reference to solid Determine term storage portion 3, i.e. based on identifying the recognition result that obtains of speech data and being stored in fixed phrase storage part The statement of 3, extracts (identification data) such as the song title of melody (content), artist name, and exports. As concrete extracting method, the song stored for fixed phrase storage part 3 introduces statement, available major term Converge and identify < artist name > and the part of < song title > continuously, utilize type of syntax speech recognition to identify Part in addition.
Control portion 4 (identifies number in recognition result speech recognition section 2 exported i.e. song title, artist name etc. According to) character string as input, (identify number receiving the song title of this melody (content), artist name etc. According to) in the case of, the order started to content interval described later test section 6 output action, i.e. send detection melody The start time of (content) and the instruction of finish time.
Information storage part 5 is such as it is shown on figure 3, at least storage has melody (content), this melody (content) Artist name, song title (identification data).It addition, as shown in Figure 3, artist name, song title (are being known Other data) corresponding with melody (content) associatedly preserve while, it is also possible to will obtain (recording) this The acquisition date etc. of melody (content) preserves the most explicitly.And it is possible to as shown in Fig. 3 (a) that Sample ground stores data by song title, it is also possible to store the number after collecting by artist as shown in Fig. 3 (b) According to.It addition, information storage part 5 can be hard disk, it is also possible to be SD card etc..
Content interval test section 6 is according to the instruction from control portion 4, from the language got by voice acquisition unit 1 Sound data detect start time and the finish time of melody (content).Specifically, will be from voice acquisition unit The digital voice data of 1 output, as input, utilizes the frequecy characteristic amount that the digital voice data inputted has Deng, melody (content) and the boundary interval of session (part beyond content) in detection speech data.If detection Interval to the beginning of melody, then send the order that record starts to video speech record portion 7 described later, if detecting The end of melody is interval, then send the order of record end to video speech record portion 7.For starting interval, knot The detection that bundle is interval, uses T/F to analyze such conventional method and carries out, therefore, omit herein Explanation.
Video speech record portion 7 is according to the order of content interval test section 6, the most only to content interval test section 6 Melody (content) part in content interval between the start time and the finish time that detect carries out record, and It is stored in information storage part 5.
Then, the song title received from speech recognition section 2 and artist name (are identified number by above-mentioned control portion 4 According to) melody (content) the corresponding association that recorded with video speech record portion 7, it is then stored in information In storage part 5.
Then, utilizing the flow chart shown in Fig. 4, the action to the self-recording unit of embodiment 1 is said Bright.
First, voice acquisition unit 1 obtains, by circuit input, the voice (step ST11) inputted by audio frequency apparatus. Now, in the case of analog format, audio frequency apparatus the voice inputted is carried out A/D conversion, such as, is converted to PCM format, to obtain as numerical data.
Then, the speech data that voice acquisition unit 1 is got by speech recognition section 2 is identified, with character string Mode exports recognition result.Now, on the basis of comparing with fixed phrase storage part 3, big by carrying out Vocabulary continuous speech identification, extracts song title and artist name (step ST12).
Control portion 4 accepts song title, artist name from speech recognition section 2, moves making content interval test section 6 Make to indicate.Content interval test section 6 uses signal processing technology, the audio frequency getting voice acquisition unit 1 Voice processes, and extracts the characteristic quantities such as frequency, and the beginning of detection melody part is interval (step ST13), and The order that record starts is sent to video speech record portion 7.
Then, video speech record portion 7 accepts the order from content interval test section 6, from step ST13 The record (step ST14) starting melody is acted in the starting position of the melody detected.
Additionally, content interval test section 6 uses signal processing technology, the audio speech got is processed, Extracting characteristic quantity, the end of detection melody part is interval (step ST15), and to 7, video speech record portion Send the order of record end.
Then, video speech record portion 7 accepts the order from content interval test section 6, stops the note of melody Record (step ST16), and the melody of this recording is stored information storage part 5 (step ST17).
Finally, control portion 4 by the song title got from the speech recognition section 2 extracted in step ST12, Artist name is associated with the melody preserved in step ST17, is then saved into information storage part 5 (step Rapid ST18).
Result preserves association table the most as shown in Figure 3.
Thus, by being based only upon the played data of radio, television set etc., carry out make use of the continuous language of big vocabulary The speech recognition of sound identification, thus without the external data base for the identification data with reference to content, it is possible to save The time this external data base is made, updated, and, it is not required that set up between external data base Communication.
Additionally, due to carry out recorded content so that identification data and content beginning can be extracted as condition, Therefore, it is possible to the most only preserve song portions, without the capacity build-up of pressure to storage medium.
As it has been described above, according to present embodiment 1, by played data is carried out speech recognition, based on the knowledge obtained Other result extracts the song title corresponding with the content such as melody, artist name etc. and identifies data, it is possible to Do not send, receive the information of content to external equipment in the case of, receive the identification data of this content, by this identification Data association corresponding with content, and automatically carry out record.
Embodiment 2.
Fig. 5 is the block diagram of an example of the self-recording unit representing embodiments of the present invention 2.To with reality Execute the label that the identical structure mark of the structure illustrated by mode 1 is identical, and omit repeat specification.Shown below Embodiment 2 in, compared with embodiment 1, the letter that control portion 4 is stored by referring to information storage part 5 Breath, only record meet the content of user preferences.
In information storage part 5 the most in the manner depicted in FIG. 6, the art that not only will export from speech recognition section 2 Family's name, song title (identification data) is corresponding with melody (content) associatedly preserves, and also preserves and comprises Obtaining the data obtaining number of times of each melody (content), this artistical melody (content), this information stores The data that portion 5 is stored can carry out reference by control portion 4.
Then, (identification data) such as speech recognition section 2 is exported by control portion 4 song title, artist name Character string is as input, and this song title and artist name (identification data) be recorded information storage part 5, and And these data stored by referring to information storage part 5 are (to the relevant letter comprising this content obtaining number of times Breath), when only the number of times in this content of acquisition is more than stipulated number, to content interval, test section 6 output action is opened The order begun.
Then, utilizing the flow chart shown in Fig. 7, the action to the self-recording unit of embodiment 2 is said Bright.
First, voice acquisition unit 1 obtains, by circuit input, the voice (step ST21) inputted by audio frequency apparatus. Now, in the case of analog format, audio frequency apparatus the voice inputted is carried out A/D conversion, such as, is converted to PCM format, to obtain as numerical data.
Then, the speech data that voice acquisition unit 1 is got by speech recognition section 2 is identified, with character string Mode exports recognition result.Now, on the basis of comparing with fixed phrase storage part 3, big by carrying out Vocabulary continuous speech identification, thus extracts song title and artist name (step ST22).
If control portion 4 obtains song title, artist name from speech recognition section 2, then for the song title obtained, Artist name, with reference to the data stored in information storage part 5, is obtaining this song title, the content of artist name The situation that number of times is more than stipulated number (situation that step ST23 is yes) under, make content interval test section 6 Action, carries out the process of step ST24~ST29.
It addition, for step ST24~the process of ST29, due to the step shown in Fig. 4 of embodiment 1 The process of ST13~ST18 is identical, therefore omits the description.
On the other hand, in step ST23, the song title extracted in step ST22, artist name Melody obtain number of times less than in the situation (situation that step ST23 is no) of stipulated number, control portion 4 will be from Speech recognition section 2 output song title, artist name acquisition number of times increase once, be then saved in information storage Portion 5 (step ST30).
Thereby, it is possible to only record the melody having obtained the song title of more than stipulated number, artist name, i.e. Only record meets the content of user preferences such that it is able to the most only record song portions, without to storage medium Capacity build-up of pressure.
As it has been described above, according to present embodiment 2, in addition to the effect of embodiment 1, it is possible to only record meets The content of user preferences, it is possible to the most only record song portions, without causing the capacity of storage medium Pressure.
Embodiment 3.
Represent block diagram and the embodiment 2 of an example of the self-recording unit of embodiments of the present invention 3 Block diagram shown in Fig. 5 is identical, therefore omits diagram and explanation.In embodiment 3 shown below, if with enforcement Mode 2 is compared, then and whether this melody of not according to (content) meets the hobby of user and decide whether to carry out melody The order that the interval detection of (content) starts, but decide whether to carry out melody according to the matching degree of speech recognition The order that the interval detection of (content) starts.
It addition, in present embodiment 3, when recognition result is exported to control portion 4 by speech recognition section 2, Its matching degree identified also is exported together with this recognition result.
Then, utilizing the flow chart shown in Fig. 8, the action to the self-recording unit of embodiment 3 is said Bright.
First, voice acquisition unit 1 obtains, by circuit input, the voice (step ST31) inputted by audio frequency apparatus. Now, in the case of analog format, audio frequency apparatus the voice inputted is carried out A/D conversion, such as, is converted to PCM format, obtains as numerical data.
Then, voice acquisition unit 1 speech data got is identified by speech recognition section 2, with character String mode output recognition result.Now, on the basis of comparing with fixed phrase storage part 3, by carrying out Large vocabulary continuous speech recognition, thus extracts song title and artist name (step ST32).
When recognition result is exported by speech recognition section 2, represent the voice after being identified in speech recognition section 2 The matching degree of accuracy (most like degree) be output the most together, control portion 4 obtains the coupling of this identification the most simultaneously Degree, and only this identification matching degree regulation value more than situation (situation that step ST33 is yes) under, Make test section 6 action of content interval, carry out the process of step ST34~ST39.
It addition, for step ST34~the process of ST39, due to the step shown in Fig. 4 of embodiment 1 The process of ST13~ST18 is identical, therefore omits the description.
On the other hand, in step ST33, the matching degree in speech recognition is less than the situation (step of the value of regulation The situation that ST33 is no) under, directly terminate to process.
Here, the concrete example of matching degree is illustrated.Such as, in large vocabulary continuous speech recognition, known The words of the accuracy (most like degree) of each sound of the other voice host etc. because hearing from played data are said Fluency, the minimizing of noise and uprise, generally, if reaching the matching degree of 60~more than 70%, be then judged as defeated Go out this sound (character).Accordingly, as the value of the regulation in step ST33, by being redefined for such as 80%, thus in the case of having correctly identified voice, only advance to the process after step ST34.
Additionally, such as, introduce statement (Fig. 2) in the song stored with fixed phrase storage part 3 and compare Type of syntax speech recognition in, it is also possible to reach percent how many according to consistent statement, calculate and identified Whether voice is that song introduces such matching degree.In this case, as the value of the regulation in step ST33, By being redefined for such as 80%, thus only the syntax of song introduction is being carried out correctly speech recognition In the case of advance to the process after step ST34.
Thereby, it is possible to prevent the voice identification result relatively low based on matching degree and make content interval test section 6 mistakenly The situation of action, and be prevented from preserving and be associated with wrong song title, artist name (identification data) Melody (content).
As it has been described above, according to present embodiment 3, in addition to the effect of embodiment 1, due to can be only at language The matching degree of sound identification more than the value of regulation in the case of the identification data of recorded content and content such that it is able to anti- Only preserve the content that the identification data with mistake are associated, and be prevented from the capacity build-up of pressure to storage medium.
Embodiment 4.
Fig. 9 is the block diagram of an example of the self-recording unit representing embodiments of the present invention 4.Additionally, The label identical to the structure mark identical with the structure illustrated by embodiment 1~3, and omit repeat specification. In the block diagram of present embodiment 4, also to input unit 8 and the output unit eliminating diagram in embodiment 1~3 9 are shown, this input unit 8 utilize button, touch panel etc. obtain input signal, thus accept from User operation input, this output unit 9 by video data or carry out audio frequency output present data to user, In embodiments discussed below 4, via these input units 8 and output unit 9, user is able to select whether to protect Deposit melody (content).
If control portion 4 obtains the word of (identification data) such as the song title of speech recognition section 2 output, artist name Symbol string, then present these song title, artist name etc. (identification data) via output unit 9, thus come Be confirmed whether to preserve to user, and accept the input from user via input unit 8, thus judge be No melody to be preserved (content).Specifically, the situation meaning input to be preserved is being received via input unit Under, by (identification data) and melody (content) corresponding passes such as the song title of melody (content), artist name It is saved in information storage part 5 connection, in the case of receiving and meaning the input not preserved, only preserves melody (interior Hold) song title, artist name etc. (identification data).
Input unit 8, for inputting the wish of user, can be such as button, touch screen etc., it is also possible to is to utilize The phonetic entry using speech recognition of mike etc. or gesture input.In addition it is also possible to be combinations thereof.
Output unit 9 such as can utilize synthesis voice to export song title, the artist name exported by control portion 4 (identification data), it is also possible on display picture, show character.In addition it is also possible to use both sides simultaneously Formula exports.
Then, utilizing the flow chart shown in Figure 10, the action to the self-recording unit of embodiment 4 is said Bright.
It addition, for step ST41~the process of ST46, due to the step shown in Fig. 4 of embodiment 1 The process of ST11~ST16 is identical, therefore omits the description.
Then, in step ST46, video speech record portion 7 is receiving from content interval test section 6 Order and after stopping the record of melody, control portion 4 sends output song title, the finger of artist name to output unit 9 Show, be confirmed whether to preserve this melody (step ST47) to user's request.
Via input unit 8, the melody shown by song title, artist name be have selected situation to be preserved user Under, i.e. if whether input unit 8 preserves about melody to receive means user's input (step ST48 to be preserved The situation being yes), then the melody recorded in video speech record portion 7 is saved in information storage part 5 (step ST49), and by song title, artist name it is saved in information storage part 5 (step ST50) explicitly with this melody.
On the other hand, in step ST48, in the case of user does not select to preserve, i.e. for melody Whether to preserve, input unit 8 receives and means the user's input not preserved, and in this situation, (step ST48 is No situation) under, only song title, artist name are saved in information storage part 5, and to this song title, art The song title of acquisition number of times etc. of family's name, artist name information are updated (step ST51).
As it has been described above, according to present embodiment 4, in addition to the effect of embodiment 1, have recorded content it After be made whether confirmation to be preserved to user again, only preserve in the case of needs preserve, therefore, it is possible to Prevent from preserving the undesirable content of user.
Embodiment 5.
Figure 11 is the block diagram of an example of the self-recording unit representing embodiments of the present invention 5.Additionally, The label identical to the structure mark identical with the structure illustrated by embodiment 1~4, and omit repeat specification. In embodiment 5 shown below, if compared with embodiment 4, then control portion 4 is to content interval test section 6 Detect that the melody that when terminating interval, video speech record portion 7 is recorded of melody is preserved with information storage part 5 Melody compare, in the case of the melody having preserved same song name, artist name, preserve tonequality A preferable side.
Control portion 4 obtains the video speech record portion 7 when content interval test section 6 detects the end interval of melody The melody that recording obtains, and the excellent degree of the tonequality of this melody is quantized.Now, excellent as to tonequality Degree carries out the method quantized, and uses the usual ways such as signal to noise ratio (S/N ratio), therefore, saves here Slightly illustrate.It addition, the benchmark of the excellent degree as tonequality, it is possible to use record length, it is also possible to use letter Make an uproar than the combination with record length.
Further, the data that control portion 4 is stored by referring to information storage part 5, in speech recognition section 2 The identification data of the content extracted, it determines whether there are identical data in information storage part 5 and (there is identical song Song name, the song of artist name), if existence, then the melody (content) recording of video speech record portion 7 obtained, The tonequality of the melody (content) preserved with information storage part 5 compares, only by video speech record portion 7 In the case of the tonequality of the melody (content) obtained of newly the recording tonequality higher than existing melody, automatically coverage information The melody (content) that storage part 5 is preserved, and preserve.
Then, utilizing the flow chart shown in Figure 12, the action to the self-recording unit of embodiment 5 is said Bright.
It addition, for step ST61~the process of ST66, due to the step shown in Fig. 4 of embodiment 1 The process of ST11~ST16 is identical, therefore omits the description.
Then, in step ST66, video speech record portion 7 is accepting the life from content interval test section 6 After the record made and stop melody, control portion 4 discriminant information storage part 5 is preserved and step The song title that detected by speech recognition section 2 in ST62, the melody (step ST67) that artist name is identical, In the saved situation (situation that step ST67 is yes) having identical melody, further obtaining step ST64~ The melody obtained by the recording of video speech record portion 7 in ST66, is carried out the excellent degree of the tonequality to this melody The tonequality of the melody preserved in the tonequality information obtained after quantizing and information storage part 5 compares (step ST68)。
The tonequality of melody obtained when video speech record portion 7 recording in step ST64~ST66 is than the pleasure preserved During bent tonequality height (situation that step ST68 is yes), the melody that the recording of video speech record portion 7 obtains is preserved To information storage part 5 (step ST69), and song title, artist name are saved in letter explicitly with this melody Breath storage part 5 (step ST70).
Additionally, in the judgement of step ST67, information storage part 5 is not preserved the situation of identical melody In (situation that step ST67 is no), it is also carried out the process of above-mentioned steps ST69 and ST70.
On the other hand, in step ST68, the tonequality of the melody obtained of recording in video speech record portion 7 is being protected In situation (situation that step ST68 is no) below the tonequality of the melody deposited, only by song title, artist name It is saved in information storage part 5, and this song title, artist name are obtained the song title of number of times etc., artist name Information is updated (step ST71).
As it has been described above, according to present embodiment 5, in addition to the effect of embodiment 1, for obtained Song title, artist name, when the tonequality of the melody newly got is higher, record this melody (content), newly When the tonequality of the melody got is below the tonequality of the melody preserved, do not cover melody (content), by this The mode of kind, it is possible to be automatically updated to the preferable content of tonequality all the time.
It addition, preferably in 5, situations below is illustrated, i.e. the song obtained of recording newly In the case of bent tonequality is higher than the tonequality of the song preserved, automatically carry out covering preserving but it also may to User preserves after being made whether the confirmation of preservation to be covered again.
In this case, not only newly record the melody obtained below the tonequality of the melody preserved time do not cover Melody (content), and in the case of the melody obtained of newly recording is higher than the tonequality of the melody preserved, also want Obtaining carrying out again after user confirms covering preservation, thereby, it is possible to according to the situation of user, select to preserve tonequality relatively A good side, even if or tonequality slightly worse but still select to retain the melody of the recording state that user is liked.
Embodiment 6.
Figure 13 is the block diagram of an example of the self-recording unit representing embodiments of the present invention 6.Additionally, The label identical to the structure mark identical with the structure illustrated by embodiment 1~5, and omit repeat specification. In embodiment 6 shown below, if compared with embodiment 2, then speech recognition section 2 is by multiple speech recognitions Device 21,22,23, constitute, polyglot is possessed respectively to identification dictionary (not shown), by these Language uses multiple speech recognition engine, carries out speech recognition by polyglot.
It is said that in general, the speech recognition capabilities of the speech recognition engine Foreign Language of such as Japanese is more weak, saying English In the case of literary composition, the accuracy of identification making speech recognition engine in English is higher.Therefore, possesses such as day pragmatic Speech recognition device 2-1, the speech recognition device 2-2 of English, the speech recognition device 2-3 of German, etc. this Sample be respectively provided with the speech recognition device 21 of the various language identifying dictionary of various language, 22,23,. Here, with use these multiple speech recognition devices 21,22,23, the speech recognition section that is connected in parallel Illustrate in case of 2.
Then, when the voice exported from voice acquisition unit 1 is identified by speech recognition section 2, make corresponding to The speech recognition device 21,22,23 of polyglot enters concurrently with respective identification dictionary (not shown) Action is made, utilize each speech recognition device 21,22,23, polyglot is carried out respectively speech recognition, And its result is exported to control portion 4.Now, each speech recognition device 21,22,23, know in output The matching degree of this identification is also exported while other result.
Control portion 4 according to by multiple speech recognition device 21,22,23, matching degree is in the result that identifies High result, determines the language of identified voice, is extracted by language the highest for the matching degree by this identification The song title of melody (content), artist name etc. (identification data) be saved in information storage part 5.
Alternatively, it is also possible to as shown in figure 14, use following speech recognition section 2 to replace the language shown in Figure 13 Sound identification part 2, i.e. utilize a speech recognition device 20 switch multiple speech recognition dictionary 20-1,20-2, 20-3, thus it is identified.
Then, utilizing the flow chart shown in Figure 15, the action to the self-recording unit of embodiment 6 is said Bright.
First, voice acquisition unit 1 obtains, by circuit input, the voice (step ST81) inputted by audio frequency apparatus. Now, in the case of analog format, audio frequency apparatus the voice inputted is carried out A/D conversion, such as, is converted to PCM format, to obtain as numerical data.
Then, the speech data that voice acquisition unit 1 is got by speech recognition section 2 is identified, with character string Mode exports recognition result.Now, on the basis of comparing with fixed phrase storage part 3, big by carrying out Vocabulary continuous speech identification, thus extracts song title and artist name (step ST82).
Control portion 4 obtains the accurate of the voice of the various language being identified in expression speech recognition section 2 the most simultaneously The matching degree of degree (most like degree), matching degree based on this identification determines the language (step of song title, artist name Rapid ST83).Such as, language the highest for matching degree is defined as the language of song title, artist name.Thus, logical Cross multilingual speech recognition dictionary and be prevented from the speech recognition that precision is relatively low, even the song of foreign country Name, artist name, it is also possible to be correctly identified.
Further, the matching degree of the speech recognition of the language that control portion 4 is determined in step ST83 is in the value of regulation Time above (situation that step ST84 is yes), make test section 6 action of content interval, carry out step ST85~SST90 Process.
It addition, for step ST85~the process of ST90, due to the step shown in Fig. 4 of embodiment 1 The process of ST13~ST18 is identical, therefore omits the description.
It addition, in step ST83, as the language determining song title, artist name based on the matching degree identified The method of speech, it is contemplated that following various methods: such as to all of multiple language possessing speech recognition dictionary Speech carries out speech recognition, and the matching degree identifying them compares, the method determining the language that matching degree is the highest; And preset the threshold value of the matching degree of identification, if the matching degree identified is more than set threshold value, judge For being this language, remaining language is not the most carried out that language identification is such determines method etc., it is possible to use wherein A certain method.
As it has been described above, according to present embodiment 6, in addition to the effect of embodiment 1, carry out employing various The speech recognition of the speech recognition engine of language, determines language by matching degree based on this identification, even if thus It is the song title of foreign language, artist name, it is also possible to be correctly identified and preserve.
It addition, in the above-described embodiment, the situation with content as melody, in case of being music content It is illustrated, but is not limited to music content, for example, it is possible to extract the interval of the content about sports relay also Carry out record, the interval of the content about talk show can be extracted and carry out record, it is also possible to extract about record The interval of the content of sheet also carries out record.
As long as the self-recording unit of the present invention is able to receive that the device of the played data of radio, television set, Even if do not possess with the outside communication unit communicated in the case of, the poor environment of the connection status of network Under, also can be suitable for.
It addition, the present application can carry out independent assortment to each embodiment in the range of its invention, or right Each embodiment be formed arbitrarily the arbitrary element that key element carries out deforming or omit in each embodiment.
Industrial practicality
As long as the self-recording unit of the present invention is able to receive that the device of the played data of radio, television set, Even if do not possess with the outside communication unit communicated in the case of, the poor environment of the connection status of network Under, also can be suitable for.
Label declaration
1 voice acquisition unit, 2 speech recognition sections, 3 fixed phrase storage parts, 4 control portions, 5 information storages Portion, 6 content interval test sections, 7 video speech record portions, 8 input units, 9 output units, 20,21,22, 23, speech recognition device, 20-1,20-2,20-3, identification dictionary.

Claims (6)

1. a self-recording unit, it is characterised in that including:
Voice acquisition unit, this voice acquisition unit detects and obtains the voice of the identification data comprising content and this content from played data;
Fixed phrase storage part, fixed phrase when introducing described content is stored by this fixed phrase storage part;
Speech recognition section, described voice acquisition unit the speech data got is identified by this speech recognition section, and based on the fixed phrase stored in this recognition result and described fixed phrase storage part, extracts the identification data of described content and export;
Control portion, this control portion, in the case of the identification data receiving described content from described speech recognition section, sends start time and the instruction of finish time detecting described content;
Content interval test section, this content interval test section, according to the instruction from described control portion, the speech data got based on described voice acquisition unit, detects start time and the finish time of described content;
Video speech record portion, the content in content interval between start time and finish time that test section detection in described content interval is obtained by this video speech record portion carries out record;And
Information storage part, this information storage part at least stores the content and the identification data of described content obtained by described video speech record portion record,
The identification data of described content are recorded, with described video speech record portion, the corresponding association of content obtained by described control portion, and are saved in described information storage part.
2. self-recording unit as claimed in claim 1, it is characterised in that
The packet that described information storage part is stored contains the number of times obtaining described content,
The data that described control portion is stored by referring to described information storage part, time only more than the number of times that the number of times obtaining described content is regulation, by corresponding with described content for the identification data of described content association, and are saved in described information storage part.
3. self-recording unit as claimed in claim 1, it is characterised in that
Described speech recognition section also exports the matching degree of this identification while exporting described recognition result,
Described control portion only when the matching degree of described identification is more than the value of regulation, by corresponding with described content for the identification data of described content association, and is saved in described information storage part.
4. self-recording unit as claimed in claim 1, it is characterised in that also include:
Accept the input unit of the operation input from user;And
The output unit of data is presented to described user,
Described control portion is by corresponding with described content for the identification data of described content association and when being saved in described information storage part, it is confirmed whether to carry out described preservation to described user via described output unit, receive via described input unit mean input to be preserved time, by corresponding with described content for the identification data of described content association and be saved in described information storage part, receive via described input unit mean the input not preserved time, only the identification data of described content are saved in described information storage part.
5. self-recording unit as claimed in claim 1, it is characterised in that
The data that described control portion is stored by referring to described information storage part, differentiate whether described information storage part exists the data identical with the identification data of the described content extracted, if existing, the tonequality of the content then content that described video speech record portion record obtains preserved with described information storage part compares, and in the case of the tonequality of content that only obtains at described video speech record portion record is higher, the content obtained by this video speech record portion record is covered the content that described information storage part is preserved, and preserves.
6. self-recording unit as claimed in claim 1, it is characterised in that
Described speech recognition section has the identification dictionary of polyglot, carries out speech recognition by described polyglot, also exports the matching degree of this identification while exporting this recognition result,
Described control portion matching degree based on described identification determines the language of the identification data of described content, by the identification data association corresponding with described content of the content extracted according to this language determined, and is saved in described information storage part.
CN201280073736.0A 2012-06-04 2012-06-04 Self-recording unit Expired - Fee Related CN104350545B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/003652 WO2013183078A1 (en) 2012-06-04 2012-06-04 Automatic recording device

Publications (2)

Publication Number Publication Date
CN104350545A CN104350545A (en) 2015-02-11
CN104350545B true CN104350545B (en) 2016-10-05

Family

ID=49711508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201280073736.0A Expired - Fee Related CN104350545B (en) 2012-06-04 2012-06-04 Self-recording unit

Country Status (3)

Country Link
JP (1) JP5591428B2 (en)
CN (1) CN104350545B (en)
WO (1) WO2013183078A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015161632A (en) * 2014-02-28 2015-09-07 富士通テン株式会社 Image display system, head-up display device, image display method, and program
US11328727B2 (en) * 2017-03-31 2022-05-10 Optim Corporation Speech detail recording system and method
JP2019200393A (en) * 2018-05-18 2019-11-21 シャープ株式会社 Determination device, electronic apparatus, response system, method for controlling determination device, and control program
JP7009338B2 (en) * 2018-09-20 2022-01-25 Tvs Regza株式会社 Information processing equipment, information processing systems, and video equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1104392A (en) * 1993-12-21 1995-06-28 罗伊·J·曼科维茨 Apparatus and method for identifying broadcast programs and accessing information relating thereto
CN1726489A (en) * 2002-10-28 2006-01-25 格雷斯诺特有限公司 Personal audio recording system
CN101611578A (en) * 2006-12-18 2009-12-23 Ubc媒体集团 The method of structure and deal with data file request
CN101996627A (en) * 2009-08-21 2011-03-30 索尼公司 Speech processing apparatus, speech processing method and program

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003085884A (en) * 2001-09-14 2003-03-20 Pioneer Electronic Corp Information recording device
JP2007219178A (en) * 2006-02-16 2007-08-30 Sony Corp Musical piece extraction program, musical piece extraction device, and musical piece extraction method
JP4442585B2 (en) * 2006-05-11 2010-03-31 三菱電機株式会社 Music section detection method and apparatus, and data recording method and apparatus
JP2011223205A (en) * 2010-04-07 2011-11-04 Onkyo Corp Broadcast recording apparatus and program for the same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1104392A (en) * 1993-12-21 1995-06-28 罗伊·J·曼科维茨 Apparatus and method for identifying broadcast programs and accessing information relating thereto
CN1726489A (en) * 2002-10-28 2006-01-25 格雷斯诺特有限公司 Personal audio recording system
CN101611578A (en) * 2006-12-18 2009-12-23 Ubc媒体集团 The method of structure and deal with data file request
CN101996627A (en) * 2009-08-21 2011-03-30 索尼公司 Speech processing apparatus, speech processing method and program

Also Published As

Publication number Publication date
WO2013183078A1 (en) 2013-12-12
CN104350545A (en) 2015-02-11
JP5591428B2 (en) 2014-09-17
JPWO2013183078A1 (en) 2016-01-21

Similar Documents

Publication Publication Date Title
CN108305632A (en) A kind of the voice abstract forming method and system of meeting
CN1333363C (en) Audio signal processing apparatus and audio signal processing method
US8209169B2 (en) Synchronization of an input text of a speech with a recording of the speech
CN106373598B (en) The control method and device of audio replay
US10529340B2 (en) Voiceprint registration method, server and storage medium
CN107305541A (en) Speech recognition text segmentation method and device
CN104350545B (en) Self-recording unit
CN102937959A (en) Automatically creating a mapping between text data and audio data
CN107274916A (en) The method and device operated based on voiceprint to audio/video file
CN109448460A (en) One kind reciting detection method and user equipment
CN107103915A (en) A kind of audio data processing method and device
CN111341305A (en) Audio data labeling method, device and system
CN107644637A (en) Phoneme synthesizing method and device
CN108305611B (en) Text-to-speech method, device, storage medium and computer equipment
US9691389B2 (en) Spoken word generation method and system for speech recognition and computer readable medium thereof
US20130030794A1 (en) Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof
CN103053173B (en) Interest interval determines that device, interest interval determine that method and interest interval determine integrated circuit
CN101326571B (en) Audio recognizing device
CN105895102A (en) Recording editing method and recording device
CN113450774A (en) Training data acquisition method and device
CN113782026A (en) Information processing method, device, medium and equipment
US7680654B2 (en) Apparatus and method for segmentation of audio data into meta patterns
CN109635151A (en) Establish the method, apparatus and computer equipment of audio retrieval index
Veiga et al. Towards automatic classification of speech styles
CN114078470A (en) Model processing method and device, and voice recognition method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20161005

Termination date: 20200604

CF01 Termination of patent right due to non-payment of annual fee