TW548631B - System, method, and article of manufacture for a voice recognition system for identity authentication in order to gain access to data on the Internet - Google Patents
System, method, and article of manufacture for a voice recognition system for identity authentication in order to gain access to data on the Internet Download PDFInfo
- Publication number
- TW548631B TW548631B TW89117686A TW89117686A TW548631B TW 548631 B TW548631 B TW 548631B TW 89117686 A TW89117686 A TW 89117686A TW 89117686 A TW89117686 A TW 89117686A TW 548631 B TW548631 B TW 548631B
- Authority
- TW
- Taiwan
- Prior art keywords
- user
- voice
- data
- speech
- signal
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 111
- 238000004519 manufacturing process Methods 0.000 title abstract 2
- 238000012795 verification Methods 0.000 claims abstract description 11
- 238000011049 filling Methods 0.000 claims description 100
- 230000002079 cooperative effect Effects 0.000 claims description 77
- 238000004590 computer program Methods 0.000 claims description 7
- 230000008451 emotion Effects 0.000 description 89
- 238000012545 processing Methods 0.000 description 61
- 230000008859 change Effects 0.000 description 52
- 230000006870 function Effects 0.000 description 42
- 238000012360 testing method Methods 0.000 description 42
- 238000009434 installation Methods 0.000 description 41
- 238000004364 calculation method Methods 0.000 description 40
- 238000004891 communication Methods 0.000 description 38
- 241000282472 Canis lupus familiaris Species 0.000 description 32
- 238000004458 analytical method Methods 0.000 description 29
- 238000004422 calculation algorithm Methods 0.000 description 27
- 230000008569 process Effects 0.000 description 26
- 238000005516 engineering process Methods 0.000 description 25
- 230000002996 emotional effect Effects 0.000 description 24
- 238000012986 modification Methods 0.000 description 24
- 230000004048 modification Effects 0.000 description 24
- 238000001228 spectrum Methods 0.000 description 22
- 230000005540 biological transmission Effects 0.000 description 21
- 230000000875 corresponding effect Effects 0.000 description 20
- 230000015654 memory Effects 0.000 description 19
- 230000004044 response Effects 0.000 description 19
- 238000003860 storage Methods 0.000 description 18
- 208000019901 Anxiety disease Diseases 0.000 description 17
- 241000282414 Homo sapiens Species 0.000 description 17
- 230000036506 anxiety Effects 0.000 description 17
- 241000282326 Felis catus Species 0.000 description 16
- 210000001260 vocal cord Anatomy 0.000 description 15
- 238000005070 sampling Methods 0.000 description 14
- 239000011295 pitch Substances 0.000 description 13
- 230000007704 transition Effects 0.000 description 13
- 230000007423 decrease Effects 0.000 description 12
- 238000007726 management method Methods 0.000 description 12
- 210000003205 muscle Anatomy 0.000 description 12
- 230000009471 action Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 11
- 238000009826 distribution Methods 0.000 description 11
- 230000000694 effects Effects 0.000 description 11
- 238000012856 packing Methods 0.000 description 11
- 230000005236 sound signal Effects 0.000 description 11
- 230000001755 vocal effect Effects 0.000 description 11
- 230000006399 behavior Effects 0.000 description 10
- 239000000919 ceramic Substances 0.000 description 10
- 230000001186 cumulative effect Effects 0.000 description 10
- 238000013461 design Methods 0.000 description 10
- 238000011161 development Methods 0.000 description 10
- 230000003595 spectral effect Effects 0.000 description 10
- 101100269674 Mus musculus Alyref2 gene Proteins 0.000 description 9
- 239000003990 capacitor Substances 0.000 description 9
- 230000001276 controlling effect Effects 0.000 description 9
- 230000008909 emotion recognition Effects 0.000 description 9
- 238000000605 extraction Methods 0.000 description 9
- 238000013528 artificial neural network Methods 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 8
- 150000001875 compounds Chemical class 0.000 description 8
- 230000006835 compression Effects 0.000 description 8
- 238000007906 compression Methods 0.000 description 8
- 230000005284 excitation Effects 0.000 description 8
- 238000005259 measurement Methods 0.000 description 8
- 238000012546 transfer Methods 0.000 description 8
- 238000013475 authorization Methods 0.000 description 7
- 230000002829 reductive effect Effects 0.000 description 7
- 238000013519 translation Methods 0.000 description 7
- 238000001514 detection method Methods 0.000 description 6
- 230000014509 gene expression Effects 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 238000012544 monitoring process Methods 0.000 description 6
- 210000000214 mouth Anatomy 0.000 description 6
- 238000004806 packaging method and process Methods 0.000 description 6
- 230000008447 perception Effects 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 239000000243 solution Substances 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 238000012937 correction Methods 0.000 description 5
- 230000005281 excited state Effects 0.000 description 5
- 238000001914 filtration Methods 0.000 description 5
- 210000004072 lung Anatomy 0.000 description 5
- 239000000463 material Substances 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 241001465754 Metazoa Species 0.000 description 4
- 239000002585 base Substances 0.000 description 4
- 239000002131 composite material Substances 0.000 description 4
- 238000012790 confirmation Methods 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 235000013305 food Nutrition 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000010355 oscillation Effects 0.000 description 4
- 238000007639 printing Methods 0.000 description 4
- 230000000717 retained effect Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 241000219793 Trifolium Species 0.000 description 3
- 241000700605 Viruses Species 0.000 description 3
- 230000009849 deactivation Effects 0.000 description 3
- 210000003811 finger Anatomy 0.000 description 3
- 210000003128 head Anatomy 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 230000000670 limiting effect Effects 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000036651 mood Effects 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 230000002207 retinal effect Effects 0.000 description 3
- 210000002784 stomach Anatomy 0.000 description 3
- 241000283690 Bos taurus Species 0.000 description 2
- 201000011001 Ebola Hemorrhagic Fever Diseases 0.000 description 2
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 206010049816 Muscle tightness Diseases 0.000 description 2
- 244000299461 Theobroma cacao Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000000747 cardiac effect Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 210000004704 glottis Anatomy 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 210000000867 larynx Anatomy 0.000 description 2
- 238000011068 loading method Methods 0.000 description 2
- 239000002184 metal Substances 0.000 description 2
- 229910052751 metal Inorganic materials 0.000 description 2
- 210000003928 nasal cavity Anatomy 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000007935 neutral effect Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 230000000241 respiratory effect Effects 0.000 description 2
- 230000029058 respiratory gaseous exchange Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000010183 spectrum analysis Methods 0.000 description 2
- 210000001519 tissue Anatomy 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 241000251468 Actinopterygii Species 0.000 description 1
- 235000001674 Agaricus brunnescens Nutrition 0.000 description 1
- 244000291564 Allium cepa Species 0.000 description 1
- 235000002732 Allium cepa var. cepa Nutrition 0.000 description 1
- 240000002234 Allium sativum Species 0.000 description 1
- 241000272525 Anas platyrhynchos Species 0.000 description 1
- 244000268002 Andropogon fragilis Species 0.000 description 1
- 241000272814 Anser sp. Species 0.000 description 1
- 244000075850 Avena orientalis Species 0.000 description 1
- 235000007319 Avena orientalis Nutrition 0.000 description 1
- 241001674044 Blattodea Species 0.000 description 1
- 241000167854 Bourreria succulenta Species 0.000 description 1
- 229910001369 Brass Inorganic materials 0.000 description 1
- 244000056139 Brassica cretica Species 0.000 description 1
- 235000003351 Brassica cretica Nutrition 0.000 description 1
- 235000003343 Brassica rupestris Nutrition 0.000 description 1
- 241000426451 Camponotus modoc Species 0.000 description 1
- 241000283707 Capra Species 0.000 description 1
- 235000002566 Capsicum Nutrition 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 235000005979 Citrus limon Nutrition 0.000 description 1
- 244000131522 Citrus pyriformis Species 0.000 description 1
- 240000000560 Citrus x paradisi Species 0.000 description 1
- 241000272201 Columbiformes Species 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 229920000742 Cotton Polymers 0.000 description 1
- 206010011224 Cough Diseases 0.000 description 1
- 241000219112 Cucumis Species 0.000 description 1
- 235000015510 Cucumis melo subsp melo Nutrition 0.000 description 1
- 240000008067 Cucumis sativus Species 0.000 description 1
- 235000010799 Cucumis sativus var sativus Nutrition 0.000 description 1
- 244000000626 Daucus carota Species 0.000 description 1
- 235000002767 Daucus carota Nutrition 0.000 description 1
- 206010011878 Deafness Diseases 0.000 description 1
- 206010012735 Diarrhoea Diseases 0.000 description 1
- 241000255925 Diptera Species 0.000 description 1
- 241001115402 Ebolavirus Species 0.000 description 1
- 241000283074 Equus asinus Species 0.000 description 1
- 241000287828 Gallus gallus Species 0.000 description 1
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 240000005979 Hordeum vulgare Species 0.000 description 1
- 235000007340 Hordeum vulgare Nutrition 0.000 description 1
- 206010020400 Hostility Diseases 0.000 description 1
- 235000019687 Lamb Nutrition 0.000 description 1
- 235000007688 Lycopersicon esculentum Nutrition 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 101100537098 Mus musculus Alyref gene Proteins 0.000 description 1
- 101100188802 Mus musculus Hcrt gene Proteins 0.000 description 1
- 240000005561 Musa balbisiana Species 0.000 description 1
- 235000018290 Musa x paradisiaca Nutrition 0.000 description 1
- 244000061176 Nicotiana tabacum Species 0.000 description 1
- 235000002637 Nicotiana tabacum Nutrition 0.000 description 1
- 240000007817 Olea europaea Species 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 241000282320 Panthera leo Species 0.000 description 1
- 241000282376 Panthera tigris Species 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 239000006002 Pepper Substances 0.000 description 1
- 241000288049 Perdix perdix Species 0.000 description 1
- 235000016761 Piper aduncum Nutrition 0.000 description 1
- 240000003889 Piper guineense Species 0.000 description 1
- 235000017804 Piper guineense Nutrition 0.000 description 1
- 235000008184 Piper nigrum Nutrition 0.000 description 1
- 244000018633 Prunus armeniaca Species 0.000 description 1
- 235000009827 Prunus armeniaca Nutrition 0.000 description 1
- 235000014443 Pyrus communis Nutrition 0.000 description 1
- 101100386054 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) CYS3 gene Proteins 0.000 description 1
- 244000007853 Sarothamnus scoparius Species 0.000 description 1
- 241000270295 Serpentes Species 0.000 description 1
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- CDBYLPFSWZWCQE-UHFFFAOYSA-L Sodium Carbonate Chemical compound [Na+].[Na+].[O-]C([O-])=O CDBYLPFSWZWCQE-UHFFFAOYSA-L 0.000 description 1
- 240000003768 Solanum lycopersicum Species 0.000 description 1
- 244000061456 Solanum tuberosum Species 0.000 description 1
- 235000002595 Solanum tuberosum Nutrition 0.000 description 1
- 229920002472 Starch Polymers 0.000 description 1
- 229910000831 Steel Inorganic materials 0.000 description 1
- 244000269722 Thea sinensis Species 0.000 description 1
- 235000009470 Theobroma cacao Nutrition 0.000 description 1
- ATJFFYVFTNAWJD-UHFFFAOYSA-N Tin Chemical compound [Sn] ATJFFYVFTNAWJD-UHFFFAOYSA-N 0.000 description 1
- 206010044565 Tremor Diseases 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 244000098338 Triticum aestivum Species 0.000 description 1
- 235000009754 Vitis X bourquina Nutrition 0.000 description 1
- 235000012333 Vitis X labruscana Nutrition 0.000 description 1
- 240000006365 Vitis vinifera Species 0.000 description 1
- 235000014787 Vitis vinifera Nutrition 0.000 description 1
- 102100029469 WD repeat and HMG-box DNA-binding protein 1 Human genes 0.000 description 1
- 101710097421 WD repeat and HMG-box DNA-binding protein 1 Proteins 0.000 description 1
- 240000008042 Zea mays Species 0.000 description 1
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 1
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 1
- FJJCIZWZNKZHII-UHFFFAOYSA-N [4,6-bis(cyanoamino)-1,3,5-triazin-2-yl]cyanamide Chemical compound N#CNC1=NC(NC#N)=NC(NC#N)=N1 FJJCIZWZNKZHII-UHFFFAOYSA-N 0.000 description 1
- 239000000370 acceptor Substances 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 210000003423 ankle Anatomy 0.000 description 1
- 101150095908 apex1 gene Proteins 0.000 description 1
- 230000036528 appetite Effects 0.000 description 1
- 235000019789 appetite Nutrition 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 238000005311 autocorrelation function Methods 0.000 description 1
- 210000003403 autonomic nervous system Anatomy 0.000 description 1
- 239000003637 basic solution Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 235000013361 beverage Nutrition 0.000 description 1
- QKSKPIVNLNLAAV-UHFFFAOYSA-N bis(2-chloroethyl) sulfide Chemical compound ClCCSCCCl QKSKPIVNLNLAAV-UHFFFAOYSA-N 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000036772 blood pressure Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 239000010951 brass Substances 0.000 description 1
- 235000008429 bread Nutrition 0.000 description 1
- 235000021152 breakfast Nutrition 0.000 description 1
- 239000011449 brick Substances 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 235000013351 cheese Nutrition 0.000 description 1
- 235000019693 cherries Nutrition 0.000 description 1
- 210000000038 chest Anatomy 0.000 description 1
- 235000019219 chocolate Nutrition 0.000 description 1
- 235000019506 cigar Nutrition 0.000 description 1
- 235000019504 cigarettes Nutrition 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 235000009508 confectionery Nutrition 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 235000005822 corn Nutrition 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 101150002418 cpi-2 gene Proteins 0.000 description 1
- 239000006071 cream Substances 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000011157 data evaluation Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 235000021185 dessert Nutrition 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000000428 dust Substances 0.000 description 1
- 230000004064 dysfunction Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 229920001971 elastomer Polymers 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 210000001097 facial muscle Anatomy 0.000 description 1
- 210000004905 finger nail Anatomy 0.000 description 1
- 210000002683 foot Anatomy 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 235000011389 fruit/vegetable juice Nutrition 0.000 description 1
- 235000004611 garlic Nutrition 0.000 description 1
- 239000010437 gem Substances 0.000 description 1
- 229910001751 gemstone Inorganic materials 0.000 description 1
- 230000008571 general function Effects 0.000 description 1
- 239000003292 glue Substances 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 210000004209 hair Anatomy 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 235000012907 honey Nutrition 0.000 description 1
- 235000015243 ice cream Nutrition 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- JEIPFZHSYJVQDO-UHFFFAOYSA-N iron(III) oxide Inorganic materials O=[Fe]O[Fe]=O JEIPFZHSYJVQDO-UHFFFAOYSA-N 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 210000003127 knee Anatomy 0.000 description 1
- 239000010985 leather Substances 0.000 description 1
- 210000002414 leg Anatomy 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 241000238565 lobster Species 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005461 lubrication Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 235000013372 meat Nutrition 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 235000013336 milk Nutrition 0.000 description 1
- 239000008267 milk Substances 0.000 description 1
- 210000004080 milk Anatomy 0.000 description 1
- 230000005405 multipole Effects 0.000 description 1
- 230000004118 muscle contraction Effects 0.000 description 1
- 230000036640 muscle relaxation Effects 0.000 description 1
- 235000010460 mustard Nutrition 0.000 description 1
- 210000000282 nail Anatomy 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 235000012149 noodles Nutrition 0.000 description 1
- 210000001331 nose Anatomy 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 235000012771 pancakes Nutrition 0.000 description 1
- 210000003695 paranasal sinus Anatomy 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000035699 permeability Effects 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 239000002574 poison Substances 0.000 description 1
- 231100000614 poison Toxicity 0.000 description 1
- 235000020004 porter Nutrition 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 210000001747 pupil Anatomy 0.000 description 1
- 229910052704 radon Inorganic materials 0.000 description 1
- SYUHGPGVQRZVTB-UHFFFAOYSA-N radon atom Chemical compound [Rn] SYUHGPGVQRZVTB-UHFFFAOYSA-N 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 239000012925 reference material Substances 0.000 description 1
- 238000002310 reflectometry Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 210000001210 retinal vessel Anatomy 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 239000012266 salt solution Substances 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 235000015067 sauces Nutrition 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000007958 sleep Effects 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 239000000344 soap Substances 0.000 description 1
- 235000014347 soups Nutrition 0.000 description 1
- 235000013599 spices Nutrition 0.000 description 1
- 235000019698 starch Nutrition 0.000 description 1
- 239000008107 starch Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000010959 steel Substances 0.000 description 1
- 230000004936 stimulating effect Effects 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
- 101150035983 str1 gene Proteins 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000035922 thirst Effects 0.000 description 1
- 210000000115 thoracic cavity Anatomy 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 210000000689 upper leg Anatomy 0.000 description 1
- 238000002255 vaccination Methods 0.000 description 1
- 235000013311 vegetables Nutrition 0.000 description 1
- 235000021419 vinegar Nutrition 0.000 description 1
- 239000000052 vinegar Substances 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 230000003442 weekly effect Effects 0.000 description 1
- 239000002023 wood Substances 0.000 description 1
- 210000002268 wool Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
548631 A7 B7 五、發明說明(1 ) 發明領域 本發明是關於語音辨識系統,尤其是根據語音分析識別使 用者以准許存取網路上資料的系統。 發明背景 有許多的應用時常要使用者存取具有高安全性要求的系 統,這樣的應用包括,但不限於,金融服務例如股票交易 確認與執行、銀行帳戶查詢與電匯、以網際網路爲基礎的 電子商務、電腦網路、保險箱、住家、門、電梯、車輛及 其他高價値的設備等,本文說明書和申請專利範圍將這些 稱爲「安全系統」。 目前常用來識別個人的實體記號鑑證裝置,例如機密卡或 有限存取卡,都有可能遺失、遭竊、出借給未經授權的個 人及/或複製,因此安全保護稍嫌不足。 另一種更精密的鑑證方法稱爲生物測定鑑證,可以提供更 高的安全保護。利用生物測定鑑證辨識牽涉到鑑證獨特的 身體特徵,例如指紋、視網膜掃描、臉部辨認及語音模式 鑑證。 請注意,在本文中和語音分析技術上,語音模式鑑證和語 音模式辨識並不相同。在語音模式辨識上,說話者先說出 一個詞語(例如一個字),系統從預先定義的詞彙中選取以 判定說出的字。因此語音辨識提供的是識別所說字詞的能 力,並不是識別說話者的能力。 視網膜掃描的基礎在於每個人的視網膜血管圖案都是唯一 4Hickman200021tw; AND1P115.TW 1 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 裝---- 訂---------線· 經濟部智慧財產局員工消費合作社印製 548631 A7 B7548631 A7 B7 V. Description of the Invention (1) Field of the Invention The present invention relates to a speech recognition system, especially a system that recognizes a user based on speech analysis to allow access to data on the network. BACKGROUND OF THE INVENTION There are many applications that often require users to access systems with high security requirements. Such applications include, but are not limited to, financial services such as confirmation and execution of stock transactions, bank account inquiries and wire transfers, Internet-based E-commerce, computer networks, safes, homes, doors, elevators, vehicles, and other high-priced equipment, etc., the scope of this description and the patent application will refer to these as "security systems." The physical token authentication devices currently used to identify individuals, such as secret cards or limited access cards, can be lost, stolen, lent to unauthorized individuals and / or copied, so security protection is somewhat inadequate. Another more sophisticated authentication method is called biometric authentication, which can provide higher security protection. The use of biometric authentication involves identification of unique physical characteristics, such as fingerprints, retinal scans, facial recognition, and speech pattern authentication. Please note that speech pattern authentication and speech pattern recognition are not the same in this paper and speech analysis techniques. In speech pattern recognition, the speaker first speaks a word (such as a word), and the system selects a pre-defined vocabulary to determine the spoken word. So speech recognition provides the ability to recognize the words spoken, not the ability to identify the speaker. The basis of retinal scanning is that each person's retinal blood vessel pattern is unique. 4Hickman200021tw; AND1P115.TW 1 This paper size applies to Chinese National Standard (CNS) A4 (210 X 297 mm) Page) Packing ---- Order --------- Line · Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7
五、發明說明(xJ 的圖案’是一生都不會改變的事實。雖然視網膜掃描提供 很局的安全性,但是因爲價格昂貴,而且需要複雜的硬體 和軟體才能實行,所以限制了實用性。 指紋和臉部辨識也需要昂貴且複雜的硬體和軟體才能實 行。 語音鑑定,也稱爲語音鑑證、語音模式鑑證、說話者識別 鑑定及聲紋,可以提供識別說話者的功能。語音鑑定和語 音鑑證在本文中會交替使用,兩者意思相同。語音鑑定的 技術在下列美國專利中有廣泛的討論:美國專利號碼第 5,502,759 ; 5,499,288 ; 5,414,755 ; 5,365,574 ; 5,297,194 ; 5,216,720 ; 5,142,565 ; 5,127,043 ; 5,054,083 ; 5,023,901 ; 4,468,2〇4及4,100,370號,此處提到只是作爲參考。這些 專利敘述了許多語音鑑定的方法。 語音鑑證是完全根據說話者的發音識別說話者。例如,使 用特徵摘取和模式比對演算法則可以鑑定說話者的假設身 份,其中模式比對是利用數位化接收聲紋和先前儲存的參 考樣本比對。語音處理使用到的特性包括,舉例而言,音 調頻率、功率頻譜値、頻譜係數及線性預測編碼,請參閱 B· S. Atal 的(1976) Automatic recognition of speakers from their voice. Proc. IEEE,Vol· 64, ρρ· 460-475,此處 提到只是作爲參考。 其他語音識別的技術包括,但不限於,神經網路處理、語 音模型與參考集的比較、使用可選擇性調整訊號臨界値的 密碼鑑定、以及同時的語音識別與鑑定。 (請先閱讀背面之注音?事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製V. Description of the invention (The pattern of the xJ is a fact that will not change in a lifetime. Although the retinal scan provides a lot of security, it is limited in practicality because it is expensive and requires complex hardware and software to implement it. Fingerprints and face recognition also require expensive and complex hardware and software to perform. Voice authentication, also known as voice authentication, voice mode authentication, speaker identification and voiceprint, can provide speaker identification functions. Voice identification and Voice authentication will be used interchangeably in this article, and both have the same meaning. The technology of voice authentication is widely discussed in the following U.S. patents: U.S. Patent No. 5,502,759; 5,499,288; 5,414,755; 5,365,574; 5,297,194; 5,216,720; 5,142,565; 5,127,043; 5,054,083; 5,023,901; 4,468,204, and 4,100,370, which are mentioned here for reference only. These patents describe many methods of speech authentication. Speech authentication is based on the speaker's pronunciation to identify the speaker completely. For example, using feature extraction and pattern matching algorithms can identify speaker hypotheses Identity, where the pattern comparison is a comparison of digitally received voiceprints with previously stored reference samples. Features used in speech processing include, for example, tone frequency, power spectrum chirp, spectral coefficients, and linear predictive coding, see B. S. Atal's (1976) Automatic recognition of speakers from their voice. Proc. IEEE, Vol · 64, ρ · 460-475, mentioned here for reference only. Other speech recognition technologies include, but are not limited to, Neural network processing, comparison of speech model with reference set, password authentication using selective adjustment of signal criticality, and simultaneous speech recognition and authentication. (Please read the note on the back? Matters before filling out this page) Printed by the Property Agency Staff Consumer Cooperative
4Hickman200021tw; AND1P115.TW 24Hickman200021tw; AND1P115.TW 2
經濟部智慧財產局員工消費合作社印製 548631 A7 五、發明說明(3) S· Furui 的(1991) Speaker dependent—feature extraction, recognition and processing techniques. Speech communications,Vol. 10, pp· 505-520 裡對最新的特徵分 類技術有詳細的說明,此處提到只是作爲參考。 依賴文字的說話者辨識法必須分析預定的發音,而無關文 字的辨識法則不必依賴任何特定的口說文字。但是,不論 哪一種情形,分類器都會產生說話者的代表量測値,再和 預先選取的臨界値比較。如果說話者的代表量測値低於臨 界値,就可以確認說話者的身份,否則就會宣告說話者是 假冒的人。 語音鑑定科技的效能相對較低,這是其遲遲未能進入市場 的主要原因之一。「相等錯誤率(EER)」是一種計算法, 牽涉到兩個參數:錯誤接受(錯誤存取許可)和錯誤拒絕 (允許的存取拒絕),兩者都會根據要求的安全存取程度變 化,不過,如下所示,兩者間有一定的取捨。最新語音鑑 定演算法(不論是依賴文字或無關文字)的EER値約在 2%左右。 藉由改變錯誤拒絕錯誤的臨界値,錯誤接受錯誤也會隨著 變化,其情形如 J. Guavain,L· Lamel 與 B· Prouts 的(March, 1995) LIMSI 1995 scientific report 中的圖 1 所示,此處提 及只是作爲參考。圖中有五個圖表,分別代表各種錯誤拒 絕率之間的關係(橫座標),產生的語音鑑定演算錯誤接 受率分別顯示 9.0%、8.3%、5.1%、4.4%及 3.5%等 EER 値。前面提到,錯誤拒絕率和錯誤接受率之間有一定的取 4Hickman200021tw; AND1P115.TW 3 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂---------. (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 五、發明說明(φ) 捨,其繪製出雙曲線的圖表,其中EER値較低的關係圖 表較接近兩軸。 (請先閱讀背面之注音?事項再填寫本頁) 因此,系統的錯誤拒絕率太低時,錯誤接受率就會太高, 反之亦然。 以語音爲基礎之安全系統的各種技術在下列美國專利中有 廣泛的討論:美國專利號碼第5,265,191 ; 5,245,694 ; 4,864,642 ; 4,865,072 ; 4,821,027 ; 4,797,672 ; 4,590,604 ; 4,534,056 ; 4,020,285 ; 4,013,837 ; 3,991,271 號;此處提 到只是作爲參考。這些專利說明了各種語音安全系統在不 同應用下實行的情形,例如電話網路、電腦網路、汽車及 電梯等。 但是,這些技術都無法提供所需要的足夠效能,因爲一旦 這些技術的錯誤拒絕率設定過低時,其高錯誤接受率將令 人無法接受,反之亦然。 根據建議,說話者鑑定的錯誤拒絕率必須在1%的範圍內, 錯誤接受率必須在0.1%範圍內,市場才能接受。 因此,對於改善的錯誤接受和拒絕率,有廣泛的需要,而 且,具有一可靠又安全之語音鑑證系統,會有相當大的好 處。 經濟部智慧財產局員工消費合作社印製 發明摘要 提供透過語音鑑定以允許使用者存取網路上資料的一種系 統、方法及製成品。當使用者要求存取資料,例如網站上 的資料時,提示使用者提供語音樣本。接著擷取使用者的 4Hickman200021tw; AND1P115.TW 4 本纸張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 經濟部智慧財產局員工消費合作社印制衣 五、發明說明(y) 註冊資訊。註冊資訊中包含了使用者語音的語音掃描。透 過網路接收使用者的語音樣本,再和語音掃插比較,藉以 查證使用者的身份。確定使用者的身份以後,再授予使用 者資料存取權。如果使用者的身份驗證未通過,將拒絕其 存取資料。 於本發明的一具體實施例中,會記錄使用者的語音以建立 語音掃描,然後再儲存語音掃描。這可形成註冊程序的一 部分。語音掃描中最好包含使用者所說的多組詞語以提高 安全性,並且利用比較交替詞語的方式鑑定身份,例如第 一個詞語如果無法確認使用者的身份,要提示使用者再唸 出其他的詞語。也可以選擇性地將採取語音樣本的時間和 曰期和使用者的語音樣本一起記錄。 圖式簡要說明 參考下列的詳細說明將可以更淸楚地了解本發明。各項說 明參照所附的圖式,其中: 圖1是本發明一具體實施例之硬體實施的示意圖; 圖2例示本發明一具體實施例的流程圖,其使用語音分 析偵測情緒; 圖3是顯示一 S70資料集平均辨識精確度的圖表; 圖4是說明一 S80資料集平均辨識精確度的圖表; 圖5是例示一 s90資料集平均辨識精確度的圖表; 圖6是使用統計値偵測情緒的本發明一具體實施例的流程 圖; 4Hickman200021tw; AND1P115.TW 5 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------«裝--------訂---------線· (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(G) 圖7是在商業環境下偵測語音中焦慮情緒以防詐欺的方法 流程圖; 圖8是根據本發明一具體實施例從一語音樣本中偵測情緒 的裝置的流程圖; 圖9是根據本發明一具體實施例從聲音產生圖像記錄的裝 置流程圖; 圖10是監測語音訊號中的情緖,並且根據偵測到的情緒 提供回饋的本發明一具體實施例的流程圖; 圖11是利用本發明具體實施例比較使用者語音訊號與電 腦偵測的情緒語音訊號以改進發明、使用者或兩者的情緒 辨識的流程圖; 圖12根據本發明具體實施例製作的語音辨識裝置的方塊 示意圖; 圖13是圖12中元素組件和儲存方塊的方塊示意圖; 圖14是根據本發明具體實施例具有生物監視器和前處理 器的一語音辨識系統; 圖15是圖14中生物監視器產生的生物訊號; 圖16是生物監視器內的電路; 圖17是前處理器的方塊圖; 圖18說明聲調修改和生物訊號之間的關係; 圖19是校準程式的流程圖; 圖20大致顯示本發明系統中達到改進的聲調期間候選集 部分的架構; 圖21是利用語音鑑定識別使用者,以允許使用者存取網 4Hickman200021tw; AND1P115.TW 6 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂---------線 (請先閱讀背面之注意事項再填寫本頁) 548631 經濟部智慧財產局員工消費合作社印制衣 A7 B7 五、發明說明(/;) 路上資料的本發明具體實施例的流程圖; 圖22是用於控制存取安全系統的語音鑑證系統的基本觀 念; 圖23是根據本發明建立說話者身份的系統; 圖24是根據本發明識別說話者的示範系統的第一步驟; 圖25是圖24所述系統的第二步驟; 圖26是圖24所述系統的第三步驟; 圖27是圖24所述說話者識別系統的第四步驟; 圖28是根據語音訊號決定人員是否夠合格通過邊境之方 法的流程圖; 圖29是根據本發明一面向之辨識說話者的方法; 圖30是根據本發明一面向之辨識說話者的另一種方法; 圖31是說話者辨識系統的基本元件; 圖32是將資訊儲存在圖31中說話者辨識資訊儲存單元裡 的範例; 圖33是根據本發明一具體實施例之一說話者辨識系統的 較佳實施例; 圖34是圖33的說話者辨識系統具體實施例的更多細節; 圖35是在網際網路上處理資料的語音指令辨識法流程 圖; 圖36是根據本發明一具體實施例,利用語音訊號在網路 上控制內容和應用程式的資訊系統一般方塊圖; 圖37A、37B及37C —起構成採用本發明一具體實施例的 示範娛樂提供系統的方塊圖; 4Hickman200021tw; AND1P115.TW 7 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------·裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明(g) 圖38是根據包含語言翻譯功能的本發明具體實施例套用 規則以形成可接受句子的方法;及 圖39是包含語言翻譯功能的本發明具體實施例的代表性 硬體實施。 詳細說明 根據本發明的至少一個具體實施例,提供透過語音分析和 語音辨識執行各種功能和活動的一系統。此系統可以根據 例如圖1所示之硬體實施而達成。另外,可以利用物件導 向程式設計(OOP)的軟體程式設計實行本發明具體實施 例的各種功能性和使用者介面功能。 硬體槪述 本發明之一較佳具體實施例的代表性硬體環境如圖1所 示,這是標準的工作站硬體組態,具有一個中央處理單元 110,例如微處理器,以及利用系統匯流排112互連的其 他許多單元。圖1所示的工作站包含了隨機存取記憶體 (RAM) 114、唯讀記憶體(ROM) 116、將磁碟儲存裝置120 之類周邊裝置連接至匯流排Π2的輸入/輸出配接器118、 將鍵盤124、滑鼠126、喇叭128、麥克風132及/或觸控 螢幕(未顯示)等其他使用者介面裝置連接至匯流排112 的使用者介面配接器122、將工作站連接至通訊網路(例 如資料處理網路)的通訊配接器134以及將匯流排112連 接至顯示裝置138的顯示器配接器136。工作站上通常會 4Hickman200021tw: AND1P115.TW 8 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明(q) 安裝一作業系統,例如Microsoft windows NT或 Windows/95 作業系統(〇S)、IBM OS/2 作業系統、MAC OS 或UNIX作業系統。 軟體槪述 物件導向程式設計(OOP)在開發複雜應用程式的使用上 曰益普及。隨著OOP逐漸成爲軟體設計與開發的主流’ 各種軟體解決方案都必須適應以使用OOP的優點。要將 OOP的原則套用至電子訊息傳送系統的訊息傳送介面, 必須爲訊息傳送介面提供一組OOP類別和物件。 OOP是使用物件開發電腦軟體的一組程序,包括分析問 題、設計系統以及組構程式的步驟。物件是包含資料和一 組相關結構與程序的軟體套件。由於其中包含了資料和一 組結構與程序,因此可以看成是一項自己自足的元件,不 需要其他額外的結構、程序或資料就能執行其特定的工 作。因此,OOP將電腦程式看成是一組極爲獨立的元件, 稱爲物件,每一個物件負責一項特定的工作。這種將資料、 結構及程序都包裝在一個元件或模組內的觀念稱爲包裝。 OOP元件大體上是可以連續使用的軟體模組,提供一個 符合某種物件模型的介面,可以在執行時透過元件整合架 構存取。元件整合架構是一組架構機制,可以讓不同處理 空間中的各個軟體模組互相利用彼此的能力或功能。通常 4Hickman200021tw; AND1P115.TW 9 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------·裝--------訂---------^_vl (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 五、發明說明σ ) (請先閱讀背面之注意事項再填寫本頁) 是假設一個共同的元件物件模型,在此模型上建立架構。 此處有必要區別物件和物件類別。物件是物件類別的單一 執行實例。物件類別通常直接稱爲類別,可以看成是一份 藍圖,根據此藍圖可以形成許多物件。 OOP可以讓程式設計師建立一物件,而該物件是其他物 件的一部份。例如,代表活塞引擎的物件和代表活塞的物 件之間就有複合關係。實際上,活塞引擎就是由活塞、汽 門及其他元件組成;活塞是活塞引擎中一個元素這件事實 在OOP裡可以利用兩個物件作邏輯與意義上的表示。 經濟部智慧財產局員工消費合作社印製 OOP也允許根據其他物件建立新物件。如果有兩個物件, 一個代表活塞引擎,另一個代表使用陶瓷活塞的活塞引 擎’兩者之間就不是複合的關係。陶瓷活塞引擎並不會構 成活塞引擎,只是比單純的活塞多了一項限制的一種活塞 引擎;它的活塞是陶瓷製成。此時,代表陶瓷活塞引擎的 物件稱爲衍生物件,繼承了代表活塞引擎物件的一切層 面,並且增加了更多的限制或明細。代表陶瓷活塞引擎的 物件「依附」於代表活塞引擎的物件。這些物件之間的關 係稱爲繼承。 當代表陶瓷活塞引擎的物件或類別繼承代表活塞引擎的物 件的所有層面時,也繼承了活塞引擎類別中定義的標準活 塞的熱特性。但是因爲陶瓷活塞引擎的熱特性通常和金屬 4Hickman200021tw; AND1P115.TW 10 ^紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 經濟部智慧財產局員工消費合作社印製 B7 五、發明說明(U ) 活塞不同,因此陶瓷活塞引擎會以陶瓷特定熱特性取代標 準的熱特性,捨棄原來的功能,改用與陶瓷活塞相關的新 功能。不同的活塞引擎有不同的特性,但是相關的基礎功 能相同(例如引擎的活塞數目、點火順序、潤滑等)。程 式設計師要存取任一個活塞引擎物件中的這些功能時,會 以相同的名稱叫用同樣的功能,但是每一種活塞引擎在相 同名稱的背後會有不同的/修訂的功能實行。在相同名稱 下隱藏不同功能實行的能力稱爲同質異像,可以大幅簡化 物件之間的溝通。 在複合關係、包裝、繼承及同質異像等各種觀念下,物件 幾乎可以代表真實世界中的一切。事實上,事物能否變成 物件導向軟體中的物件,唯一的限制就是我們對真實世界 的邏輯認知。以下是一些標準的類別: # 物件可以代表實體物件,例如交通流量模擬裡的汽 車、電路設計程式中的電氣元件、經濟模型裡的國 家或空中流量控制系統裡的飛機。 # 物件可以代表電腦-使用者環境裡的元素,例如視 窗、功能表或圖形物件。 # 物件可以代表存貨,例如人事檔案或城市經緯度的 表格。 # 物件可以代表使用者定義的資料類型,例如時間、 角度、複合數或平面上的點。 4Hickman200021tw; AND1P115.TW 11 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公爱) -----------·裝--------訂---------線· (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(\>) 由於OOP利用物件代表邏輯上可分離事物的能力如此強 大,因此能夠讓軟體開發人員設計及實行代表實物層面模 型的電腦程式,不論實物是物理實體、流程、系統或事物 的組合皆可。而且因爲物件可以代表一切,所以軟體開發 人員也可以建立未來在較大的軟體專案中當作元件的物 件。 如果90%的新OOP軟體程式都是利用現有的可重複使用 物件構成的既有元件所組成,那麼只剩下10%的新軟體 專案必須從頭撰寫及測試。因爲90%是取自經過廣泛測 試,可以重複使用的物件庫存,所以程式發生錯誤的可能 性只剩下10%。因此,OOP可以讓軟體人員利用已經建 構好的其他物件建立新物件。 此一程序非常類似利用組件和子組件建立一部複雜的機 器。而OOP科技則使軟體工程變得像是硬體工程,開發 人員可以利用物件型態的各種現有元件建構軟體。這一切 都有助於提升軟體品質及加快開發速度。 各種程式設計語言都已經開始完全支援OOP原理,例如 包裝、繼承、同質異像及複合關係。C++語言的出現使許 多商用軟體開發人員得以擁抱OOP。C++是一種OOP語 言,提供可以讓機器執行的快速程式碼,並且適合商業應 用程式與系統程式設計專案。如今已經成爲許多OOP程 4Hickman200021tw; AND1P115.TW 12 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂---------線 (請先閱讀背面之注咅?事項再填寫本頁) 經濟部智慧財產局員工消費合作社印制农 548631 A7 五、發明說明(\3)Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 V. Description of Invention (3) S. Furui's (1991) Speaker dependent—feature extraction, recognition and processing techniques. Speech communications, Vol. 10, pp · 505-520 There is a detailed description of the latest feature classification technology, which is mentioned here for reference only. Text-dependent speaker recognition must analyze predetermined pronunciations, while unrelated text recognition does not have to rely on any particular spoken text. However, in either case, the classifier generates a representative measure of the speaker and compares it with a preselected critical threshold. If the speaker's representative measurement is lower than the threshold, the identity of the speaker can be confirmed, otherwise the speaker is declared to be a fake person. The relatively low efficiency of speech recognition technology is one of the main reasons for its slow market entry. Equal Error Rate (EER) is a calculation method that involves two parameters: false acceptance (erroneous access permission) and false rejection (permitted access denial), both of which change according to the required level of secure access, However, as shown below, there is a trade-off between the two. The latest speech recognition algorithms (whether text-dependent or text-free) have an EER of about 2%. By changing the threshold of false rejection errors, false acceptance errors also change. The situation is shown in Figure 1 of J. Guavain, L. Lamel and B. Prouts (March, 1995) LIMSI 1995 scientific report. It is mentioned here for reference only. There are five graphs in the figure, which represent the relationships between the various false rejection rates (horizontal coordinates). The error acceptance rates of the speech authentication calculations show EER of 9.0%, 8.3%, 5.1%, 4.4%, and 3.5%, respectively. As mentioned earlier, there is a certain difference between the false rejection rate and the false acceptance rate. 4Hickman200021tw; AND1P115.TW 3 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) -------- --- Equipment -------- Order ---------. (Please read the notes on the back before filling out this page) 548631 A7 B7 V. Description of the invention (φ) A hyperbolic chart is drawn, where the lower EER 値 relationship chart is closer to the two axes. (Please read the note on the back? Matters before filling out this page) Therefore, if the system's false rejection rate is too low, the false acceptance rate will be too high, and vice versa. Various technologies for voice-based security systems are widely discussed in the following U.S. patents: U.S. Patent Nos. 5,265,191; 5,245,694; 4,864,642; 4,865,072; 4,821,027; 4,797,672; 4,590,604; 4,534,056; 4,020,285; 4,013,837; 3,991, No. 271; mentioned here for reference only. These patents describe the implementation of various voice security systems in different applications, such as telephone networks, computer networks, cars and elevators. However, none of these technologies can provide the sufficient performance required, because once the false rejection rate of these technologies is set too low, their high false acceptance rate will be unacceptable, and vice versa. According to the recommendation, the false rejection rate of speaker identification must be in the range of 1%, and the false acceptance rate must be in the range of 0.1% for the market to accept. Therefore, there is a wide need for improved false acceptance and rejection rates, and a reliable and secure voice authentication system would be of considerable benefit. Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economics Abstract of the Invention Provides a system, method, and manufactured product that allows users to access data on the Internet through voice authentication. When a user requests access to data, such as data on a website, the user is prompted to provide a voice sample. Then capture the user ’s 4Hickman200021tw; AND1P115.TW 4 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 548631 A7 B7 Printed clothing by the employee consumer cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs (Y) Registration information. The registration information includes a voice scan of the user's voice. Receive the user's voice sample through the network, and compare it with voice scanning to verify the user's identity. Once the user is identified, the user's data is granted access. If the user's authentication fails, they will be denied access to the data. In a specific embodiment of the present invention, the user's voice is recorded to establish a voice scan, and then the voice scan is stored. This can form part of the registration process. Voice scanning is best to include multiple groups of words spoken by the user to improve security, and identify the identity by comparing alternative words. For example, if the first word cannot confirm the identity of the user, the user is prompted to recite other words. Words. It is also possible to optionally record the time and date when the voice sample was taken together with the user's voice sample. BRIEF DESCRIPTION OF THE DRAWINGS The invention will be better understood with reference to the following detailed description. Each description refers to the attached drawings, wherein: FIG. 1 is a schematic diagram of the hardware implementation of a specific embodiment of the present invention; FIG. 2 illustrates a flowchart of a specific embodiment of the present invention, which uses speech analysis to detect emotions; 3 is a chart showing the average recognition accuracy of an S70 data set; FIG. 4 is a chart showing the average recognition accuracy of an S80 data set; FIG. 5 is a chart illustrating the average recognition accuracy of an S90 data set; Flow chart of a specific embodiment of the present invention for detecting emotions; 4Hickman200021tw; AND1P115.TW 5 This paper size applies to China National Standard (CNS) A4 specification (210 X 297 mm) ----------- «Packing -------- Order --------- Line · (Please read the notes on the back before filling out this page) 548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs Description of the Invention (G) FIG. 7 is a flowchart of a method for detecting anxiety in a voice to prevent fraud in a business environment; FIG. 8 is a flowchart of a device for detecting emotion from a voice sample according to a specific embodiment of the present invention; FIG. 9 is a diagram showing a sound output from a sound according to a specific embodiment of the present invention. Flow chart of a device for recording images; FIG. 10 is a flowchart of a specific embodiment of the present invention that monitors emotions in a voice signal and provides feedback based on the detected emotions; FIG. 11 is a comparative use of specific embodiments of the present invention Flowchart of a person's speech signal and a computer-detected emotional speech signal to improve the emotion recognition of the invention, the user, or both; FIG. 12 is a block diagram of a speech recognition device made according to a specific embodiment of the present invention; Block diagram of element components and storage blocks; Figure 14 is a speech recognition system with a biological monitor and a pre-processor according to a specific embodiment of the present invention; Figure 15 is a biological signal generated by the biological monitor in Figure 14; Figure 16 is a biological Circuits in the monitor; Figure 17 is a block diagram of a pre-processor; Figure 18 illustrates the relationship between tone modification and biological signals; Figure 19 is a flowchart of a calibration routine; Figure 20 generally shows the improved tone achieved in the system of the present invention The structure of the candidate set during the period; Figure 21 is the use of voice recognition to identify users to allow users to access the network 4Hickman200021tw; AND1 P115.TW 6 This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) ----------- installation -------- order ------ --- line (please read the notes on the back before filling this page) 548631 Printed clothing A7 B7 of the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the invention (/;) Flowchart of the specific embodiment of the present invention Figure 22 is a basic concept of a voice authentication system for controlling access to a security system; Figure 23 is a system for establishing a speaker identity according to the present invention; Figure 24 is a first step of an exemplary system for identifying a speaker according to the present invention; 25 is the second step of the system described in FIG. 24; FIG. 26 is the third step of the system described in FIG. 24; FIG. 27 is the fourth step of the speaker recognition system described in FIG. 24; A flowchart of a method for passing the border in a satisfactory manner; FIG. 29 is a method for identifying a speaker according to the present invention; FIG. 30 is another method for identifying a speaker according to the present invention; FIG. 31 is a diagram of a speaker identification system Basic components; Figure 32 is the information stored in Figure 31 speaker identification Examples in the information storage unit; FIG. 33 is a preferred embodiment of a speaker recognition system according to a specific embodiment of the present invention; FIG. 34 is more details of a specific embodiment of the speaker recognition system of FIG. 33; Flowchart of voice command recognition method for processing data on the Internet; Figure 36 is a general block diagram of an information system that uses voice signals to control content and applications on the network according to a specific embodiment of the present invention; Figures 37A, 37B, and 37C — 4Hickman200021tw; AND1P115.TW 7 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) ------- ---- · Equipment -------- Order --------- (Please read the notes on the back before filling out this page) 548631 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 V. Description of the Invention (g) FIG. 38 is a method of applying rules to form an acceptable sentence according to a specific embodiment of the present invention including a language translation function; and FIG. 39 is a representative hardware of a specific embodiment of the present invention including a language translation function Shi. DETAILED DESCRIPTION According to at least one specific embodiment of the present invention, a system is provided for performing various functions and activities through speech analysis and speech recognition. This system can be implemented based on a hardware implementation such as that shown in FIG. In addition, object-oriented programming (OOP) software programming can be used to implement various functionalities and user interface functions of the specific embodiments of the present invention. Hardware Description A representative hardware environment of a preferred embodiment of the present invention is shown in FIG. 1. This is a standard workstation hardware configuration with a central processing unit 110, such as a microprocessor, and a utilization system. The bus 112 interconnects many other units. The workstation shown in FIG. 1 includes a random access memory (RAM) 114, a read-only memory (ROM) 116, and an input / output adapter 118 that connects peripheral devices such as a disk storage device 120 to the bus Π2. , Connect other user interface devices such as keyboard 124, mouse 126, speaker 128, microphone 132, and / or touch screen (not shown) to the user interface adapter 122 of bus 112, connect the workstation to the communication network A communication adapter 134 (such as a data processing network) and a display adapter 136 connecting the bus 112 to the display device 138. 4Hickman200021tw: AND1P115.TW 8 is usually on the workstation. This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm). --------- (Please read the notes on the back before filling out this page) 548631 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 V. Description of the invention (q) Install an operating system, such as Microsoft windows NT or Windows / 95 operating system (OS), IBM OS / 2 operating system, MAC OS or UNIX operating system. Software Description Object-oriented programming (OOP) is gaining popularity in the development of complex applications. As OOP gradually becomes the mainstream of software design and development, various software solutions must be adapted to use the advantages of OOP. To apply the principles of OOP to the messaging interface of an electronic messaging system, a set of OOP classes and objects must be provided for the messaging interface. OOP is a set of procedures for developing computer software using objects, including steps to analyze problems, design systems, and construct programs. Objects are software packages that contain data and a set of related structures and procedures. Because it contains data and a set of structures and procedures, it can be regarded as a self-contained component, and it can perform its specific tasks without the need for additional structures, procedures, or data. Therefore, OOP regards computer programs as a set of extremely independent components, called objects, each of which is responsible for a specific task. The concept of packaging data, structure, and programs in a component or module is called packaging. OOP components are generally software modules that can be used continuously. They provide an interface that conforms to a certain object model, and can be accessed through the component integration framework during execution. A component integration architecture is a set of architectural mechanisms that allow software modules in different processing spaces to utilize each other's capabilities or functions. Usually 4Hickman200021tw; AND1P115.TW 9 This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) ----------- · Installation -------- Order-- ------- ^ _ vl (Please read the notes on the back before filling this page) 548631 A7 B7 V. Description of the invention σ) (Please read the notes on the back before filling this page) It is assuming a common component An object model on which to build a framework. It is necessary here to distinguish between objects and object categories. An object is a single instance of an object class. Object categories are often referred to directly as categories and can be viewed as a blueprint from which many objects can be formed. OOP allows programmers to create an object that is part of another object. For example, there is a composite relationship between an object representing a piston engine and an object representing a piston. In fact, a piston engine consists of a piston, a valve, and other components; the fact that a piston is an element in a piston engine can be represented logically and meaningfully in OOP using two objects. OOP printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs also allows the creation of new objects based on other objects. If there are two objects, one represents a piston engine, and the other represents a piston engine using a ceramic piston. A ceramic piston engine does not constitute a piston engine, but a piston engine with one more limitation than a simple piston; its piston is made of ceramic. At this time, the object representing the ceramic piston engine is called a derivative, inheriting all the layers representing the object of the piston engine, and adding more restrictions or details. The object representing the ceramic piston engine is "attached" to the object representing the piston engine. The relationship between these objects is called inheritance. When an object or category representing a ceramic piston engine inherits all levels of an object representing a piston engine, it also inherits the thermal characteristics of a standard piston defined in the piston engine category. But because the thermal characteristics of ceramic piston engines are usually the same as those of metal 4Hickman200021tw; AND1P115.TW 10 ^ The paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 548631 Printed by the Consumers ’Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs B7 5. Description of the Invention (U) Pistons are different, so ceramic piston engines will replace the standard thermal characteristics with ceramic-specific thermal characteristics, abandon the original functions, and use new functions related to ceramic pistons. Different piston engines have different characteristics, but the related basic functions are the same (such as the number of engine pistons, ignition sequence, lubrication, etc.). When a programmer wants to access these functions in any piston engine object, they will call the same function under the same name, but each piston engine will have different / revised functions implemented behind the same name. The ability to hide the implementation of different functions under the same name is called a homogeneous vision, which can greatly simplify the communication between objects. Under the various concepts of compound relationship, packaging, inheritance, and homogenous vision, objects can represent almost everything in the real world. In fact, the only limitation on whether things can become objects in object-oriented software is our logical perception of the real world. Here are some standard categories: # Objects can represent physical objects, such as cars in traffic flow simulations, electrical components in circuit design programs, countries in economic models, or airplanes in air flow control systems. # Objects can represent elements in the computer-user environment, such as windows, menus, or graphical objects. # Objects can represent inventory, such as personnel files or tables of city latitude and longitude. # Objects can represent user-defined data types, such as time, angle, composite number, or point on a plane. 4Hickman200021tw; AND1P115.TW 11 This paper size applies to China National Standard (CNS) A4 specification (210 X 297 public love) ----------- · Installation -------- Order --- ------ Line · (Please read the notes on the back before filling out this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Description of Invention (\ >) Because OOP uses objects to represent logic The ability to separate things is so powerful that it allows software developers to design and implement computer programs that represent physical-level models, whether the physical thing is a physical entity, process, system, or combination of things. And because objects can represent everything, software developers can also create objects that will be used as components in larger software projects in the future. If 90% of the new OOP software programs are made up of existing components using existing reusable objects, then only 10% of the new software projects must be written and tested from scratch. Because 90% is taken from an extensively tested and reusable inventory of objects, there is only 10% chance of program errors. Therefore, OOP allows software personnel to create new objects using other objects that have already been constructed. This procedure is very similar to building a complex machine using components and subassemblies. OOP technology makes software engineering like hardware engineering. Developers can use various existing components of object types to build software. All this helps to improve software quality and speed up development. Various programming languages have begun to fully support OOP principles, such as packaging, inheritance, homogeneous vision, and compound relationships. The advent of the C ++ language has allowed many commercial software developers to embrace OOP. C ++ is an OOP language that provides fast code that can be executed by machines, and is suitable for commercial application and system programming projects. Now it has become a lot of OOP process 4Hickman200021tw; AND1P115.TW 12 This paper size is applicable to China National Standard (CNS) A4 specification (210 X 297 mm) -Order --------- line (Please read the note on the back? Matters before filling out this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs, Agricultural 548631 A7 V. Invention Description
式設計師最愛的選擇,不過市面上還是有許多其他的OOP 語言,例如 Smalltalk、Common Lisp Object System (CLOS) 及Eiffel。另外,許多普及的傳統電腦程式設計語言,例 如Pascal,也都加入了 OOP的能力。 物件類別的優點可以彙整如下: • 物件及其對應的類別會將複雜的程式設計問題細分 爲許多較小、較簡單的問題。 • 包裝可以將資料整理成許多獨立的小物件,各小物 件之間可以互相通訊,利用此方法強制精簡資料。 包裝可以保護物件中的資料,防止意外受損,但是 又能讓其他物件呼叫物件的組成功能和結構,與資 料互動。 • 利用子類別和繼承可以從系統中可用的標準類別衍 生新型物件,擴充及修改物件。因此要建立新功能 不必從頭開始。 • 同質異像和多重繼承可以讓不同的程式設計師混合 及匹配許多不同類別的特性,建立可以在預期的方 式下配合相關物件使用的專用物件。 • 類別階層和包含階層提供建立真實世界物件與各物 件間關係模型的彈性機制。 • 可連續使用類別的程式庫在許多情況下都非常實 用,但是還是有一些限制。例如: • 複雜性。複雜系統中相關的類別階層可能包含數十 4Hickman200021tw: AND1P115.TW 13 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 裝--------訂---------線 (請先閱讀背面之注意事項再填寫本頁) 548631 A7 ' ^~------B7 五、發明說明(丨Lf〇 或數百個類別,很容易混淆。 • 控制流程。利用類別程式庫寫成的程式還要負責控 制流程(必須控制利用特定程式庫建立的所有物件 的互動)。程式設計師必須決定要在什麼時候爲什麼 樣的物件呼叫哪些函數。 • 心力加倍。雖然類別程式庫可以讓程式設計師使用 及再使用許多程式碼的小片段,但是每一位程式設 計師都以不同的方法將這些小片段放在一起。兩位 程式設計師可以使用同一組類別程式庫寫出內部結 構(也就是設計)完全不同,但是用途完全相同的 兩套程式,其內部結構差異就在於設計師撰寫程式 的過程中所作的許多小決定而定。這些程式碼片段 必然會以稍微不同的方法執行相同的工作,如果放 在一起’理論上應該要配合得很好,但是結果卻必 然不盡理想。 類別程式庫很有彈性。程式變得愈複雜,就有愈多的程式 設計師要一再地針對基本的問題重新創造基本的解決方 案。一種較新的類別程式庫延伸槪念是使用類別程式庫框 架。此框架較爲複雜,由大量的合作類別集合組成,這些 類別會補捉在特定應用範圍內實行共同需求與設定的小規 模模型和主要機制。原先開發的目的是讓應用程式設計師 可以不必操心顯示功能表、視窗、對話框及個人電腦標準 使用者介面等瑣事。 4Hickman200021tw; AND1P115.TW 14 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) |裝--------訂---------: 經濟部智慧財產局員工消費合作社印製 A7 548631 B7____ 五、發明說明(15) 框架也代表程式設計師對其所寫的程式碼和其他人所寫程 式碼之間互動的觀念已經改變。以往的程序式程式設計時 代,設計師要呼叫作業系統提供的程式以執行特定的工 作,但是程式基本上會將整個程式碼從頭到尾執行一次, 設計師只能負責控制流程。如果只是要印出薪資支票、計 算數學表格或利用只以一種方法執行的程式解決問題的 話,還算恰當。 圖形使用者介面的發展徹底改變了這種程序式程式設計安 排。執行程序不再是由程式邏輯控制,使用者可以驅動程 式,並且決定應該執行特定動作的時機。現在大部分的個 人電腦都是利用監測滑鼠、鍵盤及其他外部事件來源的事 件回路做到這一點,根據使用者執行的動作呼叫程式設計 師程式碼中特定的部分。程式設計師不必再決定事件發生 的順序。程式是分成許多分離的小片斷,在不可預期的時 間以不可預期的順序呼叫。開發人員將控制權轉移給使用 者之後,寫出來的是更容易使用的程式。但是,開發人員 寫出來的個別程式片段還是要呼叫作業系統提供的程式庫 才能完成特定的工作,而且設計師還是要決定事件回路呼 叫每一片段程式後,程式內的流程控制。應用程式碼還是 要以系統爲基礎。 即使是事件回路程式,程式設計師也需要撰寫許多不必爲 每個應甩程式分開撰寫的程式碼。應用程式框架的觀念更 4Hickman200021tw; AND1P115.TW 15 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注咅心事項再填寫本頁) -裝--------訂---------線- 經濟部智慧財產局員工消費合作社印製 經濟部智慧財產局員工消費合作社印製 548631 A7 五、發明說明(丨b) 進一步擴大了事件回路的觀念。使用應用程式框架時,程 式設計師不必再處理建構基本功能表、視窗及對話框,然 後將這些組合在一起的細節,一切基本的使用者介面元素 都已經準備好,設計師可以直接開始處理應用程式碼。接 著再將框架的一般功能換成要設計的應用程式的特定功 能,開始建構。 應用程式框架可以減少程式設計師必須從頭開始撰寫的程 式碼總數。但是因爲框架實際上是顯示視窗、支援複製與 貼上等的一般性應用程式,因此設計師也可以捨棄比此事 件回路程式所允許者更多的控制權。框架程式碼幾乎會注 意所有的事件處理與控制流程,唯有在框架需要時才會呼 叫設計師的程式碼(例如建立或處理專有的資料結構)。 撰寫框架程式的程式設計師不只能將控制權轉移給使用者 (事件回路也是一樣),也可以將程式裡的詳細控制流程轉 移給框架。相較於分離式程式,這種方法可以使用自訂的 程式碼建立能夠以預期方法配合工作的更複雜系統,針對 類似問題重複建立。 如上所述,框架基本上是針對假設的問題範圍構成可重複使 用設計解決方案的一組配合類別。通常包含提供預設行爲的 物件(例如功能表和視窗),程式設計師使用時要繼承其部The favorite choice of modern designers, but there are still many other OOP languages on the market, such as Smalltalk, Common Lisp Object System (CLOS) and Eiffel. In addition, many popular traditional computer programming languages, such as Pascal, have also added OOP capabilities. The advantages of object classes can be summarized as follows: • Objects and their corresponding classes break down complex programming problems into many smaller, simpler problems. • Packaging can organize the data into many independent small objects, and each small object can communicate with each other. This method is used to force the data to be streamlined. Packaging protects the data in the object from accidental damage, but allows other objects to call the constituent functions and structure of the object and interact with the data. • Use subclasses and inheritance to derive new types of objects from standard classes available in the system, and extend and modify objects. So to build new features, you don't have to start from scratch. • Homogeneous vision and multiple inheritance allow different programmers to mix and match many different categories of features, creating specialized objects that can be used in a desired way with related objects. • The class hierarchy and inclusion hierarchy provide a flexible mechanism for modeling relationships between real-world objects and objects. • Libraries that can use classes continuously are very useful in many cases, but there are some limitations. For example: • Complexity. The relevant class hierarchy in a complex system may contain dozens of 4Hickman200021tw: AND1P115.TW 13 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm). -------- Order ---- ----- Line (Please read the notes on the back before filling this page) 548631 A7 '^ ~ ------ B7 V. Description of the invention (丨 Lf〇 or hundreds of categories, it is easy to be confused. • Control flow. Programs written using class libraries are also responsible for controlling the flow (it must control the interaction of all objects created with a specific library). The programmer must decide when and what functions to call for what objects. • Double your efforts .Although category libraries allow programmers to use and reuse many small pieces of code, each programmer puts these pieces together in different ways. Two programmers can use the same set Class libraries write completely different internal structures (that is, designs), but use two programs with exactly the same purpose. The difference in the internal structure lies in the way the designer writes the program. It depends on many small decisions. These code snippets will inevitably perform the same work in slightly different ways. If put together, 'they should work well in theory, but the results are not ideal. The category library is very good. Flexibility. The more complex a program becomes, the more programmers have to re-create basic solutions to basic problems over and over again. A newer extension of the class library is the use of the class library framework. This framework is more For complexity, it consists of a large set of cooperative categories that complement small-scale models and main mechanisms that implement common needs and settings within a specific application. The original development was designed to allow application designers to display menus without having to worry about , Windows, dialog boxes, and the standard user interface of personal computers. 4Hickman200021tw; AND1P115.TW 14 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the notes on the back before filling out (This page) | Install -------- Order ---------: Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 548631 B7____ 5. Description of the Invention (15) The framework also represents that the programmer's concept of interaction between the code written by him and the code written by others has changed. In the era of procedural programming, designers had to call The operating system provides a program to perform a specific task, but the program basically executes the entire code once from beginning to end, and the designer can only control the flow. If you only want to print paychecks, calculate mathematical forms or use only one The method execution method is appropriate to solve the problem. The development of graphical user interface has completely changed this procedural programming arrangement. The execution process is no longer controlled by program logic. Users can drive the program and decide when specific actions should be performed. Most personal computers today do this by monitoring the event loop of the mouse, keyboard, and other external event sources, calling specific parts of the programmer's code based on the actions performed by the user. Programmers no longer need to decide the order in which events occur. A program is divided into many small pieces that are separated and called in an unexpected order at an unexpected time. After developers transfer control to users, they write programs that are easier to use. However, the individual program fragments written by the developer still need to call the library provided by the operating system to complete the specific work, and the designer still has to decide the event loop to call the program flow control after each fragment program. Application code is still system-based. Even for event loop programs, programmers need to write a lot of code that does not have to be written separately for each application. The concept of the application framework is even more 4Hickman200021tw; AND1P115.TW 15 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) (Please read the note on the back before filling this page) -Install-- ------ Order --------- Line-Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs Printed by the Employee Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 V. Description of the invention (丨 b) Further expansion Concept of the event loop. When using application frameworks, programmers no longer need to deal with building basic menus, windows, and dialog boxes, and then combine these details together. All basic user interface elements are ready, and designers can start processing applications directly. Code. Then, the general functions of the framework are replaced with the specific functions of the application to be designed, and construction is started. Application frameworks reduce the total number of program code that programmers must write from scratch. But because the frame is actually a general application that displays windows, supports copying and pasting, etc., designers can also give up more control than allowed by this event loop program. The framework code will pay attention to almost all event processing and control processes, and will only call the designer's code when the framework needs it (such as creating or processing proprietary data structures). A programmer writing a framework program can not only transfer control to the user (the same is true of the event loop), but can also transfer the detailed control flow in the program to the framework. Rather than separate programs, this method can use custom code to create more complex systems that work in the expected way, and iteratively builds for similar problems. As mentioned above, a framework is basically a set of fit categories for a reusable design solution for a hypothetical problem area. It usually contains objects (such as menus and windows) that provide default behavior, and programmers should inherit their parts when using
4Hickman200021tw; AND1P115.TW 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ----------—--------訂---------線 (請先閱讀背面之注意事項再填寫本頁) 16 548631 A7 B7 五、發明說明(⑺) 分預設行爲,再修訂其他一些行爲,讓框架在適當的時機呼 叫應用程式。 (請先閱讀背面之注意事項再填寫本頁) 框架和類別程式庫之間有三大差異: • 行爲與通訊協定的差異。類別程式庫基本上是在程 式需要個別行爲時呼叫的一些行爲集合。而框架不 只提供行爲’還提供控制行爲組合方法的通訊協定 或一組規則’包括規範程式設計師應提供的內容與 框架提供內容的規則。 • 呼叫與修訂的差異。使用類別程式庫時,設計師撰 寫的程式會建立物件及呼叫其成員功能。使用框架 也可以用相同的方法建立及呼叫物件(也就是將框 架當成類別程式庫處理),但是如果要充份利用框架 的可重複使用設計,設計師撰寫的程式碼要區分框 架並且由框架呼叫。框架會管理其物件中的控制流 程。撰寫程式需要區分框架呼叫的各個軟體片段的 責任’而不是指定不同的程式片段應該如何配合。 經濟部智慧財產局員工消費合作社印製 • 實行與設計的差異。程式設計師採用類別程式庫時 只能再利用實行,但是採用框架時,可以再利用設 計。框架包含了一系列的相關程式或軟體片段工作 的方法。它代表了能夠適應已知範圍內許多特定問 題的一種一般性設計解決方案。例如,單一框架可 以包含使用者介面工作的方式,但是利用同一個框 架建立的兩個不同使用者介面解決的可能是兩個截 然不同的介面問題。 4Hickman200021tw; AND1P115.TW 17 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(\各) 因此’只要開發各種問題和程式設計工作解決方案所需的 框架,就可以大幅減少軟體設計與開發的努力。本發明最 佳的具體實施例是利用超文字標記語言(HTML),配合一 般用途安全通訊協定在Internet上的用戶端與公司之間的 傳輸媒體實行文件。HTTP或其他通訊協定可以輕易取代 HTML,不會有不適當的情形。有關這些產品的資訊,可 以參考 T· Berners-Lee, D· Connoly 的’’RFC 1866: Hypertext Markup Language - 2·0”(Nov. 1995);以及 R· Fielding,H, Frystyk,T· Berners-Lee,J. Gettys 和 J.C. Mogul 的 "Hypertext Transfer Protocol - HTTP/1.1: HTTP Working Group Internet Draft”(May 2, 1996)。HTML 是用於建立 可在不同平台間攜帶的超文字文件的一種簡單資料格式。 HTML文件是具有一般語意的SGML文件,這些語意適 合表達各種範圍的資訊。全球資訊網的全球資訊活動自 1990年起就開始使用HTML至今。HTML是ISO標準 8879 ' 1986 Information Processing Text and Office Systems 及 Standard Generalized Markup Language (SGML)的應 用。 至今爲止,網路開發工具在建立連接用戶端與伺服器和現 有電算資源交互作業的動態網路應用程式方面的能力仍有 其限制。直到最近,HTML還是開發網路式解決方案的主 要科技。但是目前已經證實HTML在下列範圍並不稱職: • 效能不佳; 4Hickman200021tw; AND1P115.TW 18 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) —--------裝--------訂---------線 (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(q) • 使用者介面功能有限; • 只能產生靜態的網頁; •缺少現有應用程式和資料的交互作業能力;及 • 無法調整。4Hickman200021tw; AND1P115.TW This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) -------------------- Order ------ --- Line (Please read the notes on the back before filling this page) 16 548631 A7 B7 V. Description of the invention (⑺) Pre-set behaviors, and then revise some other behaviors, so that the framework calls the application at the appropriate time. (Please read the notes on the back before filling out this page.) There are three major differences between the framework and the category library: • Differences in behavior and protocols. A category library is basically a collection of behaviors that are called when the program requires individual behaviors. And the framework not only provides behaviors, but also provides a protocol or a set of rules that control the combination of behaviors, including rules that regulate what programmers should provide and what the framework provides. • Differences between calling and revision. When using a class library, a program written by a designer creates objects and calls its member functions. You can also use the framework to create and call objects in the same way (that is, treat the framework as a class library), but if you want to make full use of the reusable design of the framework, the code written by the designer must distinguish the framework and be called by the framework. . The framework manages the control processes in its objects. Writing a program needs to distinguish the responsibilities of each piece of software called by the framework 'rather than specifying how different pieces of the program should fit together. Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs • Differences between implementation and design. Programmers can only implement reuse when using category libraries, but they can reuse designs when using frameworks. The framework contains a series of methods for working with related programs or software fragments. It represents a general design solution that can adapt to many specific problems within a known range. For example, a single framework can include the way the user interface works, but two different user interfaces created using the same framework may solve two distinct interface problems. 4Hickman200021tw; AND1P115.TW 17 This paper size applies to Chinese National Standard (CNS) A4 (210 X 297 mm) 548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of invention (\ each) A variety of problems and the framework required for programming work solutions can significantly reduce software design and development efforts. The preferred embodiment of the present invention is to implement a file using a hypertext markup language (HTML) in conjunction with a general-purpose secure communication protocol to transmit media between a user terminal on the Internet and a company. HTTP or other protocols can easily replace HTML without inappropriate situations. For information on these products, refer to "RFC 1866: Hypertext Markup Language-2.0" (Nov. 1995) by T. Berners-Lee, D. Connoly; and R. Fielding, H, Frystyk, T. Berners- "Hypertext Transfer Protocol-HTTP / 1.1: HTTP Working Group Internet Draft" by Lee, J. Gettys and JC Mogul (May 2, 1996). HTML is a simple data format used to create hypertext documents that can be carried across different platforms. HTML files are SGML files with general semantics that are suitable for expressing a wide range of information. World Wide Web's global information activities have been using HTML since 1990. HTML is an application of ISO standard 8879 '1986 Information Processing Text and Office Systems and Standard Generalized Markup Language (SGML). To date, the ability of web development tools to build dynamic web applications that connect clients with servers and interact with existing computing resources has its limits. Until recently, HTML was the main technology for developing web-based solutions. However, it has been confirmed that HTML is incompetent in the following areas: • Poor performance; 4Hickman200021tw; AND1P115.TW 18 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) --------- -Install -------- Order --------- line (please read the precautions on the back before filling this page) Explanation (q) • The user interface has limited functionality; • Can only generate static web pages; • Lacks the ability to interact with existing applications and data; and • Cannot be adjusted.
Sun Microsystem的Java語言利用下列特點解決了許多用 戶端的問題: • 提高用戶端的效能; •可以讓使用者建立動態的即時網路應用程式;及 •提供建立廣泛的使用者介面元件的能力。 開發人員利用Java可以建立堅強的使用者介面(ui)元 件。可以建立自訂的「小裝置」(例如即時股票行情收錄 器、動畫圖示等)並改善用戶端效能。Java和HTML不 同,它支援用戶端驗證的觀念、下載適當的處理到用戶端 以提升效能。可以建立動態即時網頁。也可以利用上述的 自訂UI元件建立動態網頁。Sun Microsystem's Java language uses the following features to solve many client problems: • Improves client performance; • Allows users to create dynamic, real-time web applications; and • Provides the ability to create a wide range of user interface components. Developers can use Java to build robust user interface (UI) components. You can create custom "small devices" (such as real-time stock ticker, animated icons, etc.) and improve client performance. Java differs from HTML in that it supports the concept of client-side authentication and downloads appropriate processing to the client to improve performance. Can create dynamic real-time web pages. You can also use the custom UI components described above to create dynamic web pages.
Sun的Java語言已經脫穎而出,成爲業界普遍認同的 「Internet程式設計」語言。Sun將Java定義爲「簡單、 物件導向、分散式、解譯式、堅固、安全、沒有架構限制、 可攜式、高效能、多緒、動態、符合私密、一般用途的程 式設計語言。Java以無平台限制的Java applet型態支援 Internet的程式設計」。Java applet是付合Sun的Java 4Hickman200021tw; AND1P115.TW 19 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ------------裝--------訂---------^A_WT (請先閱讀背面之注意事項再填寫本頁) A7 548631 B7_ 五、發明說明(丄〇)Sun's Java language has emerged as the "Internet programming" language generally recognized by the industry. Sun defines Java as "simple, object-oriented, decentralized, interpretable, rugged, secure, free of architectural restrictions, portable, high-performance, multithreaded, dynamic, private, general-purpose programming language. The platform-less Java applet type supports Internet programming. " Java applet is Java 4Hickman200021tw; AND1P115.TW 19 which is compatible with Sun. This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm). ---- Order --------- ^ A_WT (Please read the notes on the back before filling out this page) A7 548631 B7_ 5. Description of the invention (丄 〇)
Application Programming Interface (API)的一些專用小型 應用程式,能夠讓開發人員在網路文件中加入「互動內容」 (例如簡單的動畫、網頁裝飾、基本遊戲等)。Applet將伺 服器的程式碼複製到用戶端上,在Java相容的瀏覽器(例 如Netscape Navigator)裡執行。以程式語言的角度來看, Java的核心功能是以C++爲基礎。Sim的Java文件指出, Java基本上是「C++加上Objective C的延伸以提供更動 態的方法解析」。 功能與Java類似的另一種科技是由Microsoft和ActiveX 提供,可以讓開發人員與網路設計者建立Internet與個人 電腦的動態內容。ActiveX包含開發動畫、3D虛擬實境、 視訊及其他多媒體內容的一些工具。這些工具使用Internet 標準,可以在多種平台上工作,而且已經得到100多家公 司的支持。群組的建構區塊稱爲ActiveX Control,是能 夠讓開發人員將軟體小片段插入超文字標記語言(HTML) 網頁中的小型快速元件。ActiveX Control可以配合許多 種程式設計語言,包括Microsoft Visual C++、Borland Delphi、MicrosoftVisual Basic程式設計系統,以及未來 的 Microsoft Java 開發工具,代名"Jakarta”。ActiveX 科 技還包括ActiveX Server Framework,可以讓開發人員建 立伺服器應用程式。熟練的程式人員可以輕易看出,使用 ActiveX代替Java也能夠實行本發明,不會有不適的情 形。 4Hickman200021tw; AND1P115.TW 20 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 訂---------線知 經濟部智慧財產局員Η消費合作社印製 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(>\) 情緒辨識 本發明的方向是要將辨識話語中情緒的技術應用在商業用 途。可以使用本發明的一些具體實施例根據語音分析偵測 說話者的情緒’再將偵測到的情緒輸出。本發明的其他具 體實施例可用於偵測電話中心交談裡的情緒狀態,並且向 總機或主管提供回饋以便監測。也可以利用本發明的具體 實施例根據發話方表達的情緒將語音郵件訊息排序。 如果已經知道目標對象,建議對幾個目標對象進行硏究, 判斷語音中可作爲情緒指標最可靠的部分。如果不知道目 標對象,可以改用其他的對象。以下討論就是以此爲方向: •應向非專業演員的人取得資料,因爲專業演員可能會 過份強調特定的話語成分,容易產生錯誤。 •可以向一組期望接受分析的群組中選取的測試對象取 得資料。這可以改進精確度。 •可以設定電話通話品質的話語(<3.4 kHz)爲目標,改 進在電話系統中使用的精確度。 •只能依賴語音訊號進行測試。也就是說排除新的語音 辨識技術,因爲新技術要求較佳的訊號品質和計算能 力。 資料收集與評估 測試範例中錄製了三十個人說的話,每個人都要唸出四句 短句: 4Hickman200021tw; AND IP 115.TW 21 -----------,ΦΜ--------訂---------線· (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS)A4規格(210 χ 297公釐)Application Programming Interface (API) are specialized small applications that allow developers to add "interactive content" to web documents (such as simple animations, web page decoration, basic games, etc.). The applet copies the server code to the client and executes it in a Java-compatible browser (such as Netscape Navigator). From a programming language perspective, Java's core functionality is based on C ++. Sim's Java documentation states that Java is basically "an extension of C ++ plus Objective C to provide a more dynamic approach to parsing." Another technology similar to Java is provided by Microsoft and ActiveX, which allows developers and web designers to create dynamic content on the Internet and personal computers. ActiveX includes tools for developing animation, 3D virtual reality, video and other multimedia content. These tools use Internet standards, can work on multiple platforms, and have been supported by more than 100 companies. The building block of a group, called ActiveX Control, is a small, fast component that enables developers to insert small pieces of software into Hypertext Markup Language (HTML) web pages. ActiveX Control can cooperate with many programming languages, including Microsoft Visual C ++, Borland Delphi, Microsoft Visual Basic programming system, and future Microsoft Java development tools, codenamed "Jakarta". ActiveX technology also includes ActiveX Server Framework, which allows development Personnel to create server applications. Skilled programmers can easily see that using ActiveX instead of Java can also implement the present invention without any discomfort. 4Hickman200021tw; AND1P115.TW 20 This paper applies Chinese National Standard (CNS) A4 Specifications (210 X 297 mm) (Please read the precautions on the back before filling out this page) Order --------- Information from the Intellectual Property Bureau of the Ministry of Economic AffairsΗConsumer Cooperative Printed by the Intellectual Property Bureau of the Ministry of Economic Affairs Printed by the cooperative 548631 A7 B7 V. Description of the invention (> \) Emotion recognition The direction of the present invention is to apply the technology of identifying emotions in utterances to commercial use. Some specific embodiments of the present invention can be used to detect speech based on speech analysis The sentiment of the user, and then the detected emotion is output. His specific embodiment can be used to detect the emotional state in a call center conversation, and provide feedback to the switchboard or supervisor for monitoring. The specific embodiment of the present invention can also be used to sort the voice mail messages according to the emotion expressed by the caller. If already known For the target object, it is recommended to research several target objects to determine the most reliable part of the speech as an emotional indicator. If you do not know the target object, you can use other objects. The following discussion is in this direction: Professional actors get information because professional actors may over-emphasize specific discourse components and be prone to errors. • Data can be obtained from test subjects selected from a group that expects to be analyzed. This can improve accuracy. You can set the utterance of phone call quality (< 3.4 kHz) as the goal to improve the accuracy used in the telephone system. • Only rely on the voice signal for testing. That is to say, exclude new voice recognition technology because the new technology requires more Excellent signal quality and computing power. Data collection and evaluation test In the example, thirty people were recorded, and each person had to read four short sentences: 4Hickman200021tw; AND IP 115.TW 21 -----------, ΦΜ -------- Order --------- Line · (Please read the precautions on the back before filling in this page) This paper size is applicable to China National Standard (CNS) A4 (210 χ 297 mm)
548631 A7 B7 五、發明說明 • “This is not what I expected. ” • “I’ll be right there.” (請先閱讀背面之注意事項再填寫本頁) • “Tomorrow is my birthday. ” • “I’m getting married next week.” 每個句子要錄五次。每次錄音時,對象要表達下列情緒狀 態之一:愉快、生氣、悲傷、害怕/焦慮及正常(不帶感 情)。五個對象也可以利用不同的錄音參數錄兩次句子。 因此,每個對象共錄製20或40段話語,總共錄製700 段話語,每一種情緒狀態有140段話語。每一段話語可以 使用近談式(麥克風與說話者的嘴部距離在1〇公分以內) 麥克風錄製;前1〇〇段話語以22-kHz/8bit錄製,剩餘的 600 段以 22-kHz/16bit 錄製。 建立話語集之後,可以進行一項試驗,找出下列問題的答 案·· •未經特殊訓練的人表達及辨識話語中情緒的準確性如 何? •人辨識自己6至8週前所錄聲音中情緒的準確性如何? •哪些情緒最容易/最難辨識? 經濟部智慧財產局員工消費合作社印制衣 試驗的一項重要結果是選出一組最可靠的話語,也就是大 多數人都能辨識的話語。這一組話語可以作爲電腦執行的 模型辨識演算的訓練和測試資料。 4Hickman200021tw; AND1P115.TW 22 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 五、發明說明(上3) 再使用即有的互動程式隨機選取及播放這些話語,讓使用 者根據話語的情緒內容將每一段話語歸類。例如,二十三 個對象可以參與評估,另外還有20個人參與了稍早的錄 音試驗。 表1是根據前述硏究語言運用取得的資料產生的語言運用 的混淆陣列。列和欄分別代表真正的和判斷的類別。例如, 第二列顯示,以愉快情緒表達的話語中,有11.9%被判斷 爲正常(不帶感情)的情緒、61.4%正確判斷爲愉快、10.1% 判斷爲生氣、4.1%判斷爲悲傷,還有12.5%則被判斷爲害 怕。從表中也可以看出,最容易辨識的類別是生氣 (72.2%),最難辨識的類別是害怕(49.5%)。悲傷和害怕、 悲傷和正常以及愉快和害怕之間比較可能混淆。平均精確 度是63.5%,和其他試驗硏究的結果相近。 (請先閱讀背面之注意事項再填寫本頁) 裝--------訂----- S! 經濟部智慧財產局員工消費合作社印製 V表1 語言運用混淆陣列 類別 正常 愉快 生氣 悲傷 害怕 總計 正常 66.3 2.5 7.0 18.2 6.0 100 愉快 11.9 61.4 10.1 4.1 12.5 100 生氣 10.6 5.2 72.2 5.6 6.3 100 悲傷 11.8 1.0 4.7 68.3 14.3 100 害怕 11.8 9.4 5.1 24.2 49.5 100 4Hickman200021tw; AND1P115.TW 23 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7548631 A7 B7 V. Description of the invention • “This is not what I expected.” • “I'll be right there.” (Please read the notes on the back before filling this page) • “Tomorrow is my birthday.” • “ I'm getting married next week. "Each sentence is recorded five times. During each recording, the subject should express one of the following emotional states: pleasant, angry, sad, scared / anxious, and normal (without emotion). Five subjects can also record sentences twice with different recording parameters. Therefore, each subject recorded a total of 20 or 40 utterances, for a total of 700 utterances, with 140 utterances for each emotional state. Each utterance can be recorded using a close-talking style (the distance between the microphone and the speaker's mouth is within 10 cm); the first 100 utterances are recorded at 22-kHz / 8bit, and the remaining 600 utterances are recorded at 22-kHz / 16bit Record. After the discourse set is established, an experiment can be performed to find the answers to the following questions: • How accurate are people without special training to express and recognize emotions in the discourse? • How accurately do people recognize their emotions in voices recorded 6 to 8 weeks ago? • Which emotions are the easiest / hardest to recognize? An important result of the trial of printing clothing for the consumer cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs was the selection of the most reliable set of discourses, that is, most people can recognize. This set of utterances can be used as training and test data for computer-implemented model recognition algorithms. 4Hickman200021tw; AND1P115.TW 22 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 548631 A7 B7 V. Description of the invention (above 3) Then use the existing interactive program to randomly select and play these words, Let users classify each utterance based on their emotional content. For example, twenty-three subjects can participate in the assessment, and twenty others participated in earlier recording trials. Table 1 is a confusion array of language usage based on the data obtained from the study of language usage. The rows and columns represent the real and judgment categories, respectively. For example, the second column shows that 11.9% of the words expressed with pleasant emotions were judged as normal (without emotions), 61.4% were judged correctly as pleasant, 10.1% were judged as angry, 4.1% were judged as sad, and 12.5% were judged to be afraid. It can also be seen from the table that the most recognizable category is anger (72.2%) and the most difficult to recognise category is fear (49.5%). There may be more confusion between sadness and fear, sadness and normality, and pleasure and fear. The average accuracy is 63.5%, which is similar to the results of other experimental studies. (Please read the precautions on the back before filling out this page) Packing -------- Order ----- S! Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs. Angry, sad and afraid total normal 66.3 2.5 7.0 18.2 6.0 100 Happy 11.9 61.4 10.1 4.1 12.5 100 Angry 10.6 5.2 72.2 5.6 6.3 100 Sad 11.8 1.0 4.7 68.3 14.3 100 Fear 11.8 9.4 5.1 24.2 49.5 100 4 Hickman200021tw; AND1P115.TW 23 This paper is for China National Standard (CNS) A4 specification (210 X 297 mm) 548631 A7
五、發明說明(上…) 表2是評估者對每一種情緒類別和將每一類別表現加總計 算所得的彙總語言運用統計値。可以看出生氣和悲傷的偏 差遠低於其他情緒類別。 評估者統計値 、'表2 類別 平均値 硏究偏差 中間値 最小値 最大値 正常 66.3 13.7 64.3 29.3 95.7 愉快 61.4 11.8 62.9 31.4 78.6 生氣 72.2 5.3 72.1 62.9 84.3 悲傷 68.3 7.8 68.6 50.0 80.0 害怕 49.5 13.3 51.4 22.1 68.6 總計 317.7 28.9 314.3 253.6 355.7 (請先閱讀背面之注意事項再填寫本頁)V. Description of the invention (above ...) Table 2 shows the statistics of the language used by the evaluator for each emotion category and the total performance of each category. It can be seen that the difference between anger and sadness is much lower than other emotion categories. Evaluator statistics, 'Table 2 Category average study bias Middle, minimum, maximum, normal 66.3 13.7 64.3 29.3 95.7 Pleasant 61.4 11.8 62.9 31.4 78.6 Angry 72.2 5.3 72.1 62.9 84.3 Sad 68.3 7.8 68.6 50.0 80.0 Fear 49.5 13.3 51.4 22.1 68.6 Total 317.7 28.9 314.3 253.6 355.7 (Please read the notes on the back before filling this page)
以下表3是演員的統計値,也就是對象表達情緒的準確 性。更正確地說,表中的數字表示其他對象能夠正確辨識 特定類別表達情緒的程度。以表2和表3作比較,可以發 現表達情緒的能力(總平均値62.9%)約等於辨識情緒的 能力(總平均値63.2%),但是表3的表達偏差比較大。 ▽表3 演員的統計値 訂 線 m 經濟部智慧財產局員工消費合作社印製 類別 平均値 硏究偏差 中間値 最小値 最大値 正常 65.1 16.4 68.5 26.1 89.1 愉快 59.8 21.1 66.3 2.2 91.3 生氣 71.7 24.5 78.2 13.0 100.0 悲傷 68.1 18.4 72.6 32.6 93.5 害怕 49.7 18.6 48.9 17.4 88.0 總計 314.3 52.5 315.2 213 445.7Table 3 below is the statistical performance of actors, that is, the accuracy of emotions expressed by subjects. More precisely, the numbers in the table indicate the degree to which other objects can correctly recognize the emotions expressed in a particular category. Comparing Table 2 and Table 3, we can find that the ability to express emotions (total average 値 62.9%) is approximately equal to the ability to identify emotions (total average 値 63.2%), but the expression bias in Table 3 is relatively large. ▽ Table 3 Statistics of actors (bookmarking line) m Printed category of the Intellectual Property Bureau of the Ministry of Economic Affairs Employee Consumer Cooperative Coordination Average Average Deviation Middle Minimum Minimum Maximum Normal 65.1 16.4 68.5 26.1 89.1 Happy 59.8 21.1 66.3 2.2 91.3 Angry 71.7 24.5 78.2 13.0 100.0 Grief 68.1 18.4 72.6 32.6 93.5 Fear 49.7 18.6 48.9 17.4 88.0 Total 314.3 52.5 315.2 213 445.7
4Hickman200021tw; AND1P115.TW 24 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) A7 548631 B7 五、發明說明(25) (請先閱讀背面之注意事項再填寫本頁) 表4是自我參考統計値,也就是對象辨識自己話語的準確 性。可以看出人是比較能夠辨識自己的情緒的(平均値 80.0%),尤其是生氣(98.1%)、悲傷(80.0%)及害怕 (78.8%)。有趣的是,辨識害怕的準確性比辨識愉快還要 高。有些對象無法辨識自己的愉快和正常情緒。 表4 自我參考統計値 類別 平均値 硏究偏差 中間値 最小値 最大値 正常 71.9 25.3 75.0 0.0 100.0 愉快 71.2 33.0 75.0 0.0 100.0 生氣 98.1 6.1 100.0 75.0 100.0 悲傷 80.0 22.0 81.2 25.0 100.0 害怕 78.8 24.7 87.5 25.0 100.0 總計 400.0 65.3 412.5 250.0 500.0 經濟部智慧財產局員工消費合作社印製 從700句話語集中,可以選出五個包含至少能讓百分之户 的對象(p=70、80、90、95及100%)正確辨識表達指定 情緒話語的巢狀資料集。本文中以s70、s80、s90、s95 及slOO來代表這些資料集。以下表5是每一組資料集中 的元素數目。可以看出,全部對象都能正確辨識的話語只 有7.9%。此値呈線性增加,到s70資料集時達到52.7%, 相當於話語情緒解譯的70%程度。 4Hickman200021tw; AND1P115.TW 25 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(·ΐ4) V表5 Ρ程度一致資料集 資料集 s70 s80 s90 s95 slOO 大小 369 52.7% 257 36.7% 149 21.3% 94 13.4% 55 7.9% 以上結果提供了珍貴的人類語言運用洞察力,也可以作爲和 電腦語言運用比較的基準。 特性解析 根據發現,聲調是情緒辨識的主要聲音提示。嚴格地說, 聲調是由基頻(F0),也就是聲帶振動的主要(最低)頻 率表現。會影響聲音情緒訊號的其他聲音變數包括: •聲音能量 •頻譜特性 •共振峰(通常只考慮前面的一兩個共振峰(F1、F2))。 •時間特性(說話速率和停頓)。 另一種解析特性的方法是再加強特性集的內容,考慮一些 衍生的特性,例如訊號的LPC (線性預測編碼)參數或最 平順的聲調輪廓及其衍生物。 本發明可以採用以下的策略。首先要考慮基頻F〇 (聲帶振 動的主要(最低)頻率)、能量、說話速率、前三個共振 4Hickman200021tw; AND1P115.TW 26 本纸張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) --------------------訂---------線 · (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 五、發明說明(>q) 峰(FI、F2及F3)與其頻寬(BW1、BW2及BW3)以及 愈多愈好的計算。然後利用特性選擇技術將統計値排列, 選取其中一組最重要的特性。 說話速率可以利用話語中聲音部分平均長度的倒數來計 算。至於其他一切參數,可以計算下列的統計値:平均値、 標準偏差値、最小値、最大値及範圍。另外,還可以計算 F0的斜率,就是話語中有聲部分的線性回歸,也就是最 適合聲調輪廓的線。也可以利用話語總能量中有聲能量的 比例算出相對的有聲能量。這些全部加起來,每一段話語 約有40項特性。 可以利用RELIEF-F演算選擇特性。例如,爲s70資料集 執行RELIEF-F,將最近的數字從1變化到12,再根據等 級總和排列特性。所得的前14項特性如下:F0最大値、 F0標準偏差、F0範圍、F0平均値、BW1平均値、BW2 平均値、能量標準偏差、說話速率、F0斜率、F1最大値、 能量最大値、能量範圍、F2範圍及F1範圍。可以根據等 級總和形成三組巢狀特性集,硏究特性對情緒辨識準確性 的影響。第一組包括前八項特性(從F0最大値到說話速 率)、第二組是第一組再加緊接著的兩項特性(F0斜率和 F1最大値)、第三組包含全部的14項特性。關於RELIEF-F 演算,在 Proc. European Conf· On Machine Learning (1994) 裡由 I· Kononenko 所寫的 “Estimating attributes: Analysis 4Hickman200021tw; AND1P115.TW 27 本紙張尺度適用中國國家標準(CNS)A4規格(210 x 297公釐) (請先閱讀背面之注意事項再填寫本頁) -— 訂---------線義 經濟部智慧財產局員工消費合作社印制衣 548631 A7 B7 五、發明說明(xg) and extension of RELIEF” 一文中 171-182 頁有詳細的介 紹,此處提到只是作爲參考。 (請先閱讀背面之注意事項再填寫本頁) 圖2說明本發明的一具體實施例,其係使用語音分析偵測 情緒。作業200接收麥克風收音或數位化樣本等聲音訊 號。在作業202中依照前面的說明解析及選取預定數量的 語音訊號特性。這些特性包括基頻最大値、基頻標準偏差、 基頻範圍、基頻平均値、第一共振峰平均頻寬、第二共振 峰平均頻寬、能量標準偏差、說話速率、基頻斜率、第一 共振峰最大値、能量最大値、能量範圍、第二共振峰範圍 以及第一共振峰範圍。利用作業202中選取的特性,在作 業2〇4中根據摘取的特性判斷語音訊號中相關的情緒。最 後’在作業206輸出判斷的情緒。有關根據本發明以語音 訊號爲基礎判斷情緒的說明,請參考下面的討論,尤其是 圖8和9的參照討論。 語音訊號的特性最好是從基頻最大値、基頻標準偏差、 經濟部智慧財產局員工消費合作社印製 基頻範圍、基頻平均値、第一共振峰平均頻寬、第二共振 峰平均頻寬、能量標準偏差及說話速率等組成的特性群組 中選取。理想情況是摘取的特性至少包含基頻斜率和第一 共振峰最大値之一。 可以選擇摘取一些特性,包括基頻最大値、基頻標準偏差、 基頻範圍、基頻平均値、第一共振峰平均頻寬、第二共振 4Hickman200021tw; AND1P115.TW 28 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 五、發明說明(叫) 峰平均頻寬、能量標準偏差以及說話速率。摘取的特性最 好是包括基頻斜率和第一共振峰最大値。 另一種選擇摘取的特性包括基頻最大値、基頻標準偏差、 基頻範圍、基頻平均値、第一共振峰平均頻寬、第二共振 峰平均頻寬、能量標準偏差、說話速率、基頻斜率、第一 共振峰最大値、能量最大値、能量範圍、第二共振峰範圍 以及第一共振峰範圍。 電腦效能 可以採取兩種不範性的方法辨識話語中的情緒:神經網路 和分類器組合。第一種方法使用雙層式後方傳播神經網路 架構,有8、10或14個元素的輸入向量、隱藏的s形階 層有10或20個節點、輸出線性階層有$個節點。輸出數 目和情緒類別的數目一樣。使用s7〇、sso及s9〇等資料 集訓練及測試演算。可以隨機將這些資料集分成訓練(話 語的67%)和測試(33%)子集。可以建立使用不同初始 加權矩陣訓練的許多個神經網路分類器。以這種方法套用 至s7〇資料集和上述的8項特性集時,其平均準確性約爲 S5%,各種情緒類別的準確性分佈如下:正常狀態40_ 50%、愉快55-60%、生氣60-80%、悲傷60-70%以及害怕 20-40% ° 4Hickman200021tw; AND1P115.TW 29 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ------------·衣 (請先閱讀背面之注意事項再填寫本頁)4Hickman200021tw; AND1P115.TW 24 This paper size applies to Chinese National Standard (CNS) A4 (210 X 297 mm) A7 548631 B7 V. Description of the invention (25) (Please read the precautions on the back before filling this page) Table 4 It is self-reference statistics, that is, the accuracy with which an object recognizes its own words. It can be seen that people are more able to recognize their emotions (average 値 80.0%), especially angry (98.1%), sad (80.0%), and afraid (78.8%). Interestingly, identifying fear is more accurate than identifying pleasure. Some subjects are unable to recognize their happiness and normal mood. Table 4 Self-reference statistics: category average research deviation middle, minimum, maximum, normal 71.9 25.3 75.0 0.0 100.0 happy 71.2 33.0 75.0 0.0 100.0 angry 98.1 6.1 100.0 75.0 100.0 sad 80.0 22.0 81.2 25.0 100.0 afraid of 78.8 24.7 87.5 25.0 100.0 total 400.0 65.3 412.5 250.0 500.0 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs From the 700-sentence set, five objects (p = 70, 80, 90, 95, and 100%) containing at least 100% of households can be correctly identified A nested data set expressing the specified emotional discourse. In this paper, these data sets are represented by s70, s80, s90, s95, and slOO. Table 5 below is the number of elements in each data set. It can be seen that only 7.9% of the words were correctly recognized by all subjects. This increase linearly, reaching 52.7% by the time of the s70 dataset, which is equivalent to 70% of the interpretation of discourse emotions. 4Hickman200021tw; AND1P115.TW 25 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) 548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the invention (· ΐ4) V Table 5 P Level consistency data set Data set s70 s80 s90 s95 slOO Size 369 52.7% 257 36.7% 149 21.3% 94 13.4% 55 7.9% The above results provide valuable insights into the use of human language and can also be used as a benchmark for comparison with computer language use. Analysis of characteristics According to the findings, tone is the main voice prompt for emotion recognition. Strictly speaking, the tone is represented by the fundamental frequency (F0), which is the main (lowest) frequency of vocal cord vibration. Other sound variables that affect the emotional signal of a sound include: • Sound energy • Spectral characteristics • Formants (usually only the first one or two formants (F1, F2) are considered). • Time characteristics (speech rate and pauses). Another way to analyze the characteristics is to further strengthen the content of the feature set, considering some derived characteristics, such as the signal's LPC (linear predictive coding) parameters or the smoothest tone contour and its derivatives. The present invention can adopt the following strategies. First consider the fundamental frequency F0 (the main (minimum) frequency of vocal cord vibration), energy, speech rate, and the first three resonances 4Hickman200021tw; AND1P115.TW 26 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 Mm) -------------------- Order --------- line · (Please read the precautions on the back before filling this page) 548631 A7 B7 V. Description of the invention (&q; q) The peaks (FI, F2, and F3) and their bandwidths (BW1, BW2, and BW3) and the more the better the calculation. Then use the feature selection technique to arrange the statistics and select one of the most important features. Speaking rate can be calculated using the reciprocal of the average length of the vocal part of the discourse. For all other parameters, the following statistical 値 can be calculated: average 値, standard deviation 値, minimum 値, maximum 値, and range. In addition, the slope of F0 can also be calculated, which is the linear regression of the vocal part of the utterance, that is, the line most suitable for the contour of the tone. The relative sound energy can also be calculated by using the proportion of sound energy in the total speech energy. All of this adds up to about 40 characteristics in each discourse. Features can be selected using the RELIEF-F algorithm. For example, perform RELIEF-F for the s70 dataset, changing the most recent number from 1 to 12, and then rank the characteristics according to the sum of the grades. The first 14 characteristics obtained are as follows: F0 maximum 値, F0 standard deviation, F0 range, F0 average 値, BW1 average 値, BW2 average 値, energy standard deviation, speech rate, F0 slope, F1 maximum 値, energy maximum 値, energy Range, F2 range, and F1 range. Three sets of nested feature sets can be formed based on the sum of grades to investigate the influence of the features on the accuracy of emotion recognition. The first group includes the first eight characteristics (from F0 maximum 値 to the rate of speech), the second group is the first group followed by the next two characteristics (F0 slope and F1 maximum 値), and the third group contains all 14 characteristics . Regarding the calculation of RELIEF-F, "Estimating attributes: Analysis 4Hickman200021tw; AND1P115.TW 27" written by I. Kononenko in Proc. European Conf · On Machine Learning (1994) 27 This paper is based on the Chinese National Standard (CNS) A4 specification ( 210 x 297 mm) (Please read the notes on the back before filling out this page) ----- Order --------- Printed clothing by the Consumers ’Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Invention Detailed descriptions are provided in the "xg) and extension of RELIEF" pages 171-182, which are mentioned here for reference only. (Please read the notes on the back before filling out this page.) Figure 2 illustrates a specific embodiment of the present invention, which uses speech analysis to detect emotions. Job 200 receives a sound signal such as a microphone or a digitized sample. In operation 202, a predetermined number of voice signal characteristics are analyzed and selected according to the foregoing description. These characteristics include maximum fundamental frequency chirp, fundamental frequency standard deviation, fundamental frequency range, fundamental frequency average chirp, first formant average bandwidth, second formant average bandwidth, energy standard deviation, speech rate, fundamental frequency slope, first A formant maximum chirp, energy maximum chirp, energy range, second formant range, and first formant range. Using the characteristics selected in job 202, the relevant emotions in the speech signal are judged based on the extracted characteristics in job 204. Finally, the judged emotion is output at homework 206. For a description of judging emotions based on a voice signal according to the present invention, please refer to the discussion below, especially the reference discussions of FIGS. 8 and 9. The characteristics of the voice signal are preferably from the maximum fundamental frequency, the standard deviation of the fundamental frequency, the basic frequency range printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs, the average fundamental frequency, the average frequency of the first formant, and the average of the second formant. From the feature group consisting of bandwidth, energy standard deviation, and speech rate. Ideally, the extracted characteristics include at least one of the fundamental frequency slope and the largest chirp of the first formant. You can choose to extract some characteristics, including the fundamental frequency maximum 値, fundamental frequency standard deviation, fundamental frequency range, fundamental frequency average 値, average frequency of the first formant, second resonance 4Hickman200021tw; AND1P115.TW 28 This paper size applies to China Standard (CNS) A4 specifications (210 X 297 mm) 548631 A7 B7 5. Description of the invention (called) Peak average bandwidth, standard deviation of energy, and speech rate. The extracted characteristics preferably include the slope of the fundamental frequency and the maximum chirp of the first formant. Another optional feature includes the fundamental frequency maximum chirp, fundamental frequency standard deviation, fundamental frequency range, fundamental frequency average chirp, first formant average bandwidth, second formant average bandwidth, energy standard deviation, speech rate, Fundamental frequency slope, first formant maximum 値, energy maximum 値, energy range, second formant range, and first formant range. Computer Performance There are two non-standard ways to identify emotions in speech: a combination of neural networks and classifiers. The first method uses a two-layer rear-propagation neural network architecture with an input vector of 8, 10, or 14 elements, a hidden sigmoidal layer with 10 or 20 nodes, and an output linear layer with $ nodes. The number of outputs is the same as the number of emotion categories. Use s70, sso, and s90 to train and test calculus. These data sets can be randomly divided into training (67% of discourse) and testing (33%) subsets. Many neural network classifiers can be built using different initial weighting matrices. When applied to the s70 data set and the above 8 feature sets in this way, the average accuracy is about S5%, and the accuracy distribution of various emotion categories is as follows: normal state 40_50%, happy 55-60%, angry 60-80%, sadness 60-70%, and fear 20-40% ° 4Hickman200021tw; AND1P115.TW 29 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) -------- ---- · Clothing (Please read the precautions on the back before filling in this page)
訂---------線I 經濟部智慧財產局員工消費合作社印製 548631Order --------- Line I Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economy 548631
五、 經濟部智慧財產局員工消費合作社印製 發明說明(3〇) 第二種方法使用的是分類器組合。組合包含奇數的神經網 路分類器,使用啓動聚集和交叉驗證監護技巧以不同的訓 練子集訓練。組合根據主要選擇原則進行決策。建議的組 合大小是7到15。 调3是針對s70資料集、全部三組特性集以及兩種神經網 路架構(隱藏階層分別有1〇和20條神經)試驗的平均辨 識準確性。可以看出不同特性集和架構的愉快準確度大抵 上都相同(〜68%)。害怕的準確性較低(15-25%)。生氣的 準確度在8項特性集時相對較低(40-45%),但是到了 14 項特性集時大幅增加(65%)。不過悲傷的準確度在8項特 性集時比其他組特性集要來得高。平均準確度約爲55%。 害怕的低準確度和理論結果相同。根據理論,發生無關聯 錯誤的個別分類器比率超過0.5時(本例是0.6-0.8),選 舉組合的錯誤率就會增加。 圖4是s80資料集的結果。可以看出正常狀態的準確度很 低(20-30%)。害怕的準確度變化很大,從8項特性集10 條神經架構的11%到10項特性集10條神經架構的53%。 愉快、生氣及悲傷的準確度相對較高(68-83%)。平均準 確度(61%)高於s70資料集。 圖5是s90資料集的結果。可以看出害怕的準確度較高 (25-60%),但是形式和s80資料集相同。悲傷和生氣的準 4Hickman200021tw; AND1P115.TW 30 (請先閱讀背面之注意事項再填寫本頁) |裝--------訂---------線j 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)5. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economics (30) The second method uses a combination of classifiers. Combining neural network classifiers with odd numbers, use initiation aggregation and cross-validation monitoring techniques to train with different training subsets. Portfolio decisions are made based on key selection principles. The recommended combination size is 7 to 15. Tune 3 is the average recognition accuracy for the s70 data set, all three sets of feature sets, and two neural network architectures (10 and 20 nerves in the hidden hierarchy, respectively). It can be seen that the pleasant accuracy of different feature sets and architectures is almost the same (~ 68%). Fear is less accurate (15-25%). The accuracy of anger was relatively low at 8 feature sets (40-45%), but it increased significantly by 14 feature sets (65%). However, the accuracy of sadness is higher in the eight feature sets than in the other sets. The average accuracy is about 55%. The fear of low accuracy is the same as the theoretical result. According to the theory, when the ratio of individual classifiers with unassociated errors exceeds 0.5 (0.6-0.8 in this example), the error rate of the selected combination increases. Figure 4 is the result of the s80 dataset. It can be seen that the accuracy of the normal state is very low (20-30%). Fear of accuracy varies widely, from 11% of 10 feature sets with 10 neural architectures to 53% of 10 feature sets with 10 neural architectures. Pleasure, anger, and sadness are relatively accurate (68-83%). The average accuracy (61%) is higher than the s70 data set. Figure 5 is the result of the s90 dataset. It can be seen that fear is more accurate (25-60%), but in the same form as the s80 dataset. Sad and angry quasi 4Hickman200021tw; AND1P115.TW 30 (Please read the precautions on the back before filling this page) China National Standard (CNS) A4 specification (210 X 297 mm)
經濟部智慧財產局員工消費合作社印製 548631 a7 B7 五、發明說明G\) 確度非常高:生氣是75-100%,悲傷是88-93%。平均準 確度(62%)約和s80資料集相等。 圖6說明本發明的一具體實施例’其係使用統計値偵測情 緒。首先是由作業600提供一個資料庫,資料庫中包含和 人類情緒相關聯的語音參數統計値,例如以上各表格和圖 3至5所示。此外,資料庫中也會包含與害怕相關的許多 語音聲調及其他和愉快相關的許多語音聲調,以及特定聲 調的錯誤範圍。接下來是在作業接收語音訊號。作業 604會從語音訊號中摘取一或數項特性。有關從語音訊號 摘取特性的細節,請參閱前面討論特性摘取的部分。接著 在作業606中以摘取的語音特性和資料庫中的語音參數作 比較。作業608根據摘取的語音特性和語音參數的比較’ 從資料庫中選取情緒。例如比較資料庫中的數位化話語樣 本和從語音訊號中摘取特性的數位化樣本,建立可能的情 緒淸單,然後利用演算考慮人類準確度的統計値以辨識情 緒,最後判斷出最可能的情緒。選取的情緒最後會在作業 610輸出。關於辨識話語中情緒的電腦化機制,請參考後 面的「偵測語音訊號中情緒的示範裝置」。 於本發明的一面向中,資料庫包含與情緒相關的特定語音 特性的可能性。從資料庫選取的情緒集最好包含分析可能 性以及根據可能性選取最可能的情緒。資料庫可能性可以 選擇性包含語言運用混淆統計値,例如前面「語言運用混 4Hickman200021tw; AND IP 115.TW 31 -----------裝--------訂---------線 (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS)A4規格(210 χ 297公釐)Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 a7 B7 V. Invention Description G \) The accuracy is very high: anger is 75-100%, sadness is 88-93%. The average accuracy (62%) is approximately equal to the s80 dataset. Fig. 6 illustrates a specific embodiment of the present invention 'which uses statistics to detect emotions. First, a database is provided by job 600. The database contains speech parameter statistics related to human emotions, as shown in the above tables and Figures 3 to 5. In addition, the database contains many voice tones related to fear and many other tones related to pleasure, as well as the range of errors for specific tones. The next step is to receive voice signals during the job. Assignment 604 extracts one or more characteristics from the voice signal. For details on extracting features from a voice signal, see the section that discusses feature extraction earlier. Then in operation 606, the extracted speech characteristics are compared with the speech parameters in the database. Assignment 608 selects emotions from the database based on the comparison of the extracted speech characteristics and speech parameters. For example, compare the digitized utterance samples in the database with the digitized samples of the characteristics extracted from the speech signal to establish a possible emotional list, and then use calculations to consider statistics of human accuracy to identify emotions, and finally determine the most likely mood. The selected emotions are finally output at homework 610. Regarding the computerized mechanism for identifying emotions in speech, please refer to the "Demonstration Device for Detecting Emotions in Speech Signals" below. In one aspect of the invention, the database contains the possibility of specific speech characteristics related to emotions. The sentiment set selected from the database preferably includes analysis of the likelihood and selection of the most likely sentiment based on the likelihood. Database possibilities can optionally include language usage confusion statistics, such as the previous "Language Usage Mix 4Hickman200021tw; AND IP 115.TW 31 ----------- 装 -------- Order- -------- Line (Please read the precautions on the back before filling in this page) This paper size applies to China National Standard (CNS) A4 (210 x 297 mm)
548631 A7 B7 五、發明說明(3X) 淆矩陣」所示的內容。另外,資料庫中的統計値也可以選 擇性包含如前面表格所示的自動辨識統計値。 (請先閱讀背面之注意事項再填寫本頁) 於本發明的另一面向中,摘取的特性包括基頻最大値、基 頻標準偏差、基頻範圍、基頻平均値、第一共振峰平均頻 寬、第二共振峰平均頻寬、能量標準偏差、說話速率、基 頻斜率、第一共振峰最大値、能量最大値、能量範圍、第 二共振峰範圍及/或第一共振峰的範圍。 經濟部智慧財產局員工消費合作社印制衣 圖7是在商業環境下偵測語音中焦慮情緒以防詐欺的方法 流程圖。首先是在作業700接收一個人在商業事件中的語 音訊號。例如,在人員旁邊以麥克風收音,或者利用電話 接線捕捉的語音訊號。然後在作業702裡分析商業事件中 的語音訊號,判斷人員的焦慮程度。語音訊號會依照前面 的說明分析。作業704會輸出焦慮程度的指示値,最好是 在商業事件結束前輸出,讓有意防止詐欺者評估是否要在 該人員離去前面對面查證。輸出可以採取任何形式,包括 書面列印或電腦螢幕上的畫面顯示。必須了解的是,本發 明的具體實施例可以偵測焦慮以外的情緒,包括因爲壓力 引起的緊張和欺騙時會有的其他情緒。 本發明的具體實施例特別適用於合約協商、保險交涉、客 戶服務等商業範圍。這些範圍的詐欺一年會讓許多公司損 失數以百萬計的金錢。幸好本發明將可提供協助消除詐欺 4Hickman200021tw; AND1P115.TW 32 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 五、發明說明(乃) 的工具。另外還要注意,本發明也可應用於執法及法庭環 境等領域。 最好是輸出該人員焦慮程度的確定程度,協助查緝詐欺者 判斷說話的人所說的話是否有欺騙成分。這可以根據前面 討論參考圖6的本發明具體實施例的統計値決定。可以選 擇即時輸出說話者的焦慮程度指示,讓防阻詐欺者能夠很 快得到結果,在對方說出可疑的話語後立即盤問。 還有一種選擇,焦慮程度指示中可以包一個警報,在焦慮 程度超過預定程度以後就發出警報。警報可以包含電腦畫 面上的視覺通知或警報聲等,警告監督者、監聽者及/或 查緝詐欺者。也可以將警報連接至錄音裝置,在還沒有監 錄談話的情況下,只要發出警報就會開始監錄談話。 警報選項特別適合許多人輪流講話的情況。例如客戶服務 部門或打開客戶服戶代表的電話。每一位客戶輪流和客戶 服務代表講話時,本發明將會偵測客戶話語中的焦慮程 度。如果客戶的焦慮程度超過預定程度而觸動警報,客戶 服務代表電腦螢幕上的視覺指示或閃光燈等就可以通知服 務代表。客戶服務代表就會知道可能有欺騙的行爲,在確 定以後伺機揭穿。也可以利用警報通知主管。另外,也可 以在觸動警報後開始監錄談話。548631 A7 B7 V. Description of Invention (3X) Confusion Matrix ". In addition, the statistical data in the database can optionally include automatic identification statistics as shown in the previous table. (Please read the precautions on the back before filling this page) In another aspect of the present invention, the extracted characteristics include the fundamental frequency maximum chirp, fundamental frequency standard deviation, fundamental frequency range, fundamental frequency average chirp, and the first formant Average frequency bandwidth, average frequency of second formants, standard deviation of energy, speech rate, fundamental frequency slope, maximum formant 値, maximum energy 値, energy range, second formant range and / or range. Figure 7 is a flowchart of a method for detecting anxiety in voice in a business environment to prevent fraud. The first is to receive a voice signal of a person in a business event at job 700. For example, pick up a microphone next to a person or use a voice signal captured by a telephone cable. Then in job 702, analyze the voice signals in the business event to determine the degree of anxiety of the personnel. The voice signal will be analyzed according to the previous instructions. Homework 704 will output an indication of the level of anxiety, preferably before the end of the business event, so that the scammer can evaluate whether he or she needs to be verified face-to-face before the person leaves. The output can take any form, including a printed print or a screen display on a computer screen. It must be understood that specific embodiments of the present invention can detect emotions other than anxiety, including stress and other emotions associated with deception. The specific embodiments of the present invention are particularly applicable to commercial areas such as contract negotiation, insurance negotiation, and customer service. These ranges of fraud can cost many companies millions of dollars a year. Fortunately, the present invention can provide assistance in eliminating fraud 4Hickman200021tw; AND1P115.TW 32 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 548631 A7 B7 V. The tool for explaining the invention (is). It should also be noted that the present invention is also applicable to the fields of law enforcement and court environment. It is best to output a certain degree of anxiety of the person to assist the fraudster in determining whether the words spoken by the person are deceptive. This can be determined based on the statistics of the specific embodiment of the present invention discussed above with reference to FIG. You can choose to output the speaker's anxiety indicator in real time, so that the fraud prevention person can get the result quickly, and ask the questioner immediately after the other person speaks suspicious words. Alternatively, an alert may be included in the anxiety indicator, and an alert may be issued when the anxiety exceeds a predetermined level. Alerts can include visual notifications or siren sounds on a computer screen to alert supervisors, listeners, and / or detect fraud. You can also connect an alarm to a recording device and start monitoring the conversation as soon as the alarm is issued without the conversation being monitored. The alert option is ideal for situations where many people take turns speaking. For example, the customer service department or a call to a customer service representative. As each customer takes turns speaking to a customer service representative, the present invention will detect the degree of anxiety in the customer's words. If the customer's anxiety level exceeds a predetermined level and an alarm is triggered, a visual indication on the customer service representative's computer screen or a flashing light can notify the service representative. The customer service representative will then know that there may be deceptive behavior and wait for the opportunity to expose it after it is determined. Alerts can also be used to notify supervisors. Alternatively, you can start monitoring the conversation after the alarm is triggered.
4Hickman200021tw; AND1P115.TW (請先閱讀背面之注意事項再填寫本頁)4Hickman200021tw; AND1P115.TW (Please read the notes on the back before filling this page)
ϋ ϋ· I ϋ ϋ ·ϋ ϋ 一-ον · I ϋ ϋ I n ϋ ϋ Iϋ ϋ · I ϋ ϋ · ϋ ϋ 一 -ον · I ϋ ϋ I n ϋ ϋ I
經濟部智慧財產局員工消費合作社印製 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)Printed by the Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs This paper is sized to the Chinese National Standard (CNS) A4 (210 X 297 mm)
A7 548631 B7 _ 五、發明說明(3$) 於本發明的一個具體實施例中,至少要摘取及使用語音訊 號中的一項特性,以判斷說話者的焦慮程度。可能摘取的 特性包括基頻最大値、基頻標準偏差、基頻範圍、基頻平 均値、第一共振峰平均頻寬、第二共振峰平均頻寬、能量 標準偏差、說話速率、基頻斜率、第一共振峰最大値、能 量最大値、能量範圍、第二共振峰範圍以及第一共振峰範 圍。因此,舉例而言,可以利用從基頻讀數中判斷的聲調 搖擺程度協助判斷焦慮的程度。搖擺程度愈大,焦慮程度 愈高。也可以考慮說話者話語中的停頓情形。 下一節介紹的是可用於判斷語音訊號中之情緒,包括焦 慮,的裝置。 偵測語音訊號中情緒的示範裝置 本節會討論根據本發明用以分析話語的幾個裝置。 本發明的一個具體實施例包括分析說話者話語以判斷情,緒 狀態的一裝置。分析儀分析的是人類話語裡第〜共振峰頻 帶的即時頻率或聲調成分。分析話語時,裝置會分析差別 的第一共振峰聲調、聲調變化率、持續時間和時間分佈圖 形等特定的値發生圖形。這些係數和瞬間及長期情,緒狀態 間的關係很複雜,也都是情緒的基本係數。A7 548631 B7 _ 5. Description of the invention (3 $) In a specific embodiment of the present invention, at least one characteristic of the voice signal must be extracted and used to judge the speaker's anxiety. Possible characteristics include the fundamental frequency maximum chirp, fundamental frequency standard deviation, fundamental frequency range, fundamental frequency average chirp, first formant average bandwidth, second formant average bandwidth, energy standard deviation, speech rate, fundamental frequency The slope, the maximum formant of the first formant, the maximum form of energy, the energy range, the second formant range, and the first formant range. So, for example, the degree of tonal sway judged from the fundamental frequency reading can be used to help determine the level of anxiety. The greater the swing, the higher the level of anxiety. It is also possible to consider pauses in the speaker's discourse. The next section describes devices that can be used to judge emotions, including anxiety, in a voice signal. Exemplary Devices for Detecting Emotions in Speech Signals This section discusses several devices for analyzing speech in accordance with the present invention. A specific embodiment of the present invention includes a device that analyzes the speaker's utterances to determine the state of emotion and thread. The analyzer analyzes the instantaneous frequency or tonal components of the ~ formant frequency band in human speech. When analyzing the utterance, the device analyzes specific radon occurrence patterns such as the differential first formant tone, tone change rate, duration, and time distribution pattern. The relationship between these coefficients and instant and long-term emotions and states is very complicated, and they are all basic coefficients of emotion.
4Hickman200021tw; AND1P115.TW 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ------------裝---- (請先閱讀背面之注意事項再填寫本頁) 訂---------線t 經濟部智慧財產局員工消費合作社印製 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明) 人類說話是由兩個基本的發聲機制啓始。聲帶:由肌肉控 制的細薄伸展薄膜,從肺送出的空氣通過這些薄膜時,薄 膜就會振動。發出"buzz”這個代表聲音時’基頻在80Hz 至240Hz之間。有意和無意的肌肉收縮和放鬆會使此頻 率在一個適當的範圍內變化。’’buzz’’的基本波形包含許 多諧波,有些會刺激和聲道相關的許多固定與可變腔的共 振。說話產生的第二種基本聲音是頻率分佈非常廣而且一 致的假隨機雜音。這是送出的空氣通過聲道形成的亂流造 成的,稱爲「嘶聲」。大部分的隨機雜音都經過舌部運動 調整,也會刺激固定和可變腔。說出來的話就是由「嗡聲」 和「嘶聲」的複雜混合,經過共鳴腔塑造和發音而產生的。 分析話說聲音中的能量分佈時’可以發現能量會分成幾個 明顯的頻帶,叫作共振峰。主要的共振峰有三個。此處討 論的系統利用從基本「嘶聲」頻率延伸至約1〇〇〇 Hz的第 一共振峰頻帶。此共振峰不足能量最高,也反映了各種聲 道和顏面肌肉張力變化功能造成的高度頻率調變。 事實上,分析特定的第一共振峰頻率分佈圖形就可以量測 話語相關的肌肉張力變化和互動。由於這些肌肉主要是受 第二無意識程序影響偏斜及發聲,而無意識程序則是受情 緒狀態的影響,因此不論說話者有沒有注意到該狀態,我 們都可以判斷情緒活動的相對量測結果。硏究結果也支持 一般普遍的假設,由於說話的機制非常複雜而且高度獨立 4Hickman200021tw; AND1P115.TW 35 (請先閱讀背面之注意事項再填寫本頁) 訂---------線i 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)4Hickman200021tw; AND1P115.TW This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) ------------ Installation ---- (Please read the precautions on the back first (Fill in this page) Order --------- line t Printed by the Employees 'Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs Printed by the Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs Printed by 548631 A7 B7 V. Invention Description) Human speaking is made by two The basic vocal mechanism begins. Vocal cords: Thin, stretchy films controlled by muscles that vibrate as air from the lungs passes through them. The “fundamental sound” of “quoting buzz” is between 80Hz and 240Hz. Intentional and unintentional muscle contraction and relaxation will cause this frequency to change within a proper range. The basic waveform of `` buzz '' contains many harmonic Waves, some of which can stimulate many fixed and variable cavity resonances related to the channel. The second basic sound produced by speaking is a pseudo-random noise with a very wide and consistent frequency distribution. This is the chaos formed by the air sent out through the channel. Caused by the flow is called "hisse." Most random murmurs are adjusted by tongue movement and also stimulate fixed and variable cavities. The words spoken are a complex mixture of "hum" and "hiss", shaped and pronounced in a resonance chamber. When analyzing the energy distribution in speech, it can be found that the energy is divided into several distinct frequency bands, called formants. There are three main formants. The system discussed here makes use of a first formant frequency band extending from a basic "fizz" frequency to about 1000 Hz. This formant lacks the highest energy, and it also reflects the high frequency modulation caused by various channels and facial muscle tension changes. In fact, analyzing specific first formant frequency distribution patterns can measure discourse-related muscle tension changes and interactions. Since these muscles are mainly deflected and vocalized by the second unconscious program, while the unconscious program is affected by the state of emotion, we can judge the relative measurement of emotional activity whether or not the speaker notices this state. The research results also support general assumptions, because the speaking mechanism is very complex and highly independent. 4Hickman200021tw; AND1P115.TW 35 (Please read the precautions on the back before filling this page) Order --------- line i This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)
548631 A7548631 A7
五、發明說明(冬 經濟部智慧財產局員工消費合作社印製 動作’因此很少人能夠有意識地「投射」虛構的情緒狀態。 事實上’想要表達虛構情緒往往會在語音圖形中產生自己 獨特的心理壓力「指紋」。 由於第一共振峰話語聲音的特性,本發明分析FM解調的 第一共振峰說話訊號,並且產生零値的輸出指示。 FM解調訊號中的零値或「平坦」點的頻率或數目、零値 的長度和單字期間內零値存在總時間與單字總時間的比率 全都是人情緒狀態的指示。使用者只要檢視裝置的輸出, 就可以看出或感覺零値的存在,再觀察零値的數目或頻率 輸出、零値的長度以及單字期間內零値存在總時間與單字 期間長度的比例,就可以判斷說話者的情緒狀態。 在本發明中,話語訊號的第一共振峰頻帶是FM解調訊 號,此訊號會套用至偵測FM解調訊號存在的單字偵測電 路。FM解調訊號也會套用至零値偵測工具,由此工具偵 測FM解調訊號中的零件,並產生輸出指示。輸出電路再 交連至單字偵測器和零値偵測器。單字偵測器偵測到FM 解調訊號的存在時就會啓動輸出電路,輸出電路再產生FM 解調訊號中有或沒有零値的輸出指示。輸出電路的輸出顯 示方式可以讓使用者察覺輸出,爲使用者提供FM解調訊 號中有零値存在的指示。裝置使用者只要監看零値,就可 以判斷語音分析對象的情緒狀態。 4Hickman200021tw; AND1P115.TW 36 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------•裝--------訂---------線· (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印制衣 548631 A7 B7 五、發明說明( 於本發明的另一個具體實施例中,分析的是顚首。所謂的 顫音是半隨意反應產生的,配合下列其他反應是硏究詐欺 的寶貴資料:呼吸量;吸氣-呼氣比;新陳代謝比·’呼吸 規律性與比率;字詞與觀念的關聯;臉部表情·,運動原反 應;以及一些特定麻醉品的反應。不過’以前尙未開發出 能有效且可靠分析聲音變化,在臨床上判斷對象情緒狀 態、意見或試圖欺騙的技術。 早期嘗試將音質變化和情緒刺激因素關聯的試驗證實了人 類說話受到情緒很大的影響。根據壓力刺激測得的語音中 可偵測的變化遠比傳統的自律神經系統作用造成的生理徵 候指示來得快。 壓力會造成兩種聲音變化。第一種是只有壓力極重的情況 才會造成的總變化。這類變化會表現在說話速率、音量、 聲音顫抖、音節間距離變化以及聲音基本聲調或頻率變化 等可察覺的變化上。壓力程度低於完全失控的程度時,至 少有些對象會有意識地控制此一總變化。 第二種聲音變化是音質的變化。人耳並無法識別這類變 化,不過很顯然是在輕微壓力下因爲聲帶稍微拉緊,造成 抑制選取的頻率變化的下意識表現。以圖形表現時,很容 易可以識別無壓力或正常的發聲和中度壓力、試圖欺騙或 敵視態度下的發音。這些模型在兩性、各種年齡層以及各 4Hickman200021tw; AND1P115.TW 37 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ------------裝— (請先閱讀背面之注意事項再填寫本頁) 訂---------線0· 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明説明(3牙) 種情況條件下完全適用。第二種變化並不受意識控制。 人類發聲結構會發出兩種聲音。第一種聲音是聲帶振動造 成的’而聲帶振動則是聲門部分關閉,肺腔和肺收縮強迫 空氣通過聲門造成的。根據說話者的性別和年齡,以及說 話者使用的音調,這些振動的頻率通常會在1〇〇至300 Hz 之間變化。這種聲音的衰退時間很快。 第二種聲音牽涉到共振峰頻率,構成頭部各腔室共鳴造成 的聲音,包括喉腔、口腔、鼻腔及竇腔等。以聲帶發出的 有聲音來說’這種聲音是因爲較低頻率的音源刺激共鳴腔 造成的,而以無聲摩擦的情形而言,則是肺部送出空氣的 通道被部分限制造成的。不論是哪一種刺激源,共振峰的 頻率都是由相關腔室的共鳴頻率決定。共振峰頻率通常約 在800 Hz,出現在與各腔室共鳴頻率一致的明確頻帶。 第一個(或最低的)共振峰是口腔和喉腔造成的共振峰, 値得注意的是其頻率會隨著口腔在形成各種聲音時尺寸和 容量的變化而移動,尤其是發母音的時候。最高的共振峰 頻率比較穩定,因爲腔室的容量比較穩定。共振峰波形和 迅速衰退的聲帶訊號不同,屬於持續訊號。發出有聲音時, 聲音波形會以振幅調變的方式加載至共振峰波形上。 根據發現,人類聲音中有第三類訊號存在,而且這類訊號 和前面討論的第二種聲音變化有關。這類訊號屬於亞聲, 4Hickman200021tw; AND1P115.TW 38 (請先閱讀背面之注意事項再本頁)V. Description of the invention (Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Winter Economics, so few people can consciously "project" fictional emotional states. In fact, "want to express fictional emotions, they often produce themselves in voice graphics Unique psychological pressure "fingerprint". Due to the characteristic of the first formant speech sound, the present invention analyzes the first formant speech signal of FM demodulation and generates an output indication of zero chirp. The zero chirp or " The frequency or number of "flat" points, the length of the zeros, and the ratio of the total time of the zeros to the total time of the singles are all indicators of the emotional state of the person. As long as the user looks at the output of the device, he can see or feel zeros. The presence of 値, and then observing the number or frequency output of zero 値, the length of zero 値, and the ratio of the total time of zero 値 existence to the length of the single word period in the word period can determine the emotional state of the speaker. The first formant frequency band is the FM demodulation signal, which is applied to the single-word detection circuit that detects the presence of the FM demodulation signal. FM The demodulated signal is also applied to the zero-line detection tool, which detects the parts in the FM demodulated signal and generates an output indication. The output circuit is then connected to the word detector and zero-line detector. Word detection When the device detects the presence of the FM demodulated signal, it will start the output circuit, and the output circuit will generate an output indication with or without zero in the FM demodulated signal. The output display mode of the output circuit allows the user to perceive the output. The user provides an indication of the presence of zero noise in the FM demodulation signal. As long as the user of the device monitors the zero noise, he can judge the emotional state of the voice analysis object. Specifications (210 X 297 mm) ----------- • installation -------- order --------- line · (Please read the precautions on the back first (Fill in this page) Printed clothing by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Description of the invention (In another specific embodiment of the present invention, the head is analyzed. The so-called vibrato is generated by semi-random response, Cooperating with the following other reactions is to investigate fraud Valuable data: breathing volume; inspiration-expiration ratio; metabolism ratio · 'Respiration regularity and ratio; words and concepts; facial expressions · motorogen response; and some specific drug responses. But' previously尙 No technology has been developed that can effectively and reliably analyze sound changes, and clinically judge the subject's emotional state, opinions, or attempt to deceive. Early experiments that tried to associate changes in sound quality with emotional stimuli confirmed that human speech is greatly affected by emotions. The detectable changes in speech measured by stress stimuli are much faster than the physiological signs indicated by the traditional autonomic nervous system. Stress causes two kinds of sound changes. The first is only caused by extreme pressure These changes are manifested in perceptible changes such as speech rate, volume, tremor of the sound, changes in distance between syllables, and changes in the basic pitch or frequency of the sound. When the level of stress is below the level of total runaway, at least some subjects will consciously control this total change. The second change in sound is a change in sound quality. The human ear cannot recognize such changes, but it is clear that the vocal cords are slightly tightened under slight pressure, which results in a subconscious expression that suppresses the selected frequency change. When expressed graphically, it is easy to recognize pronunciations with no or normal vocalization and moderate stress, attempts to deceive, or hostility. These models are of both sexes, all ages, and 4Hickman200021tw; AND1P115.TW 37 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) ------------ installation— ( Please read the precautions on the back before filling this page) Order --------- Line 0 · 548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the Invention (3 teeth) Exactly. The second change is not controlled by consciousness. The human vocal structure makes two sounds. The first sound is caused by vocal cord vibration, and the vocal cord vibration is caused when the glottis is partially closed, and the lung cavity and lung contraction force air through the glottis. Depending on the gender and age of the speaker, and the tone used by the speaker, the frequency of these vibrations typically varies between 100 and 300 Hz. This sound decays quickly. The second kind of sound involves the formant frequency, which constitutes the sound caused by the resonance of the various chambers of the head, including the larynx, oral cavity, nasal cavity, and sinus cavity. In the case of vocal cords, ‘This sound is caused by a lower-frequency sound source stimulating the resonance chamber, and in the case of silent friction, it is caused by the passage of air from the lungs being partially restricted. Regardless of the stimulus source, the frequency of the formant is determined by the resonance frequency of the relevant chamber. The formant frequency is usually around 800 Hz and appears in a well-defined frequency band consistent with the resonance frequency of each chamber. The first (or lowest) formant is the formant caused by the oral cavity and the larynx. It should be noted that its frequency will move with the change in size and capacity of the oral cavity when forming various sounds, especially when the vowel is pronounced . The highest formant frequency is relatively stable because the volume of the chamber is relatively stable. The formant waveform is different from the rapidly decaying vocal cord signal, which is a continuous signal. When a sound is emitted, the sound waveform is loaded onto the formant waveform by means of amplitude modulation. It was found that a third type of signal exists in human sounds, and that this type of signal is related to the second type of sound change discussed earlier. This type of signal is subsonic, 4Hickman200021tw; AND1P115.TW 38 (Please read the precautions on the back before this page)
-、1T 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) 548631 經濟部智慧財產局員工消費合作社印製 A7 __ B7________ 發明説明(3^) 也就是人類聽覺聽不到的頻率調變,聲帶聲音和共振峰聲 音中多少都有其存在。此訊號通常介於8至12 Hz間,因 此人耳聽不到。由於此特性構成頻率調變,不是振幅調變, 因此無法直接在時基/振幅圖記錄中察覺。因爲此亞聲訊 號是心理壓力的重要聲音指標之一,所以要更深入地處 理。 有許多類似物可用於提供整個發聲程序的圖形表現。而且 機械式和電子式類似物都已經順利採用,例如設計電腦語 音。不過,這些類似物都把有聲訊號源(聲帶)和腔壁視 爲堅硬且不變的特性。但是,聲帶和產生共振峰的主要腔 壁實際上都是有彈性的細胞組織,敏銳地反應控制細胞組 織的複雜肌肉群的一切動作。利用骨骼和連結控制聲帶的 這些肌肉允許人有意的和自動的發出聲音和聲調振動。同 樣地,控制舌頭、嘴唇及喉嚨的肌肉允許有意的和自動的 控制第一共振峰的頻率。其他共振峰也是受到同樣的影 響,不過程度較小。 値得注意到是,正常說話時,這些肌肉的動作只是其總工 作能力的一小部分。因此,即使運用這些肌肉改變聲帶的 位置和唇、舌及內喉壁的位置,肌肉還是保持在相對放鬆 的狀態。硏究發現,在此相對放鬆的狀態下,自然的肌肉 起伏通常是發生在前面提過的8-12 Hz頻率。這種起伏造 成聲帶張力的輕微變化,也使聲音的基本聲調頻率改變。 4Hickman200021tw; AND1P115.TW 39 本紙張又度適用中國國家標準(CNS ) A4規格(210 X 297公釐) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明説明(斗〇) 此外,起伏也會稍微改變共鳴腔(尤其是與第一共振峰相 關的共鳴腔)的容量和腔壁的彈性,使得共振峰頻率改 變。這些中心頻率的改變構成了中心或載波頻率的調變。 特別要注意的是,不論是聲音基本聲調頻率的改變或共振 峰頻率的改變,聽的人都無法察覺,部分是因爲變化很小, 部分是因爲其頻率主要都位於前面提到的不可聞頻率範圍 內。 爲了觀察此頻率調變,可以採用現有眾多頻率調變技術中 的任何一種,當然要記住調變頻率的値是在8-12 Hz之間, 而且載波是聲音頻譜中的一個頻帶。 爲了充份了解上面的說明,要先了解此波形的「質量中心」 的觀念。我們能夠大約判斷任何一段錄音筆行程兩頭之間 的中心點。如果將所有行程兩頭間的中心點標出來,再將 這些中心點大槪連成一條連續曲線,可以看出形成一條約 等於整個波形平均値或「質量中心」的線。將這些記號都 連起來,稍作平滑修正,就會產生一條平滑的曲線。這條 線代表由先前提到的起伏造成的亞聲頻率調變。 如上所述,在個人試驗中施加輕微或中度心理壓力時,與 聲帶和腔壁相關的肌肉群會受到溫合的肌肉張力。對象察 覺不出此張力,即使是檢查者,在沒有輔助觀察技術的情 4Hickman200021tw; AND1P115.TW 40 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閱讀背面之注意事項再本頁) —裝. 本 -訂 線 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明説明(Μ) 況下也無法察覺。但是此張力卻足以減少或幾乎消除未受 壓力對象的肌肉起伏情形,因此消除產生亞聲頻率調變的 載波頻率變化基礎。 雖然只有採用聲音作爲心理壓力評估生理媒介的技術才會 使用亞聲波形’但是在目前使用,也能夠偵測生理變化的 技術和裝置裡,聲音並不會提供心理壓力造成的聽不到的 心理變化的額外儀器量測指標。在前面提到的四種常用生 理變化(腦波圖形、心臟活動、皮膚導電係數及呼吸活動) 裡,呼吸活動和心臟活動會直接和間接影響發聲波形的振 幅和細節,並且提供更多整體心理壓力評估的基礎,尤其 在牽涉到連續聲音反應的測試中更明顯。 圖8是另一種裝置。如圖所示,轉換器800將對象的口 說發音聲波轉換成電氣訊號波形,連接至聲音放大器802 的輸入端。此放大器只是要將電氣訊號的功率放大到更穩 定、更實用的程度。放大器802的輸出連接至濾波器8〇4。 此濾波器主要是消除一些不要的低頻和雜訊成分。 訊號經過濾波之後,再連接至FM鑒頻器806,將與中心 頻率的偏差轉換成以振幅變動的訊號。變幅訊號再經過偵 測電路808的偵測,將訊號整流,產生由一序列半波脈衝 構成的訊號。偵測後再將訊號連接至積分電路81〇,將訊 號積分之所要的程度。電路810會將訊號積分至很小的範 4Hickman200021tw; AND1P115.TW 41 本紙張尺度適用中國國家標準(CNS ) A4規格(2丨〇><297公釐) 548631 五______ 經濟部智慧財產局員工消費合作社印製 A7 B7 發明説明(φι) 置1以產生波形,或者積分至更大的程度以產生訊號。訊號 經過積分以後,由放大器812放大,再連接至處理器814 判斷與聲音訊號相關的情緒。最後利用電腦螢幕或印表機 等輸出裝置816輸出偵測到的情緒。也可以選擇性輸出統 計資料。 圖9顯示根據本發明具體實施例,用以產生可見記錄之一 裝置的較簡單裝置,聲音訊號經過麥克風900轉換成電 氣訊號,以磁性作用錄製到錄音帶裝置902。再隨時以各 種速度利用後面的設備處理訊號,播放的訊號連接至傳統 的二極體904整流。整流後的訊號會連接至傳統的放大器 906以及908表示的選擇開關可動接點上。開關908的可 動接點可以切換到任一個固定接點,每一個接點都連接了 一顆電容器。圖9所示是一組四顆電容器910、912、914 及916,每一顆其中一腳都連接至開關的固定接點,另一 腳則接地。放大器906的輸出連接至處理器918。 此特定設備組合中使用的錄音機是具有內建擴大機的 Uher 4000型四速錄音裝置。電容器910-916的値分別是 0.5、3、10及50微法拉(ixF),放大器906的輸入阻抗約 爲10,000歐姆(l〇kQ )。我們在後面會看到,此裝置可 以,或者已經使用其他的各種元件。 圖9電路的作業中,經過二極體904整流的波形會積分至 4Hickman200021tw; AND1P115.TW 42 (請先聞讀背面之注意· 本頁) 裝· 本紙張尺度適用中國國家標準(CNS ) A4規格(210x297公釐)-, 1T This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 548631 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 __ B7________ Description of the invention (3 ^) This is the frequency tone that human hearing cannot hear Change, vocal cord sound and formant sound are more or less present. This signal is usually between 8 and 12 Hz, so the human ear cannot hear it. Because this characteristic constitutes frequency modulation, not amplitude modulation, it cannot be detected directly in the time base / amplitude map record. Because this sub-sound signal is one of the important sound indicators of psychological stress, it needs to be processed more deeply. There are many analogs that can be used to provide a graphical representation of the entire sounding program. And mechanical and electronic analogs have been adopted smoothly, such as designing computer voice. However, these analogs both see the acoustic signal source (vocal cord) and the cavity wall as hard and constant properties. However, the vocal cords and the main cavity walls that form the resonance peaks are actually elastic cellular tissues, which sensitively respond to all movements of the complex muscle groups that control the cellular tissues. These muscles that control the vocal cords using bones and connections allow people to intentionally and automatically make sounds and tonal vibrations. Similarly, controlling the muscles of the tongue, lips, and throat allows intentional and automatic control of the frequency of the first formant. The other formants are also affected, but to a lesser extent. What I noticed was that during normal speech, these muscle movements are only a small part of their total working ability. Therefore, even if these muscles are used to change the position of the vocal cords and the position of the lips, tongue, and inner throat wall, the muscles remain relatively relaxed. Studies have shown that in this relatively relaxed state, natural muscle ups and downs usually occur at the 8-12 Hz frequency mentioned earlier. This undulation causes a slight change in the vocal cord tension and also changes the fundamental tone frequency of the sound. 4Hickman200021tw; AND1P115.TW 39 This paper is again applicable to the Chinese National Standard (CNS) A4 (210 X 297 mm) 548631 A7 B7 Printed by the Consumers ’Cooperative of the Intellectual Property Bureau of the Ministry of Economy The capacity of the resonance cavity (especially the resonance cavity related to the first formant) and the elasticity of the cavity wall are also slightly changed, so that the formant frequency is changed. These changes in center frequency constitute a modulation of the center or carrier frequency. It is important to note that no matter what the fundamental tone frequency or formant frequency of the sound is, the listener cannot detect it, partly because the change is small, and partly because its frequency is mainly at the aforementioned unseen frequency Within range. In order to observe this frequency modulation, any of the many existing frequency modulation techniques can be used. Of course, it is important to remember that the modulation frequency is between 8-12 Hz, and the carrier is a frequency band in the sound spectrum. In order to fully understand the above description, we must first understand the concept of "mass center" of this waveform. We can judge the center point between the two ends of any stroke of the recorder. If the center points between the two ends of all the strokes are marked, and then the center points are connected into a continuous curve, it can be seen that a line is approximately equal to the average 値 or “center of mass” of the entire waveform. Connect these marks together and make a smooth correction to produce a smooth curve. This line represents the subsonic frequency modulation caused by the previously mentioned undulations. As mentioned above, the muscle groups associated with the vocal cords and the cavity wall are subject to mild muscle tone when mild or moderate psychological stress is applied in individual experiments. Subject cannot detect this tension, even the inspector, without assisted observation technology 4Hickman200021tw; AND1P115.TW 40 This paper size applies Chinese National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back first (Continued on this page) — installed. This-line 548631 A7 B7 printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. The invention description (M) cannot be noticed. However, this tension is sufficient to reduce or almost eliminate muscle undulations in an unstressed subject, thus eliminating the basis for carrier frequency changes that produce subsonic frequency modulation. Although only the technology that uses sound as a physiological medium for psychological stress assessment uses subsonic waveforms, but in the currently used technologies and devices that can detect physiological changes, sound does not provide inaudible psychology caused by psychological stress. Variations of additional instrument measurements. Among the four commonly-used physiological changes mentioned above (electroencephalogram, cardiac activity, skin conductivity, and respiratory activity), respiratory activity and cardiac activity directly and indirectly affect the amplitude and details of the vocal waveform and provide more overall psychology The basis of stress assessment is more apparent in tests involving continuous acoustic response. Figure 8 shows another device. As shown in the figure, the converter 800 converts an object's spoken sound wave into an electrical signal waveform and connects it to the input terminal of the sound amplifier 802. This amplifier just amplifies the power of electrical signals to a more stable and practical level. The output of the amplifier 802 is connected to a filter 804. This filter is mainly to eliminate some unwanted low frequency and noise components. After the signal is filtered, it is connected to the FM frequency discriminator 806 to convert the deviation from the center frequency into a signal that varies in amplitude. The amplitude-changing signal is then detected by the detection circuit 808 to rectify the signal to generate a signal composed of a series of half-wave pulses. After detection, connect the signal to the integration circuit 810 to integrate the signal to the required level. Circuit 810 will integrate the signal to a small range 4Hickman200021tw; AND1P115.TW 41 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (2 丨 〇 > < 297 mm) 548631 Five ______ Intellectual Property Bureau of the Ministry of Economic Affairs Printed by the employee consumer cooperative A7 B7 Description of the invention (φι) Set to 1 to generate a waveform, or integrate to a greater degree to generate a signal. After the signal is integrated, it is amplified by the amplifier 812 and then connected to the processor 814 to judge the emotions related to the sound signal. Finally, an output device 816 such as a computer screen or a printer is used to output the detected emotion. Statistics can also be selectively output. FIG. 9 shows a simpler device for generating a visible record according to a specific embodiment of the present invention. A sound signal is converted to an electrical signal by a microphone 900 and recorded to a tape device 902 by magnetic action. Then, the signal is processed by the back device at various speeds at any time, and the played signal is connected to the conventional diode 904 for rectification. The rectified signal is connected to the movable contact of the conventional selector 906 and 908 selector switch. The movable contact of the switch 908 can be switched to any fixed contact, and a capacitor is connected to each contact. Figure 9 shows a group of four capacitors 910, 912, 914, and 916, one of which is connected to the fixed contact of the switch and the other is grounded. The output of the amplifier 906 is connected to the processor 918. The recorder used in this particular combination of devices is a Uher 4000 four-speed recording device with a built-in amplifier. Capacitors 910-916 have 0.5, 3, 10, and 50 microfarads (ixF), respectively, and the input impedance of amplifier 906 is about 10,000 ohms (10 kQ). As we will see later, this device can be, or other various components have been used. In the operation of the circuit in Figure 9, the waveform rectified by the diode 904 will be integrated to 4Hickman200021tw; AND1P115.TW 42 (Please read the note on the back first; this page). The size of this paper applies to the Chinese National Standard (CNS) A4 specification. (210x297 mm)
548631 A7 B7 五、發明説明(斗3) 所要的程度,選取時間常數使頻率調變亞聲波結果顯示成 大致上沿著代表波形「質量中心」線緩慢變化的直流電平。 該特定圖中的行程相對較爲快速,表示開關是連接至値 較低的電容器。在本具體實施例中,複合濾波是由電容器 910、912、914或916完成,在減速播放時是由錄音機完 成。 具有總機回饋的電話作業 圖10說明本發明的一個具體實施例,其監測語音訊號中 的情緒,並且根據偵測到的情緒提供總機回饋。首先是在 作業1000接收至少兩位對象之間談話成分的語音訊號代 表。再於作業1002中判斷和語音訊號的相關情緒。最後 在作業1004裡根據判斷的情緒提供回饋給第三者。 談話可以透過電信網路進行,如果配合網際網路電話使用 的話,也可以透過廣域網路進行。可以選擇過濾情緒,唯 有判斷情緒屬於由生氣、悲傷及害怕等組成的負情緒群組 中選取的負情緒時,才需要提供回饋。也可以根據正情緒 或中性情緒作相同的設定。可以依照先前的詳細說明,從 語音訊號中摘取特性以判斷情緒。 本發明特別適合與緊急回應系統有關的作業,例如911系 統。這類系統中可以使用本發明監聽來電。接聽電話的技 4Hickman200021tw; AND1P115.TW 43 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) (請先閱讀背面之注意事項再綱寫本頁)548631 A7 B7 V. Description of the invention (Battle 3) The degree of time required is selected. The time constant is selected so that the frequency-modulated subsonic result is displayed as a DC level that changes slowly along the line representing the "center of mass" of the waveform. The travel in this particular diagram is relatively fast, indicating that the switch is connected to a lower capacitor. In this specific embodiment, the composite filtering is performed by the capacitor 910, 912, 914, or 916, and is performed by the recorder during deceleration playback. Telephone Job with Switchboard Feedback FIG. 10 illustrates a specific embodiment of the present invention that monitors emotions in a voice signal and provides switchboard feedback based on the detected emotions. The first is to receive a voice signal representative of the conversation component between at least two subjects at home 1000. Then, the emotion related to the voice signal is determined in operation 1002. Finally, in the assignment 1004, a feedback is provided to a third party according to the judged emotion. The conversation can take place over a telecommunications network or, if used with an Internet phone, over a wide area network. You can choose to filter your emotions. You only need to provide feedback if you judge that the emotions belong to the negative emotions selected from the negative emotion group consisting of anger, sadness, and fear. The same setting can be made based on positive or neutral emotions. The emotion can be judged by extracting characteristics from the voice signal according to the previous detailed description. The invention is particularly suitable for operations related to emergency response systems, such as the 911 system. The invention can be used in such systems to monitor incoming calls. Call answering technology 4Hickman200021tw; AND1P115.TW 43 This paper size is applicable to China National Standard (CNS) Α4 size (210X297 mm) (Please read the precautions on the back before writing this page)
Ml. 、τ 經濟部智慧財產局員工消費合作社印製 548631 經濟部智慧財產局員工消費合作社印製 A7 B7五、發明説明(w) 術人員可以在與發話方交談時判斷發話方的情緒。然後再 將情緒透過無線電波傳送給緊急應變小組,也就是警察' 消防及/或救護人員,使其了解發話方的情緒狀態。 另外一種情景是,對象之一爲客戶,另一名對象是員工’ 例如電話中心或客戶服務部雇用的員工,第三者則是主 管。在此例中,可以使用本發明監聽客戶和員工之間的談 話,藉以判斷客戶及/或員工是否感到不耐煩。一旦偵測 到負面情緒,便會傳送回饋給主管,主管再評估情況’在 必要時介入。 改進情緒辨識 圖11說明本發明的一具體實施例,其比較使用者和電腦 語音訊號情緒偵測以改進發明、使用者或兩者情緒辨_ ° 首先是在作業1100提供語音訊號及和語音訊號相關的情 緒。作業1102會以前面介紹過的方法自動判斷語音訊號 的相關情緒。自動判斷的情緒會在作業1104中儲存’例 如儲存到電腦可讀取的媒體上。作業1106會接收和使用 者決定的語音訊號相關的使用者判斷情緒。作業11〇8胃 將自動判斷的情緒和使用者判斷的情緒作一比較。 本發明可以發出或接收語音訊號。選擇性根據提供的情緒 確認語音訊號中的相關情緒。此時應判斷自動判斷的情緒 或使用者判斷的情緒是否和已確認的情緒相符。如果使用Ml., Τ Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 V. Description of Invention (w) A technician can judge the emotion of the caller when talking with the caller. The emotions are then transmitted via radio waves to the emergency response team, the police 'fire and / or ambulance personnel, to make them aware of the emotional state of the caller. Another scenario is that one of the objects is a customer, the other is an employee ’such as an employee employed by a call center or customer service department, and a third party is the supervisor. In this example, the present invention can be used to monitor conversations between customers and employees to determine whether the customers and / or employees are impatient. Once negative emotions are detected, they are sent back to the supervisor, who then evaluates the situation ’and intervenes if necessary. Improve emotion recognition FIG. 11 illustrates a specific embodiment of the present invention, which compares the user and computer voice signal emotion detection to improve the emotion recognition of the invention, the user, or both. ° First, the voice signal and the voice signal are provided in operation 1100. Related emotions. Assignment 1102 will automatically determine the emotions associated with the voice signal using the methods previously described. Automatically judged emotions are stored ' in homework 1104, for example, on a computer-readable medium. Assignment 1106 receives user judgment emotions related to the voice signal determined by the user. Assignment 110: Stomach Compare the emotion judged automatically by the user. The invention can send or receive voice signals. Optionally confirm the relevant emotions in the voice signal based on the emotions provided. At this time, it should be judged whether the emotion judged automatically or the emotion judged by the user is consistent with the confirmed emotion. If used
4Hickman200021tw; AND1P115.TW 44 請 先 閱 背 面 之 注 意 事 項4Hickman200021tw; AND1P115.TW 44 Please read the notes on the back first
訂 本紙張尺度適用中國國家標準(CNS ) A4規格(2ΙΟ'〆297公釐) 548631 A7 B7 五、發明説明U5) (請先閱讀背面之注意事項再填寫本頁) 者判斷的情緒和已確認的情緒相符,這位使用者應該要得 獎。此外,也可以摘取語音訊號中至少一項特性,自動判 斷其中的情緒,如同前面介紹的情形。 爲了協助使用者辨識情緒,可以根據本發明的具體實施例玩 一種情緒辨識遊戲。遊戲可以讓使用者和電腦或其他人比 賽,看看誰能最準確地辨識談話錄音中的情緒。遊戲的實際 應用之一是協助孤僻的人藉由辨識談話中的情緒,開發較好 的情緒技巧。 經濟部智慧財產局員工消費合作社印製 根據本發明的具體實施例,可以使用一項裝置建立能用於 改進情緒辨識的語音訊號相關資料。在這樣的一個具體實 施例中,裝置會透過麥克風或錄音機等轉換器接受語音。 實際聲波轉換爲電氣訊號後,平行傳送至一組涵蓋聲音頻 率範圍的巾售標準滤波器。將最低灑波器的中心頻率設定 在能讓包含最低聲音頻率訊號的語音訊號振幅電氣能量代 表通過的任何値,將會建立後續濾波器直到最後一個傳送 能量,通常是8kHz至16kHz之間或10kHz至20kHz之 間,的所有濾波器的中心値,並且決定濾波器的確實數目。 第一個濾波器的中心頻率特定値並不重要,只要能捕捉到 人類聲音的最低聲調,約70Hz即可。基本上任何市售的 濾波器組,只要能和市售的數位器和微電腦匹配,都可以 使用。說明書段落說明於較佳具體實施例中特定的中心頻 率和微處理器集。本說明書中所揭露的加強演算也會將濾 4Hickman200021tw; AND1P115.TW 45 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) ' 548631 A7 B7 五、發明説明(认) ^ ^ 波窃的平均品質換成可接受的頻率和振幅値,因此濾波器 的品質也不是特別重要。當然,算出中心頻率以後是以1/3 的比例定義所有濾波器的頻寬。 在濾波器的分割處理後,再以市售的數位器將或多工器和 數位器將濾波器輸出電壓數位化,在所揭露的較佳具體實 施例的情況中,將數位器與確認的市售濾波器組裝在一起 以免除連接電路和硬體。數位器的轉換或識別速度品質也 不是很重要,因爲此應用的修正演算(參考規格)和取樣 率需求都很低,目前市售數位器的平均品質都高於此需 求。 經濟部智慧財產局員工消費合作社印製 載送不斷變化資訊的任何複雜聲音都可加以減化,取其近 似値,方法是捕捉訊號的峰値頻率和振幅,減少資訊的位 元數。當然,在話語訊號上執行這種作業已經是舊的知識。 不過,在話語的硏究中,這類峰値發生的許多特定區域往 往歸類爲「共振峰」區域。但是,這些區域近似値不一定 都和每一位說話者在所有環境下的峰値一致。話語硏究人 員和以前的發明技術都偏重於量測落在標準共振峰頻率區 域內的峰値,並且稱其爲「正確的」峰値,似乎他們的定 義裡沒有估計値,只有絕對値。這種現象使得許多的硏究 和共振峰量測裝置以人工方式排除了即時適當表現複雜和 大幅變化聲波所需的許多適當峰値。由於本討論旨在適用 於動物聲音和所有的人類語言,因此並不打算討論共振峰 4Hickman200021tw; AND1P115.TW 46 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 548631 A7 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明说明(w) 等的人工限制,我們會將聲波當成可以分析任何聲音的複 雜和高度變化的聲波處理。 爲了將峰値辨識作業標準化及簡單化,不考慮濾波器頻 寬、品質及數位器辨別的變化,實際儲存的振幅和頻率値 都是「代表値」。所以高頻濾波器的頻寬數字表現和低頻 濾波器頻寬一樣。每一個濾波器都指定了 1到25的連續 値,聲音大小則以1到40的數字來區分,以便在CRT螢 幕畫面上顯示。如果峰値濾波器右邊的濾波器輸出振幅大 於左邊濾波器的輸出,只要將濾波器的數字調高到接近 下一個整數値的十進位値,就可以建立頻率代表値的關 聯。此演算之一較佳具體實施例的細節說明於此揭露的說 明書中,這個校正程序必須在壓縮程序之前,還能夠取得 所有濾波器的振幅値之前進行。 較佳的具體實施例並不會放慢取樣率,而是在此校正與壓 縮處理前儲存約10至15秒話語樣本的所有濾波器振幅 値,取樣率爲每秒10至15個樣本。如果電腦記憶體空間 的考量比掃描速度還重要,可以在每一次掃描之間進行校 正和壓縮,避免大量的資料儲存記憶體需求。由於目前市 面上一般普通價位的小型電腦都已配備了足夠的記憶體, 因此較佳的和此處討論的具體實施例可以儲存所有的資料 並在以後處理資料。 4Hickman200021tw; AND1P115.TW 47 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閲讀背面之注意事項再本頁) 訂 線· 548631 A7 ___B7____ 五、發明説明(斗?) 我們討論的大部分語言動物訊號包括了包含不大可能在頻 率領域任一端的最大振幅峰値的人類。利用如本發明使用 的任何一般簡單的數字排序演算就可以判斷此峰値。接著 再將振幅和頻率代表値放到保存六個峰値振幅和頻率的記 憶體位置組中的第三組位置。 8kHz以上的最高頻率峰値是放在第六個記憶體位置內, 並且標示爲高頻峰値。最低的峰値放在第一組記憶體內。 再從這兩者之間選擇另外三組峰値。經過此壓縮處理之 後,變成以六個峰値每一個的振幅和頻率代表値加上一段 時間樣本內,例如每秒十次,總共十秒的樣本,未濾波 的總訊號能量振幅來表示語音訊號。總計可以提供13〇〇 個値。 經濟部智慧財產局員工消費合作社印製 假使操作員因爲遇到不可預期的雜音干擾,利用停止修改 開關修改樣本長度開關以停止作業,演算系統可以允許樣 本長度的變化。演算系統是利用對聲音訊號後四或五秒的 樣本數目變化不會太敏感的平均値來容許此變化。可能的 話,要採用較大的話語樣本,因爲較大的話語樣本可以捕 捉說話者說話的「風格」,通常在1〇至15秒內的話語就 可明顯表現此風格。 此壓縮功能的輸出會傳送至元素組合和形成下列各項內容 的儲存體演算系統:(a)接下來要討論的四個語音品質 4Hickman200021tw; AND1P115.TW 48 本紙張又度適用中國國家標準(CNS ) A4規格(210X297公釐) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明説明Kl) 値;(b)聲音的「停頓」或有聲無聲比:(c)「變化性」--現 行掃描和最後一次掃描的每一個峰値振幅之間的差異;(d) 取得掃描間第二次大於0.4的峰値變化和聲音掃描總數的 次數比例以得出「音節變化近似値」;及(e)「高頻分析」 -第六個峰値振幅中包含非零値的有聲掃描的數目比 例。每次掃描總共會有20個元素。然後再將這些內容傳 送至範圍組合演算系統。 作爲元素的四個語音品質値分別是(1)「散佈範圍」-所 有掃描在最大峰値振幅之上的頻率代表平均値和之下的平 均値之間差異的平均樣本値,(2)「平衡」―峰値4、5及 6的平均振幅除以峰値1和2平均値所得的平均樣本値’ (3)「包絡平滑性高値」-振幅高於最大峰値的所有掃描 平均値除以最大峰値所得的平均樣本値,(4)「包絡平滑 性低値」-振幅低於最大峰値的所有掃描平均値除以最 大峰値所得的。 聲音風格範圍分爲「共鳴」和「音質」,利用演算系統以 作用於選取元素的係數矩陣組合而成。 「說話風格」範圍分爲「變化-單調」、「起伏-平穩」、「斷 續-持續」、「激昂-柔和」、「情感-抑制」。這五種範圍的名 稱分別代表各範圍的兩極風格’是利用演算系統以作用於 20個聲音元素中15個元素的係數矩陣演算系統評量及組 4Hickman200021tw; AND1PH5.TW 49 本紙張尺度適用中國國家標準(CNS ) A4規格(X 297公董) 548631 A7 B7___ 五、發明説明(5D) 合,詳細資料請參考表6和規格部分。 知覺風格範圍分成「生態-結構」、「不變的敏感性」、「其 他-自我」、「知覺-內部」、「恨-愛」、「獨立-依賴」以及「感 性的-實際的」。這七種知覺範圍的名稱分別代表範圍的兩 極,是利用演算系統以作用於聲音和說話中選取的聲音元 素的係數矩陣評量及組合,詳細資料請參考表7和規格部 分。 市售的標準電腦鍵盤或小鍵盤可以讓本發明的使用者改變 所有的係數,重新定義任何組合的話語、語音或知覺範圍 以便硏究。選擇開關可以選擇爲指定的對象聲音樣本顯示 任何或全部的元素或範圍値。數位處理器負責控制聲音訊 號的類比至數位轉換,同時也控制將語音元素重新組合爲 語音和話語、知覺等範圍的數字値。 經濟部智慧財產局員工消費合作社印製 麥克風也是和組合語音、話語及知覺範圍的演算系統互 動,其重要性不亞於操作者的小鍵盤輸入和値的選取輸出 畫面以及係數矩陣選擇。輸出選擇開關只是將輸出導向適 合將訊號傳送至市售標準顯示器、數據機、印表機或預設 的內建發光讀出裝置陣列的任一個或全部的輸出插座。 硏究人員可以使用此發明發展出群組設定檔標準,根據職 業、機能不良、作業、興趣嗜好、文化、語言、性別、年 齡、動物種類等,在出版物中列出硏究結果。或者,使用 4Hickman200021tw; AND1P115.TW 50 本紙張尺度適用中國國家標準(CNS ) A4規格(21〇X 297公釐) 548631 經濟部智慧財產局員工消費合作社印製 A7 B7五、發明説明(g\) 者可以拿自己的値和其他人發行或機器中內建的値作比 較。 現在請參考圖,利用麥克風1210將所說的話語輸入語 音分析儀,經過麥克風放大器1211放大訊號,或者從磁 帶輸入插座1212輸入錄音,使用預錄的話語發音輸入。 輸入電平控制器1213可以調整送到濾波器驅動放大器 1214的語音訊號電平。濾波器驅動放大器1214會放大訊 號,並將訊號套用至V.U.電表1215以量測正確的作業訊 號電平。 每秒掃描率和每一樣本掃描數是由操作員利用掃描率和樣 本時間開關1216控制。操作員利用樣本開始開關和停止 修訂1217開始取樣。修訂功能可以讓操作員以手動方式 修訂設定的取樣時間及停止取樣,以防止意外的聲音干 擾,包括同時說話的人,污染了樣本。此開關也負責連接 及切斷微處理器接到110伏電源輸入接腳的電源。 濾波器驅動放大器1214的輸出也會套用至市售的微處理 器控制濾波器組和數位器1218,此裝置會將取樣組織的 音頻範圍電氣訊號分割成1/3八度音程,並且將每一濾波 器的電壓輸出數位化。在本發明的一特定工作具體實施例 中,使用Eventide頻譜分析儀的25個1/3八度音程濾波 器,濾波器的中心頻率範圍從63 HZ至16,000 HZ。另外 4Hickman200021tw; AND1P115.TW 51 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 548631 A7 B7 五、發明説明(sq 還使用了 AKAI麥克風和具有內建擴大機的錄音機作爲濾 波器組和數位器1218的輸入。濾波器組使用的每秒掃描 數約爲每秒掃描十次。其他微處理器控制的濾波器組和數 位器的工作速度可能不一樣。 巾面上有許多微處理器都適於控制前述的濾波器組和數位 器。 不論何種複合聲音,在1/10秒時間片段內音頻範圍內的 振幅都不可能呈穩定或平坦的情形,一定會有波峰和波 谷。1219是此訊號波峰的頻率代表値,記下波峰兩邊的 振幅値,並且往振幅較大的鄰接濾波器値調整其峰値,可 以使其更準確。因爲鄰接1/3八度音程濾波器的特性,特 定頻率的能量會根據各濾波器的截止特性,部分進入鄰接 濾波器。爲了減少此效應,唯有兩個鄰接濾波器的振幅在 其平均値範圍10%以內時,才會將峰値濾波器的頻率當 成中心頻率。爲了確保相等的分離間隔,代表不相等頻率 間隔的値線性化及正常化的小値,25個濾波器每個都指 定了 1到25的一個數目値,在處理程序的其餘步驟使用。 因此,濾波器24和25之間的3,500 HZ差異値就變成1, 也等於第一和第二濾波器之間的17 HZ差異。 爲了防止每一濾波器編號出現五個以上的小區塊,並且在 1到25的濾波器編號的每一個小區塊間維持等値的調距, 4Hickman200021tw; AND1P115.TW 52 (請先閱讀背面之注意事項再填寫本頁}The paper size of the edition applies to the Chinese National Standard (CNS) A4 specification (2Ι 10'〆297 mm) 548631 A7 B7 V. Description of the invention U5) (Please read the notes on the back before filling this page) Matches the mood, this user should win the prize. In addition, you can extract at least one of the characteristics of the voice signal and automatically determine the emotion in it, as in the case described earlier. To assist users in identifying emotions, an emotion recognition game may be played according to a specific embodiment of the present invention. Games allow users to compete with computers or other people to see who can best identify emotions in conversation recordings. One of the practical applications of the game is to help lonely people develop better emotional skills by identifying emotions in conversation. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs According to a specific embodiment of the present invention, a device can be used to create voice signal related data that can be used to improve emotion recognition. In such a specific embodiment, the device accepts voice through a microphone or a voice recorder. After the actual sound waves are converted into electrical signals, they are transmitted in parallel to a set of standard filters covering the sound frequency range. Setting the center frequency of the lowest sprinkler to any chirp that allows the electrical energy of the voice signal amplitude containing the lowest sound frequency signal to pass through will create a subsequent filter until the last transmitted energy, usually between 8kHz to 16kHz or 10kHz Between 20kHz, the center of all filters is chirped and determines the exact number of filters. The center frequency of the first filter does not matter, as long as it can capture the lowest tone of human voice, about 70Hz. Basically any commercially available filter bank can be used as long as it can be matched with a commercially available digital tablet and microcomputer. The paragraphs of the description describe specific center frequencies and microprocessor sets in the preferred embodiment. The enhanced calculations disclosed in this manual will also filter 4Hickman200021tw; AND1P115.TW 45 This paper size applies Chinese National Standard (CNS) A4 specifications (210X297 mm) '548631 A7 B7 V. Description of the invention (recognition) ^ ^ The average quality is replaced by an acceptable frequency and amplitude, so the quality of the filter is not particularly important. Of course, after calculating the center frequency, the bandwidth of all filters is defined by a ratio of 1/3. After the filter is divided, a commercially available digitizer or multiplexer and digitizer are used to digitize the output voltage of the filter. In the case of the disclosed preferred embodiment, the digitizer and the confirmed Commercial filters are assembled together to eliminate connection circuitry and hardware. The quality of the conversion or recognition speed of the digitizer is not very important, because the correction calculation (reference specification) and sampling rate requirements of this application are very low. The average quality of digitizers currently on the market is higher than this requirement. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs Any complex sound that carries changing information can be reduced to a similar value by capturing the peak frequency and amplitude of the signal to reduce the number of bits of information. Of course, it is old knowledge to perform such operations on discourse signals. However, in the study of discourse, many specific regions where such peaks occur are often classified as "formant" regions. However, these area approximations are not necessarily consistent with the peaks of each speaker in all circumstances. Discourse researchers and previous inventions have focused on measuring peaks that fall within the frequency range of standard formants and calling them "correct" peaks. It seems that there is no estimate in their definitions, only absolutes. This phenomenon has led many research and formant measurement devices to manually eliminate many of the appropriate peaks required to instantly and appropriately represent complex and widely varying sound waves. Since this discussion is intended to be applicable to animal sounds and all human languages, it is not intended to discuss formants 4Hickman200021tw; AND1P115.TW 46 This paper size applies Chinese National Standard (CNS) A4 specifications (210X297 mm) 548631 A7 A7 B7 Economy Printed by the Ministry of Intellectual Property Bureau's Consumer Cooperative Cooperative, V. Invention Description (w) and other artificial restrictions, we will treat sound waves as sound waves that can analyze the complexity and high variation of any sound. In order to standardize and simplify the peak chirp identification operation, regardless of filter bandwidth, quality, and changes in digitizer discrimination, the actual stored amplitudes and frequencies 値 are "representative 値". Therefore, the digital representation of the bandwidth of the high-frequency filter is the same as the bandwidth of the low-frequency filter. Each filter is assigned a continuous 値 from 1 to 25, and the sound size is distinguished by a number from 1 to 40 for display on the CRT screen. If the output amplitude of the filter on the right of the peak chirp filter is greater than the output of the filter on the left, as long as the number of the filter is increased close to the decimal 値 of the next integer 値, the association of the frequency representative 値 can be established. The details of one of the preferred embodiments of this calculation are described in the specification disclosed here. This correction procedure must be performed before the compression procedure and before the amplitudes of all filters can be obtained. The preferred embodiment does not slow down the sampling rate, but stores all filter amplitudes 値 of the speech samples for about 10 to 15 seconds before the correction and compression processing, and the sampling rate is 10 to 15 samples per second. If the consideration of computer memory space is more important than the scanning speed, you can correct and compress between each scan to avoid a large amount of data storage memory requirements. Since small computers of ordinary price on the market are currently equipped with sufficient memory, the preferred and specific embodiments discussed herein can store all data and process the data later. 4Hickman200021tw; AND1P115.TW 47 This paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before this page) Threading · 548631 A7 ___B7____ V. Invention Description (Do?) Us Most of the verbal animal signals discussed include humans that contain the largest amplitude peaks that are unlikely to be at either end of the frequency domain. This peak can be judged by any generally simple numerical ordering algorithm as used in the present invention. Amplitude and frequency representatives are then placed in the third group of memory position groups that store the six peaks' amplitude and frequency. The highest frequency peaks above 8kHz are placed in the sixth memory location and marked as high frequency peaks. The lowest peak is placed in the first group of memories. Then choose another three groups of peaks from these two. After this compression processing, it becomes six peaks, each of which is represented by the amplitude and frequency, plus a period of time samples, such as ten times per second, a total of ten seconds of samples, and the unfiltered total signal energy amplitude to represent the voice signal . A total of 13,000 can be provided. Printed by the Consumer Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs. If the operator encounters unexpected noise interference, use the stop modification switch to modify the sample length switch to stop the operation. The calculation system can allow the sample length to change. The calculus system uses an average chirp that is not too sensitive to changes in the number of samples four or five seconds after the sound signal to allow this change. If possible, use a larger utterance sample, because the larger utterance sample captures the “style” of the speaker, and this style is usually evident in words within 10 to 15 seconds. The output of this compression function is sent to the storage algorithm system that combines the elements and forms the following: (a) The four voice qualities to be discussed next 4Hickman200021tw; AND1P115.TW 48 This paper is again applicable to the Chinese National Standard (CNS ) A4 specifications (210X297 mm) 548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of invention Kl) 値; (b) "Pause" or voiceless ratio of sound: (c) "Variability"- -The difference between the amplitude of each peak chirp between the current scan and the last scan; (d) Obtaining the ratio of the number of peak chirp changes greater than 0.4 between scans to the total number of sound scans to obtain "approximate syllable change"; And (e) "High Frequency Analysis"-the ratio of the number of audible scans with a non-zero chirp in the sixth chirp amplitude. There will be a total of 20 elements per scan. These contents are then transmitted to the range combination calculation system. The four voice qualities as elements are (1) the "spread range"-the average sample of the frequency between all scans above the maximum peak 値 amplitude representing the average 値 and the average 之下 below, (2) " "Balance"-the average sample obtained by dividing the average amplitude of peaks 4, 5, and 6 by the average of peaks 1 and 2 "(3)" Envelope smoothness "-the average of all scans with amplitudes above the maximum peak Taking the average sample of the maximum peak value, (4) "low envelope smoothness"-the average of all scans with amplitudes lower than the maximum peak value, divided by the maximum peak value. The range of sound styles is divided into "resonance" and "sound quality", which are combined by a calculation system using a matrix of coefficients acting on selected elements. The range of "speaking style" is divided into "change-monotone", "undulation-steady", "intermittent-continuous", "excited-soft" and "emotion-inhibited". The names of these five ranges respectively represent the bipolar style of each range. 'It is a calculation using the calculation system to act on a coefficient matrix calculation system of 15 elements out of 20 sound elements. 4Hickman200021tw; AND1PH5.TW 49 This paper scale applies to China Standard (CNS) A4 specification (X 297 public director) 548631 A7 B7___ 5. Description of invention (5D), please refer to Table 6 and specifications for details. The range of perceptual style is divided into "ecological-structure", "constant sensitivity", "other-self", "perceive-internal", "hate-love", "independent-dependent" and "perceptual-actual". The names of these seven perceptual ranges represent the poles of the range, respectively. They are calculated and combined by the calculation system to act on the coefficient matrix of the selected sound elements in speech and speech. For details, please refer to Table 7 and the specifications section. Commercially available standard computer keyboards or keypads allow users of the present invention to change all coefficients and redefine any combination of utterance, speech, or perceptual ranges for investigation. The selection switch can select to display any or all elements or ranges of the specified object sound sample. The digital processor is responsible for controlling the analog-to-digital conversion of the sound signal, and it also controls the recombination of speech elements into a range of speech, utterance, and perception digital 値. Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs. The microphone also interacts with a calculation system that combines voice, utterance, and perception range. Its importance is no less important than the operator's keypad input and the selection of the output screen and coefficient matrix selection. The output selector switch simply directs the output to any or all output sockets suitable for transmitting signals to commercially available standard displays, modems, printers, or a preset array of built-in light-emitting readout devices. Researchers can use this invention to develop group profile standards that list research results in publications based on occupation, dysfunction, homework, hobbies, culture, language, gender, age, animal species, and more. Alternatively, use 4Hickman200021tw; AND1P115.TW 50 This paper size applies Chinese National Standard (CNS) A4 specification (21 × X 297 mm) 548631 Printed by A7 B7, Consumer Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of invention (g \) You can compare your own puppet with other puppets issued by others or built into the machine. Now referring to the figure, use the microphone 1210 to input the speech into the speech analyzer, amplify the signal through the microphone amplifier 1211, or input the recording from the tape input socket 1212, using the pre-recorded speech pronunciation input. The input level controller 1213 can adjust the level of the voice signal sent to the filter drive amplifier 1214. The filter driver amplifier 1214 amplifies the signal and applies the signal to a V.U. meter 1215 to measure the correct operating signal level. The scan rate per second and the number of scans per sample are controlled by the operator using the scan rate and sample time switch 1216. The operator uses the sample start switch to stop and stop revision 1217 to start sampling. The revision function allows the operator to manually modify the set sampling time and stop sampling to prevent accidental sound interference, including simultaneous speakers, contaminating the sample. This switch is also responsible for connecting and disconnecting the microprocessor from the 110 volt power input pin. The output of the filter driver amplifier 1214 is also applied to a commercially available microprocessor-controlled filter bank and digitizer 1218. This device divides the audio signal of the sampling organization's audio range into 1/3 octaves, and divides each The voltage output of the filter is digitized. In a specific working embodiment of the present invention, 25 1/3 octave filters of the Eventide spectrum analyzer are used, and the center frequency of the filter ranges from 63 HZ to 16,000 HZ. In addition 4Hickman200021tw; AND1P115.TW 51 This paper size is applicable to Chinese National Standard (CNS) A4 specification (210X297 mm) 548631 A7 B7 5. Invention description (sq also uses AKAI microphone and recorder with built-in amplifier as filter banks And the input of digitizer 1218. The scan rate per second used by the filter bank is about ten scans per second. Other microprocessor-controlled filter banks and digitizers may work at different speeds. There are many micro-processing on the towel surface All the filters are suitable for controlling the aforementioned filter banks and digitizers. Regardless of the composite sound, the amplitude in the audio range in the 1/10 second time segment cannot be stable or flat, and there must be peaks and valleys. 1219 is the frequency representative of the peak of this signal, write down the amplitude 値 on both sides of the peak, and adjust its peak 往 to the adjacent filter with larger amplitude, which can make it more accurate. Because of the adjacent 1/3 octave filter Characteristic, the energy of a specific frequency will partially enter the adjacent filter according to the cutoff characteristics of each filter. In order to reduce this effect, there are only two adjacent filters The peak chirp filter's frequency will be regarded as the center frequency only when the amplitude is within 10% of its average chirp range. In order to ensure equal separation intervals, the linearization and normalization of chirps representing unequal frequency intervals, 25 filters The filters each specify a number of 1 to 25, which is used in the remaining steps of the processing program. Therefore, the 3,500 HZ difference between filters 24 and 25 becomes 1, which is also equal to that of the first and second filters. 17 HZ difference between each. In order to prevent more than five small blocks from appearing in each filter number, and to maintain an equal pitch between each small block of filter numbers 1 to 25, 4Hickman200021tw; AND1P115.TW 52 ( Please read the notes on the back before filling out this page}
、11 經濟部智慧財產局員工消費合作社印製 本紙張尺度適用中國國家標準(CNS ) A4規格(21〇><297公釐), 11 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs This paper size is applicable to the Chinese National Standard (CNS) A4 specification (21〇 > < 297 mm)
548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明説明(53) 這些小區塊都分成許多〇.2的調距’而且進一步指定如 下。如果兩個鄰接濾波器與峰値濾波器的振幅差大於其平 均値的30%,峰値濾波器的編號距離下一個濾波器編號 的中間點比距離峰値濾波器還近。如果鄰接濾波器分別代 表較高或較低的頻率,這會使峰値濾波器的濾波器編號, 例如濾波器編號6.0,增加爲6.4或減少爲5.6。如果較大 的鄰接濾波器振幅分別代表較高或較低的頻率,會自動將 其他所有峰値濾波器的濾波器編號値+2或-2。 經過前述頻率修正1220以後,會將已分段及數位化表現 的口語發音1219壓縮,只儲存六個振幅峰値,其餘峰値 一律捨棄。發明人發現,只要注意到下列的特性,六個峰 値已經足夠捕捉風格特性。至少有一個波峰接近基頻;基 頻和峰値振幅頻率之間的區域只能有一個波峰,最接近最 大峰値的峰値保留;而且儲存在最大峰値之上的前兩個峰 値加上最靠近16,000 HZ端或在8 kHz以上的第25個濾 波器,微處理器記憶體裡總共儲存六個峰値。這可以確保 最大波峰永遠是記憶體中儲存的第三個波峰,同時儲存的 第六個波峰可以用於高頻分析,而且第一個波峰是最低也 最靠近基頻。 訊號經過壓縮以包含全頻帶振幅値、六個波峰的濾波器編 號及這十三個値每個各十個樣本,總共10秒的樣本(1300 個値)之後,圖12的1221開始組合聲音元素。 4Hickman200021tw; AND1P115.TW 53 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 548631 A7 B7 五、發明説明(5{) 本發明利用口語發音中的低頻集和高頻集之間的關係提供 口語發音品質元素。另一方面,比較停頓與衰退率等與聲 音能量事件相關的量測組合,決定說話風格的元素。這些 說話風格品質元素是從圖13的133〇、1331及1332的頻 譜分析得出。說話風格元也可以從圖12的1233、1234、 1235及1236和表\^所示的其他四個分析功能得出。 儲存的聲音®格品質分析元素的命名和衍生情形如下:(1) 頻譜「分佈」-每次掃描中在最大波峰之上的峰値濾波 器平均數和在最大波峰之下的峰値濾波器平均數之間濾波 器數目距離的樣本平均値,圖13,1330 ; (2)頻譜的能量 「平衡」一所有掃描在最大峰値以上的波峰振幅總和與 最大峰値以下的振幅總和比例的樣本平均數,1331; (3)頻 譜包絡「平坦性」-每一次掃描中在最大峰値以上的波 峰平均振幅(高)與最大峰値的比例,以及在最大峰値以 下的波峰振幅(低)與最大峰値的比例,每一樣本中這兩 種比例的計算平均數,1332。 儲存的說話風格元素分別命名及衍生如下:(1)頻譜變異 性一 一次掃描中每一個峰値濾波器編號與下一次掃描中 每一個對應的峰値濾波器編號之間的數値差異的一次說話 樣本的六個平均數,以及這六個峰値的六個振幅値差異, 再包括每次掃描的全頻譜振幅差異,產生13個平均樣本 總數,1333 ; (2)發音停頓比例分析一全能量振幅停頓 4Hickman200021tw; AND 1P115 .TW 54 本紙張尺度適用中國國家標準(CNS ) A4規格(X297公釐) (請先閱讀背面之注意事項再本頁)548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the Invention (53) These small blocks are divided into a number of 0.2 pitches and are further specified as follows. If the amplitude difference between two adjacent filters and the peak chirp filter is greater than 30% of their average chirp, the number of the peak chirp filter is closer to the middle point of the next filter number than the peak chirp filter. If the adjacent filters represent higher or lower frequencies, respectively, this will increase the filter number of the peak-to-peak filter, such as filter number 6.0, to 6.4 or decrease to 5.6. If the larger adjacent filter amplitudes represent higher or lower frequencies, respectively, all other peaks will automatically have the filter number 値 +2 or -2 of the filter. After the aforementioned frequency correction 1220, the segmented and digitized spoken pronunciation 1219 is compressed, and only six amplitude peaks are stored, and the remaining peaks are discarded. The inventors found that as long as the following characteristics are noted, the six peaks 値 are sufficient to capture the style characteristics. At least one peak is close to the fundamental frequency; there can be only one peak in the area between the fundamental frequency and the peak frequency; the peak closest to the largest peak is retained; and the first two peaks stored above the largest peak are added At the 25th filter closest to the 16,000 HZ end or above 8 kHz, a total of six peaks are stored in the microprocessor memory. This ensures that the maximum peak is always the third peak stored in memory, the sixth peak stored at the same time can be used for high-frequency analysis, and the first peak is the lowest and closest to the fundamental frequency. After the signal is compressed to include the full-band amplitude 値, the filter number of the six peaks, and the thirteen 値 each of ten samples, a total of 10 seconds of samples (1300 値), the 1221 of Figure 12 begins to combine sound elements . 4Hickman200021tw; AND1P115.TW 53 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 548631 A7 B7 V. Description of the invention (5 {) The present invention uses the low frequency set and high frequency set in spoken pronunciation. Relationships provide elements of spoken pronunciation quality. On the other hand, comparisons of measurement combinations related to sound energy events such as pauses and decay rates determine elements of speech style. These speech style quality elements are obtained from the spectrum analysis of 1330, 1331, and 1332 in FIG. 13. Speaking style elements can also be derived from 1233, 1234, 1235, and 1236 in Figure 12 and the other four analysis functions shown in Table \ ^. The names of the stored sound quality analysis elements and their derivation are as follows: (1) Spectrum "distribution"-the average number of peak chirp filters above the maximum peak and the peak chirp filters below the maximum peak in each scan Sample average 値 of the number of filters between averages, Figure 13, 1330; (2) Spectral energy "balance"-a sample of the ratio of the sum of the sum of the amplitudes of the peaks above the maximum peak 与 to the sum of the amplitudes below the maximum 値Average, 1331; (3) Spectral envelope "flatness"-the ratio of the peak average amplitude (high) to the maximum peak value above the maximum peak value in each scan, and the peak amplitude (low) below the maximum peak value The ratio to the maximum peak 値, the calculated average of these two ratios in each sample, 1332. The stored speech style elements are named and derived as follows: (1) Spectrum variability The number difference between each peak 値 filter number in one scan and each corresponding peak 値 filter number in the next scan The six averages of a spoken sample and the six amplitudes of the six peaks, and the full-spectrum amplitude difference of each scan, yielding a total of 13 average samples, 1333; Full energy amplitude pause 4Hickman200021tw; AND 1P115 .TW 54 This paper size is applicable to China National Standard (CNS) A4 size (X297 mm) (Please read the precautions on the back before this page)
、1T 經濟部智慧財產局員工消費合作社印製 548631 經濟部智慧財產局員工消費合作社印製 A7 _____B7五、發明説明() (低於振幅値兩個單位)的樣本中掃描數目與有聲音能量 (大於一個單位値)的數目的比例,1334; (3)音節變化近 似値-第三個波峰變化數値大於0.4的掃描數與樣本中 有聲音的掃描數的比例,1335 ; (4)以及高頻分析―第 六個波峰有振幅値的樣本掃描數與總掃描數的比例, 1336 〇 本發明中的方法和裝置將聲音風格分成七個範圍,如表6 所示。這些是經過硏究決定,對表7所列的七個知覺或認 知風格範圍最敏感的。 將聲音風格兀素和語苜、說話及知覺範圍關聯以便輸出 (圖12,1228)的程序是利用各種方程式決定每一種範圍 作爲選取的聲音風格元素的函數,如圖13的1330至1336 所示。表6建立了圖13中1333至1336的說話風格元素 與說話風格範圍的關聯。 表7說明了七個知覺風格範圍與聲音風格元素1330至 1336之間的關係。同樣地,要有一個包含零的選擇性輸 入係數陣列的目的是讓裝置操作員能夠在1222和1223的 硏究時切換或鍵入這些係數的變化。反應快的操作員可以 發展不同的知覺範圍,甚至個性或認識範圍,或係數(如 果他比較喜歡這個名詞),發展時需要完全不同的係數。 發展時要鍵入所要的係數集,並且記下與這些係數關聯的 4Hickman200021tw; AND1P115.TW 55 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 548631 A7 _ B7 —一― __ — ~1 ' —---------------一""" —五、發明説明(fb) 範圍(1226)。例如,有的硏究人員可能不想要表7中的 他人-自我範圍,而要把它換成名爲內向-外向的使用者知 覺範圍。更換他人-自我集的係數,經過試驗,找出選取 的加權聲音風格元素組合與其外部決定的內向-外向範圍 之間可接受的高度關聯性以後,硏究人員便可以使用新的 內向-外向範圍的位置,真正將其重新命名。這項作業可 以做到此發明的聲音元素對使用者的內向-外向範圍非常 敏感,而且硏究人員的係數集能夠反應適當的關係。只要 提供許多使用者決定的範圍,使其接近實用,就可以做到 這點,並且讓本發明在探討、開發或驗證與聲音風格元素 相關的新知覺範圍硏究環境下提供極大的生產力。 表6 (請先閱讀背面之注意· 事項再 本頁) 裝· 訂 說話風格範圍的 (DSj)(l)係數 元素 (差異) 線 經濟部智慧財產局員工消費合作社印製 ESi(2) CSil CSi2 CSi3 CSi4 CSi5 1號 0 0 0 0 0 振幅1 0 0 0 0 0 2號 1 0 0 0 1 振幅2 1 0 0 1 0 4Hickman200021tw; AND1P115.TW 56 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 548631 A7 B7 五 、發明説明uq) 3號 振幅3 4號 振幅4 5號 振幅5 6號 振幅6 振幅7 停頓 峰値6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 -1 0 1 (請先閱讀背面之注意事項再本頁) 裝. 經濟部智慧財產局員工消費合作社印製 ##STR1## DS1 =變化單調 DS2 =多變平順 DS3 =斷續連貫 DS4 =積極溫和 DS5 =情緒化克制。 (2) 1至6號=峰値濾波器差異1-6,振幅1至6 峰値振幅差異1-6。 振幅7 =全帶通振幅差異。1T printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 _____B7 V. Description of the invention () (below the amplitude 値 two units) The number of scans and sound energy in the sample (3) the ratio of the number of syllable changes 値-the third peak change number 値 the ratio of the number of scans greater than 0.4 to the number of scans with sound in the sample, 1335; (4) and high Frequency analysis-The ratio of the number of sample scans to the total scans of the sixth peak with amplitude chirp. The method and device in the present invention divides the sound style into seven ranges, as shown in Table 6. These are deliberate decisions and are most sensitive to the range of seven perceptual or cognitive styles listed in Table 7. The program that associates sound style elements with speech, speech, and perceptual ranges for output (Figure 12, 1228) uses various equations to determine each range as a function of the selected sound style element, as shown in Figures 1330 to 1336. . Table 6 establishes the association between the speech style elements 1333 to 1336 and the speech style range in FIG. 13. Table 7 illustrates the relationship between the seven perceptual style ranges and the sound style elements 1330 to 1336. Similarly, the purpose of having an array of selective input coefficients containing zeros is to allow the device operator to switch or key in changes to these coefficients during the 1222 and 1223 investigations. A fast-acting operator can develop different ranges of perception, even personality or cognition, or coefficients (if he prefers the term), and development requires completely different coefficients. When developing, enter the required coefficient set, and note the 4Hickman200021tw; AND1P115.TW 55 associated with these coefficients. This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 548631 A7 _ B7 —One ― __ — ~ 1 '—--------------- 一 " " " —V. Scope of Invention (fb) (1226). For example, some researchers may not want the other-self scope in Table 7 but instead replace it with an inward-outgoing user perception scope. After changing the other-self set coefficient, after experiments, find an acceptable high correlation between the selected weighted sound style element combination and its externally determined inward-outward range, the researcher can use the new inward-outward range Location, really rename it. This operation can make the sound element of this invention very sensitive to the user's inward-outward range, and the researcher's coefficient set can reflect the appropriate relationship. This can be done as long as many user-determined ranges are provided to bring them closer to practicality, and the present invention provides great productivity in the context of exploring, developing, or verifying new perceptual ranges related to sound style elements. Table 6 (Please read the notes on the back and the items on this page first) Binding and ordering (DSj) (l) coefficient elements (differences) in the range of speaking styles Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs Printed ESi (2) CSil CSi2 CSi3 CSi4 CSi5 No. 1 0 0 0 0 0 Amplitude 1 0 0 0 0 0 No. 2 1 0 0 0 1 Amplitude 2 1 0 0 1 0 4Hickman200021tw; AND1P115.TW 56 This paper size applies to the Chinese National Standard (CNS) A4 specifications ( 210X297 mm) 548631 A7 B7 V. Description of the invention uq) No. 3 amplitude 3 No. 4 amplitude 4 No. 5 amplitude 5 No. 6 amplitude 6 No. 7 amplitude peak 7 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 -1 0 1 (Please read the precautions on the back before this page) Printed by the Ministry of Intellectual Property Bureau's Consumer Cooperatives ## STR1 ## DS1 = Monotonic change DS2 = Changeable smooth DS3 = Intermittent coherent DS4 = Positive moderate DS5 = Emotional restraint. (2) Nos. 1 to 6 = peak-to-peak filter difference 1-6, amplitude 1 to 6 peak-to-peak amplitude difference 1-6. Amplitude 7 = full bandpass amplitude difference.
4Hickman200021tw: AND1P115.TW 574Hickman200021tw: AND1P115.TW 57
、1T 線· 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐) 548631 A7 B7 五、發明説明(Μ) 表7 知覺風格 範圍的(DPj)(l)係數 元素 差異 EPi CPil CPi2 CPi3 CPi4 CPi5 CPi6 CPi7 分佈 0 0 0 0 0 0 0 平衡 1 1 0 0 0 0 0 Env-H 0 1 0 0 0 0 0 Env-L 1 0 0 0 0 0 0 1號 0 0 0 0 0 0 0 振幅1 0 0 0 0 0 0 0 2號 0 0 1 0 0 0 1 振幅2 0 0 1 0 0 1 0 3號 0 0 0 0 0 0 0 振幅3 0 0 0 0 0 0 0 4號 0 0 0 0 0 0 0 振幅4 0 0 0 0 0 0 0 5號 0 0 0 0 0 0 1 振幅5 0 0 0 0 -1 0 0 6號 0 0 0 0 0 0 0 經濟部智慧財產局員工消費合作社印製 4Hickman200021tw; AND1P115.TW 58 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) 548631 A7 B7 五、發明説明(q) 振幅6 0 0 0 0 0 0 0 振幅7 0 0 0 1 1 0 -1 停頓 0 0 0 1 1 0 0 峰値6 0 0 0 0 -1 -1 1 經濟部智慧財產局員工消費合作社印製 ##STR2## DPI =生態結構高-低; DP2 =不變靈敏度高-低; DP3 =他人-自我; DP4 =感覺-心靈; DP5 =恨-愛; DP6依賴-獨立; DP7 =情感-身體。 (2) 1至6號=峰値濾波器1-6 ;振幅1至6 =峰 値 振幅差異1-6 ;振幅7全帶通振幅差異。 本發明可供使用者使用的主要結果是範圍値1226 ,可以 利用開關1227選擇在標準的燈光顯示器上顯示,也可以 選擇供監視器、印表機、數據機或其他標準輸出裝置1228 使用。這些可用於判斷對象的語音有多接近內建、發行或 個人開發的控制或標準中的任何或全部聲音或知覺範圍, 再根據判斷結果協助改進情緒辨識。 4Hickman200021tw; AND1P115.TW 59 本纸張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐) (請先閲讀背面之注意事項再ΙΡΝ:頁) 裝·Line 1T · This paper size applies Chinese National Standard (CNS) A4 specification (210 × 297 mm) 548631 A7 B7 V. Description of the invention (M) Table 7 (DPj) (l) coefficient element difference of perceptual style range EPi CPil CPi2 CPi3 CPi4 CPi5 CPi6 CPi7 Distribution 0 0 0 0 0 0 0 Balance 1 1 0 0 0 0 0 Env-H 0 1 0 0 0 0 0 Env-L 1 0 0 0 0 0 0 Number 1 0 0 0 0 0 0 0 Amplitude 1 0 0 0 0 0 0 0 No. 2 0 0 1 0 0 0 1 Amplitude 2 0 0 1 0 0 1 0 No. 3 0 0 0 0 0 0 0 Amplitude 3 0 0 0 0 0 0 0 No. 4 0 0 0 0 0 0 0 0 Amplitude 4 0 0 0 0 0 0 0 No. 5 0 0 0 0 0 0 0 1 Amplitude 5 0 0 0 0 -1 0 0 No. 6 0 0 0 0 0 0 0 Printed by the Employees' Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs 4Hickman200021tw; AND1P115.TW 58 This paper size applies Chinese National Standard (CNS) A4 specification (210X 297 mm) 548631 A7 B7 V. Description of the invention (q) Amplitude 6 0 0 0 0 0 0 0 0 Amplitude 7 0 0 0 1 1 0 -1 Pause 0 0 0 1 1 0 0 Peak 6 6 0 0 0 -1 -1 1 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs ## STR2 ## DPI = Ecological structure high-low; DP2 = unchanged High-low sensitivity; DP3 = others-self; DP4 = feeling-mind DP5 = Hate - Love; DP6 dependence - independence; DP7 = emotion - body. (2) Nos. 1 to 6 = peak 値 filters 1-6; amplitudes 1 to 6 = peak 値 amplitude difference 1-6; amplitude 7 full bandpass amplitude difference. The main result of the present invention that can be used by the user is the range 値 1226. The switch 1227 can be used to select and display on a standard light display, or it can be selected for use by a monitor, printer, modem, or other standard output device 1228. These can be used to determine how close the subject's voice is to any or all of the sound or perception range in the built-in, published, or personally developed controls or standards, and then help improve emotional recognition based on the results of the judgment. 4Hickman200021tw; AND1P115.TW 59 This paper size applies to Chinese National Standard (CNS) Α4 size (210 × 297 mm) (Please read the precautions on the back before IPN: page)
、1T -線 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明説明(b〇) 本發明的另一項示範性具體實施例裡,利用接收自使用者 的生物訊號協助判斷使用者說話中的情緒。利用補償情 緒、焦慮或疲倦等因素造成的使用者話語變化來改進話語 辨識系統的辨識率。從使用者發音中取得的話語訊號經過 前處理器修改,提供給話語辨識系統以改進辨識率。話語 訊號是根據表示使用者情緒狀態的生物訊號修改。 圖14詳細說明了話語辨識系統的結構,從麥克風1418取 得的話語訊號和從生物監視器1430取得的生物訊號由前 處理器1432接收。由生物監視器1430送到前處理器1432 的訊號是表示使用者皮膚表面兩點間阻抗的生物訊號。生 物監視器1430利用連接至使用者一根手指上的接點1436 和連接至另一根手指的接點1438量測阻抗。生物監視器 可以使用如Tandy Corporation所屬的Radio Shack銷售的 63-664生物回饋監視器,商標名稱(MICRONATA.RTM. BIOFEEDBACK MONITOR)。這些接點也可以連接到使用 者皮膚上的其他位置。使用者變得興奮或焦慮時,1436 和1438兩點間的阻抗會降低,監視器1430會偵測此情形, 產生指示阻抗降低的生物訊號。前處理器1432使用生物 監視器1430送來的生物訊號修改由麥克風1418接收的 話語訊號,話語訊號經過修改以補償因爲疲倦或情緒狀態 改變產生的變化所造成的使用者話語變化。例如,生物監 視器1430顯示使用者處於興奮的狀態時,前處理器1432 可以調低麥克風1418送來的話語訊號聲調,如果生物監 4Hickman200021tw; AND1P115.TW 60 (請先閲讀背面之注意事項再本頁) •裝·1T-line 548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the invention (b) In another exemplary embodiment of the present invention, the biological signal received from the user is used to assist in determining the user Emotions in speech. Improve the recognition rate of speech recognition systems by compensating for changes in user speech caused by factors such as emotions, anxiety, or fatigue. The utterance signal obtained from the user's pronunciation is modified by the preprocessor and provided to the utterance recognition system to improve the recognition rate. Discourse signals are modified based on biological signals that indicate the user's emotional state. Fig. 14 illustrates the structure of the speech recognition system in detail. The speech signal obtained from the microphone 1418 and the biological signal obtained from the biological monitor 1430 are received by the preprocessor 1432. The signal sent from the biological monitor 1430 to the preprocessor 1432 is a biological signal indicating the impedance between two points on the user's skin surface. The biological monitor 1430 measures the impedance using a contact 1436 connected to one finger of the user and a contact 1438 connected to the other finger. Biological monitors such as 63-664 biological feedback monitors sold by Radio Shack owned by Tandy Corporation, trade names (MICRONATA.RTM. BIOFEEDBACK MONITOR) can be used. These contacts can also be connected to other locations on the user's skin. When the user becomes excited or anxious, the impedance between two points 1436 and 1438 will decrease, and the monitor 1430 will detect this situation and generate a biological signal indicating that the impedance is reduced. The preprocessor 1432 uses the biological signal sent from the biological monitor 1430 to modify the utterance signal received by the microphone 1418. The utterance signal is modified to compensate for changes in the user's speech caused by changes caused by fatigue or changes in emotional state. For example, when the biological monitor 1430 shows that the user is in an excited state, the front processor 1432 can lower the tone of the utterance signal sent by the microphone 1418. Page) • Loading ·
、1T 線 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 548631 A7 B7 經 濟 部 智 慧 財 產 局 g 消 費 合 作 社 印 製 五、發明説明(έ?1) 視器1430顯不使用者處於較不興奮的狀態,例如疲倦時, 前處理器1432可以調高麥克風1418送入的話語訊號聲 調。然後前處理器1432再以傳統的方式將修改過的話語 訊號提供給音效卡1416。前處理器1432會使用RS232介 面等介面和PC1410通訊,以便進行初始化和校準等作業。 使用者1434利用觀察顯示器1412及使用鍵盤1414或小 鍵盤1439或滑鼠輸入指令的方式和前處理器1432溝通。 也可以控制麥克風1418的增益及/或頻率響應,使用生物 訊號預先處理話語訊號。可以調高或調低麥克風的增益或 放大以回應生物訊號。也可以利用生物訊號改變麥克風的 頻率響應。例如,麥克風1418若是採用AUDIO-TECHNICA U.S.,Inc.的ATM71型麥克風,可以利用生物訊號切換相 對平坦響應或衰減響應。衰減響應的低頻話語訊號增益較 小0 生物監視器1430若是Radio Shack的向上參考型監視器 生物訊號的形式就是一連串的斜坡形訊號,每一個斜坡^ 期約爲0.2 msec。圖15所示就是生物訊號,一連串的斜 坡形訊號1542以時間T加以分割。斜坡1542之間的T 時間量與接點1438和1436之間的阻抗有關。使用者處於 較興奮的狀態時,接點1438和1436之間的阻抗會降低’ 時間T也會減少。使用者處於較不興奮的狀態時,接點1438 和1436之間的阻抗會變高,時間T會增加。 4Hickman200021tw; AND 1P115 .TW 本紙張尺度適用中國國家標準(CNS ) A4規格(2丨0^7公董) 61 548631 A7 B7 五、發明説明αι) 生物監視器輸入的生物訊號波形也可以不是一連串的斜坡 式訊號。例如,生物訊號可以是根據生物監視器量測結果 固定改變振幅及/或頻率的類比訊號,也可以是根據生物 監視器量測所得狀況的數位値。 生物監視器1430包含了圖16的電路,會產生表示接點 1438與1436之間阻抗的生物訊號。電路由兩個部分組成, 第一部分用於感應接點1438和1436之間的阻抗,第二部 分是一個振盪器,其輸出接頭1648會產生連串的斜坡訊 號,振盪頻率是由第一部分控制。 第一部分根據接點1438和1436之間的阻抗控制電晶體的 集極電流Ie,Q1和電壓Ve,Q1。在本具體實施例中,阻抗感 應器1650只是連接在說話者皮膚的接點1438和1436。 由於接點1438和1436之間的阻抗變化和第二部分的振盪 頻率比較起來相對很緩慢,因此和第二部分相較之下’集 極電流Ie,Q1和電壓Ve,Q1幾乎可說是穩定狀態。電容器C3 使這些電流和電壓更爲穩定。 第2部分是一個振盪器。電抗元件L1和C1控制電晶體 Q3的導通和截止以產生振盪。電源剛打開時’ 會導 入基極電流Ib,Q2,使Q2導通。同樣地,1。,^也會提供基 極電流Ib,Q3,使電晶體Q3導通。電感器L1 一開始並沒 有電流。Q3導通時,Vcc電壓減掉小量的飽和電晶體電 4Hickman200021tw; AND1P115.TW 62 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閱讀背面之注意事項再本頁)1. The paper size of the 1T line is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 548631 A7 B7 Printed by the Intellectual Property Bureau of the Ministry of Economic Affairs g Consumption cooperatives 5. Description of the invention (1?) In a less excited state, such as when tired, the pre-processor 1432 can raise the tone of the utterance signal sent by the microphone 1418. The pre-processor 1432 then provides the modified utterance signal to the sound card 1416 in a conventional manner. The preprocessor 1432 communicates with the PC1410 using an interface such as an RS232 interface to perform operations such as initialization and calibration. The user 1434 communicates with the front processor 1432 by using the observation display 1412 and using the keyboard 1414 or the keypad 1439 or the mouse to input instructions. It is also possible to control the gain and / or frequency response of the microphone 1418 and pre-process the utterance signal using a biological signal. You can increase or decrease the microphone's gain or amplification in response to a biological signal. It is also possible to use a biosignal to change the frequency response of the microphone. For example, if the microphone 1418 is an ATM71 microphone of AUDIO-TECHNICA U.S., Inc., it can use a biological signal to switch the relatively flat response or the attenuation response. The low-frequency speech signal gain of the attenuation response is smaller. 0 If the biological monitor 1430 is a radio-shack upward reference monitor, the biological signal is a series of ramp-shaped signals, each ramp ^ period is about 0.2 msec. Figure 15 shows the biological signal. A series of slope-shaped signals 1542 are divided by time T. The amount of T time between ramps 1542 is related to the impedance between contacts 1438 and 1436. When the user is in a more excited state, the impedance between the contacts 1438 and 1436 will decrease, and the time T will also decrease. When the user is in a less excited state, the impedance between the contacts 1438 and 1436 becomes higher and the time T increases. 4Hickman200021tw; AND 1P115 .TW This paper size is applicable to Chinese National Standard (CNS) A4 specification (2 丨 0 ^ 7 public director) 61 548631 A7 B7 V. Description of the invention αι) The biological signal waveform input by the biological monitor may not be a series Slope signal. For example, the bio-signal may be an analog signal that changes its amplitude and / or frequency according to the measurement result of the bio-monitor, or it may be a digital signal based on the status measured by the bio-monitor. The biological monitor 1430 includes the circuit of FIG. 16 and generates a biological signal representing the impedance between the contacts 1438 and 1436. The circuit consists of two parts. The first part is used to sense the impedance between the contacts 1438 and 1436. The second part is an oscillator. The output connector 1648 will generate a series of ramp signals. The oscillation frequency is controlled by the first part. The first part controls the collector currents Ie, Q1 and voltages Ve, Q1 of the transistor based on the impedance between the contacts 1438 and 1436. In this embodiment, the impedance sensor 1650 is only the contacts 1438 and 1436 connected to the speaker's skin. Since the impedance change between contacts 1438 and 1436 is relatively slow compared with the oscillation frequency of the second part, compared with the second part, the 'collector current Ie, Q1 and voltage Ve, Q1 are almost stable status. Capacitor C3 makes these currents and voltages more stable. Part 2 is an oscillator. Reactance elements L1 and C1 control on and off of transistor Q3 to generate oscillation. When the power is just turned on ', the base current Ib, Q2 will be conducted, and Q2 will be turned on. Similarly, 1. , ^ Will also provide the base current Ib, Q3, so that the transistor Q3 is turned on. Inductor L1 has no current at first. When Q3 is turned on, the Vcc voltage is reduced by a small amount of saturated transistor 4Hickman200021tw; AND1P115.TW 62 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before this page)
—裝· .F 訂 經濟部智慧財產局員工消費合作社印製 548631 Δ7 Α7 Β7 五、發明説明(C3) 壓¥^3會施加到L1兩端。因此,電流Iu會隨著 增加。電流L增加時,通過電容器C1的電流Iel也會增 加。因爲電流^,(^即乎是完全固定,所以電流I增加會 使電晶體Q2的基極電流IB,Q2減少。這會造成電流Ie,Q2、 \(^及kQ3減少。所以會有更多的電流L流經電容器Cl ’ 使電流1。,(^3更爲減少。此一回授作用造成電晶體Q3截止。 到最後,電容器C1會完全充電,電流L和Iel降到零, 使電流Ie,Q1能夠再次抽取電流Ib,Q2,讓電晶體Q2和Q3 導通,重新開始振盪周期。 電流1。,<^會隨接點1438和1436之間的阻抗變化,此電 流控制了輸出訊號的周期頻率。接點1438和1436之間的 阻抗增加時,斜坡訊號間的時間T會減少,接點1438和 1436之間的阻抗增加時,斜坡訊號之間的時間T會增加。 經濟部智慧財產局員工消費合作社印製 電路的電源由三伏特的電池1662供應,此電池經由開關 1664連接至電路。電路中還有一個可變電阻1666,用來 設定電路的工作點。最好是將可變電阻1666設定在接近 其工作範圍中間點附近。然後電路就會根據接點1438和 1436之間的阻抗,從此工作點開始依照前面的說明變化。 4Hickman200021tw; AND1P115.TW 63 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明αψ) 電路中還包含了開關1668和喇叭1670。接頭1648中沒 有插入匹配接頭時,開關1668會將電路輸出送至喇叭 1670,不會送至接頭1648。 圖17是前處理器1432的方塊圖。類比至數位(A/D)轉 換器1780接收麥克風1418輸入的話語或發音訊號,類比 至數位(A/D)轉換器1782接收生物監視器1430輸入的 生物訊號。A/D 1782輸出的訊號會送至微處理器1784。 微處理器1784監測A/D 1782輸入的訊號,判斷數位訊號 處理器(DSP)裝置1786應採取何種動作。微處理器1784 使用記憶體1788儲存程式及執行剪貼作業。微處理器1784 使用RS232介面和個人電腦1410通訊。控制PC1410與 微處理器1784之間介面的軟體可能是在具有多重應用程 式環境的個人電腦1410上執行,使用的是如Microsoft Corporation銷售的商標名稱(WINDOWS)程式之類的套 裝軟體。DSP 1786的輸出再由數位至類比轉換器1790轉 換回類比訊號。DSP 1786在微處理器1784的指令控制下 修改過A/D1780的訊號之後,D/A轉換器1790的輸出會 傳送至音效卡1416。微處理器1784可以採用市面上很普 遍的微處理器,例如Intel Corporation的微處理器, DSP1786也可以採用市面上很普遍的數位訊號處理晶片, 例如 Texas Instruments 的 TMS320CXX 裝置系歹[J。 生物監視器1430和前處理器1432可以放在同一張介面 4Hickman200021tw; AND1P115.TW 64 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ------------裝·-------訂------ (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(0) 卡上,插入個人電腦1410裡的空插槽。使用個人電腦 1410就可以執行微處理器1786和數位訊號處理器1786 的功能,不需要專用的硬體。 微處理器1784監測A/D 1782的生物訊號以判斷DSP1786 應採取何種動作。A/D 1782輸出的訊號顯示使用者是處 於較興奮的狀態時,微處理器1784會通知DSP 1786處 理A/D 1780的方式是降低其話語訊號的聲調。A/D 1782 的輸出訊號顯示使用者處於較不活潑或疲倦的狀態時,微 處理器1784會指示DSP 1786提高話語訊號中的聲調。 DSP 17S6會建立話語模型以修改話語訊號的聲調。然後 DSP再使用模型以修改過的聲調重新建立話語訊號。話 語模型是使用非常著名的線性預測編碼技術之一建立。 Analog Device,Inc.的應用書”Digital Signal Processing Applications Using the ADSP 2100 Family” 355-372 頁有 介紹這種技術。這本書的出版商是?^!^06-Hall,Englewood Cliffs,N.J.,1992 年發行。此技術牽涉到 將話語訊號做成有時間變化係數的FIR (有限脈衝響應) 濾波器模型,利用連串的脈衝來激發濾波器。脈衝之間的 時間T是一些聲調或基礎頻率。時間變化係數可以利用 Levinson-Durbin循環等技巧計算出來,這在前面提到的 Analog Device,Inc.出版的書中有介紹。構成激發濾波器 的連串脈衝的脈衝間時間T可以利用John D. Markel的 4Hickman200021tw; AND1P115.TW 65 本紙張尺度適用中國國家標準(CNS)A4規格(210 χ 297公釐) (請先閱讀背面之注意事項再填寫本頁) --------訂·----111· 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明 SIFT (簡單反相濾波器追蹤)演算等方法計算,此方法在 John D. Markel 的 IEEE Transactions on Audio and Electroacoustics,Vol. AU-20, No. 5, December,1972 裡 MThe SIFT Algorithm for Fundamental Frequency Estimation”這一章有介紹。DSP I786會在激發FIR濾波 器以重建話語訊號時變更脈衝間的時間T,藉此修改話語 訊號的基頻聲調。例如,脈衝間的時間Τ減少1%,聲調 就會提高1 %。 要注意到,即使不變更聲調也能修改話語訊號。例如,聲 調、振幅、頻率及/或訊號頻譜等都可以修改。可以衰減 或放大訊號頻譜的部分或全部。 除了監測顯示使用者皮膚上兩點間阻抗的生物訊號以外, 也可以監測其他的生物訊號。顯示自動活動的訊號可以作 爲生物訊號。可以使用顯示自動活動的訊號,例如血壓、 脈博數、腦波或其他帶電活動、瞳孔大小、皮膚溫度、特 定電磁波長的透通性或反射性或顯示使用者情緒狀態的其 他訊號表示。 \Λ 18是微處理器1784用於指示DSP 1786根據與生物訊 號相關的時間Τ變更話語訊號聲調的聲調修改曲線。水 平軸1802代表生物訊號斜坡1442之間的時間,垂直軸 1804代表DSP 1786輸出所導致的聲調百分比變化。 4Hickman200021tw; AND1P115.TW 66 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -------------------訂·-------- (請先閱讀背面之注意事項再填寫本頁) 548631 a? B7 五、發明說明uq) 圖19是由微處理器1784執行以建立圖18所示工作曲線 的指令流程圖。起始之後,先執行步驟1930以建立和1802 軸相同線性的直線。這條線代表沒有在生物訊號的所有T 値上加入聲調變化。在步驟1930之後會執行決策步驟 1932,由微處理器1784判斷是否收到鍵盤1414或小鍵盤 1439輸入的修改指令。如果沒有收到修改指令’微處理 器1784會執行迴圈,等候修改指令。如果有收到修改指 令,將會執行步驟1934以決定要用於建立新參考點Refl 的ΤΤ=Ί^η値。Trefl値等於從生物訊號取得的T目前値。 例如,Trefl可能等於〇·6 msec。決定了 Trefl値以後,微處 理器1784會執行步驟1938,要求使用者發音,以便在步 驟1940取得聲調樣本。由於聲調樣本可作爲沿著1804軸 指示的聲調百分比變化的基礎,因此最好是取得聲調樣 本。步驟1942中,微處理器1784會指示DSP 1786調高 話語訊號的聲調,調高量等於與Refl點聯結的目前聲調 變化加上百分之五的增量;不過也可以使用較小或較大的 增量。(此時與Refl點聯結的聲調變化是零。請參考步驟 1930。)步驟1944裡,微處理器1784會要求使用者對著 話語辨識系統說幾個指令以執行辨識測試,判斷是否達到 可接受的辨識率。使用者完成測試以後,可以利用鍵盤 1414或小鍵盤1439等輸入”end”之類指令,告訴微處 理器1784已經完成測試。 4Hickman200021tw; AND1P115.TW 67 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注音?事項再填寫本頁) ρί裝--------訂·!------t 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(G?) 微處理器1784執行了步驟1944之後,再執行步驟1946, 指示DSP 1786以Refl點聯結的聲調變化減掉百分之五 的減量以調低輸入話語訊號的聲調;也可以使用較小或較 大的減量(請注意,在執行步驟1930之後,與Refl點聯 結的聲調變化是零)。步驟1948裡,微處理器1784會要 求使用者再進行一次話語辨識測試,並在完成測試後輸入 nend”指令。步驟1950裡,微處理器U84會要求使用者 選擇第一或第二次測試,指示哪一次測試的辨識能力較 佳。步驟1952利用使用者選擇的結果選取步驟1954或 1956。如果選取測試1爲最佳測試,將會執行步驟1956, 並將與Refl點聯結的新百分比變化設定爲等於前一個 Refl點値加上百分之五或步驟1942使用的增量。如果選 擇測試2爲最佳測試,會執行步驟1954,並且將Refl的 聯結百分比變化値設定爲等於Refl的舊値減掉百分之五 或步驟1946使用的減量。決定與結的百分比變 化會建立新的參考點Refl。例如,若是選擇測試1爲最 佳測試,Refl點會位於圖18中的1858點。建立好新建 立的Refl點1858的位置以後,接著在步驟1962建立直 線1860。直線1860是用於計算生物訊號中不同T値百分 比變化的初始聲調修改線。一開始可以指定這條線的斜率 爲每毫秒增加百分之五·,不過也可以使用其他的斜率。 建立此起始修改行後,微處理器1784會進入等候迴路’ 執行步驟1964和1966。在步驟1964裡,微處理器1784 4Hickman200021tw; AND1P115.TW 68 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂--- (請先閱讀背面之注意事項再填寫本頁) 禮_ 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明aq) 會檢查修改指令,而在步驟1966裡則會檢查停用指令。 如果步驟1964沒有收到修改指令,處理器會到步驟1966 檢查停用指令。要是沒有收到停用指令’微處理器會返回 步驟1964,要是有收到停用指令的話,微處理器會執行 步驟1930,將生物訊號中所有T値的聲調變化設定爲等 於零。處理器會一直停留在此檢查修改與停用指令的迴 路,直到使用者對利用曲線1860所作的話語訊號前處理 產生的辨識率感到不滿意爲止。 如果在步驟1964收到修改指令,就會執行步驟1968。步 驟1968會決定T的値,檢查T的値是等於或幾乎等於Refl 點的八…的値。如果T的値和Refl —樣,就會執行步驟 I942。如果T的値和Refl不一樣,將會執行步驟1970。 步驟1970會建立新參考點Ref2的1\^値。爲了方便說 明,我們假設Tref2 = 1.1 m sec。對照圖18,這會在直線 1860上建立點1872爲Ref2點。步驟1974裡,微處理器 1784會指示DSP 1786將Ref2點相關的聲調變化提高2.5% (也可以使用其他百分比値)。步驟1976會要求使用者執 行辨識測試,並在完成後輸入’’end"指令。步驟1978裡, 微處理器1784會指示DSP 1786降低話語訊號的聲調, 降低的値是ReO相關的聲調變化量減掉2.5%。步驟1980 會再要求使用者執行辨識測試,同樣要在完成後輸入 ’’end”指令。步驟1982會要求使用者指示第一或第二次 測試的結果較佳。在步驟1984裡.,微處理器1784會根據 4Hickman200021tw; AND1P115.TW 69 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------Aw — (請先閱讀背面之注意事項再填寫本頁) 訂--- 嫌- 548631 A7 B7 經濟部智慧財產局員工消費合作社印制衣 五、發明說明() 最佳測試選擇結果執行不同步驟,如果選擇測試1,將執 行步驟1986,如果選擇測試2,將執行步驟1988。步驟1986 裡,微處理器1784會將Ref2點相關的百分比變化設定爲 Ref2相關的舊値加上2.5%或者在步驟1974裡使用的增 量。步驟1988裡,Ref2相關的百分比變化設定爲等於Ref2 的舊相關値減掉2.5%或者步驟1978中使用的減量。完成 步驟1986或I988後,將會執行步驟1990。步驟1990會 建立一條新的聲調修改線。新修改線使用Refl的相關線 和Ref2的新相關點。例如,假設使用者在步驟1984選取 了測試1 ’新的Ref2相關點將是圖18裡的點1892。現在 新的聲調轉換線已經變成通過1892和1858兩點的直線 1898。微處理器1684執行過步驟1990以後,會回到步驟 1964和1966構成的迴路作業。 請注意,此處使用的是線性修改線,不過也可以使用非線 性修改線。其方式是使用點1858和196建立向右連結點 1858的線條斜率,以及使用向左連結點1858的另一個參 考點建立向左連結1858的線條斜率。也可以加上最大百 分比聲調變化的正負限制。聲調修改線接近這些限制時, 可以漸進地接近,或者在與限制接觸的點上突然變化。 另外也可以使用固定修改曲線,例如曲線1800,然後調 整可變電阻1666,直到達到可接受的辨識率爲止。 4Hickman200021tw; AND 1P115 .TW 70 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ------------裝--- (請先閱讀背面之注意事項再填寫本頁) 訂---------禮_ 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(rjl) 語音訊息傳揆系粹 圖20例示本發明的一具體實施例,其根據語音訊息中的 情緒特性而管理語音訊息。作業2000接收利用電信網路 傳輸的一些語音訊息。作業2002將語音訊息儲存在儲存 媒體上,例如前面提到的錄音機或硬碟。作業2004會判 斷語音訊息中語音訊號的相關情緒。可以利用前面提到的 任一種方法判斷情緒。 然後再根據判斷的情緒在作業2006裡組織語音訊息。例 如’語音中顯示悲傷、生氣或害怕等負面情緒的訊息可以 在郵件匣及/或資料庫中分成同一組。作業2008可以存取 組織過的語音訊息。 語音訊息會跟在電話之後。可以選擇將情緒相近的語音訊 息組織在一起。也可以選擇在接到電話之後馬上即時組織 語音訊息。最好是確定一種組織語音訊息的方法以輔助存 取組織的語音訊息。另外,也最好依照前面的討論,至少 從語音訊息摘取一項特性以判斷情緒。 根據本發明所做的示範性語音訊息傳送系統具體實施例 裡’會將聲調和LPC參數(通常也會包括其他的激發資 訊)編碼以便傳輸及/或儲存,然後再解碼以提供和原始 話語輸入相近的回答。 4Hickman200021tw: AND1P115.TW 71 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------·裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(f/>) 本發明特別相關於用以分析或編碼人類話語訊號的線性預 測編碼(LPC)系統(及方法)。於LPC模型中,連串樣本 中的每一個樣本通常都會模型化(簡化的模型)爲前面樣 本的線性組合再加上一個激發功能:—Equipment · .F Order Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 Δ7 Α7 Β7 V. Description of the Invention (C3) Pressing ¥ ^ 3 will be applied to both ends of L1. Therefore, the current Iu increases with it. As the current L increases, the current Iel passing through the capacitor C1 also increases. Because the currents ^, (^ are completely fixed, increasing the current I will reduce the base current IB, Q2 of the transistor Q2. This will cause the currents Ie, Q2, \ (^, and kQ3 to decrease. So there will be more The current L flows through the capacitor Cl ′ to reduce the current 1., (3). This feedback effect causes the transistor Q3 to be turned off. At the end, the capacitor C1 is fully charged, and the currents L and Iel are reduced to zero, so that the current Ie Q1 can draw currents Ib, Q2 again, so that transistors Q2 and Q3 are turned on, and the oscillation cycle is restarted. Current 1., ^ will change with the impedance between contacts 1438 and 1436. This current controls the output signal. Periodic frequency. As the impedance between contacts 1438 and 1436 increases, the time T between ramp signals decreases. When the impedance between contacts 1438 and 1436 increases, the time T between ramp signals increases. Intellectual Property of the Ministry of Economic Affairs The power of the printed circuit of the Bureau ’s consumer cooperative is supplied by a three-volt battery 1662, which is connected to the circuit via a switch 1664. There is also a variable resistor 1666 in the circuit to set the operating point of the circuit. It is best to change the variable Resistance 1666 is set at Near the middle point of its working range. Then the circuit will change according to the previous description based on the impedance between the contacts 1438 and 1436. 4Hickman200021tw; AND1P115.TW 63 This paper standard applies Chinese National Standard (CNS) Α4 Specifications (210X297 mm) 548631 Printed by the Consumers ’Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 V. Description of invention αψ) The circuit also contains a switch 1668 and a speaker 1670. When no matching connector is inserted in the connector 1648, the switch 1668 will The output is sent to the speaker 1670, but not to the connector 1648. Figure 17 is a block diagram of the preprocessor 1432. The analog-to-digital (A / D) converter 1780 receives the speech or pronunciation signal input from the microphone 1418, and the analog-to-digital (A / D) converter 1782 receives the biological signal input from the biological monitor 1430. The signal output from the A / D 1782 is sent to the microprocessor 1784. The microprocessor 1784 monitors the signal input from the A / D 1782 to determine the digital signal processor ( DSP) device 1786 should take action. Microprocessor 1784 uses memory 1788 to store programs and perform cut and paste operations. Microprocessor 1784 uses RS232 interface Communicates with the personal computer 1410. The software controlling the interface between the PC 1410 and the microprocessor 1784 may be executed on the personal computer 1410 with a multi-application environment, using, for example, a brand name (WINDOWS) program sold by Microsoft Corporation. Software package. The output of DSP 1786 is converted back to analog signal by digital-to-analog converter 1790. After the DSP 1786 modifies the signal of the A / D 1780 under the instruction control of the microprocessor 1784, the output of the D / A converter 1790 is transmitted to the sound card 1416. The microprocessor 1784 can use a common microprocessor on the market, such as the microprocessor of Intel Corporation, and the DSP1786 can also use a common digital signal processing chip on the market, such as the TMS320CXX device system of Texas Instruments [J. The biological monitor 1430 and the front processor 1432 can be placed on the same interface. 4Hickman200021tw; AND1P115.TW 64 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) ---------- --Installation ------- Order ------ (Please read the precautions on the back before filling out this page) Printed by the Intellectual Property Bureau Employee Consumer Cooperative of the Ministry of Economy 548631 A7 B7 V. Invention Description (0 ) Card, insert the empty slot in the personal computer 1410. The personal computer 1410 can perform the functions of the microprocessor 1786 and the digital signal processor 1786 without the need for dedicated hardware. The microprocessor 1784 monitors the biological signal of the A / D 1782 to determine what action the DSP 1786 should take. When the signal output by the A / D 1782 indicates that the user is in a more excited state, the microprocessor 1784 will notify the DSP 1786 to process the A / D 1780 by lowering the tone of its utterance signal. When the output signal of A / D 1782 shows that the user is in a less active or tired state, the microprocessor 1784 instructs the DSP 1786 to raise the tone in the utterance signal. The DSP 17S6 will build a speech model to modify the tone of the speech signal. The DSP then uses the model to re-establish the utterance signal with the modified tones. The discourse model is built using one of the very well-known linear predictive coding techniques. Analog Devices, Inc.'s application book "Digital Signal Processing Applications Using the ADSP 2100 Family" has pages 355-372 describing this technique. Who is the publisher of this book? ^! ^ 06-Hall, Englewood Cliffs, N.J., 1992. This technique involves making a speech signal into a FIR (Finite Impulse Response) filter model with a time-varying coefficient, using a series of pulses to excite the filter. The time T between pulses is some tone or fundamental frequency. The coefficient of time variation can be calculated using techniques such as Levinson-Durbin cycles, which are described in the aforementioned book published by Analog Device, Inc. The inter-pulse time T of the series of pulses constituting the excitation filter can use 4Hickman200021tw of John D. Markel; AND1P115.TW 65 This paper size is applicable to China National Standard (CNS) A4 (210 χ 297 mm) (Please read the back first Please pay attention to this page and fill in this page) -------- Order · ---- 111 · 548631 A7 B7 Printed by the Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs V. Invention Description SIFT (Simple Inverse Filter Tracking) Calculations such as calculations, this method is described in the chapter "M The SIFT Algorithm for Fundamental Frequency Estimation" in John D. Markel's IEEE Transactions on Audio and Electroacoustics, Vol. AU-20, No. 5, December, 1972. DSP I786 The time T between pulses is changed when the FIR filter is excited to reconstruct the speech signal, thereby modifying the fundamental frequency tone of the speech signal. For example, if the time T between pulses is reduced by 1%, the tone is increased by 1%. Note that, The speech signal can be modified without changing the tone. For example, the tone, amplitude, frequency, and / or signal spectrum can be modified. Part or all of the signal spectrum can be attenuated or amplified In addition to monitoring biological signals showing impedance between two points on the user's skin, other biological signals can also be monitored. Signals showing automatic activity can be used as biological signals. Signals showing automatic activity can be used, such as blood pressure, pulse number, brain Wave or other charged activity, pupil size, skin temperature, permeability or reflectivity of specific electromagnetic wavelengths or other signal representations showing the user's emotional state. \ Λ 18 is a microprocessor 1784 used to instruct the DSP 1786 to communicate with biological signals Relevant time T changes the tone modification curve of the tone of the speech signal. The horizontal axis 1802 represents the time between the biological signal slopes 1442, and the vertical axis 1804 represents the percentage change of the tone caused by the output of the DSP 1786. 4Hickman200021tw; AND1P115.TW 66 This paper scale applies China National Standard (CNS) A4 Specification (210 X 297 mm) ------------------- Order · -------- (Please read the Note: Please fill out this page again) 548631 a? B7 V. Description of the invention uq) Figure 19 is a flowchart of instructions executed by the microprocessor 1784 to establish the working curve shown in Figure 18. After starting, execute Step 1930 is performed to establish a straight line that is the same linear as the 1802 axis. This line represents that no pitch change has been added to all T 値 of the biological signal. After step 1930, a decision step 1932 is executed, and the microprocessor 1784 determines whether a modification instruction input from the keyboard 1414 or the keypad 1439 is received. If no modification instruction is received, the microprocessor 1784 executes a loop and waits for the modification instruction. If a modification instruction is received, step 1934 will be performed to determine TT = Ί ^ η 値 to be used to establish a new reference point Ref1. Trefl 値 is equal to T current 取得 obtained from biological signals. For example, Trefl may be equal to 0.6 msec. After Trefl 値 is determined, the microprocessor 1784 executes step 1938 and asks the user to pronounce it in order to obtain a tone sample in step 1940. Since tone samples can be used as the basis for the percent change in tone indicated along the 1804 axis, it is best to obtain a tone sample. In step 1942, the microprocessor 1784 instructs the DSP 1786 to increase the tone of the speech signal by an amount equal to the current tone change associated with the Refl point plus a five percent increment; however, a smaller or larger amount may be used. In increments. (At this time, the tone change associated with the Refl point is zero. Please refer to step 1930.) In step 1944, the microprocessor 1784 will ask the user to say a few instructions to the speech recognition system to perform a recognition test to determine whether it is acceptable. Recognition rate. After the user finishes the test, he can use the keyboard 1414 or the keypad 1439 to input an instruction such as "end" to tell the microprocessor 1784 that the test has been completed. 4Hickman200021tw; AND1P115.TW 67 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the note on the back? Matters before filling this page) ·! ------ t Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the invention (G?) After the microprocessor 1784 performs step 1944, Then execute step 1946 to instruct the DSP 1786 to reduce the tone of the input speech signal by a 5% reduction with the tone change associated with the Refl point; a smaller or larger decrease can also be used (please note that during step 1930 After that, the tone change associated with the Refl point is zero). In step 1948, the microprocessor 1784 will ask the user to perform another speech recognition test and enter the “nend” command after the test is completed. In step 1950, the microprocessor U84 will ask the user to choose the first or second test. Indicate which test has better recognition ability. Step 1952 uses the result selected by the user to select step 1954 or 1956. If test 1 is selected as the best test, step 1956 will be performed, and the new percentage change set with the Refl point will be set Is equal to the previous Refl point 値 plus five percent or the increment used in step 1942. If test 2 is selected as the best test, step 1954 is performed, and the change in the percentage of connection of Refl is set equal to the old Refl 値Subtract 5 percent or the decrement used in step 1946. The percentage change in the decision and knot will create a new reference point Refl. For example, if test 1 is selected as the best test, the Refl point will be located at 1858 in Figure 18. Create After the position of the newly established Refl point 1858, a straight line 1860 is established in step 1962. The straight line 1860 is used to calculate different T 値 percentage changes in the biological signal Initial Tone Modification Line. The slope of this line can be specified to increase by 5% per millisecond at the beginning, but other slopes can also be used. After establishing this initial modification line, the microprocessor 1784 will enter the waiting loop for execution Steps 1964 and 1966. In step 1964, the microprocessor 1784 4Hickman200021tw; AND1P115.TW 68 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) ----------- Packing -------- Order --- (Please read the notes on the back before filling out this page) Gift _ Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Invention Description aq) Will check and modify Instruction, and the deactivation instruction is checked in step 1966. If no modification instruction is received in step 1964, the processor will go to step 1966 to check the deactivation instruction. If no deactivation instruction is received, the microprocessor will return to step 1964, If a disable instruction is received, the microprocessor will execute step 1930 to set the tone changes of all T 等于 in the biological signal to be equal to zero. The processor will stay here to check the circuit of the modification and disable instructions until the The user is not satisfied with the recognition rate generated by the pre-processing of the utterance signal made by the curve 1860. If a modification instruction is received in step 1964, step 1968 is performed. Step 1968 determines the 値 of T and checks whether 检查 of T is equal to or It is almost equal to eight of the Refl point. If T of T is the same as Refl, step I942 is performed. If T of T is not the same as Refl, step 1970 is performed. Step 1970 creates a new reference point Ref2 of 1 \ ^ 値. For the sake of explanation, we assume Tref2 = 1.1 m sec. Referring to Figure 18, this will establish point 1872 on line 1860 as Ref2. In step 1974, the microprocessor 1784 instructs the DSP 1786 to increase the tone change associated with the Ref2 point by 2.5% (other percentages may also be used). Step 1976 asks the user to perform a recognition test, and enters an ' end " command upon completion. In step 1978, the microprocessor 1784 instructs the DSP 1786 to reduce the tone of the utterance signal, and the reduction is caused by a reduction of the tone change related to ReO by 2.5%. Step 1980 will then ask the user to perform the recognition test, and also enter the "end" command after completion. Step 1982 will ask the user to indicate that the results of the first or second test are better. In step 1984, the microprocessing 1784 will be based on 4Hickman200021tw; AND1P115.TW 69 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) ----------- Aw — (Please read the precautions on the back first (Fill in this page again) Order --- Suspect-548631 A7 B7 Printing of clothing by the Consumer Property Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the invention () The best test selection results are performed in different steps. If test 1 is selected, step 1986 will be performed. If test 2 is selected, step 1988 will be executed. In step 1986, the microprocessor 1784 will set the percentage change associated with the Ref2 point to the Ref2 related old value plus 2.5% or the increment used in step 1974. In step 1988 The percentage change of the Ref2 correlation is set equal to the old correlation of Ref2, minus 2.5% or the reduction used in step 1978. After completing step 1986 or I988, step 1990 will be executed. Step 1990 will create a new Adjust the modification line. The new modification line uses the correlation line of Refl and the new correlation point of Ref2. For example, suppose the user selects test 1 in step 1984. The new Ref2 correlation point will be point 1892 in Figure 18. Now the new tone conversion The line has become a straight line 1898 passing through two points of 1892 and 1858. After the microprocessor 1684 has performed step 1990, it will return to the loop operation composed of steps 1964 and 1966. Please note that the linear modification line is used here, but it is also possible Use a non-linear modification line. This is done by using points 1858 and 196 to establish the slope of the line to the right of connection point 1858, and using another reference point of the left to connect point 1858 to establish the slope of the line to the left of connection 1858. You can also add a maximum The positive and negative limits of the percentage tone change. When the tone modification line approaches these limits, it can be approached gradually, or it can change suddenly at the point in contact with the limit. Alternatively, you can use a fixed modification curve, such as curve 1800, and adjust the variable resistance 1666 Until an acceptable recognition rate is reached. 4Hickman200021tw; AND 1P115.TW 70 This paper size applies Chinese national standards (CNS) A4 specification (210 X 297 mm) ------------ install --- (Please read the precautions on the back before filling this page) Order -------- -Li_ 548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the Invention (rjl) Voice Message Transmission Figure 20 illustrates a specific embodiment of the present invention, which is managed according to the emotional characteristics of the voice message Voice message. Job 2000 receives some voice messages transmitted over a telecommunications network. Assignment 2002 stores voice messages on a storage medium, such as the recorder or hard drive mentioned earlier. Assignment 2004 will determine the emotion associated with the voice signal in the voice message. Emotions can be judged using any of the methods mentioned earlier. Then organize the voice message in homework 2006 based on the judged emotions. For example, messages that show negative emotions such as sadness, anger, or fear in the voice can be grouped in the mailbox and / or database. Assignment 2008 has access to organized voice messages. The voice message will follow the call. You can choose to organize voice messages with similar emotions together. You can also choose to organize voice messages immediately after receiving a call. It is best to identify a way to organize your voice messages to assist in accessing your organization's voice messages. In addition, it is also best to judge at least one feature from the voice message according to the previous discussion. In the exemplary embodiment of the exemplary voice message transmission system made according to the present invention, the tones and LPC parameters (usually also including other excitation information) are encoded for transmission and / or storage, and then decoded to provide and original speech input. Similar answer. 4Hickman200021tw: AND1P115.TW 71 This paper size is applicable to China National Standard (CNS) A4 specification (210 X 297 mm) ----------- · Installation -------- Order --- ------ (Please read the precautions on the back before filling out this page) 548631 A7 B7 Printed by the Consumers' Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs V. Description of the invention (f / >) This invention is particularly relevant for analysis Or linear predictive coding (LPC) systems (and methods) that encode human speech signals. In the LPC model, each sample in the series of samples is usually modeled (simplified model) as a linear combination of the previous samples plus an excitation function:
Sk=t^k-J+uk 7=1 其中的Uk是LPC殘留訊號。也就是,Uk代表LPC模型未 預測到的輸入話語訊號裡的殘留資訊。這裡只使用N個 先前的訊號進行預測。可以增加模型順序(通常是10個 左右)以提供較佳的預測,但是在一般話語模型的應用 中,一定會有一些資訊保存在殘留訊號uk中。 LPC模型的一般框架中,可以選取許多特定的語音分析 具體實現。其中許多必須判斷輸入話語訊號的聲調。也就 是,人類語音除了事實上和聲道共鳴一致的共振峰頻率以 外,還包含了由說話者調變,與喉頭調整氣流的頻率對應 的聲調。也就是說,人類發聲可以視爲套用至聽覺被動性 濾波器的激發功能,此激發功能通常會出現在LPC殘留 功能中,而被動性聽覺濾波器的特性(口腔、鼻腔、胸腔 等的共鳴特性)則由LPC參數塑型。在發出無聲音時, 激發功能的聲調並不明確,其最佳化模型爲寬頻白色噪音 或粉紅噪音。 4Hickman200021tw; AND1P115.TW 72 ^紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 裝--------訂---------· (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 五、發明說明(rj3) 估§十聲g周期間並不全然是件小事,會遇到許多問題。其中 一點就是第一共振峰通常是出現在接近聲調頻率的地方。 因此’聲調估計往往是利用LPC殘留訊號執行,因爲LPC 估計程序事實上就是從激發資訊中展開聲道共鳴,讓殘留 訊號包含相對較少的聲道共鳴(共振峰),包含相對較多 的激發資訊(聲調)。但是,這種殘留式聲調估計技術本 身有其困難之處。LPC模型本身通常會在殘留訊號中導 入高頻噪音,此高頻噪音各部分的頻譜密度比應偵測的實 際聲調還要高。此一難處的解決方案之一就是在1000 Hz 左右對殘留訊號進行低通濾波。這可以除去高頻噪音,不 過同時也會除去發音中無聲區域存在的高頻能量,使殘留 訊號在決定發音的應用上幾乎毫無用處。 語音訊息傳送應用的首要原則是播放的語音品質,先前技 術系統在這方面有許多缺點,尤其是,其中許多都是無法 準確地偵測輸入語音訊號中的聲調和發音問題。 通常很可能將聲調期間錯誤估計成實際値的兩倍或一半。 例如,採用關聯法時,如果期間P相關性良好,期間2P 的相關性也一定很好,而且訊號在期間P/2也會有良好的 相關性。不過,這種雙倍或減半錯誤會使音質退化,很令 人困擾。例如,聲調期間錯誤減半往往會造成刺耳的吱吱 聲’而錯誤加倍往往會造成粗糙的聲音。此外,聲調期間 加倍或減半很可能是間歇性發生,使得合成的語音很容易 4Hickman200021tw; AND1P115.TW 73 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝 (請先閱讀背面之注意事項再填寫本頁) ----訂--------- 經濟部智慧財產局員工消費合作社印制衣 經濟部智慧財產局員工消費合作社印製 548631 a? B7 五、發明說明(啊) 出現間歇性沙啞或磨擦的情形。 本發明採用適應性濾波器過濾殘留訊號。使用在第一個反 射係數(語音輸入的kO有單極點的時間變化濾波器除去 語音中有聲期間裡的高頻噪音,但是會保留無聲語音期間 裡的高頻資訊。然後再使用經過調適濾波的殘留訊號作爲 聲調決定的輸入。 要做出較好的有聲/無聲決定必須保留無聲語音期間內的 高頻資訊。也就是,通常如果找不到強力的聲調,就會判 斷這是「無聲」發音,也就是殘留訊號沒有相關遲滯時, 會提供很高的正常化相關値。但是,如果只測試無聲發音 期間裡殘留訊號的低通濾波部分,此殘留訊號的部分片段 會呈現假性的相關性。其危險之處在於,以往科技的固定 低通濾波器產生的截切殘留訊號並未包含能可靠地顯示無 聲期間內沒有相關性存在的資料,必須借助無聲期間高頻 能量提供的額外頻寬可靠地排除可能發現的假性相關性遲 滯。 改進聲調和發音決定對於語音訊息傳送系統特別重要,對 於其他應用環境當然也有好處。例如,採用聲調資訊的文 書辨識系統一定要有一個良好的聲調估計程序。同樣地, 聲調資訊有時候也會用於確認說話者,尤其是在高頻資訊 會部分流失的電話線上。此外,我們也希望以後的辨識系 4Hickman200021tw; AND IP 115.TW 74 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------Aw ^--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 A7 ______ B7 五、發明說明(#) 統裡能夠考慮到聲調代表的句法資訊。同樣地,有些先進 的語音辨識系統’例如語音輸入文字系統,也希望有良好 的發音分析能力。 V第一個反射係數h大約是和高/低頻能量比及訊號有關。 請篸閱 R· J· McAulay 的"Design of a Robust Maximum Likelihood Pitch Estimator for Speech and Additive Noise,f, Technical Note,1979--28,Lincoln Labs, June 11,1979,此 處提到只是作爲參考。h接近-1時,訊號中的低頻能量 比高頻能量多,h接近丨時,情形正好相反。因此,使用 h決定單極解強調濾波器的極點時,有聲發音期間的殘留 訊號會經過低通濾波,而無聲發音期間的的殘留訊號會經 過高通濾波。也就是說,計算有聲期間的聲調時會排除共 振峰頻率,而無聲期間內的必要高頻寬資訊則會保留,才 能準確地偵測出沒有聲調相關性存在的情形。 最好是使用後處理動態程式設計技巧以提供最佳的聲調値 和最佳的發音決定,也就是,一個個地追蹤每個訊框中的 聲調和發音,累積各個軌跡的連串訊框聲調/發音決定的 累計損失,找出提供最佳聲調與發音決定的軌跡。累計損 失是利用訊框錯誤會延續到下一個訊框的特性取得。訊框 錯誤不只會造成逐個訊框的聲調期間大量偏差’也會造成 聲調的相關性「良好」相對較差的情形,並且在前後訊框 相對變化較少的頻譜中造成發音決定的變化。最後一項訊 4Hickman200021tw; AND1P115.TW 75 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 裝 11111111 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(u) 框轉變錯誤的特性會強迫發音轉變朝向最大頻譜變化點發 展。 本發明的語音訊息傳送系統包含爲LPC分析區塊提供一 個以時序si顯不的話語輸入訊號。可以利用多種傳統技 術執行LPC分析,最後產生的是一組LPC參數和殘留訊 號W。許多著名的參考裡都提到LPC分析及摘取LPC參 數的各種方法的背景,包括Markel and Gray的Linear Prediction of Speech (1976)和 Rabiner 與 Schafer 的 Digital Processing of Speech Signals (1978)以及其中引用的參 考,此處提到只是作爲參考。 於目前的較佳具體實施例中,在8 KHz頻率處以16位元 的精確度取樣類比話語波形,產生輸入時序Si。當然,本 發明完全不受使用的取樣率或精確度限制,可以適用於以 任何速率取樣的話語,以及任何等級的精確度。 於目前的較佳具體實施例中,使用的LPC參數集包括許 多反射係數h,同時使用十階LPC模型(也就是只摘取 反射係數k!至k1Q ’不摘取更筒的係數)。但是,如同熟 習本技藝之人士所熟知者,也可以使用其他模型或其他同 等的LPC參數集,例如,可以使用LPC預測器係數~或 脈衝響應估計ek。不過,最方便的還是反射係數k/ 4Hickman200021tw; AND1P115.TW 76 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 訂--- 經濟部智慧財產局員工消費合作社印剩取 548631 A7 B7 五、發明說明) 於目前的較佳具體實施例中,是根據Leroux-Gueguen程 序摘取反射係數,此程序的說明可參考kiEE Transactions on Acoustics, Speech and Signal Processing, p. 257 (June 1977)等文件,此處提到只是作爲參考。不過,熟練的技 術人員也可以使用其他常見的演算法,例如Durbin演算 法,來計算係數。 LPC參數計算通常會附帶產生殘留訊號Uk。但是,如果 使用不會自動產生副產品uk的方法計算參數,只要利用 LPC參數架構有限脈衝響應數位濾波器即可,濾波器會 直接利用輸入時序sk計算殘留訊號uk。 殘留訊號時序uk接著再經過以目前訊框中LPC參數爲基 礎,非常簡單的數位濾波作業。亦即,話語輸入訊號Sk 將是某個取樣率,例如8 KHz,下的時序,每個樣本會變 更一次時序値。不過,LPC參數通常只會以100 Hz的訊 框頻率在每個訊框期間重新計算一次。殘留訊號nk的期 間也等於取樣期間。所以,以LPC參數爲基礎的數位濾 波器値在每個殘留訊號ixk中最好不能調整。於目前的較 佳具體實施例中,在新的LPC參數値產生之前,殘留訊 號時序uk約有80個値通過濾波器14,因此會實行濾波器 14的新特性。 更明確地說,第一反射係數h是從LPC分析部分12提 4Hickman200021tw; AND1P115.TW 77 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注咅?事項再填寫本頁) 裝---- tr--------- 經濟部智慧財產局員工消費合作社印製 A7 B7 548631 五、發明說明(Γ^) 供的LPC參數集中摘取。LPC參數本身是反射係數h, 只需要查詢第一個反射係數h。但是,使用到其他LPC 參數時,產生第一次反射係數的參數變化通常非常簡單, 例如 k^a^ao 雖然本發明較佳地使用第一個反射係數定義單極調適濾波 器,但是本發明的應用範圍並不只限於此主要的較佳具體 實施例。也就是,濾波器不一定要單極濾波器,可以架構 成更複雜的濾波器,具有一或數個極點及/或一或數個零, 其中部分或全部可以根據本發明調整變化。 另外也要注意,調適濾波器的特性不必由第一個反射係數 h決定。我們知道還有其他許多同等的LPC參數集,其 他LPC參數集裡的參數也能夠提供所要的濾波特性。尤 其是任何LPC參數集中,順序最低的參數通常可能提供 整體頻譜形狀的相關資訊。因此,根據本發明決定的調適 濾波器可以使用31或定義極點,可以是單極或多極, 也可以單獨使用或配合其他零及/或極點使用。此外,由 LPC參數調適定義的極點(或零)不一定要完全和該參數 一致,如同本較佳具體實例中的一樣,其大小或相位可以 改變。 4Hickman200021tw; AND1P115.TW 78 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) --------^------— (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(Μ) 因此,單極調適濾波器會過濾殘留訊號時序Uk以產生濾 波後的時序U’k。前面討論過,此濾波後時序u’k的高頻能 量在有聲發音節段會大幅減少,但是在無聲發音節段裡幾 乎可以保留全部的頻寬。此濾波後殘留訊號U’k接著再接 受進一步處理,以摘取候選聲調和發音決定。 有許多方法可以從殘留訊號中摘取聲調資訊,這些方法都 可以使用。前面參考提到的Markel和Gray合著的書籍中 介紹了其中許多方法。 於目前的較佳具體實施例中,是藉由找出濾波後殘留訊號 的正常化相關性功能裡的峰値以取得候選聲調,其定義如 下: ^ ujuj - kSk = t ^ k-J + uk 7 = 1 where Uk is the LPC residual signal. That is, Uk represents the residual information in the input speech signal that was not predicted by the LPC model. Here only N previous signals are used for prediction. The order of the models (usually around 10) can be increased to provide better predictions, but in the application of general discourse models, some information must be stored in the residual signal uk. In the general framework of the LPC model, many specific speech analysis implementations can be selected. Many of them must judge the tone of the input speech signal. That is, in addition to the formant frequency that is actually consistent with the resonance of the vocal tract, human speech also includes a tone that is tuned by the speaker and corresponds to the frequency of the laryngeal adjustment of the airflow. In other words, human vocalization can be regarded as the excitation function applied to the auditory passive filter. This excitation function usually appears in the LPC residual function, and the characteristics of the passive auditory filter (resonance characteristics of the oral cavity, nasal cavity, and chest cavity, etc.) ) Is shaped by LPC parameters. When there is no sound, the tone of the excitation function is not clear, and its optimization model is broadband white noise or pink noise. 4Hickman200021tw; AND1P115.TW 72 ^ The paper size is applicable to the Chinese National Standard (CNS) A4 (210 X 297 mm). -------- Order --------- (Please read the back first Please pay attention to this page before filling in this page) 548631 A7 B7 V. Description of invention (rj3) It is estimated that the period of ten sounds and g weeks is not a trivial matter and many problems will be encountered. One of them is that the first formant usually appears near the tone frequency. Therefore, the tonal estimation is often performed using the LPC residual signal, because the LPC estimation process is actually expanding the channel resonance from the excitation information, so that the residual signal contains relatively few channel resonances (formants) and contains relatively more excitations. Information (tone). However, this residual tone estimation technique has its own difficulties. The LPC model itself usually introduces high-frequency noise into the residual signal. The spectral density of each part of this high-frequency noise is higher than the actual tone to be detected. One of the difficult solutions is to low-pass filter the residual signal around 1000 Hz. This removes high-frequency noise, but at the same time removes high-frequency energy present in the silent area of the pronunciation, making the residual signal almost useless for applications that determine pronunciation. The overriding principle of voice messaging applications is the quality of the voice played. Prior art systems had many shortcomings in this regard. In particular, many of them were unable to accurately detect tone and pronunciation problems in input voice signals. It is often possible to erroneously estimate the duration of the tone to be twice or half the actual chirp. For example, when using the correlation method, if the correlation of period P is good, the correlation of period 2P must also be good, and the signal will have a good correlation during period P / 2. However, this double or halving error can degrade the sound quality and is very disturbing. For example, halving errors during tones often results in harsh creaks' and doubling errors often results in rough sounds. In addition, the doubling or halving during the tone period is likely to occur intermittently, making the synthesized speech easy 4Hickman200021tw; AND1P115.TW 73 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) ---- ------- Installation (please read the precautions on the back before filling this page) ---- Order --------- Intellectual Property of the Ministry of Economic Affairs Bureau Employee Consumption Co-operative Printing Clothing Intellectual Property of the Ministry of Economic Affairs Printed by the Bureau's Consumer Cooperatives 548631 a? B7 5. Description of the Invention (ah) Intermittent husky or friction occurs. The present invention uses an adaptive filter to filter the residual signal. Use the first reflection coefficient (the kO of the speech input has a single-pole time-varying filter to remove high-frequency noise during the vocal period of the speech, but retain the high-frequency information during the silent period. Then use the adjusted filter Residual signals are used as input for tonal decisions. To make better vocal / silent decisions, high frequency information must be preserved during periods of silent speech. That is, if a strong tone is not found, it is usually judged to be a "silent" pronunciation That is, when the residual signal has no correlation hysteresis, it will provide a high normalized correlation. However, if only the low-pass filtered part of the residual signal during the silent period is tested, some fragments of the residual signal will show a false correlation. The danger is that the clipped residual signal generated by the fixed low-pass filter of the previous technology does not contain data that can reliably show that there is no correlation during the silent period, and must rely on the extra bandwidth provided by the high-frequency energy during the silent period. Reliably exclude false correlation lags that may be found. Improved pitch and pronunciation decisions for speech The information transmission system is particularly important, and of course it is also beneficial to other application environments. For example, a document recognition system using tone information must have a good tone estimation process. Similarly, tone information is sometimes used to identify the speaker, especially On the telephone line where high-frequency information will be partially lost. In addition, we also hope that the identification system in the future is 4Hickman200021tw; AND IP 115.TW 74 This paper size applies to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) --- -------- Aw ^ -------- Order --------- (Please read the notes on the back before filling this page) 548631 A7 ______ B7 V. Description of the invention ( #) The system can take into account the syntactic information represented by the tone. Similarly, some advanced speech recognition systems, such as speech input text systems, also hope to have good pronunciation analysis capabilities. V The first reflection coefficient h is about and high / The low frequency energy ratio is related to the signal. Please read " Design of a Robust Maximum Likelihood Pitch Estimator for Speech and Additive Noise, f, Technical Note, 1979--28, Lincoln Labs, June by R.J. McAulay 11, 1979, mentioned here for reference only. When h is close to -1, the low-frequency energy in the signal is more than the high-frequency energy. When h is close to 丨, the situation is reversed. Therefore, using h to determine the unipolar solution At the pole, the residual signal during vocalization will be low-pass filtered, and the residual signal during silent vocalization will be high-pass filtered. That is to say, the formant frequency will be excluded when calculating the tone during vocalization, and it is necessary during the silent period. High-bandwidth information is retained to accurately detect situations where there is no tone correlation. It is best to use post-processing dynamic programming techniques to provide the best pitch and best pronunciation decisions, that is, to track the pitch and pronunciation of each frame one by one, accumulating a series of frame pitches for each track The cumulative loss of / voice decisions, find the trajectory that provides the best tone and pronunciation decisions. Cumulative loss is obtained by using the feature that frame errors will continue to the next frame. Frame errors not only cause a large amount of deviation during the tone period from frame to frame ', but also cause the situation that the correlation of the tones is "good" and relatively poor, and cause changes in pronunciation decisions in the frequency spectrum with relatively little change in the front and back frames. The last news 4Hickman200021tw; AND1P115.TW 75 This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page). 11111111 Employees of Intellectual Property Bureau, Ministry of Economic Affairs Printed by the Consumer Cooperative 548631 A7 B7 V. Description of the Invention (u) The characteristics of the wrong frame transition will force the pronunciation transition towards the point of maximum spectrum change. The voice message transmission system of the present invention includes providing a speech input signal that is displayed in time sequence si for the LPC analysis block. A variety of conventional techniques can be used to perform the LPC analysis. The result is a set of LPC parameters and residual signal W. Many well-known references refer to the background of various methods for LPC analysis and extraction of LPC parameters, including Markel and Gray's Linear Prediction of Speech (1976) and Rabiner and Schafer's Digital Processing of Speech Signals (1978) and the references cited therein. Reference, mentioned here for reference only. In the presently preferred embodiment, the analog speech waveform is sampled at a frequency of 8 KHz with 16-bit accuracy to generate the input timing Si. Of course, the present invention is completely independent of the sampling rate or accuracy used, and can be applied to utterances sampled at any rate, and to any level of accuracy. In the presently preferred embodiment, the LPC parameter set used includes many reflection coefficients h, while a tenth-order LPC model is used (that is, only reflection coefficients k! To k1Q are extracted, and no more coefficients are extracted). However, as is familiar to those skilled in the art, other models or other equivalent LPC parameter sets can also be used. For example, LPC predictor coefficients ~ or impulse response estimation ek can be used. However, the most convenient is the reflection coefficient k / 4Hickman200021tw; AND1P115.TW 76 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) (Please read the precautions on the back before filling this page) Order- -Intellectual Property Bureau, Ministry of Economic Affairs, Employee Cooperatives, India, 548631 A7 B7 5. Invention description) In the current preferred embodiment, the reflection coefficient is extracted according to the Leroux-Gueguen program. For the description of this program, refer to kiEE. on Acoustics, Speech and Signal Processing, p. 257 (June 1977) and other documents, mentioned here for reference only. However, skilled technicians can also use other common algorithms, such as Durbin's algorithm, to calculate the coefficients. The calculation of LPC parameters is usually accompanied by a residual signal Uk. However, if the parameter is calculated using a method that does not automatically generate the by-product uk, as long as the finite impulse response digital filter is constructed using the LPC parameter, the filter will directly calculate the residual signal uk using the input timing sk. The residual signal timing uk then goes through a very simple digital filtering operation based on the LPC parameters of the current frame. That is, the input signal Sk will be at a certain sampling rate, such as 8 KHz, and each sample will change its timing once. However, LPC parameters are usually recalculated only once during each frame at a frame frequency of 100 Hz. The period of the residual signal nk is also equal to the sampling period. Therefore, the digital filter based on the LPC parameter is best not adjusted in each residual signal ixk. In the presently preferred embodiment, before the new LPC parameter 値 is generated, about 80 residual signal timings uk pass through the filter 14, so the new characteristics of the filter 14 will be implemented. More specifically, the first reflection coefficient h is 4Hickman200021tw; AND1P115.TW 77 from the LPC analysis section 12. This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) (please read the note on the back first) ? Please fill in this page again) Install ---- tr --------- Printed by the Consumers' Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 548631 V. Extraction of the LPC parameters provided by the description of the invention (Γ ^) . The LPC parameter itself is the reflection coefficient h, and only the first reflection coefficient h needs to be queried. However, when other LPC parameters are used, the parameter change that produces the first reflection coefficient is usually very simple, such as k ^ a ^ ao. Although the present invention preferably uses the first reflection coefficient to define a single-pole adaptive filter, the present invention The scope of application is not limited to this main preferred embodiment. That is, the filter does not have to be a single-pole filter, and can be constructed as a more complex filter with one or several poles and / or one or several zeros, some or all of which can be adjusted and changed according to the present invention. Also note that the characteristics of the adaptive filter need not be determined by the first reflection coefficient h. We know that there are many other equivalent LPC parameter sets. The parameters in other LPC parameter sets can also provide the required filtering characteristics. Especially in any LPC parameter set, the lowest-order parameter may usually provide information about the overall spectrum shape. Therefore, the adaptive filter determined according to the present invention can use 31 or define poles, which can be single-pole or multi-pole, or can be used alone or with other zero and / or poles. In addition, the poles (or zeros) defined by the LPC parameter adaptation do not have to be completely consistent with this parameter, as in this preferred embodiment, their size or phase can be changed. 4Hickman200021tw; AND1P115.TW 78 This paper size applies to Chinese National Standard (CNS) A4 (210 X 297 mm) -------- ^ -------- (Please read the precautions on the back before (Fill in this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Description of Invention (M) Therefore, the single-pole adaptive filter will filter the residual signal timing Uk to generate the filtered timing U'k. As discussed earlier, the high-frequency energy of the time sequence u′k after this filtering will be greatly reduced in the vocal segment, but almost the entire bandwidth can be reserved in the silent segment. The filtered residual signal U'k is then subjected to further processing to extract candidate tones and pronunciation decisions. There are many ways to extract tone information from the residual signal, and all of these methods can be used. Many of these methods are described in the previously mentioned books by Markel and Gray. In the current preferred embodiment, candidate tones are obtained by finding the peaks in the normalized correlation function of the filtered residual signal, which are defined as follows: ^ ujuj-k
Ck: 1/2 1/2 forkm <kr Σ Σ k -----------.裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印制衣 u’j是濾波後的殘留訊號,kmin和kmax定義了相關性遲滯k 的邊界,m是一個訊框期間中的樣本數(於較佳具體實施 例中是80),定義了要建立關聯的樣本數。候選聲調値是 由遲滯k*在C(k*)的本端最大値時定義,C(k)無向量値 則用於定義每一個候選k*的『良好性」値。Ck: 1/2 1/2 forkm < kr Σ Σ k -----------. Equipment -------- Order --------- (Please read first Note on the back, please fill in this page again.) The printed clothing u'j of the employee's consumer cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs is a filtered residual signal. The number of samples (80 in the preferred embodiment) defines the number of samples to be associated. Candidate tone 値 is defined by the time when hysteresis k * is the largest at C (k *). C (k) has no vector 値 and is used to define the "goodness" of each candidate k *.
4Hickman200021tw; AND1P115.TW 79 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明(π) 可以選擇在良好性量測c(k)中使用臨界値cmin,未超過 臨界値Cmin的本端最大C(k)値將會忽略。如果k*値不會 使c(k*)大於cmin,此訊框一定是無聲音。 也可以不使用良好性臨界値cmin,只要控制正常化自動相 關功能1112以選擇具有最佳良好性値的一些候選値即 可,例如具有最大c(k)値的16個候選聲調期間値。 在一具體實施例中,完全沒有臨界値被加諸於良好性値 C(k)上,這個階段也不會做出發音決定。而是選擇16個 聲調期間値kk'、k*2等及其對應的良好性値(Qk、))。 於目前的較佳具體實施例中,即使所有的C(k)値都極低, 一樣不會在此階段做出發音決定,但是在下面討論的後續 動態程式設計步驟裡會決定發音。 於目前的較佳具體實施例裡,會根據峰値尋找演算找出許 多候選聲調。也就是說,會追蹤「良好性」質C(k)與候 選聲調期間的關係。每一個本端最大値都會看成可能的峰 値。但是,要等到功能找出固定的數量以後,才會確認此 識別的本端最大値有峰値存在。這會確認本端最大値,然 後提供聲調期間候選値之一。以此方式找出每一個波峰候 選値以後,演算系統接著會尋找波谷。也就是會把每一個 本端最小値當成可能的波谷,但是要等到預定的常數値使 函數上升以後,才會確認其爲波谷。波谷並不會單獨決定, 4Hickman200021tw; AND1P115.TW 80 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ------------裝---- (請先閱讀背面之注意事項再填寫本頁) ·11111111 548631 A7 B7 五、發明說明(g() 而是在一個確認的波峰之後一定要有一個確認的波谷,才 能找出下一個新的波峰。於目前的較佳具體實施例中,良 好性値的定義限定爲+ 1或-1,確認波峰或波谷所需的常 數値設定爲0.2,不過可以大幅改變。因此,這個階段提 供0到15的聲調候選値作爲輸出。 於目前的較佳具體實施例中,接著要將前述步驟所提供的 候選聲調期間集提供給動態程式設計演算系統。此動態程 式設計演算系統會追蹤聲調和發音決定,爲其鄰接的前後 內容中每一個最佳訊框提供聲調和發音決定。 有了候選聲調値及其良好値c(k)以後,接著使用動態程 式設計取得包含每一個訊框最佳發音決定的最佳聲調輪 廓。動態程式設計需要先分析一段話語中的許多話語訊 框,才能決定片段中第一訊框的聲調和發音。話語片段的 每一個訊框裡,會將每一個聲調候選値和從前一個訊框保 留的聲調候選値作比較。前面訊框的每一個保留聲調候選 値都含有累計損失,而每一個新聲調候選値和任一個保留 聲調候選値之間的每一次比較也都會有一個新的差距約 數。因此,新訊框中的每一個聲調候選値都會有一個代表 和前面訊框保留的聲調候選値最符合的最小損失。爲每一 個新候選値計算最小累計損失時,和候選値一起保留的還 有其累計損失和指向前面訊框中最佳符合値的向後指標。 向後指標定義了一條曲線,以計畫速率產生一條如最後訊 4Hickman200021tw; AND1P115.TW 81 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注咅?事項再填寫本頁) 裝--------訂--------- 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(J>) 框累計損失値所列的累計損失。選擇累計損失最小的曲線 就可以得到任一指定訊框的最佳曲線。無聲狀態的定義是 每一訊框的聲調候選値。損失函數最好包含發音資訊,使 動態程式設計策略能夠自然產生發音決定。 於目前的較佳具體實施例中,動態程式設計策略寬度16, 深度6。也就是,每一個訊框找出15個(或更少的)候選 値加上「無聲」決定(方便起見,指定爲零聲調期間)作 爲可能的聲調期間,前面6個訊框會保留這全部16個候 選値及其良好性値。 經濟部智慧財產局員工消費合作社印制衣 聲調和發音最後只會以動態程式設計演算系統中包含的最 舊訊框決定。亦即,聲調和發音決定會接受目前曲線成本 最小的FK-5訊框的候選聲調値。也就是在最近訊框?〃結 束的16條(或更少的)曲線裡,訊框FK中累計曲線成本 最低的候選聲調就是最佳的曲線。然後再依照及使用此最 佳曲線決定訊框FK-5的聲調/發音。請注意,接下來的訊 框(Fk-4等)不會對聲調候選値作最後決定,因爲在估計 過更多的訊框後,最佳的曲線已經不再是最佳値了。當然, 熟悉數字最佳化的人都知道,也可以在其他時候,例如緩 衝區中保留的最後第二個訊框,進行動態程式設計演算的 最後決定。除此之外,緩衝區的寬度和深度也可以大幅變 動。例如,可以評估多達64個聲調候選値,也可以只評 估兩個;緩衝區中可以只保留前一個訊框,或者保留前16 4Hickman200021tw; AND1P115.TW 82 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 經濟部智慧財產局員工消費合作社印製 B7 五、發明說明(¾¾) 個以上的訊框,熟習本技藝的人士可以進行其他的修改和 變化。動態程式設計演算是由一個訊框中的聲調期間候選 値和後一個訊框中的其他聲調期間候選値之間的轉變錯誤 定義。於目前的較佳具體實施例裡,此轉變錯誤的定義是 三個部分的總和:聲調偏差造成的錯誤Ep、「良好性」低 的聲調候選値造成的錯誤Es以及發音轉變造成的錯誤 Et〇 聲調偏差錯誤Ep是目前聲調期間和前一聲調期間的函 數,如果兩個訊框都是有聲訊號,其函數式如下:4Hickman200021tw; AND1P115.TW 79 This paper size is applicable to Chinese National Standard (CNS) A4 (210 X 297 mm) 548631 Printed by the Consumers ’Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 V. The description of the invention (π) can be chosen in good quality The critical 値 cmin is used in the measurement of c (k). The maximum C (k) at the local end that does not exceed the critical 値 Cmin will be ignored. If k * 値 does not make c (k *) greater than cmin, this frame must be silent. It is also possible not to use the goodness threshold 値 cmin, as long as the normalization autocorrelation function 1112 is controlled to select some candidate 値 with the best 良好, such as the 16 candidate tone periods 最大 with the largest c (k) 値. In a specific embodiment, there is no criticality at all, which is imposed on goodness C (k), and no pronunciation decision is made at this stage. Instead, choose 16 tone periods 値 kk ', k * 2, etc. and their corresponding goodness 値 (Qk,)). In the presently preferred embodiment, even if all C (k) 値 are extremely low, the pronunciation decision will not be made at this stage, but the pronunciation will be determined in the subsequent dynamic programming steps discussed below. In the presently preferred embodiment, many candidate tones are found based on the peak-to-peak search algorithm. That is, the relationship between the "goodness" quality C (k) and the candidate tone period is tracked. Each local maximum 値 will be regarded as a possible peak 値. However, after the function finds a fixed number, it will confirm that there is no peak at the local end of the recognition. This confirms the largest chirp at the local end and then provides one of the candidate chirps during the tones. After finding each crest candidate in this way, the calculation system then looks for troughs. That is to say, each local minimum is regarded as a possible trough, but it will not be confirmed as a trough until the function is raised by a predetermined constant. Bottom will not decide individually, 4Hickman200021tw; AND1P115.TW 80 This paper size applies to China National Standard (CNS) A4 specifications (210 X 297 mm) ------------ installation ---- ( Please read the notes on the back before filling this page) · 11111111 548631 A7 B7 V. Description of the invention (g () But after a confirmed peak, there must be a confirmed trough to find the next new peak. In the current preferred embodiment, the definition of goodness 値 is limited to +1 or -1, and the constant 确认 required to confirm the peak or trough is set to 0.2, but it can be greatly changed. Therefore, this stage provides 0 to 15 Tone candidate 値 is used as an output. In the current preferred embodiment, the candidate tone period set provided in the previous step is then provided to the dynamic programming algorithm system. This dynamic programming algorithm system tracks the tone and pronunciation decisions for Each adjacent frame in its adjacent context provides tonal and pronunciation decisions. With candidate tones 値 and its good 値 c (k), then dynamic programming is used to obtain the most inclusive of each frame. The best tone contour determined by pronunciation. Dynamic programming needs to analyze many speech frames in a discourse before determining the tone and pronunciation of the first frame in the segment. In each frame of the discourse segment, each tone will be The candidate tone is compared with the tone candidate reserved from the previous frame. Each retained tone candidate in the previous frame contains a cumulative loss, and every comparison between each new tone candidate and any reserved tone candidate. There will also be a new gap divisor. Therefore, each tone candidate in the new frame will have a representative minimum loss that corresponds to the tone candidate reserved in the previous frame. Calculate the minimum cumulative loss for each new candidate. At the same time, with the candidate 値, there is also a backward indicator of its cumulative loss and pointing to the best matching 値 in the previous frame. The backward indicator defines a curve to generate a curve at the planning rate as the last message 4Hickman200021tw; AND1P115.TW 81 copies Paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the note on the back first) Please fill in this page for the matters) Packing -------- Order --------- Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Description of Invention (J >) Cumulative Loss in the Box The cumulative loss listed in 値. Select the curve with the smallest cumulative loss to get the best curve for any given frame. The silent state is defined as the tone candidate for each frame. The loss function preferably contains pronunciation information to make the dynamic The programming strategy can naturally generate pronunciation decisions. In the current preferred embodiment, the dynamic programming strategy has a width of 16 and a depth of 6. That is, each frame finds 15 (or fewer) candidates. The "silent" decision (designated as a zero-tone period for convenience) as a possible tone period, the first six frames will retain all 16 candidates (and their goodness). Printing of clothing by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs The tone and pronunciation will only be determined by the oldest frame included in the dynamic programming algorithm system. That is, the tones and pronunciation determine the candidate tones that will accept the FK-5 frame with the smallest current curve cost. That is, in the recent frame? 16 Of the 16 (or fewer) curves ending, the candidate tone with the lowest cumulative curve cost in frame FK is the best curve. Then follow and use this best curve to determine the tone / pronunciation of frame FK-5. Note that the next frame (Fk-4, etc.) will not make a final decision on the tone candidate, because after estimating more frames, the best curve is no longer the best. Of course, anyone who is familiar with digital optimization knows that at other times, such as the last second frame retained in the buffer area, the final decision of the dynamic programming calculation can be made. In addition, the width and depth of the buffer can also vary significantly. For example, you can evaluate up to 64 tone candidate chirps, or you can only evaluate two; only the previous frame can be kept in the buffer, or the first 16 can be kept. 4Hickman200021tw; AND1P115.TW 82 This paper standard applies to Chinese National Standards (CNS) A4 specifications (210 X 297 mm) 548631 A7 Printed by B7 of the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs. 5. Description of the invention (¾¾) More than one frame. Those skilled in the art can make other modifications and changes. The dynamic programming algorithm is defined by a transition error between the candidate period 値 in one frame and the candidate period 其他 in the other frame. In the presently preferred embodiment, the definition of this transition error is the sum of three parts: the error Ep caused by the pitch deviation, the error Es caused by the "good" low pitch candidate, and the error Et caused by the pronunciation transition. The tone deviation error Ep is a function of the current tone period and the previous tone period. If both frames have an audio signal, the function formula is as follows:
Ad+ Bp ,tau In tdUp Ep - mm Ad+ βρ ,tau In t(XUp + Bp\n2 > Ad+ Bp Λ tau In 、taup + ln(l/2) 否則Ep=BP乘上DN; tau是目前訊框的候選聲調期間,taup 是要計算其轉變錯誤的前一個訊框的保留期間,BP、Ad 及DN都是常數。要注意到,最小函數包括聲調期間加倍 和聲調期間減半的假設。本發明中不一定要有此一假設, 不過我們相信此一假設會有所幫助。當然,三倍聲調期間 等也可以包含類似的假設。 4Hickman200021tw; AND1P115.TW 83 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明 發音狀態錯誤Es是考量的目前訊框聲調候選値的「良好 性」値C(k)的函數。無聲候選値一定會包含在每一個訊 框中要考量的16個或更少的聲調期間候選値,其良好性 値C(k)設定爲等於同一訊框中其他所有15個聲調期間 候選値的最大C(k)。目前候選値是有聲時,發音狀態錯 誤 Es 是 ES=BS(RV-C(tan),否則就是 Es=Bs(C(taU)-Ru), 其中C(taii)是與目前聲調候選値對應的「良好性」値,Ad + Bp, tau In tdUp Ep-mm Ad + βρ, tau In t (XUp + Bp \ n2 > Ad + Bp Λ tau In, taup + ln (l / 2) otherwise Ep = BP times DN; tau is the current frame In the candidate tone period, taup is the retention period of the previous frame whose transition error is to be calculated, and BP, Ad, and DN are all constants. Note that the minimum function includes the assumption that the tone period is doubled and the tone period is halved. The present invention It is not necessary to have this assumption in China, but we believe that this assumption will help. Of course, similar assumptions can also be included in the triple tone period, etc. 4Hickman200021tw; AND1P115.TW 83 This paper standard applies Chinese National Standard (CNS) A4 specification (210 X 297 mm) ----------- install -------- order --------- (Please read the precautions on the back before filling in this Page) Printed by the Consumers ’Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 5. The invention states that the pronunciation error Es is a function of the“ goodness ”値 C (k) of the current frame tone candidate 考. The silent candidate 値 will definitely Contains 16 or less tone period candidates to be considered in each frame, its goodness C (k) is set equal to the maximum C (k) of candidate chirps during all 15 other tones in the same frame. When the candidate chirp is voiced, the pronunciation state error Es is ES = BS (RV-C (tan), otherwise Is Es = Bs (C (taU) -Ru), where C (taii) is the "goodness" corresponding to the current tone candidate 値,
Bs、、及Ru都是常數。 發音轉變錯誤Ετ是以頻譜差異約數T定義。頻譜差異約 數T大致上爲每一個訊框定義了其頻譜與接收訊框頻譜 的差異。顯然有許多定義可用於這類頻譜差異約數,在目 前的較佳具體實施例中的定義如下: Γ= log τγ + Σ Lp(N))Bs, and Ru are constants. Pronunciation transition error Eτ is defined by the spectral difference approximation T. The spectrum difference approximation T defines the difference between the spectrum of each frame and the spectrum of the receiving frame for each frame. Obviously there are many definitions that can be used for this kind of spectral difference divisor. The definition in the current preferred embodiment is as follows: Γ = log τγ + Σ Lp (N)
\ V hpj) N E是目前訊框的RMS能量,EP是前一訊框的能量,L(N)是 目前訊框第N個對數區比率,LP (N)是前一訊框的第N 個對數區比率。對數區比率L(N)是直接利用第N個反射 係數kN計算,公式如下: 4Hickman200021tw; AND1P115.TW 84 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂--------- ^•1! (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 發聲轉變錯誤Ετ是頻譜差異近似値T的函數,定義如下: 目即和前一訊框都是無聲,或者都是有聲時,Ετ設爲=〇 ; 否則Et=Gt+At/T,其中Τ是目前訊框的頻譜差異近似 値。發音轉變錯誤的定義也可以有很大的變化。此處定義 的發音轉變錯誤主要特性是,只要發音狀態改變(有聲變 無聲,無聲變有聲),就會估算損失,也就是兩個訊框間 頻譜差異的遞減函數。也就是,除非同時發生明顯的頻譜 變化,否則發音狀態變化並不受歡迎。 這樣的發音轉變錯誤定義可以減少提供良好發音狀態決定 所需的處理時間,於本發明中提供相當顯著的好處。 於目前的較佳具體實施例中構成轉變錯誤的其他錯誤Es 和EP也可以做各種定義。也就是,可以任何方式定義發 音狀態錯誤,只要讓與目前訊框中的資料符合良好的聲調 期間蓋過資料符合不良者即可。同樣地,也可以任何方式 定義和聲調期間變化一致的聲調偏差錯誤EP。雖然有加 倍和減半的準備最好,但是聲調偏差錯誤不必包含加倍和 減半的準備。 本發明還有一項選擇性的特徵,就是聲調偏差錯誤中如果 包含了追蹤加倍和減半的準備,最好在找出最佳曲線後, 4Hickman200021tw; AND1P115.TW 85 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 裝 ·11111111 #. 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(§ς) 沿著最佳曲線將聲調期間加倍(或減半),使其儘可能保 持一致。 另外也要注意,從轉變錯誤中找出的三項成分不一定要全 部使用。例如,前面的階段如果過濾出有「良好性」値較 低的聲調假設,或者以「良好性」値作聲調期間等級排序, 選擇良好性値較高的聲調期間時,可以省略發音狀態錯 誤。同樣地,也可以視需要在轉變錯誤定義中包含其他元 件。 另外請注意,本發明所教示的動態程式設計法並不需要套 用至從調適後濾波殘留訊號中摘取的聲調期間候選値,也 不必套用至從LPC殘留訊號衍生的聲調期間候選値,不 過可以套用至任何的聲調期間候選値,包括直接從原始輸 入語音訊號中摘取的聲調期間候選値。 然後再將這三種錯誤加起來,提供目前訊框中某個聲調候 選値和前一個訊框中某個聲調候選値之間的總錯誤。前面 提過,這些轉變錯誤要再累計相加,提供動態程式設計演 算中每一個曲線的累計損失。 這種同時找出聲調和發音的動態程式設計法本身較奇特, 只有結合此尋找聲調期間候選値的方法時才使用。任何尋 找聲調期間候選値的方法都可以和此奇特的動態程式設計 4Hickman200021tw; AND1P115.TW 86 本纸張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ' " "" ^^裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(Μ) 演算結合使用。不論使用哪一種方法尋找聲調期間候選 値,候選値都只是提供作爲動態程式設計演算的輸入。 本發明的較佳具體實施例採用的是小型電腦和高精確度取 樣,所以此系統在大型應用方面並不經濟。因此,在未來 實行本發明的較佳模式應該是使用微電腦式系統,例如ΤΙ 專業電腦,的具體實施例。此專業電腦搭配麥克風、喇叭 以及包含TMS 320數字處理微處理器和資料轉換器的語 音處理卡,就是足以實行本發明的硬體。 資料存取的語音式身分鑑證 y圖21說明一本發明的具體實施例,其利用語音識別以允 許使用者存取網路上的資料。當使用者要求存取資料,例 如網站時,作業2100會提示使用者提供語音樣本。於作 業2102中,透過網路接收使用者的語音樣本。於作業 2104 ’擷取使用者的註冊資訊。要注意的是,資訊可能是 從本端儲存裝置擷取或透過網路擷取。註冊資訊中包含了 使用者語音的語音掃描。取得使用者的語音樣本後,會在 作業2106比較語音樣本和註冊資訊中的語音掃描,以驗 證使用者的身份。作業2106會在後面詳細討論。在作業 2106裡驗證通過使用者的身份以後,將會在作業2108授 予使用者資料存取權限。如果作業2106驗證的結果,使 用者身份不正確,將會在作業2110裡拒絕其存取資料。 4Hickman200021tw: AND IP 115.TW 87 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----11---· 11-----訂—---丨— 丨丨-_ (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印制取 548631 A7 B7 五、發明說明(g?) 此具體實施例在電子商務領域特別實用,可以免除鑑定認 證和發行認證的第三方。有關執行上述作業的程序和裝置 的詳細資訊,請閱讀下面的說明並參考圖22-27和29-34。 於本發明的一個具體實施例中,錄製使用者的語音以建立 語音掃描,並且儲存此語音掃描。這可以是註冊程序的一 部分。例如,當在註冊程序中提示使用者提供語音時,使 用者可以對著連接至其電腦的麥克風講話,產生的語音資 料會透過網路,例如網際網路,傳送至一網站儲存以供曰 後驗證程序的擷取。然後,等到使用者要存取網站或網站 的特定部分時,會被提示提供語音樣本,此樣本會被接收, 並和儲存在網站中的語音資料比對。作爲一種選擇,語音 掃描可以包含使用者的密碼。 較佳者,語音掃描中包含使用者所說的多個詞語以增加安 全性,於這樣的一個具體實施例中,語音掃描裡可以,舉 例而言,儲存多組密碼,使用者必須爲所有的密碼提供語 音樣本。或者,不同的存取層次或不同部分的資料可以要 求不同的詞語。也可以利用不同的詞語作爲瀏覽控制,例 如將詞語和網站中特定的網頁相聯結,使用者會被提示輸 入密碼,再根據收到的密碼顯示與該密碼相關的網站頁。 允g午語音掃描中包含數個詞語也允許利用比較替換詞語的 方式驗證身份,例如第一個詞語若是無法確定使用者的身 4Hickman200021tw; AND1P115.TW 〇〇 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(q) 份,可以提示使用者再唸出其他的詞語。例如,使用者的 語音樣本和語音掃描非常接近,但是兩者的差異在預定的 臨界値之上時,可以要求使用者再唸出另一個詞語,再用 於驗證使用者的身份。這可以讓使用者有多次機會可以存 取資料,萬一使用者語音因爲感冒等情形而稍微改變時, 此特性也特別實用。可以選擇性地記錄使用者的語音樣本 及/或接收語音樣本的時間和日期。 參考圖21的作業2106,本發明的示範性具體實施例是建 立說話者正身份或負身份的一種系統和方法,其中至少採 用兩種不同的語音鑑證裝置,且可用於監督存取安全系統 的管制權限。特別是可以使用本發明提供錯誤接受率和錯 誤拒絕率都非常低的語音鑑證。 此處所指的「安全系統」是只允許擁有授權的個人存取或 使用的任何網站、系統、裝置等,使用者每次要存取或使 用系統或裝置時,都要得到鑑證或識別肯定授權。 參考圖面和所附解說可以更淸楚了解根據本發明所實行的 語音鑑證系統和方法的原理及作業。 接著請繼續參考圖式,圖22說明用以控制存取安全系統 的語音鑑證系統的基本觀念。 4Hickman200021tw; AND1P115.TW 89 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) --------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 A7 經濟部智慧財產局員工消費合作社印製 Β7 五、發明說明(^0) 說話者2220同時或依序和安全系統2222及安全中心2224 通訊。說話者2220的語音由安全中心2224分析以進行驗 證,如果安全中心2224建立了正面的鑑證,將會傳送通 訊指令至安全系統2222、建立如2226所顯示的說話者 2220的正確識別碼(ID)、並且允許說話者2220存取安 全系統2222。 圖22的先前技術系統採用單一的語音鑑證演算系統,因 此,這個系統會遇到前面所提的錯誤接受和錯誤拒絕率彼 此消長的問題,產生太高的錯誤接受率及/或錯誤拒絕率, 分別導致系統不夠安全及/或沒有效率。 本發明是利用至少兩種不同的語音鑑證演算系統以建立說 話者身份的一套系統和方法。選取兩種大爲不同的語音鑑 證演算系統(例如文字相依以及和文字無關的演算系統) 可以確保兩者之間在錯誤接受和錯誤拒絕事件統計上並非 完全相關,也就是r<1.0,其中”r”是統計相關係數。 假設這兩種語音鑑證演算系統完全無關(r=0),而且每__ 種演算系統的錯誤拒絕臨界値都設定在低値,例如〇.5%, 那麼,根據取捨的規則,以及J. Guavain,L· Lamel和;b Prouts (March,1995) LIMSI 1995 科學報告中圖 1 的預測, 每一種演算系統的錯誤接受率應該都非常高,以此例而言 約在8%左右。 4Hickman200021tw; AND1P115.TW ΟΛ 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂------ (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(ί(() 不過,如果在兩個演算系統都給予說話者正面鑑證的情況 下才會建立正身份,則兩者的聯合錯誤接受率應該爲(8% _ 2),也就是0.6%,而聯合錯誤拒絕率應該爲0.5% χ 2, 也就是1%。 兩套演算系統之間的關聯程度增加時,聯合錯誤接受率的 預期値應該會增加,而錯誤拒絕預期値應該會減少,在完 全關聯時(r=1.0),本例中兩者的聯合値會重設爲0.5%和 8%。 請注意,B· Prouts採用的演算系統特性的最佳EER値是 3.5%。以B· Prouts的圖表外推至代表EER値爲2%的演 算系統(目前最新的技術),可以選擇將錯誤拒絕率設定 在0.3%,然後錯誤接受率落在4.6%左右,以得到0.2%的 聯合錯誤接受率和0.6%的聯合錯誤拒絕率。 因此,本文說明書和後面申請專利範圍部分所使用的「不 同的演算系統」指的是關聯性r<1.0的演算系統。 現在請參考圖23,圖中所示是根據本發明建立說話者身 份的一套系統,以下稱爲系統2350。 系統2350包含一電腦化系統2352,其中至少又包含兩套 語音鑑證演算系統2354,分別標示爲2354a和2354b。 4Hickman200021tw; AND1P115.TW 91 本纸張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)\ V hpj) NE is the RMS energy of the current frame, EP is the energy of the previous frame, L (N) is the Nth log zone ratio of the current frame, and LP (N) is the Nth of the previous frame Log zone ratio. The logarithmic area ratio L (N) is directly calculated using the Nth reflection coefficient kN, and the formula is as follows: 4Hickman200021tw; AND1P115.TW 84 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) ---- ------- Installation -------- Order --------- ^ • 1! (Please read the precautions on the back before filling this page) 548631 A7 B7 Intellectual Property of the Ministry of Economic Affairs The erroneous transition error Ετ printed by the bureau's consumer cooperative is a function of the spectrum difference approximation 値 T, which is defined as follows: When both the previous frame and the previous frame are silent, or both are audible, Ετ is set to = 〇; otherwise, Et = Gt + At / T, where T is the approximate spectral difference of the current frame. The definition of the pronunciation change error can also vary greatly. The main characteristic of the pronunciation transition error defined here is that as long as the pronunciation state changes (voiced to silent, silent to voiced), the loss is estimated, which is a decreasing function of the spectral difference between the two frames. That is, unless significant spectral changes occur at the same time, changes in pronunciation state are not welcome. Such a pronunciation transition misdefinition can reduce the processing time required to provide a good pronunciation state decision, and provides a considerable advantage in the present invention. Other errors Es and EP that constitute a transition error in the presently preferred embodiment can also be defined variously. That is, the utterance status error can be defined in any way, as long as it matches the data in the current frame with a good tone period and overwrites the data with a bad match during the period. Similarly, the pitch deviation error EP, which is consistent with the change during the pitch, can be defined in any way. Although preparation for doubling and halving is best, tonal deviation errors need not include preparation for doubling and halving. Another optional feature of the present invention is that if the tonal deviation error includes preparations for doubling and halving of tracking, it is best to find the best curve. 4Hickman200021tw; AND1P115.TW 85 This paper size applies Chinese national standards ( CNS) A4 specification (210 X 297 mm) (Please read the notes on the back before filling out this page) 装 · 11111111 #. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Description of Invention (§ς) Double (or halve) the tone period along the best curve to make it as consistent as possible. It is also important to note that the three components identified from the transformation errors do not necessarily need to be used in their entirety. For example, if "goodness" 过滤 lower tonal hypothesis is filtered out in the previous stage, or "goodness" 値 is used to rank the tone periods, and when goodness 値 higher tonal periods are selected, the pronunciation state errors can be omitted. Similarly, other components can be included in the transition error definition as needed. In addition, please note that the dynamic programming method taught by the present invention does not need to be applied to the tone period candidate 摘 extracted from the filtered residual signal after adjustment, nor does it need to be applied to the tone period candidate 衍生 derived from the LPC residual signal, but it may be applied. Apply to any tone period candidate, including tone period candidate directly extracted from the original input voice signal. These three errors are then added together to provide the total error between a tone candidate in the current frame and a tone candidate in the previous frame. As mentioned earlier, these transition errors must be accumulated and added to provide the cumulative loss of each curve in the dynamic programming calculation. This dynamic programming method, which finds both tones and pronunciation at the same time, is peculiar in itself and is only used in conjunction with this method of finding candidate cymbals during tones. Any method to find candidate cymbals during tones can be combined with this peculiar dynamic programming 4Hickman200021tw; AND1P115.TW 86 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) '" " " ^^ Packing -------- Order --------- (Please read the notes on the back before filling this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Invention Explanation (M) calculus is used in combination. Regardless of which method is used to find candidate 値 during tone, the candidate 値 is simply provided as input to a dynamic programming algorithm. The preferred embodiment of the present invention uses a small computer and high precision sampling, so this system is not economical for large applications. Therefore, a better mode for implementing the present invention in the future should be a specific embodiment using a microcomputer-based system, such as a Ti professional computer. This professional computer is equipped with a microphone, a speaker, and a voice processing card including a TMS 320 digital processing microprocessor and a data converter, which is enough hardware to implement the present invention. Voice Access Authentication for Data Access Figure 21 illustrates a specific embodiment of the present invention that uses speech recognition to allow users to access data on the network. When a user requests access to data, such as a website, operation 2100 prompts the user to provide a voice sample. In job 2102, a voice sample of a user is received through a network. At operation 2104 ', retrieve the user's registration information. It is important to note that the information may be retrieved from a local storage device or from a network. The registration information includes a voice scan of the user's voice. After the user's voice sample is obtained, the voice sample and the voice scan in the registration information are compared in operation 2106 to verify the user's identity. Assignment 2106 is discussed in detail later. After verifying the identity of the user in operation 2106, the user data access permission will be granted in operation 2108. If, as a result of the verification of operation 2106, the user's identity is incorrect, access to the data will be denied in operation 2110. 4Hickman200021tw: AND IP 115.TW 87 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) ----- 11 --- · 11 ----- Order ----- 丨-丨 丨 -_ (Please read the notes on the back before filling this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Description of Invention (g?) This specific embodiment is particularly practical in the field of e-commerce. Third parties can be exempted from certification and certification. For more information on the procedures and devices for performing the above operations, read the instructions below and refer to Figures 22-27 and 29-34. In a specific embodiment of the present invention, a user's voice is recorded to establish a voice scan, and the voice scan is stored. This can be part of the registration process. For example, when the user is prompted to provide voice during the registration process, the user can speak into a microphone connected to his computer, and the generated voice data will be transmitted to a website for storage via the Internet, such as the Internet Capture of Verification Procedure. Then, when the user wants to access the website or a specific part of the website, he will be prompted to provide a voice sample, and this sample will be received and compared with the voice data stored in the website. Alternatively, the voice scan can include the user's password. Preferably, the voice scan includes multiple words spoken by the user to increase security. In such a specific embodiment, the voice scan can, for example, store multiple sets of passwords, and the user must save all of the passwords. The password provides voice samples. Alternatively, different access levels or different parts of the data may require different words. You can also use different words as browsing controls. For example, when you connect words to a specific page on the website, the user will be prompted to enter a password, and then display the website page related to the password according to the received password. Allowing several words to be included in the voice scanning at noon also allows identity verification by comparing and replacing words. For example, if the first word cannot identify the user's body 4Hickman200021tw; AND1P115.TW 〇〇 This paper standard applies Chinese National Standard (CNS) A4 size (210 X 297 mm) Packing -------- Order --------- (Please read the notes on the back before filling this page) System 548631 A7 B7 5. The description of the invention (q) can prompt the user to recite other words. For example, the user's voice sample and voice scan are very close, but when the difference between the two is above a predetermined threshold, the user can be asked to recite another word, which can be used to verify the user's identity. This allows the user multiple opportunities to access the data, and this feature is also particularly useful in the event that the user's voice changes slightly due to a cold, etc. The user's voice samples and / or the time and date of receiving the voice samples can be selectively recorded. Referring to operation 2106 of FIG. 21, an exemplary embodiment of the present invention is a system and method for establishing a positive or negative identity of a speaker, in which at least two different voice authentication devices are used, and can be used to supervise access to a security system Regulatory authority. In particular, the present invention can be used to provide voice authentication with very low false acceptance and false rejection rates. The "security system" referred to here is any website, system, device, etc. that only authorized individuals are allowed to access or use. Every time a user wants to access or use the system or device, he must be authenticated or identified with a positive authorization. . The principles and operations of the voice authentication system and method implemented in accordance with the present invention can be better understood with reference to the drawings and accompanying explanations. Next, referring to the drawings, FIG. 22 illustrates the basic concept of the voice authentication system for controlling access to the security system. 4Hickman200021tw; AND1P115.TW 89 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) -------- Order --------- (Please read the note on the back first Please fill in this page again for matters) 548631 A7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs B7 V. Invention Description (^ 0) The speaker 2220 communicates with the security system 2222 and the security center 2224 simultaneously or sequentially. The voice of the speaker 2220 is analyzed by the security center 2224 for verification. If the security center 2224 establishes a positive authentication, it will send a communication instruction to the security system 2222, and establish the correct identification code (ID) of the speaker 2220 as shown in 2226. And allows the speaker 2220 to access the security system 2222. The prior art system of FIG. 22 uses a single voice authentication calculus system. Therefore, this system will encounter the problem that the false acceptance and false rejection rates increase and decrease each other, resulting in a too high false acceptance rate and / or false rejection rate. Respectively leading to insecure and / or inefficient systems. The present invention is a system and method for establishing speaker identity using at least two different speech authentication calculation systems. Choosing two very different speech authentication calculus systems (such as text-dependent and text-independent calculus systems) can ensure that the two are not completely correlated statistically in terms of false acceptance and false rejection events, which is r < 1.0, where "r" is the statistical correlation coefficient. Assume that these two types of speech authentication algorithms are completely independent (r = 0), and the error rejection threshold of each algorithm is set to a low value, for example, 0.5%. Then, according to the rules of selection, and J. Guavain, L. Lamel and; b Prouts (March, 1995) The prediction in Figure 1 of the LIMSI 1995 scientific report. The error acceptance rate of each calculus system should be very high, in this case about 8%. 4Hickman200021tw; AND1P115.TW ΟΛ This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) ----------- installation -------- order ---- -(Please read the notes on the back before filling this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Description of the invention (ί (() However, if the speaker is positive in both calculation systems A positive identity is established only in the case of attestation. The joint false acceptance rate of the two should be (8% _ 2), which is 0.6%, and the joint false rejection rate should be 0.5% χ 2, which is 1%. As the degree of correlation between sets of calculus systems increases, the expected rate of joint false acceptance should increase, while the rate of false rejection should decrease. When fully correlated (r = 1.0), the combined rate of the two in this example will not increase. Reset to 0.5% and 8%. Please note that the best EER 値 for the calculation system characteristics used by B · Prouts is 3.5%. Extrapolate the graph of B · Prouts to a 2% EER 値 calculation system (currently the latest Technology), you can choose to set the false rejection rate at 0.3%, and then the false acceptance rate falls to 4.6% Right, to get a joint error acceptance rate of 0.2% and a joint error rejection rate of 0.6%. Therefore, the "different calculation system" used in this description and the scope of the patent application later refers to a calculation system with correlation r < 1.0. Please refer to FIG. 23, which shows a system for establishing speaker identity according to the present invention, hereinafter referred to as system 2350. The system 2350 includes a computerized system 2352, which includes at least two voice authentication calculation systems 2354. , Marked as 2354a and 2354b respectively. 4Hickman200021tw; AND1P115.TW 91 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)
Aw --------^--------- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費 548631 A7 _____B7__ 五、發明說明(气1) 演算系統2354係被選取爲互不相同,分別負責獨立地分 析說話者的語音,各自取得獨立的正或負語音鑑證。如果 每個演算系統2354都提供正鑑證,將會識別說話者是正 確的使用者,但是如果2354其中一套演算系統產生負鑑 證,將會識別說話者並非正確的使用者(也就是辨識爲冒 充者)。 文字相關或文字無關的語音鑑證演算系統都可以採用。應 用例子包括下列美國專利中說明的:美國專利號碼第 5,666,466號的先摘取特性,然後進行圖形合配演算;美 國專利號碼第5,461,697號的神經網路語音鑑證;美國專 利號碼第5,625,747號的動態時間偏移(Dynamic Time Warping,DTW)演算;美國專利號碼第5,526,465號的 隱藏連串隨機事件模型(Hidden Markov Mode卜HM%)演 算;以及美國專利號碼第5,640,490號的向量定量(VQ) 演算。本文參考了上述所有引用的專利。 根據本發明的一較佳具體實施例,2354兩種演算系統的 錯誤拒絕臨界値都設定在低於或等於0.5%的程度,能夠 低於或等於0.4%比較好,低於0.3%更好,要是低於或等 於0.2%或者約等於0.1%的話最好。 根據實際應用情形,說話者的語音可以直接由系統2352 接受,或者透過遠端通訊模式由系統2352接受。 4HICKMAN/200021TW; AND1P115.TW 92 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注音心事項再填寫本頁) 裝--- r ---訂---------攻 548631 A7 B7 五、發明說明(e) 因此,根據較佳具體實施例,說話者的語音是由電腦化系 統2352透過遠端通訊模式2356接受以進行分析。舉例而 言,遠端通訊模式2356可以是一般或行動電話通訊模式、 電腦電話通訊模式(例如網際網路或企業內網路)或無線 電通訊模式。圖23中以統一的電話符號來代表這些通訊 模式,這些通訊模式至少會與電腦化系統2352中的一部 接收器2358 (圖中有兩部2358,分別是2358a和2358b)通 訊,如圖中折線所示。 根據本發明的另一較佳具體實施例,電腦化系統2352至 少包含兩組硬體安裝2360 (圖中有兩組,分別是2360a和 2360b),每一組硬體安裝2360負責使一套語音鑑證演算 系統2354工作。硬體安裝2360可能是任何類型,包括一 套個人電腦(PC)平台或同等裝置、電腦中一片專用的介 面卡等,而且不限於此。不同的硬體安裝2360可以擺在 遠端安裝。此處的「遠端」指的是一種情況,不同的安裝 2360在此情況下透過遠端通訊媒體通訊。 經濟部智慧財產局員工消費合作社印製 ------------裂— (請先閱讀背面之注意事項再填寫本頁) 禮· 於本發明的一應用中,至少在安全系統2362上實行一組 硬體安裝2360,例如2360a,同時至少在安全中心2364 上實行另一組硬體安裝2360,例如2360b。在一較佳具 體的硬體安裝中,在安全中心2364上實行的硬體安裝 2360b會和在安全系統2362上實行的硬體安裝2360a通 4HICKMAN/200021TW: AND1P115.TW 93 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) A7 548631 B7 _ 五、發明說明(叫) 訊,最後在安全系統2362上建立說話者的正或負識別資 本文說明書和後面申請專利範圍中所使用的「安全中心」 一詞指的是負責執行至少一種語音鑑證演算系統,以及負 責執行說話者正或負識別部分程序的電腦系統。 根據本發明的一較佳具體實施例,電腦化系統2352進一 步包括了語音辨識演算系統2366。演算系統2366負責辨 識說者的口語變化(相封於根據說話者的發苜識別說話 者),並據以操作安全系統2362。演算系統2366再進一 步負責肯定或否定地辨識口語資料,如果口語資料和被鑑 定的說話者之間依照前面所述經由演算系統2354建立的 正身份有負或正的關聯,唯有在此關聯爲正時,才會授予 說話者存取安全系統2366的權限。 說話者所說的口語資料可以包括任何的口說詞語(至少一 個字),例如,但不限於,姓名、身份證字號及要求。 於本發明的一較佳具體實施例中,採用一個具有語音鑑證 演算系統54的單一安全中心2364,和許多的安全系統 2362通訊,這些安全系統都各有一個不同的(第二個)語 音鑑證演算系統2354,讓說話者在通過鑑證以後,得以 選擇存取任一個安全系統2362子集合。 4HICKMAN/200021TW; AND1P115.TW οΔ 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明) 範例 接下來請篸考以下的範例。範例配合前面的說明一起說明 本發明,不過本發明的應用範圍並不只限於此。 經濟部智慧財產局員工消費合作社印製Aw -------- ^ --------- (Please read the notes on the back before filling out this page) Employees ’Consumption of Intellectual Property, Ministry of Economic Affairs 548631 A7 _____B7__ V. Description of Invention (Gas 1) The calculation system 2354 is selected to be different from each other. They are respectively responsible for independently analyzing the speaker's voice and obtaining independent positive or negative voice authentication. If each calculus system 2354 provides a positive attestation, it will identify the speaker as the correct user, but if one of the 2354 calculus systems generates a negative attestation, it will identify that the speaker is not the correct user (that is, it is identified as impersonation) By). Both text-related and text-independent speech authentication calculation systems can be used. Application examples include those described in the following U.S. patents: first extracting features of U.S. Patent No. 5,666,466, and then performing graphic matching calculations; neural network voice verification of U.S. Patent No. 5,461,697; U.S. Patent No. 5,625,747 Dynamic Time Warping (DTW) calculus; Hidden Markov Mode (HM%) calculus of US Patent No. 5,526,465; and Vector Quantitative (VQ) of U.S. Patent No. 5,640,490 Calculus. All references cited above are incorporated herein. According to a preferred embodiment of the present invention, the error rejection thresholds of the two calculus systems of 2354 are both set to a level of 0.5% or lower, which can be lower than or equal to 0.4%, and preferably lower than 0.3%. It is best if it is less than or equal to 0.2% or approximately equal to 0.1%. According to the actual application situation, the speaker's voice can be directly accepted by the system 2352, or by the remote communication mode by the system 2352. 4HICKMAN / 200021TW; AND1P115.TW 92 This paper size is applicable to Chinese National Standard (CNS) A4 (210 X 297 mm) (Please read the note on the back before filling in this page) Installation --- r --- Order --------- Attack 548631 A7 B7 V. Description of the Invention (e) Therefore, according to the preferred embodiment, the speaker's voice is accepted by the computerized system 2352 through the remote communication mode 2356 for analysis. For example, the remote communication mode 2356 may be a general or mobile telephone communication mode, a computer telephone communication mode (such as the Internet or an intranet), or a wireless communication mode. In Figure 23, these communication modes are represented by a unified telephone symbol. These communication modes will communicate with at least one receiver 2358 (two 2358 in the picture, 2358a and 2358b) in the computerized system 2352, as shown in the figure. Shown as a polyline. According to another preferred embodiment of the present invention, the computerized system 2352 includes at least two sets of hardware installations 2360 (there are two groups in the figure, 2360a and 2360b), and each set of hardware installations 2360 is responsible for making a set of voice Forensic calculation system 2354 works. The hardware installation 2360 may be of any type, including, but not limited to, a personal computer (PC) platform or equivalent device, a dedicated interface card in the computer, and the like. Different hardware installation 2360 can be installed at the remote end. "Remote" here refers to a situation where different installations 2360 communicate via remote communication media. Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs ------------ Crack— (Please read the precautions on the back before filling out this page) Ritual · In an application of the present invention, at least in safety One set of hardware installations 2360, such as 2360a, is implemented on system 2362, while another set of hardware installations 2360, such as 2360b, is implemented at least on Security Center 2364. In a preferred specific hardware installation, the hardware installation 2360b performed on the security center 2364 will be the same as the hardware installation 2360a performed on the security system 2362. 4HICKMAN / 200021TW: AND1P115.TW 93 This paper is applicable to China Standard (CNS) A4 specification (210 X 297 mm) A7 548631 B7 _ V. Invention description (called) information, and finally establish the positive or negative identification of the speaker on the security system 2362, and the specifications in the scope of patent application later The term "security center" is used to refer to a computer system that is responsible for performing at least one voice authentication algorithm system, and for performing positive or negative speaker identification procedures. According to a preferred embodiment of the present invention, the computerized system 2352 further includes a speech recognition calculation system 2366. The calculus system 2366 is responsible for identifying the spoken language changes of the speaker (closed to identify the speaker based on the speaker's speech) and operates the security system 2362 accordingly. The calculus system 2366 is further responsible for positively or negatively identifying spoken materials. If the spoken material and the identified speaker have a negative or positive association with the positive identity established through the calculus system 2354 as described above, the only relationship here is Only at the right time will the speaker be granted access to the security system 2366. The spoken materials spoken by the speaker may include any spoken words (at least one word), such as, but not limited to, name, ID number, and requirements. In a preferred embodiment of the present invention, a single security center 2364 with a voice authentication calculation system 54 is used to communicate with a number of security systems 2362, each of which has a different (second) voice authentication The calculation system 2354 allows the speaker to choose to access any of the security system 2362 subsets after passing the authentication. 4HICKMAN / 200021TW; AND1P115.TW οΔ This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) ----------- installation -------- order-- ------- (Please read the notes on the back before filling out this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Invention Description) Examples Please consider the following examples. The examples illustrate the present invention together with the foregoing description, but the scope of application of the present invention is not limited to this. Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs
El 24-27例不根據本發明之系統和方法的較佳具體實施 例。 因此,如圖24所示,單獨使用其語音,或者配合連接至 網路的電腦、有線電話、行動電話、電腦電話、收發機(例 如無線電收發機)或其他任何的遠端通訊媒體等等通訊裝 置,使用者,例如說話者2420,可以和安全中心2424及 一或數個安全系統2422,例如電腦網路(1號安全系統)、 語音郵件系統(2號安全系統)及/或整排的電腦系統(N 號安全系統)等等通訊,當然通訊範圍不限於此處所提裝 置。 於一較佳具體實施例中,說話者使用的是電話通訊模式, 而所有的安全系統2422和安全中心2424的電話號碼都相 同,或者採用無線電通訊模式時,其頻率和調變都相同。 不論在何種情況下,使用者最好能同時和安全系統2422 及安全中心2424通訊。於本發明的一較佳具體實施例裡, 每一個安全系統2422只包含一部接收器以進丫了語苜鑑疋 和鑑證,並沒有發射器。 4HICKMAN/200021TW; AND1P115.TW 95 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 裝--------訂---- (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 五、發明說明(qt) 圖25例示程序的下一步驟。安全中心2424利用下列方法 分析輸入的語音,(i)先前技術語音鑑證2530演算及(ii) 傳統的口語辨識演算2532,例如包括要求的說話者2420 安全系統2422 (1、2…或N號)存取碼(同時形成要求)、 密碼及社會安全碼的口語ID。錯誤拒絕臨界値設定在低 値,例如低於0.5%,最好是低於0.3%,使錯誤接受率維 持在4.6%左右。 建立了輸入語音的正識別以後,安全中心2424以傳送聲 調2536等方式認可說話的身份2536。說話者2420和特 定的安全系統2422 (例如,根據說話者2420使用的系統 存取碼決定)會接收聲調2536。 圖26是接下來的步驟。安全中心2424,或較佳者安全系 統2422,使用第二語音鑑證演算2638執行輸入語音的鑑 證。第二演算和前面配合圖25說明由安全中心2424使用 的語音鑑證演算2532不同。 例如,語音鑑證演算2638可能是如美國專利號碼第 5,461,697號中所說明的神經網路語音鑑證演算系統。 錯誤拒絕臨界値同樣也是設定在低値,例如低於0.5%, 最好是0.3或0.1%。因此,根據以上推理和計算,EER 値在2%左右的演算系統,其錯誤接受程度(例如0.3%)約 在4.6%左右。 4HICKMAN/200021TW: AND1P115.TW 96 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 裝 II---------舞 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(qrp 在本發明的一較佳具體實施例中,將安全中心2424和安 全系統2422都實體移除。由於安全中心2424裡的身份處 理延長了一段預先選取的時間間隔,因此,安全系統2422 裡的同時語音鑑定會在收到聲調2536後的t=.DELTA.T 於安全系統2422啓動。此時間延遲可以確保不致於在收 到安全中心2422的認可之前發出身份識別。 如圖27所示,唯有安全系統2424和安全系統2422都建 立了身份識別2742a與2742b以後,才會建立最後的說話 者身份識別2740,給予說話者安全系統2422的存取權限。 因此,只有在安全中心2424和安全系統2422都建立正語 音鑑定之後,才會確認說話者、確認完成程序以及允許存 取安全系統2422,如2744所示。 如果系統2422和2424之一沒有確認說話者的語音,程序 就不算確認完成,並且拒絕其存取安全系統2422。 用以管理通過邊境的語音式系統 圖28是根據語音訊號判斷要通過邊境的使用者是否合格 .的方法。首先在作業2800接收試圖通過邊境的使用者的 語音訊號。然後在作業2802分析使用者的語音訊號,判 斷使用者是否符合通過邊境的預定標準。接著在作業2804 4HICKMAN/200021TW; AND1P115.TW 97 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 A7 ___ B7 五、發明說明(刊) 輸出指示’顯示使用者是否符合預定的通過邊境標準。執 行這些作業的程序和裝置的詳細說明如下。 圖28介紹的本發明具體實施例中,利用語音訊號判斷使 用者的身份。本發明的具體實施例可用以使得經過核准的 使用者不必提出書面的身份證明,就能通過邊境進入其他 國家’於這樣的一具體實施例中,預定的標準可以包含允 許通過邊境的人員淸單中包含的身份。有關根據語音識別 人貢的程序和裝置的詳細資訊,請參閱前面的〈資料存取 的語音式身份鑑定〉,以及前面參考圖22-27和下面參考 圖29^34提到的方法和裝置。 人負的語音訊號會和許多儲存的語音樣本比較以判斷人員 的身份。這許多語音樣本每一個都和一個人員的身份聯 結。經過語音訊號和語音樣本的比對,確認人員的身份後’ 就會輸出人員的身份。除了人員的身份以外,輸出中還可 以包含顯示給邊境警衛看的顯示內容,顯示允許該人員通 過。或者,在有大門或十字轉門阻止人員通過邊境或阻斷 進入國境的地方,可以使用此輸出打開大門或轉門。 圖28所述的本發明另一具體實施例中,可以利用人員的 語音訊號偵測情緒。其預定標準可以包含爲了協助偵測走 私和其他非法活動以及協助捕捉持有僞造證件者而設計$ 情緒基礎標準。例如,可以偵測人員回答海關官員問題時 4HICKMAN/200021TW; AND1P115.TW 9<t 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) a Mamr I I —Μ— I ϋ· 一tfj0 ί 1 Met -ϋ ϋ I r 經濟部智慧財產局員工消費合作社印製 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明) 語音中的害怕和焦慮。另一種可以偵測的情緒是人員的緊 張程度。有關偵測語音訊號中情緒的詳細資訊,請參閱前 面各章,以了解具體實施例如何運作。 圖29是根據本發明之一方面應用實行的說話者辨識方 法。作業2900會在第一地點儲存預定的第一次最後語音 特性資訊儲存。再於作業2902中於第二地點輸入語音資 料。接著在作業2904於第二地點處理語音資料,以產生 中間語音特性資訊。作業2906會將中間語音特性資訊從 第二地點傳輸到第一地點。作業2908會在第一地點進一 步處理從第二地點傳來的中間語音特性資訊,以產生第二 次最後語音特性資訊。作業2910會在第一地點判斷第二 次最後語音特性資訊是否和第一次語音特性資訊完全相 符,並且產生確認的判斷訊號指示。 圖30根據本發明的第二方面應用說明了辨識說話者的方 法。作業3000會在第一地點處理許多組第一次最後語音 特性資訊,並且儲存對應的識別資訊。作業3002會在第 二地點輸入語音資料和識別資訊之一。作業3004會將一 個識別資訊傳輸至第一地點。作業3006會將與一個識別 資訊對應的第一次最後語音特性資訊之一和判斷係數傳輸 至第二地點。語音接著在作業3008中於第二地點處理, 以產生第二次最後語音特性資訊。作業3010會根據判斷 係數,在第二地點判斷第二次最後語音特性資訊是否和第 4HICKMAN/200021TW; AND1P115.TW 99 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) A7El 24-27 are preferred embodiments of systems and methods not according to the present invention. Therefore, as shown in Figure 24, use its voice alone, or cooperate with a computer, wired phone, mobile phone, computer phone, transceiver (such as a radio transceiver) or any other remote communication medium, etc. connected to the network Device, user, such as speaker 2420, can communicate with security center 2424 and one or more security systems 2422, such as computer network (security system No. 1), voice mail system (security system No. 2), and / or a full array of Computer system (N security system) and other communications, of course, the communication range is not limited to the devices mentioned here. In a preferred embodiment, the speaker uses the telephone communication mode, and the phone numbers of all the security systems 2422 and the security center 2424 are the same, or the radio communication mode has the same frequency and modulation. In any case, the user should be able to communicate with the security system 2422 and the security center 2424 at the same time. In a preferred embodiment of the present invention, each security system 2422 includes only one receiver for authentication and authentication, and there is no transmitter. 4HICKMAN / 200021TW; AND1P115.TW 95 This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm). -------- Order ---- (Please read the precautions on the back before reading) (Fill in this page) 548631 A7 B7 V. Description of Invention (qt) Figure 25 illustrates the next step of the program. The Security Center 2424 analyzes the input speech using the following methods, (i) the prior art speech authentication 2530 calculus and (ii) the traditional spoken speech recognition calculus 2532, including, for example, the required speaker 2420 security system 2422 (1,2, ... or N number) Spoken ID of access code (formed at the same time), password and social security code. The false rejection threshold is set to a low value, for example, less than 0.5%, and preferably less than 0.3%, so that the false acceptance rate is maintained at about 4.6%. After the positive recognition of the input voice is established, the security center 2424 recognizes the speaking identity 2536 by transmitting a tone 2536 and the like. Speaker 2420 and a specific security system 2422 (e.g., based on the system access code used by speaker 2420) will receive a tone of 2536. Figure 26 is the next step. The security center 2424, or better security system 2422, uses a second voice authentication algorithm 2638 to perform authentication of the input voice. The second calculus is different from the voice authentication calculus 2532 used by the security center 2424 described earlier in conjunction with FIG. For example, the speech authentication calculation 2638 may be a neural network speech authentication calculation system as described in U.S. Patent No. 5,461,697. The false rejection threshold is also set to a low value, such as below 0.5%, preferably 0.3 or 0.1%. Therefore, according to the above reasoning and calculation, the calculation system of EER 値 is about 2%, and the degree of error acceptance (for example, 0.3%) is about 4.6%. 4HICKMAN / 200021TW: AND1P115.TW 96 This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page) Installation II -------- -Printed by the Consumer Property Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 2422 are physically removed. As the identity processing in Security Center 2424 extends a pre-selected time interval, the simultaneous voice authentication in Security System 2422 will be t = .DELTA.T in the security system after receiving the tone 2536. 2422 starts. This time delay can ensure that the identification is not issued before receiving the approval of the security center 2422. As shown in Figure 27, the security system 2424 and the security system 2422 can only be established after the identifications 2742a and 2742b are established. Establish the final speaker identity 2740, giving the speaker access to the security system 2422. Therefore, only in the security center 2424 and the security system 2422 can a positive word be established After authentication, the speaker is confirmed, the completion procedure is confirmed, and access to the security system 2422 is allowed, as shown in 2744. If one of the systems 2422 and 2424 does not confirm the speaker's voice, the procedure does not count as confirmation completion, and its storage is rejected Take the security system 2422. Voice system used to manage the border. Figure 28 is a method to determine whether the user who passes the border is qualified based on the voice signal. First, at operation 2800, the voice signal of the user trying to cross the border is received. Then in Assignment 2802 analyzes the user's voice signal to determine whether the user meets the predetermined standards for crossing the border. Then in Assignment 2804 4HICKMAN / 200021TW; AND1P115.TW 97 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) ) ----------- Installation -------- Order --------- (Please read the notes on the back before filling this page) 548631 A7 ___ B7 V. Description of the Invention (Journal) The output indication 'shows whether the user meets the predetermined border crossing standards. The detailed description of the procedures and devices for performing these operations is as follows. In the specific embodiment of the present invention shown in FIG. 28, Use voice signals to determine the identity of the user. The specific embodiment of the present invention can be used to allow an approved user to enter other countries across the border without presenting a written proof of identity. In such a specific embodiment, predetermined standards Can include identities contained in a list of people allowed to cross the border. For more information on procedures and devices for identifying people by voice, see "Voice Authentication for Data Access" earlier, and refer to Figure 22-27 earlier. And the method and apparatus mentioned below with reference to Figures 29 ^ 34. The negative voice signal is compared with many stored voice samples to determine the identity of the person. Each of these many speech samples is associated with the identity of a person. After comparing the voice signal and the voice sample, the identity of the person will be output after confirming the identity of the person. In addition to the identity of the person, the output can include a display for border guards, showing that the person is allowed to pass. Alternatively, where there are gates or turnstiles that prevent people from crossing the border or blocking entry into national borders, you can use this output to open the gates or turnstiles. In another embodiment of the present invention described in FIG. 28, the voice signal of a person can be used to detect emotions. Its predetermined criteria may include designing emotional basis criteria to assist in detecting smuggling and other illegal activities and to assist in capturing those who hold forged documents. For example, when detecting personnel answering questions from customs officials, 4HICKMAN / 200021TW; AND1P115.TW 9 < t This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the notes on the back before filling in (This page) a Mamr II —Μ— I ϋ · 一 tfj0 ί 1 Met -ϋ ϋ I r Printed by the Consumers ’Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 Printed by the Consumer ’s Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 V. Description of the Invention) Fear and anxiety in speech. Another type of emotion that can be detected is how tight the person is. For more information on detecting emotions in voice signals, see the previous chapters to understand how specific embodiments work. Figure 29 is a speaker identification method implemented by an application according to one aspect of the present invention. Assignment 2900 stores the scheduled first and last voice feature information store in the first location. Then input the voice data in the second place in homework 2902. Then, in step 2904, the voice data is processed at the second location to generate intermediate voice characteristic information. Assignment 2906 will transfer the intermediate speech characteristics information from the second location to the first location. Assignment 2908 will further process the intermediate speech characteristic information from the second location at the first location to generate the second final speech characteristic information. Assignment 2910 will determine at the first location whether the second and last speech characteristics information completely matches the first speech characteristics information, and generate a confirmation judgment signal indication. Figure 30 illustrates a method for identifying a speaker according to a second aspect of the invention. Assignment 3000 will process many sets of first and last speech feature information at the first location and store the corresponding identification information. Assignment 3002 will enter one of the voice data and identification information at the second location. Assignment 3004 transmits an identification to the first location. Assignment 3006 transmits one of the first and last speech characteristics information and the judgment coefficient corresponding to one identification information to the second location. The speech is then processed at a second location in job 3008 to generate a second final speech characteristic information. Assignment 3010 will determine whether the second final voice characteristics information is the same as the 4th HICKMAN / 200021TW; AND1P115.TW 99 at the second location according to the judgment coefficient. -------- Order --------- (Please read the notes on the back before filling this page) A7
548631 B7 五、發明說明((0〇) 一次最後語音特性資訊完全相符,並且產生指示確認的判 斷訊號。 根據本發明的第三方面,說話者辨識系統包含了:處理語 音資料以產生標準語音特性資訊並儲存標準語音特性資訊 的一個註冊單元;輸入測試語音資料及處理測試語音資料 以產生中間測試語音特性資訊的第一處理單元;以及和第 一處理單元通訊連接以接收中間測試語音特性資訊,並進 一步處理中間測試語音特性資訊以產生測試語音特性資訊 的第二處理單元,連接至註冊處理單元以判斷測試語音特 性資訊是否和標準語音特性資訊完全相符的處理單元。 根據本發明的第四方面應用,說話者辨識系統包含··處理 語音資料以產生標準語音特性資訊並儲存標準語音特性資 訊及相關ID資訊的第一處理單元;選擇性連接至第一處 理單元以輸入相關ID資訊及測試語音資料的第二處理單 元,將相關的ID資訊傳輸至第一處理單元的第一處理單 元、根據測試語音資料產生測試語音特性資訊及判斷標準 語音特性資訊是否和測試語音特性資訊基本相符的第二處 理單元。 接著請參照圖式,尤其是圖31說明了說話者辨識的基本 元件,使用者對著麥克風3101說話以輸入其語音。語音 定期取樣單元3103會以預定的頻率取樣語音輸入資料, 4HICKMAN/200021TW; AND1P115.TW 100 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 裝--------訂---------· (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明((〇\) 語音特性資訊摘取單元31〇4會摘取每一取樣語音資料集 的預定語音特性資訊或最後語音特性圖形。爲註冊或起始 處理執行上述輸入和摘取程序後,模式選擇開關3108會 關上以連接註冊單元3106,將語音特性資訊儲存在說話 者辨識資訊儲存單元3105中作爲說話者的標準語音特性 資訊’同時儲存說話者身份識別資訊。 接著請參考圖32,此圖說明的是說話者辨識資訊儲存單 元3105中儲存的資訊。說話者身份識別資訊包含說話者 的名字、身份識別號碼、出生日期、社會安全號碼等。儲 存的資訊中有和上述說話者身份識別資訊對應的說話者標 準語音特性資訊。如前所述,標準語音特性資訊是由語音 處理單元3103和3104產生,後者會從說話者在註冊程序 中輸入的預定語音資料中摘取語音特性圖形。最後的語音 特性資訊或語音特性圖形包含連串的上述語音參數。 再回到圖31,模式選擇開關關上以連接說話者辨識單元 3107時,會執行說話者辨識程序。使用者必須先透過識 別輸入裝置3102輸入其說話者識別資訊,例如號碼,才 能被辨識爲註冊的說話者。註冊裝置3106根據識別資訊 指定儲存在說話者辨識資訊儲存單元31〇5中的對應標準 語音特性資訊或最後語音特性圖形,並且將其傳輸至說話 者辨識單元3107。使用者也會透過麥克風3101說出一些 預定的字詞以輸入其語音資料。輸入的語音資料由語音定 4HICKMAN/200021TW; AND1P115.TW 1〇1 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印制衣 548631 A7 B7 五、發明說明((c〇) 期取樣裝置3103和語音特性參數摘取裝置31〇4處理,以 產生測試語音特性資訊。說g舌者辨識單元3107會以測試 語音特性資訊和上面指定的標準語音特性資訊比較,判斷 兩者是否完全相符。說話者辨識裝置3107再根據上述比 較產生顯示上述大致相符狀態的判斷訊號指示。 上述及其他的說話者辨識觀念要素全都根據本發明在電腦 或電話網路上實行。電腦-網路式說話者辨識系統應該要 有許多的本端處理單元,以及至少一個管理處理單元。網 路也應該要共用一個通常位於中央管理處理單元的共同資 料庫。大體而言,電腦-網路式說話者辨識系統是分佈在 光譜的兩端。光譜一端的特性是大量的語音輸入本端處 理,另一端的特性則是大量的語音輸入中央處理。換句話 說,要完成說話者語音識別的話,主要是由本端處理單元、 中央處理單元或兩者的組合處理語音輸入,以判斷其是否 大致和前面指定的註冊語音資料相符。但是,本發明使用 的電腦網路不一定要限於上述的中央至終端限制,也可以 包括分散式系統等其他系統。 接著請參考圖33,根據本發明說明說話者辨識系統的具 體實施例。本端處理單元3331-1至3331-11分別利用網路 線路3333-1到3333·η連接至管理中央處理單元2332。 本端處理單元3331-1至3331-η全都包含一個麥克風、一 個語音定期取樣單元3103、一個語音特性參數摘取單元 4HICKMAN/200021TW; AND1P115.TW 102 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) --------訂------ (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 a? B7 五、發明說明(⑹) 3104以及一個說話者辨識單元3107。每一個本端處理單 元3331-1至3331-11都能夠輸入語音資料及處理語音輸 入,以判斷其特性圖形是否大致上和對應的標準語音特性 圖形相符。管理中央處理單元3332包含一個說話者辨識 資料管理單元3310,可以執行包括註冊及更新標準語音 特性資訊等的管理功能。 接著請參考圖34,針對上述的說話者辨識系統的偏好具 體實施例再進一步說明。爲了簡化解說,我們只詳細說明 一個本端處理單元3331-1的其餘元件。爲了讓本端處理 單元3331-1透過通訊線路3333-1和管理處理單元3332 通訊,本端處理單元3334-1提供了第一通訊輸入/輸出 (I/O)介面單元3334-1。同樣地,管理處理單元3332在 通訊線路3333_1的另一端包含了第二通訊I/O介面單元 3333-1。以下會利用前面介紹的較佳具體實施例大致解說 註冊和辨識程序。 使用者在註冊標準語音特性資訊時,要透過麥克風3101 說出一組預定的字集以輸入語音資料,以及透過ID輸入 裝置3102輸入使用者識別碼。模式開關3108置於註冊模 式,透過介面3334-1、3435及通訊線路3333-1將處理過 的語音特性資訊傳輸至註冊單元3106。註冊單元3106控 制了說話者註冊資訊儲存單元3105,以儲存語音特性資 訊和說話者辨識碼。 4HICKMAN/200021TW; AND1P115.TW 103 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 Α7 五、發明說明((οψ 使用者透過使用者ID輸入裝置3102指定其ID資訊,供 以後執行說話者辨識程序之用。輸入資訊會透過介面 3334-1、3435及通訊線路3333-1將輸入資訊傳輸至管理 處理單元3333·1。管理處理單元3332會採取回應,將和 指定的使用者ID對應的標準語音特性資訊傳送至說話者 辨識單元3107。選擇模式開關設定在說話者辨識模式以 連接說話者辨識單元3107。使用者也利用麥克風3101輸 入其語音輸入,定期取樣單元3103和語音特性資訊摘取 單元3104再處理語音輸入以產生測試語音特性資訊,並 且輸出至說話者辨識單元3107。最後,說話者辨識單元 3107會判斷測試語音特性資訊是否和選取的標準語音特 性資訊大致相符。輸出判斷訊號會顯示此判斷,授權本端 處理單元3331·1繼續進行和管理處理單元3332有關的更 多異動。總結而言,上述的較佳具體實施例基本上會在本 端處理單元上處理輸入語音資料。 在網際網路上利用語音捽制及瀏覽 圖35說明一種用於辨識語音指令的方法,以用於運作網 際網路上的資料。。首先,在作業3500中,在網站上提 供資料。在作業3502中,從存取網站的使用者接收語音 訊號。在作業3504中,解譯語音訊號,以決定瀏覽指令。 在作業3506中,根據瀏覽指令輸出網站的選取資料。 4HICKMAN/200021TW: AND1P115.TW 1〇4 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(丨〇5) 在本發明的具體實施例中,資料包含語音啓動的應用程 式。在這個具體實施例中,瀏覽指令可以控制應用程式的 執行。在本發明之一種應用的實施例中,可透過語音訊號 進行網際網路金融實務。 使用者可以從電腦、電話或兩者存取網站。可以選擇將選 取的資料輸出至電話。這類具體實施例可用於郵件服務。 例如,可以利用語音輸入文字科技,透過電話「寫」電子 郵件,而不必用到顯示器。也可以使用文字變成口述的科 技,透過電話「讀」電子郵件。 可以根據語音訊號判斷語言。然後再利用使用者所說的語 言解譯語音訊號以判斷指令。這在網際網路上的國際客戶 服務系統上特別實用。可以選擇利用人工智慧和使用者互 動,包括口述的回答及其類似物。 靈音控制的內容和應用程式 圖36是根據本發明具體實施例利用語音訊號透過網路控 制內容和應用程式的資訊系統3610的一般性方塊圖。資 訊系統3610包含一個資訊分配中心3612,這個資訊中心 會從一或多個遠端資訊提供者3614-1、· · ·、36U-n接 收資訊,並且供應或廣播此資訊給終端機單元3616。此 4HICKMAN/200021TW; AND1P115.TW 1〇5 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 A7 經濟部智慧財產局員工消費合作社印製 B7 五、發明說明(丨沁) 處使用的「資訊」包括但不限於類比視訊、類比音訊、數 位視訊、數位音訊、新聞報導等文字服務、運動比數、股 票市場行情和氣象報告、電子訊息、電子程式指南、資料 庫資訊、遊戲程式等軟體以及廣域網路資料等等。也可以 選擇或另外再讓資訊分配中心3612可在本端產生資訊, 並且提供此本端產生的資訊給終端機單元3616。 資訊分配中心3612傳輸至終端機單元3616的資訊包括 口語聲音或字詞(發音)詞彙的詞彙代表。這個詞彙提 供,例如,裝置3618的口語控制以及存取由資訊分配中 心3612傳輸的資訊的口語控制。特別是終端機單元3616 接收來自資訊分配中心3612的詞彙資料以及來自使用者 的說話(發音)資料。終端機單元3616包含執行語音辨 識演算法的一處理器,以比較詞彙資料和口語指令資料, 以辨識,例如,控制裝置3618的指令或存取資訊分配中 心3612所傳輸的資訊之指令。終端單元3616適當地產生 一指令,以控制裝置3618或存取資訊分配中心3612所傳 輸的資訊。此處使用的語音辨識演算是指將口語聲音輸入 轉換爲文字或對應指令的演算。說話者鑑定演算是指根據 要求者的語音樣本驗證所要求的說話者身份的演算法。說 話者識別演算法是指根據說話者的聲音輸入從先前取樣的 選擇淸單中識別說話者的演算。說話者識別演算法可用 以,例如,限制特定口述者控制裝置及/或存取資訊的能 力。 4HICKMAN/200021TW; AND1P115.TW 106 -----------裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)548631 B7 V. Description of the invention ((0〇) A final speech characteristic information completely matches, and a judgment signal indicating confirmation is generated. According to the third aspect of the present invention, the speaker recognition system includes: processing speech data to generate standard speech characteristics A registration unit that stores and stores standard voice characteristic information; a first processing unit that inputs test voice data and processes the test voice data to generate intermediate test voice characteristic information; and is in communication with the first processing unit to receive intermediate test voice characteristic information, A second processing unit that further processes the intermediate test voice characteristic information to generate test voice characteristic information is connected to the registration processing unit to determine whether the test voice characteristic information completely matches the standard voice characteristic information. According to a fourth aspect of the present invention Application, the speaker recognition system includes a first processing unit that processes speech data to generate standard speech characteristic information and stores standard speech characteristic information and related ID information; selectively connects to the first processing unit to input related ID information and test The second processing unit of the audio data transmits the relevant ID information to the first processing unit of the first processing unit, generates test voice characteristic information according to the test voice data, and determines whether the standard voice characteristic information basically matches the test voice characteristic information. Two processing units. Next, please refer to the drawings, especially FIG. 31, to explain the basic elements of speaker recognition. The user speaks into the microphone 3101 to input his voice. The voice regular sampling unit 3103 samples the voice input data at a predetermined frequency. 4HICKMAN / 200021TW; AND1P115.TW 100 This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm). -------- Order --------- · (please first Read the notes on the back and fill in this page) Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 V. Description of the invention ((〇 \) Voice characteristics information extraction unit 31〇 4 will extract the predetermined speech characteristic information or the final speech characteristic graph of each sampled speech data set. The above input and extraction process is performed for registration or initial processing After that, the mode selection switch 3108 will be closed to connect to the registration unit 3106, and the voice characteristic information is stored in the speaker identification information storage unit 3105 as the speaker's standard voice characteristic information. At the same time, the speaker identification information is stored. Then refer to FIG. This figure illustrates the information stored in the speaker identification information storage unit 3105. The speaker identification information includes the speaker's name, identification number, date of birth, social security number, etc. The stored information includes the speaker mentioned above. Standard speech characteristics information of the speaker corresponding to the identification information. As mentioned earlier, the standard speech characteristics information is generated by the speech processing units 3103 and 3104, which extracts the speech characteristics from the predetermined speech data entered by the speaker in the registration process. Graphics. The final speech characteristic information or speech characteristic graph contains a series of the above-mentioned speech parameters. Returning to FIG. 31 again, when the mode selection switch is turned off to connect the speaker recognition unit 3107, the speaker recognition process is executed. The user must first enter his or her speaker identification information, such as a number, through the identification input device 3102 to be identified as a registered speaker. The registration device 3106 designates the corresponding standard speech characteristic information or the final speech characteristic pattern stored in the speaker identification information storage unit 3105 according to the identification information, and transmits it to the speaker identification unit 3107. The user also speaks some predetermined words through the microphone 3101 to input his voice data. The input voice data is determined by the voice 4HICKMAN / 200021TW; AND1P115.TW 1〇1 This paper size is applicable to China National Standard (CNS) A4 specification (210 X 297 mm). -------- Order ---- ----- (Please read the precautions on the back before filling out this page) Printed clothing by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Description of the invention ((c) period sampling device 3103 and speech characteristic parameter summary Take the device 3104 processing to generate test voice characteristic information. The speaker recognition unit 3107 compares the test voice characteristic information with the standard voice characteristic information specified above to determine whether the two completely match. The speaker recognition device 3107 then According to the above comparison, a judgment signal indicating the above-mentioned substantially consistent state is generated. The above and other elements of the speaker recognition concept are all implemented on a computer or a telephone network according to the present invention. The computer-network type speaker recognition system should have many copies. End processing units, and at least one management processing unit. The network should also share a common database that is usually located in a central management processing unit. In general, The computer-networked speaker recognition system is distributed at both ends of the spectrum. One end of the spectrum is characterized by a large amount of voice input processing at the local end, and the other end is characterized by a large number of voice input central processing. In other words, to complete the speech In the case of speech recognition, the local input processing unit, the central processing unit, or a combination of the two are used to process the speech input to determine whether it is roughly consistent with the registered speech data specified previously. However, the computer network used in the present invention does not have to be It is limited to the above-mentioned central-to-terminal restrictions, and may also include other systems such as a decentralized system. Next, please refer to FIG. 33 to describe a specific embodiment of the speaker recognition system according to the present invention. The local processing units 3331-1 to 3331-11 respectively use The network lines 3333-1 to 3333 · η are connected to the management central processing unit 2332. The local processing units 3331-1 to 3331-η all include a microphone, a voice periodic sampling unit 3103, and a voice characteristic parameter extraction unit 4HICKMAN / 200021TW; AND1P115.TW 102 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) -------- Order ------ (Please read the notes on the back before filling out this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 a? B7 V. Description of Invention (⑹) 3104 And a speaker recognition unit 3107. Each of the local processing units 3331-1 to 3331-11 can input voice data and process voice input to determine whether its characteristic pattern roughly matches the corresponding standard speech characteristic pattern. Management Central The processing unit 3332 includes a speaker identification data management unit 3310, which can perform management functions including registering and updating standard voice characteristic information. Next, referring to FIG. 34, a specific embodiment of the speaker recognition system described above will be further described. To simplify the explanation, we only detail the remaining components of a local processing unit 3331-1. In order for the local processing unit 3331-1 to communicate with the management processing unit 3332 through the communication line 3333-1, the local processing unit 3334-1 provides a first communication input / output (I / O) interface unit 3334-1. Similarly, the management processing unit 3332 includes a second communication I / O interface unit 3333-1 at the other end of the communication line 3333_1. In the following, the registration and identification procedure will be roughly explained using the preferred embodiments described earlier. When the user registers the standard voice characteristic information, the user needs to speak a predetermined set of characters through the microphone 3101 to input voice data, and input the user identification code through the ID input device 3102. The mode switch 3108 is placed in the registration mode, and transmits the processed voice characteristic information to the registration unit 3106 through the interfaces 3334-1, 3435 and the communication line 3333-1. The registration unit 3106 controls the speaker registration information storage unit 3105 to store the voice characteristic information and the speaker identification code. 4HICKMAN / 200021TW; AND1P115.TW 103 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) ----------- installation -------- order-- ------- (Please read the notes on the back before filling out this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 Α7 V. Description of the invention ((οψ The user specifies it through the user ID input device 3102 ID information for future speaker identification. The input information will be transmitted to the management processing unit 3333 · 1 through the interfaces 3334-1, 3435 and communication lines 3333-1. The management processing unit 3332 will take a response and send The standard voice characteristic information corresponding to the specified user ID is transmitted to the speaker recognition unit 3107. The selection mode switch is set to the speaker recognition mode to connect to the speaker recognition unit 3107. The user also uses the microphone 3101 to input his voice input and samples periodically The unit 3103 and the voice characteristic information extraction unit 3104 then process the voice input to generate test voice characteristic information and output it to the speaker recognition unit 3107. Finally, the speaker recognition unit 3107 judges the measurement Whether the voice characteristic information is roughly consistent with the selected standard voice characteristic information. The output judgment signal will show this judgment, authorizing the local processing unit 3331 · 1 to continue to perform more changes related to the management processing unit 3332. In summary, the above comparison The preferred embodiment basically processes the input voice data on the local processing unit. Using voice suppression and browsing on the Internet Figure 35 illustrates a method for identifying voice commands for operating data on the Internet. First, in homework 3500, provide data on the website. In homework 3502, a voice signal is received from a user who accesses the website. In homework 3504, the voice signal is interpreted to determine a browsing instruction. In homework 3506, The selected information of the website is output according to the browsing instructions. 4HICKMAN / 200021TW: AND1P115.TW 1〇4 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm). ------- (Please read the notes on the back before filling this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 V. Description of the invention (丨5) In the specific embodiment of the present invention, the data includes a voice-activated application. In this specific embodiment, the browsing instruction can control the execution of the application. In an embodiment of the application of the present invention, the voice signal can be transmitted. Carry out Internet financial practice. Users can access websites from computers, phones, or both. They can choose to output selected data to phones. Such specific embodiments can be used for mail services. For example, you can use text-to-speech technology to “write” an e-mail over the phone without using a display. You can also use text-to-speech technology to “read” email over the phone. You can judge the language based on the voice signal. Then use the language spoken by the user to interpret the voice signal to determine the instruction. This is particularly useful on international customer service systems on the Internet. There is an option to use artificial intelligence to interact with the user, including verbal responses and the like. Contents and Applications of Spiritual Sound Control FIG. 36 is a general block diagram of an information system 3610 for controlling content and applications via a network using a voice signal according to an embodiment of the present invention. The information system 3610 includes an information distribution center 3612. This information center receives information from one or more remote information providers 3614-1, ..., 36U-n, and supplies or broadcasts this information to the terminal unit 3616. This 4HICKMAN / 200021TW; AND1P115.TW 1〇5 This paper size is applicable to China National Standard (CNS) A4 specification (210 X 297 mm) Packing -------- Order --------- ( (Please read the precautions on the back before filling this page) 548631 A7 Printed by the Consumer Property Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs B7 5. Information used in the description of invention (丨 Qin) includes but is not limited to analog video, analog audio, digital Text services such as video, digital audio, news reports, sports scores, stock market quotes and weather reports, electronic messages, electronic program guides, database information, software such as game programs, and wide area network data. The information distribution center 3612 can also choose to choose another location or generate the information locally, and provide the terminal unit 3616 with this locally generated information. The information transmitted by the information distribution center 3612 to the terminal unit 3616 includes vocabulary representations of spoken sounds or words (pronounced) words. This vocabulary provides, for example, spoken control of device 3618 and spoken control of access to information transmitted by information distribution center 3612. In particular, the terminal unit 3616 receives vocabulary data from the information distribution center 3612 and speech (pronunciation) data from the user. The terminal unit 3616 includes a processor that executes a speech recognition algorithm to compare vocabulary data with spoken command data to recognize, for example, a command from the control device 3618 or a command to access information transmitted by the information distribution center 3612. The terminal unit 3616 suitably generates a command to control the device 3618 or to access the information transmitted by the information distribution center 3612. The speech recognition calculus used here refers to the calculus that converts spoken voice input into text or corresponding instructions. The speaker authentication algorithm refers to an algorithm that verifies the required speaker identity based on the voice samples of the requester. The speaker recognition algorithm refers to the algorithm for identifying a speaker from a previously sampled selection list based on the speaker's voice input. Speaker recognition algorithms can be used, for example, to limit a particular dictator's ability to control devices and / or access information. 4HICKMAN / 200021TW; AND1P115.TW 106 ----------- install -------- order --------- (Please read the precautions on the back before filling this page ) This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)
548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明((。) 從資訊分配中心3612傳輸至終端機單元3616的資訊可能 是,例如,音素資料。音素(phoneme)是能夠區分語言或 方言中不同發音的聲音最小單位。因此詞彙中每一個聲音 或口述文字可由一組音素代表。另外,詞彙資料可以是藉 由一個人或一群人唸出每一個聲音或單字所產生的模板資 料。詞彙中的每一個口說聲音或單字可由各別對應的模板 代表。應該要注意的是,雖然圖36的系統說明一種系統, 其中來自資訊提供者3614-1、...、3614-n的資訊及詞彙 資料透過相同的通訊連結而傳輸,但是本發明並不侷限在 這樣的應用。因此,來自資訊服務提供者3614-1、...、 3614-n的資訊和詞彙資料可以透過不同的通訊連結而傳 輸。 可以利用許多不同的安排將語音資料提供給終端機單元 3616。在第一個舉例性而非限制性的安排中,提供一遠端 控制,包括無線麥克風或相關的轉換器,以透過電氣、光 或無線頻率訊號將使用者所說的聲音或單字傳輸至終端機 單元3616。終端機單元3616則包括一接收機、一調整收 到的訊號之類比前端、一執行調變後訊號之類比至數位轉 換的編碼解碼器(codec)以及一個連接至處理器的介面電 路。調整作業是指藉由雜訊消除、雜訊降低、濾波及其他 已知的技術對於源自一語音轉換器所接收的電氣訊號進@ 修改。在第二個舉例性而非限制性的安排中,利用麥克風、 調整麥克風聲音訊號的類比接收機、執行調整後訊號之胃548631 Printed by A7 B7, Consumer Cooperatives, Intellectual Property Bureau, Ministry of Economic Affairs 5. Description of the invention ((.) The information transmitted from the information distribution center 3612 to the terminal unit 3616 may be, for example, phoneme data. Phoneme is capable of distinguishing languages Or the smallest unit of sound in different pronunciations in dialects. Therefore, each sound or spoken text in a vocabulary can be represented by a group of phonemes. In addition, vocabulary data can be template data generated by a person or a group of people reading each sound or word. Each spoken sound or word in the vocabulary can be represented by a respective corresponding template. It should be noted that although the system of FIG. 36 illustrates a system in which the information from the information providers 3614-1, ..., 3614-n Information and vocabulary data are transmitted through the same communication link, but the present invention is not limited to such applications. Therefore, information and vocabulary data from information service providers 3614-1, ..., 3614-n can be transmitted through different Communication link for transmission. Voice data can be provided to the terminal unit 3616 using many different arrangements. In the first example In a non-limiting arrangement, a remote control is provided, including a wireless microphone or related converter, to transmit the voice or words spoken by the user to the terminal unit 3616 via electrical, optical or wireless frequency signals. Terminal The machine unit 3616 includes a receiver, an analog front end that adjusts the received signal, a codec that performs analog to digital conversion of the modulated signal, and an interface circuit connected to the processor. The adjustment operation is Refers to the modification of electrical signals received from a voice converter by noise reduction, noise reduction, filtering, and other known techniques. In the second exemplary but non-limiting arrangement, the use of a microphone Analog receiver for adjusting the microphone sound signal, stomach for performing the adjusted signal
4HICKMAN/200021TW; AND1P115.TW 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 1^^·裝--------訂---------Awr (請先閱讀背面之注意事項再填寫本頁) 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明((ο贫) 比至數k轉換的編碼解碼器、以及使用,例如,紅外線或 無線電訊號將數位化聲音資料訊號傳輸至終端機單元 3616的發射器以進行遙控。終端機單元3616則包括接收 數位化聲音資料訊號的接收機以及連接至處理器的介面電 路。數位化聲音資料訊號通常需要每秒至少64 k位元的 傳輸率。第三個舉例性而非限制性的安排中,利用麥克風、 §周整麥克風聲音訊號的類比接收機、執行調整後訊號之類 比至數位轉換的編碼解碼器、分析數位化聲音訊號以摘取 頻譜資料的數位訊號處理器、以及使用,例如,紅外線訊 號傳輸頻譜資料至終端機單元3616的發射器以進行遙 控。終端機單元3616則包括接收頻譜資料的接收器和連 接至處理器的介面電路。由於第三個安排中傳輸的是頻譜 資料’和第二個安排中的數位化聲音資料不同,因此資料 傳輸率很低,也就是低於每秒3610 k位元。因爲頻譜分 析是在遠端控制中進行,所以和第二個安排比較之下,在 辨識作業期間,終端機單元3616處理器的負載減少約 30-5〇%。在第四個舉例性安排中,終端機單元3616有麥 克風、調整麥克風聲音訊號的類比前端、執行調整後訊號 之類比至數位轉換的編碼解碼器、以及連接至處理器的介 面電路。在第五個舉例性安排中,終端機單元3616具有 麥克風、調整麥克風訊號的類比前端、執行調整後訊號之 類比至數位轉換的編碼解碼器、分析數位化訊號以摘取頻 譜資料的數位訊號處理器、以及連接至處理器匯流排的介 面電路。相較於第四個安排,第五個安排中的數位訊號處 4HICKMAN/200021TW: AND1P115.TW 108 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------AW ^--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 A7 _ B7 五、發明說明(i〇Cj) 理器是用於降低終端機單元3616處理器的負載。上述的 各種安排只是舉例說明,也可以在本發明的範圍內,利用 其他的安排爲終端機單元3616提供語音資料。 資訊分配中心所傳輸的詞彙資料可以定義使用者向 控制裝置3618所說出的指令。裝置3618可以是有能力回 應使用者提供的指令來操作的任何裝置,並且本發明在這 方面並沒有限制。因此,裝置3618可能是,例如,電視 機、立體聲收音機、錄放影機、錄音機、CD唱盤、影碟 機、電視遊樂器或電腦。舉行來說,假設裝置3618是電 源插接在終端機單元3616可開關電源插座上的電腦,而 且只要使用者分別說出“開機”和“關機”的指令,就能 夠控制電腦的開機和關機。資訊分配中心3612接著會將 定義包含“開機”及“關機”等字的指令詞彙的音素或模 版詞彙資料傳輸至終端機單元3616。當使用者說出“開 機”或“關機”,而且利用任何上述安排的其中之一將對 應指令的語音資料提供給終端機單元3616時,則終端機 裝置3616的處理器會執行語音辨識演算法,以比較口語 指令和代表指令詞彙的音素或樣版資料,以辨識口語指 令。終端機單元3616再適當地控制裝置3618,也就是打 開或關閉電腦的電源。由於電腦是如前面所述,插接在終 端機單元3616的可開關電源插座裡,因此電腦的電源開 啓和關閉動作都是在終端機單元3616的內部執行。不過, 本發明也適用於將辨識指令傳送至裝置3618以透過通訊 4HICKMAN/200021TW; AND1P115.TW 109 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 裝4HICKMAN / 200021TW; AND1P115.TW This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) 1 ^^ · Packing -------- Order --------- Awr (Please read the precautions on the back before filling out this page) 548631 Printed by the Consumer Property Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 V. Invention Description ((ο Poor) Codec with ratio to number k conversion, and use, for example, The infrared or radio signal transmits the digital audio data signal to the transmitter of the terminal unit 3616 for remote control. The terminal unit 3616 includes a receiver that receives the digital audio data signal and an interface circuit connected to the processor. Digital audio Data signals usually require a transfer rate of at least 64 kbits per second. In a third exemplary but non-limiting arrangement, a microphone, an analog receiver for § weekly microphone sound signals, and analog to digital signals after adjustments are performed A converted codec, a digital signal processor that analyzes digitized sound signals to extract spectrum data, and uses, for example, infrared signals to transmit spectrum data to the terminal unit 3616 Transmitter for remote control. The terminal unit 3616 includes a receiver for receiving spectrum data and an interface circuit connected to the processor. Since the third arrangement transmits the spectrum data 'and the digitized sound data in the second arrangement The data transmission rate is very low, which is lower than 3610 kbits per second. Because the spectrum analysis is performed in remote control, compared with the second arrangement, during the identification operation, the terminal unit 3616 The processor load is reduced by about 30-50%. In the fourth exemplary arrangement, the terminal unit 3616 has a microphone, an analog front-end for adjusting the microphone sound signal, a codec that performs analog-to-digital conversion of the adjusted signal, And the interface circuit connected to the processor. In the fifth exemplary arrangement, the terminal unit 3616 has a microphone, an analog front-end for adjusting the microphone signal, a codec that performs analog-to-digital conversion of the adjusted signal, and analyzes the digitized signal A digital signal processor to extract spectrum data, and an interface circuit connected to the processor bus. The fourth arrangement, the digital signal in the fifth arrangement 4HICKMAN / 200021TW: AND1P115.TW 108 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) --------- --AW ^ -------- Order --------- (Please read the notes on the back before filling out this page) 548631 A7 _ B7 V. Description of Invention (i〇Cj) Processor It is used to reduce the load of the processor of the terminal unit 3616. The above-mentioned various arrangements are merely examples, and other arrangements can be used to provide the terminal unit 3616 with voice data within the scope of the present invention. The vocabulary data transmitted by the information distribution center can define the instructions that the user speaks to the control device 3618. The device 3618 may be any device capable of operating in response to instructions provided by the user, and the present invention is not limited in this regard. Thus, the device 3618 may be, for example, a television, a stereo radio, a video player, a tape recorder, a CD player, a video player, a video game instrument, or a computer. In terms of holding, it is assumed that the device 3618 is a computer whose power source is connected to the terminal unit 3616 that can switch on and off the power socket, and as long as the user speaks the commands of "power on" and "power off", the computer can be turned on and off. The information distribution center 3612 then transmits to the terminal unit 3616 the phoneme or template vocabulary data that defines the command words including the words "on" and "off". When the user says "turn on" or "turn off", and provides the voice data of the corresponding instruction to the terminal unit 3616 by using any of the above arrangements, the processor of the terminal unit 3616 will execute a speech recognition algorithm To compare spoken instructions with phoneme or pattern data representing the vocabulary of the instructions to identify spoken instructions. The terminal unit 3616 controls the device 3618 appropriately, that is, turns on or off the power of the computer. Since the computer is plugged into the switchable power socket of the terminal unit 3616 as described above, the power on and off actions of the computer are performed inside the terminal unit 3616. However, the present invention is also applicable to transmitting the identification command to the device 3618 for communication via 4HICKMAN / 200021TW; AND1P115.TW 109 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) (Please read the back (Please fill in this page again)
I I n ϋ .ϊ、 a— ϋ ϋ -ϋ ϋ ^1 I I 經濟部智慧財產局員工消費合作社印製 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 五、發明說明(U0) 連結而執行的情形。舉行而言,這類通訊連結可以是網際 網路、紅外線連結、RF連結、同軸電纜、電話網路、衛 星系統或光纖,而且本發明並不受限於這些型態。 詞彙資料也可以選擇或另外定義使用者存取資訊分配中心 3612所傳輸的資訊時所說的單字和指令。這項特徵允許 使用者執行以功能表方式操作的使用者介面難以執行的作 業。例如,可以使用這項特徵,利用"搜尋鍵”指令對資訊 分配中心3612傳來的新聞報導標題進行關鍵字搜尋。特 別是資訊分配中心3612會判斷哪些個別單字是作爲關鍵 字,並且產生將這些關鍵字對映至音素或樣版的音素或模 版「字典」。資訊分配中心3612將新聞報導和字典傳輸至 終端機單元3616,終端機單元再將這些資料儲存在記憶 體裡。對於每個關鍵字,終端機單元3616會利用字典產 生對應的音素或模版字串。字串再以語音辨識演算法「登 錄」爲單一的可辨識發音,也就是會變成語音辨識演算系 統詞彙的基本部分。登錄包括爲音素或樣版字串指定識別 碼(identifier),可以是數値或關鍵字本身。當使用者接著 說出”搜尋關鍵字’’指令時,和終端機單元3616聯結的顯 示器或連接至終端機單元3616的電腦上就會顯示此指令 的專用畫面。使用者可以再說出”只有關鍵字”指令,將終 端機單元3616搜尋資訊分配中心3612傳來的新聞報導的 範圍限定在標題中有所說的關鍵字的新聞。接著使用者可 以再說出其他的關鍵字,指示更詳細的搜尋,或者檢視標II n ϋ .ϊ, a— ϋ ϋ -ϋ ϋ ^ 1 II Printed by the Employees 'Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs Printed by the Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs Printed by 548631 A7 B7 V. Implementation of the invention (U0) situation. For communication purposes, this type of communication link may be the Internet, infrared link, RF link, coaxial cable, telephone network, satellite system, or optical fiber, and the present invention is not limited to these types. The vocabulary data may also select or otherwise define the words and instructions that the user said when accessing the information transmitted by the information distribution center 3612. This feature allows users to perform tasks that are difficult to perform with a user interface that operates in a menu mode. For example, you can use this feature to use the "Search Key" command to perform a keyword search on news report titles from the information distribution center 3612. In particular, the information distribution center 3612 will determine which individual words are keywords and generate These keywords map to the phoneme or template phoneme or template "dictionary." The information distribution center 3612 transmits the news reports and the dictionary to the terminal unit 3616, and the terminal unit stores the data in the memory. For each keyword, the terminal unit 3616 uses the dictionary to generate a corresponding phoneme or template string. The string then uses the speech recognition algorithm "register" as a single recognizable pronunciation, that is, it will become a basic part of the speech recognition algorithm vocabulary. Registration includes assigning identifiers to phonemes or pattern strings, which can be numbers or keywords themselves. When the user then says the "search keyword" command, the display connected to the terminal unit 3616 or the computer connected to the terminal unit 3616 will display a dedicated screen for this command. The user can say "only the key" "Word" command to limit the scope of the news reports from the terminal unit 3616 search information distribution center 3612 to the news with the keyword in the title. Then the user can say other keywords to indicate a more detailed search , Or view the target
4HICKMAN/200021TW; AND1P115.TW 11C 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 麵 — -----------^裝--------訂---------^9 (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 A/ B7 五、發明說明(〖u) 題中有所說關鍵字的新聞報導。很明顯可以看出,使用傳 統的功能表式使用者介面執行這類作業會非常困難。 圖37A、37B及37C是採用本發明實行的用戶電視系統的 方塊圖。當然,本發明也可以應用在用戶電視系統以外的 其他資訊系統,本發明的應用並不侷限於此。用戶電視系 統提供資訊給許多的用戶位置,例如3720-1、...、3720-n (參考圖37C)。資訊可包括但不限於類比視訊、類比音 訊、數位視訊、數位音訊、新聞報導、運動比數、股價及 氣象報告等文字服務、電子訊息、電子程式指南、資料庫 資訊、遊戲等軟體以及廣域網路資料等,而且不限於這些。 參考圖37A,用戶電視系統包含了許多的資訊提供者 3714-1、.·.、3714-n,每一個都可能供應前面提到的一 個或數個資訊形式。例如,資訊提供者3714-2包含提供 類比電視訊號至發射器3718的資訊來源3715。發射器3718 交連至發射類比電視訊號3722-2的網際網路上行鏈路 (uplink)3721。資訊提供者3714- 1和3714-3分別從資訊 來源3715提供數位資訊至各個編碼器3716,該編碼器產 生編碼資料串以供傳送。資訊提供者3714-1和3714-3的 資訊來源3715可能是用於儲存資訊的記憶體,例如光學 記憶體。如果任一個資訊提供者3714-1和3714-3提供各 種資訊,例如許多不同的遊戲程式、或不同類型的文字服 務、或許多數位電視或音訊節目,則編碼器3716會多路 傳輸資訊,產生多工資料串以供傳送。編碼器3M6的資 4HICKMAN/200021TW; AND1P115.TW 111 -----------裝--------訂--------- (請先閱讀背面之注音5事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)4HICKMAN / 200021TW; AND1P115.TW 11C This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) Surface — ----------- ^ Packing -------- Order --------- ^ 9 (Please read the notes on the back before filling out this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A / B7 V. Description of the invention News coverage of said keywords. It is clear that it is very difficult to perform such operations using the traditional menu-based user interface. Figures 37A, 37B, and 37C are block diagrams of a consumer television system employing the present invention. Of course, the present invention can also be applied to other information systems other than the user television system, and the application of the present invention is not limited to this. The customer television system provides information to a number of customer locations, such as 3720-1, ..., 3720-n (refer to Figure 37C). Information may include, but is not limited to, analog video, analog audio, digital video, digital audio, news reports, sports scores, text services such as stock prices and weather reports, electronic messages, electronic program guides, database information, software such as games, and wide area networks Information, etc., and not limited to these. Referring to FIG. 37A, the user television system includes a number of information providers 3714-1,..., 3714-n, each of which may provide one or more of the aforementioned information formats. For example, the information provider 3714-2 includes an information source 3715 that provides an analog television signal to the transmitter 3718. Transmitter 3718 is connected to the Internet uplink 3721, which transmits analog television signal 3722-2. The information providers 3714-1 and 3714-3 provide digital information from the information source 3715 to each encoder 3716, which generates an encoded data string for transmission. Information sources 3715 of information providers 3714-1 and 3714-3 may be memory used to store information, such as optical memory. If any of the information providers 3714-1 and 3714-3 provides various information, such as many different game programs, or different types of text services, or perhaps most television or audio programs, the encoder 3716 will multiplex the information to generate Multiplexed data strings for transmission. Encoder 3M6 4HICKMAN / 200021TW; AND1P115.TW 111 ----------- install -------- order --------- (Please read the note on the back first Please fill in this page for 5 items) This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)
548631 A7 經濟部智慧財產局員工消費合作社印制衣 Β7 五、發明說明(\H) 料串會提供至發射器3718,然後再送至網際網路上行鏈 路3721。以圖37A的例子來看,由資訊提供者3714-1操 作的編碼器3716產生數位資料訊號3722-1,並且由資訊 提供者3714-3操作的編碼器3716產生數位訊號3722 -3。 每個訊號3722-1、3722-2及3722-3都透過網際網路3723 傳送至原始點安裝3725 (參考圖37B)。我們知道,本發 明的系統中會有許多的資訊提供者,並且因此許多訊號可 透過網際網路3723傳送至,例如原始點安裝3725的位 置。雖然圖中未顯示,但是訊號可以在,例如,原始點安 裝以外的許多位置接收,例如直接廣播服務(DBS)用戶 的場所。此外,雖然圖中是以網路連結的方式來表現資訊 提供者和原始點安裝之間的連結,但是本發明的應用並不 侷限於此。因此,這個連結可以是,例如,同軸電纜、電 話網路、衛星系統、網際網路、射頻(RF)連結、光纖或 這些媒體的任何組合。此外,雖然圖37A的資訊提供者 和原始點安裝37M的距離很遠,不過可能有一或多個資 訊提供者的實際地點和原始點安裝3725相同。 參考圖37B,原始點3725的下行鏈路3724提供接收到的 訊號3722-1、3722-2及3722-3。原始點安裝3725的作用 是通訊集線器,和各種的資訊提供者連接,並且根據各種 條件將資訊提供者連接至用戶的位置372(M、. . .、3720-η。例如,接收到的數位資料訊號3722-1會供應至接收器 3726-1,然後再送至調變器3728-1,調變送入其他的電纔548631 A7 Printed clothing by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs Β7 V. Description of invention (\ H) The material string will be provided to the transmitter 3718 and then to the Internet uplink 3721. Looking at the example of Fig. 37A, the encoder 3716 operated by the information provider 3714-1 generates a digital data signal 3722-1, and the encoder 3716 operated by the information provider 3714-3 generates a digital signal 3722 -3. Each of the signals 3722-1, 3722-2, and 3722-3 is transmitted to the original point installation 3725 via the Internet 3723 (refer to FIG. 37B). We know that there will be many information providers in the system of the present invention, and therefore many signals can be transmitted to the Internet through 3723, such as the location of the origin point installation 3725. Although not shown in the figure, the signal can be received at, for example, many locations other than the origin point installation, such as the location of a Direct Broadcast Service (DBS) user. In addition, although the connection between the information provider and the origin installation is represented by a network connection in the figure, the application of the present invention is not limited to this. Thus, this link may be, for example, a coaxial cable, a telephone network, a satellite system, the Internet, a radio frequency (RF) link, optical fiber, or any combination of these media. In addition, although the distance between the information provider in FIG. 37A and the original point installation 37M is very long, the actual location of one or more information providers may be the same as the original point installation 3725. Referring to FIG. 37B, the downlink 3724 of the original point 3725 provides the received signals 3722-1, 3722-2, and 3722-3. The origin point installation 3725 is a communication hub that connects with various information providers and connects the information provider to the user's location 372 (M,..., 3720-η. For example, the received digital data The signal 3722-1 will be supplied to the receiver 3726-1, and then to the modulator 3728-1.
4HICKMAN/200021TW; AND1P115.TW 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明((\,) 通道。調變器3728-1可採用任何適當的調變技術,例如 正交部分響應(QPR)調變。接收到的類比電視訊號 3722-2會送至接收器3726-2,再送至編碼器3730加密, 然後送至調變器3728-2,調變至不同的電纜通道。編碼 器3730也會將頻帶內資料(in-band data)加入類比電視訊 號3722-2裡,後面會再詳細討論。顯然,也可同樣地在 本端或遠端爲從其他資訊提供者接收到的數位和類比資訊 訊號提供其他的接收器、調變器及選擇性提供編碼器(未 顯示)。 接收到的數位資料訊號3722-3會送至資訊訊號處理器 (ISP) 3742,以便利用所謂的頻帶內或帶外傳輸傳送。也 可以將其他資訊提供者的資料串(未顯示)送至ISP 3742。ISP 3742負責接收一或多個資料訊號,然後將資料 傳送至馬上要介紹的用戶終端位置。ISP 3742提供資料給 編碼器3730。ISP 3742也可以根據要傳送的資料量和必 須提供及更新的資料速度等係數,提供資料給其他的編碼 器。資料重複由編碼器373〇送出。如果只有一個編碼器, 但是資料量很大,重複率會變得很慢。使用多個編碼器可 以提高資料的重複率。 特別的是,編碼器3730會將資料加入頻帶內以便傳送給 用戶,同時也會將相關的類比電視訊號3722-2加碼。在 一種安排中,資料會放在電視訊號的垂直間隔壓縮中,不 4HICKMAN/200021TW; AND1P115.TW 11, 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) ϋ ·1 ϋ I ϋ i.— I h -口、 1 I 11 I » 經濟部智慧財產局員工消費合作社印制取 548631 A7 B7 五、發明說明(\\vp 過也可以放在訊號中,而且本發明的應用並不限於此。例 如,已淸楚瞭解的是,可以利用振幅調變將資料加入聲音 載波中。此處討論的頻帶內傳送是指在包含苜訊和視訊載 波的視訊電視頻道內載送資料。因此,ISP 3742的資料可 能是利用聲音載波的振幅調變傳送,以下稱爲頻內音訊資 料,或者是放在類比電視訊號的垂直或水平壓縮,以下稱 爲頻帶內視訊資料。也可以安排ISP 3742提供資料在 MPEG壓縮視訊資料串之類數位資料串中未使用的部分 傳輸。 ISP 3742也可以在本端接收及/或產生資訊。例如,ISP 3742 可以產生事件預告或服務中斷或變更等訊息,傳送給用 戶。如果從資料提供者接收到資訊後,則ISP 3742可以 不作更動直接傳送,或者重新製作格式,然後再提供至編 碼器3730傳送給用戶。 ISP 3742也會將資訊傳送至原始點控制器("HEC”) 3732。HEC是連接至編碼器3730和頻外發射器3734。雖 然說明中的HEC 3732是和ISP 3742連接至同一部編碼 器,不過實際上HEC3732是可以連接至其他的一或多部 編碼器。HEC 3732 可以採用 Scientific-Atlanta Model 8658 以控制傳送資料至編碼器3730和頻帶外發射器3734。前 面提到,編碼器3730會將資料放在頻帶內,以傳送給客 戶,連同編碼相關的電視訊號。頻帶外發射器3734是利 4HICKMAN/200021TW; AND1P115.TW 114 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ------------^--------訂------ (請先閱讀背面之注意事項再填寫本頁) #· 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明(\\5) 用另一個載波,也就是不在同一頻道內傳送資訊。在一種 具體實施例中’頻帶外載波是108.2 MHz,但是也可以使 用其他的頻帶外載波。在HEC 3732控制下傳送的資訊可 以是例如,編碼資料。在一種安排中,會在每一個垂直壓 縮間隔安插資訊,以指示下一個視訊圖場中採用的編碼類 型。編碼系統現在已經非常普遍。例如,可以使用同步壓 縮編碼、視訊反轉編碼等,或者不同編碼科技的組合。此 外’也可以傳送授權資訊。授權資訊會授權用戶接收特性 的頻道或節目。從ISP 3742及/或HEC 3732接收到的資 訊也可以透過資料中繼器(未顯示),例如Scientific_ Atlanta Model 8556-100資料中繼器,在非編碼頻道上以 頻帶內音訊或視訊資料的形式傳送。 有些傳送的資訊屬於全面性,也就是會傳送給每一位用 戶。例如,編碼資料可以全面傳送。但是,每一位用戶都 會收到編碼資料並不表示每一位用戶的終端機單元都可以 將接收到的訊號解碼。而是只有授權的用戶終端機單元能 夠將收到的訊號解碼。另一方面,有些資訊傳送則屬於定 址傳送。例如,授權資訊通常會針對個別用戶定址。也就 是說在傳送時,資料會有一個相關的位址(例如用戶終端 機單元的序號)。定址用戶的終端機單元接收資訊,並根 據收到的資訊作出回應。其他用戶的終端機單元則會忽略 此資料。此外,也可以使用群組定址資料,其影響的是整 組用戶的終端機單元。 4HICKMAN/200021TW; AND1P115.TW 115 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公爱) 裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 經濟部智慧財產局員工消費合作社印制衣 五、發明說明(\α) 調變器3728-1、3728-2、其他任何調變器以及頻帶外發射 器3734的輸出都會送至組合器3736。組合器將個別頻道 組合爲單一的寬頻訊號,然後再透過分配網路3738傳送 至許多用戶位置3720-1、· . ·、3720-u (參考圖37C)。分 配網路3738可能包括一或多個光發射器3740、一或多個 光接收器3742以及同軸電纜3744。 如同圖37Β所示,用戶電視系統可以包含許多個原始點 安裝,各個安裝都爲特定城市或地理區域內的許多位置提 供資訊。可以利用中央控制37仏協調用戶電視系統中各 個原始點安裝的作業。中央控制3746通常和多重服務操 作員的中央局聯結,可以和許多城市的控制原始點安裝通 訊。中央控制3746包含系統控制電腦3748,其指揮中央 控制3746的其他元件。系統控制電腦3748的實例是 Scientific-Atlanta System Manager 3610 網路控制器。中 央控制3746可以,例如,提供服務提供者的帳單服務, 包括按次收費觀賞的服務。帳單電腦3750會儲存帳單資 料,也可以製作帳單格式及列印帳單。系統控制電腦3748 和HEC 3732之間可能透過數據機通訊,雖然本發明不限 於此。授權資料可以從系統控制電腦3748傳送至 HEC3732。HEC3732再適當地製作授權資料的格式,並 且透過編碼器3730以頻帶內、或透過頻帶外資料發射器 3734以頻帶外傳送授權資料給用戶終端機單元,如上所 述。 4HICKMAN/200021TW: AND1P115.TW 116 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 經濟部智慧財產局員工消費合作社印制衣 A7 B7 五、發明說明(m) 原始點安裝3725也包含從用戶位置3720-1、···、3720-n 接收反向路徑資料的RF處理器37S2。這些資料通訊包括 轉寄至系統控制電腦3748的按次付費觀賞採購,也可以 包括用戶使用原始點安裝3725上保存的資料庫資訊的要 求。例如,資料庫伺服器3754,像是Oracle.RTM.資料庫 伺服器就提供用戶存取參考材料,例如百科全書、地圖集、 字典及其類似物。用戶要求從RF處理器3752轉寄至資 訊要求處理器3756,其爲了要求的資訊而存取資料庫 3754,並將要求的資訊轉寄至要求的用戶,例如透過前述 的定址頻帶內或頻帶外異動傳送。此外,資訊要求處理器 3756也會存取通訊網路3758,以提供用戶存取其他服務, 例如銀行服務。 隨著原始點安裝和用戶位置之間的資料傳送量增加,可以 利用頻帶外和數位傳送提高使用量。例如,可以分配50 MHz的頻寬專供轉寄頻道(送至用戶終端機單元)和反向 頻道(來自用戶終端機單元)的數位資料(非視訊)傳送 使用。再分配200 MHz以上給數位視訊,並且分配300 MHz 至500給類比視訊。因此,雖然前面討論了許多舉例性的 傳送科技,不過本發明在原始點安裝和用戶位置之間的資 訊通訊方式並不侷限於此。 參考圖37C,每一個用戶位置3720-1、. . .、3720-n包含 連接至分配網路3738的用戶終端機網路3760。此處使用 4HICKMAN/200021TW; AND1P115.TW 117 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) A7 548631 B7_ 五、發明說明(丨丨8) 的“用戶位置”是指位在原始點安裝3725遠端的任何位 置。根據本發明,用戶終端機可以,例如,位於家裡、教 室、飯店房間、醫院或辨公室等。每一個用戶終端機單元 3760可以連接至一或多部裝置3762-1、···、3762-n。裝 置3762-1、· · ·、3762-n可以包含能夠回應使用者所下指 令操作的裝置,而且本發明的應用不限於此。因此,裝置 可以包括電視機、立體聲收音機、錄放影機(VCRs)、錄 音機、CD唱機、影碟機、電視遊樂器、電腦、及其類似 物。部分特定裝置可以連接在一起操作。因此,如圖37C 所示,裝置3762-1就是連接至裝置3762-2。例如,裝置 3762-2可能是電視機,裝置3762-1可能是錄放影機。爲 了便於解說,我們假設裝置3762-1是錄放影機,並且裝 置3762-2是電視機。裝置3762-1、···、3762-n中的一 或多部可以連接至用戶終端機單元3760的可開關電源插 座,藉此用戶端單元3760可在內部影響這些裝置開關的 打開或關閉。遠端控制單元3766透過通訊連結3768和用 戶終端機單元3760交換資訊。舉例而言,通訊連結3768 可以是紅外線連結。 語言翻譯 本發明的系統使用一部詞典和一組有限的文法規則翻譯語 言。詞典包含四大類的語言單元。每一個語言單位是(1) 單字,例如”狗(dog)’,或’’政府(government)” ;(2)組合字, 4HICKMAN/200021TW: AND1P115.TW 118 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 經濟部智慧財產局員工消費合作社印制衣 A7 B7 五、發明說明(丨1) 例如π停車位(parking space)’’或”首相(prime minister)’1 ;或 (3)專有名稱;或(4)在本發明中具有獨特定義的字;或 (5)有多重解釋的字義之一。在最後一種情形中,字的每 一種定義代表一個不同的語言單位,各種定義分別屬於不 同形式類別中的項目。爲了便於自動化處理,每一種定義 會,例如,利用字尾出現的句點數來區別。代表第一個(任 意指定)定義的列出項目後面沒有句點,代表第二個定義 的列出項目後面會有一個句點,依此類推。或者,可以使 用數字,例如使用下標,來區別不同的字義。 本發明專用的字在整個詞典中只佔一小部分,這些字並無 本發明特定或和其依據的自然語言字義不同的情形。而是 將本發明特定的字義放寬,以限制詞典中詞語的總數。例 如,在較佳具體實施例中,將"use"的字義放寬,代表使 用任何物件進行其主要用途,因此’’Jake use book”這個句 子裡,”use”就代表閱讀。"on”可用於代表時間(例如(i go-to ballgame) on yesterday)。不過,如果要更容易使用, 可以取消全部的發明特定字,詞典也會隨著擴充。 本發明將允許詞語的通用詞典分成四大類··「事物」或代 表名詞性的詞,例如人、地方、物品、活動或觀念’以T 代表;「連接字」,指定兩個(或更多個)名詞(包括通常 解釋爲介系詞和連接詞的字,以及利用動作、存在或存在 狀態說明關係的字)之間的關係,以C代表;「描述字」’ 4HICKMAN/200021TW; AND1P115.TW 119 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -----------·裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明 ib改一或多個名詞(包括通常解釋爲形容詞、副詞和不及 物動詞的字),以D代表;以及「邏輯連接字」,建立一 組名詞,以C代表。較佳的邏輯連接詞是”and"(及)和”or,, (或)。 詞典自然不能也不會包含可能的專有名稱淸單;專有名稱 會和本發明無法辨識的其他字一樣,以角括弧括住傳回, 表示並沒有翻譯。系統也不會辨識動詞時態;連接物全部 以現在式表達,因爲從上下文可以很容易了解時態。不過, 仍然會指定時間、特定日子及/或日期來表示時態。 根據四條擴充規則,從辭典中的名詞建構如本發明所述之 句子。最基本的句子從以下三種結構開始(任一種結構都 可以根據下面討論的擴充規則利用T詞語建立)。這些結 構代表能夠傳達資訊的最小可能字集,是建構更複雜句子 的積木。結構簡單有助於迅速翻譯爲會話性的自然語言句 子;使用本發明,即使是複合句也可以透過模組化分析更 基本的句子成分(利用以下會討論到的較佳表現完成的程 序)輕鬆地轉換爲同義的自然語言。 基本結構1 (BS1)是將描述字放在名詞字之後以形成結構 TD。BS1 句子,例如”dog brown”和”Bill swim,,可以很容 易翻譯成英文句子’’the dog is brown”(或片語”the br〇wn dog")和"Bill swims”。 4UICKMAN/200021TW; AND1P115.TW 1?0 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)4HICKMAN / 200021TW; AND1P115.TW This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) ----------- installation -------- order --- ------ (Please read the precautions on the back before filling out this page) 548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs V. Invention Description ((\,) Channel. Modulator 3728-1 may Use any appropriate modulation technique, such as Quadrature Partial Response (QPR) modulation. The received analog TV signal 3722-2 is sent to the receiver 3726-2, then sent to the encoder 3730 for encryption, and then to the modulator 3728-2, modulated to different cable channels. The encoder 3730 will also add in-band data to the analog TV signal 3722-2, which will be discussed in detail later. Obviously, the same can be done in this The remote or remote provides other receivers, modulators and optional encoders (not shown) for digital and analog information signals received from other providers. The received digital data signal 3722-3 is sent to Information Signal Processor (ISP) 3742 to transmit using so-called in-band or out-of-band transmissions. In order to send other providers' data strings (not shown) to the ISP 3742. The ISP 3742 is responsible for receiving one or more data signals and then transmitting the data to the user terminal location to be introduced soon. The ISP 3742 provides the data to the encoder 3730 ISP 3742 can also provide data to other encoders based on the amount of data to be transmitted and the speed of the data that must be provided and updated. Data is repeatedly sent by encoder 3730. If there is only one encoder, but the amount of data is large , The repetition rate will become very slow. Using multiple encoders can increase the repetition rate of the data. In particular, the encoder 3730 adds data to the frequency band for transmission to the user, and also related analog TV signals 3722- 2 plus. In one arrangement, the data will be placed in the vertical interval compression of the TV signal, not 4HICKMAN / 200021TW; AND1P115.TW 11. This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) ( (Please read the notes on the back before filling out this page) ϋ · 1 ϋ I ϋ i.— I h-mouth, 1 I 11 I » Printed by the agency 548631 A7 B7 V. Description of the invention (\\ vp can also be placed in the signal, and the application of the invention is not limited to this. For example, it is well understood that data can be added using amplitude modulation In the sound carrier. The in-band transmission discussed here refers to the carrying of data in a video television channel that includes a carrier and a video carrier. Therefore, the data of ISP 3742 may be transmitted using the amplitude modulation of the sound carrier, hereinafter referred to as in-band audio data, or vertical or horizontal compression placed on analog TV signals, hereinafter referred to as in-band video data. It is also possible to arrange for the ISP 3742 to provide transmission of unused portions of digital data strings such as MPEG compressed video data strings. ISP 3742 can also receive and / or generate information locally. For example, ISP 3742 can generate messages such as event notifications or service interruptions or changes, and send them to users. After receiving the information from the data provider, the ISP 3742 can send it directly without modification, or reformat the format, and then provide it to the encoder 3730 for transmission to the user. ISP 3742 will also send information to origin point controller (" HEC ") 3732. HEC is connected to encoder 3730 and out-of-band transmitter 3734. Although HEC 3732 in the description is connected to the same encoder as ISP 3742 , But in fact HEC3732 can be connected to one or more other encoders. HEC 3732 can use Scientific-Atlanta Model 8658 to control the transmission of data to encoder 3730 and out-of-band transmitter 3734. As mentioned earlier, encoder 3730 will Put the data in the frequency band for transmission to the customer, together with the coding-related TV signals. The out-of-band transmitter 3734 is Lee 4HICKMAN / 200021TW; AND1P115.TW 114 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 (Mm) ------------ ^ -------- Order ------ (Please read the precautions on the back before filling this page) # · 548631 Ministry of Economy Wisdom Printed by A7 B7, Consumer Cooperatives of the Property Bureau. 5. Description of the Invention (\\ 5) Use another carrier, that is, do not transmit information in the same channel. In a specific embodiment, the 'out-of-band carrier is 108.2 MHz, but it can also be used. Other out-of-band carriers Information transmitted under the control of HEC 3732 can be, for example, encoded data. In one arrangement, information is inserted at each vertical compression interval to indicate the type of encoding used in the next video field. Encoding systems are now very common. For example, you can use synchronous compression coding, video inversion coding, etc., or a combination of different coding technologies. In addition, you can also send authorization information. Authorization information authorizes users to receive characteristic channels or programs. Received from ISP 3742 and / or HEC 3732 The received information can also be transmitted through data repeaters (not shown), such as the Scientific_ Atlanta Model 8556-100 data repeater, in the form of in-band audio or video data on non-encoded channels. Some of the transmitted information is comprehensive That is, it will be transmitted to every user. For example, the encoded data can be transmitted in its entirety. However, the fact that each user receives the encoded data does not mean that the terminal unit of each user can decode the received signal. Instead, only authorized user terminal units can decode the received signals. On the one hand, some information transmission is address transmission. For example, authorization information is usually addressed to individual users. That is to say, when transmitting, the data will have an associated address (such as the serial number of the user terminal unit). The terminal unit receives the information and responds according to the received information. The terminal units of other users ignore this information. In addition, the group addressing data can also be used, which affects the terminal units of the entire group of users. 4HICKMAN / 200021TW; AND1P115.TW 115 This paper size applies to China National Standard (CNS) A4 (210 X 297 public love). -------- Order --------- (Please read first Note on the back, please fill out this page again) 548631 A7 B7 Printed clothing by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs V. Invention Description (\ α) Modulators 3728-1, 3728-2, any other modulators, and out-of-band The output of the transmitter 3734 is sent to the combiner 3736. The combiner combines individual channels into a single broadband signal, and then sends it to many user locations 3720-1, ..., 3720-u through the distribution network 3738 (refer to Figure 37C). The distribution network 3738 may include one or more optical transmitters 3740, one or more optical receivers 3742, and a coaxial cable 3744. As shown in Figure 37B, a consumer television system can include many origin point installations, each of which provides information for many locations within a particular city or geographic area. The central control 37 can be used to coordinate the installation of each origin point in the user's television system. The Central Control 3746 is usually linked to the central office of a multi-service operator, and can communicate with control origins in many cities. The central control 3746 contains a system control computer 3748, which commands the other components of the central control 3746. An example of a system control computer 3748 is the Scientific-Atlanta System Manager 3610 network controller. Central control 3746 may, for example, provide billing services to service providers, including pay-per-view services. The billing computer 3750 stores billing information, and can also create billing formats and print bills. The system control computer 3748 and the HEC 3732 may communicate through a modem, although the present invention is not limited thereto. Authorization data can be transferred from the system control computer 3748 to HEC3732. HEC3732 then appropriately authorizes the format of the authorization data and transmits the authorization data to the user terminal unit in-band through the encoder 3730 or out-of-band through the out-of-band data transmitter 3734, as described above. 4HICKMAN / 200021TW: AND1P115.TW 116 This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) ----------- installation -------- order-- ------- (Please read the precautions on the back before filling this page) 548631 Printed clothing A7 B7 of the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the invention (m) The original point installation 3725 also includes the user's location 3720-1, ..., 3720-n RF processor 37S2 receiving reverse path data. These data communications include pay-per-view viewing purchases forwarded to the system control computer 3748, as well as requirements for users to install the database information stored on the 3725 using the original point. For example, a database server 3754, such as the Oracle.RTM. Database server, provides users with access to reference materials, such as encyclopedias, atlases, dictionaries, and the like. The user request is forwarded from the RF processor 3752 to the information request processor 3756, which accesses the database 3754 for the requested information, and forwards the requested information to the requesting user, for example, through the aforementioned addressing band or outside the band Change transmission. In addition, the information request processor 3756 also accesses the communication network 3758 to provide users with access to other services, such as banking services. As the amount of data transferred between the origin point installation and the user's location increases, out-of-band and digital transfers can be used to increase usage. For example, a bandwidth of 50 MHz can be allocated for digital data (non-video) transmission of the forward channel (to the user terminal unit) and the reverse channel (from the user terminal unit). Allocate more than 200 MHz to digital video and 300 MHz to 500 analog video. Therefore, although many exemplary transmission technologies have been discussed previously, the present invention is not limited to the method of information communication between the origin point installation and the user's location. Referring to FIG. 37C, each user location 3720-1,..., 3720-n contains a user terminal network 3760 connected to a distribution network 3738. 4HICKMAN / 200021TW; AND1P115.TW 117 is used here. This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm). Order --------- (Please read the notes on the back before filling out this page) A7 548631 B7_ V. "User Position" of the invention description (丨 丨 8) refers to the 3725 remote end installed at the original point Anywhere. According to the present invention, the user terminal can be, for example, located in a home, a classroom, a restaurant room, a hospital, or an office. Each user terminal unit 3760 can be connected to one or more devices 3762-1, ..., 3762-n. The devices 3762-1, ..., 3762-n may include devices capable of responding to instructions given by the user, and the application of the present invention is not limited to this. Thus, the device may include televisions, stereo radios, video recorders (VCRs), recorders, CD players, video players, video game instruments, computers, and the like. Some specific devices can be connected and operated together. Therefore, as shown in Figure 37C, device 3762-1 is connected to device 3762-2. For example, device 3762-2 may be a television and device 3762-1 may be a video recorder. For the sake of explanation, it is assumed that the device 3762-1 is a video player and the device 3762-2 is a television. One or more of the devices 3762-1, ..., 3762-n can be connected to the switchable power socket of the user terminal unit 3760, whereby the user terminal unit 3760 can internally influence the opening or closing of these device switches. The remote control unit 3766 exchanges information with the user terminal unit 3760 through the communication link 3768. For example, the communication link 3768 may be an infrared link. Language Translation The system of the present invention uses a dictionary and a limited set of grammatical rules to translate a language. The dictionary contains four major types of language units. Each language unit is (1) a single word, such as "dog", or "government"; (2) a combination of words, 4HICKMAN / 200021TW: AND1P115.TW 118 This paper standard applies to Chinese national standards (CNS ) A4 size (210 X 297 mm) Packing -------- Order --------- (Please read the precautions on the back before filling out this page) Employee Consumer Cooperatives, Intellectual Property Bureau, Ministry of Economic Affairs Printed 548631 Printed clothing A7 B7 of the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the invention (丨 1) For example, π parking space '' or '' Prime Minister'1; or (3) Proprietary A name; or (4) a word with a unique definition in the present invention; or (5) one of multiple meanings with multiple interpretations. In the last case, each definition of a word represents a different linguistic unit, and each definition belongs to Items in different forms of categories. In order to facilitate automated processing, each definition will, for example, be distinguished by the number of periods that appear at the end of the suffix. There is no period after the list item representing the first (arbitrarily specified) definition, which represents the second After defining the listed items There will be a period, and so on. Or, you can use numbers, such as subscripts, to distinguish different word meanings. The words dedicated to the present invention occupy only a small part of the entire dictionary, and these words are not specific to the present invention and and It is based on different natural language word meanings. Instead, the specific word meanings of the present invention are relaxed to limit the total number of words in the dictionary. For example, in a preferred embodiment, the word meaning of "&use; use" is relaxed, which means that any Objects serve their primary purpose, so in the sentence "Jake use book", "use" stands for reading. " on "can be used to represent time (for example, (i go-to ballgame) on yesterday). However, if it is easier to use, you can cancel all the invention-specific words, and the dictionary will be expanded. The invention will allow the word The general dictionary is divided into four categories: "Things" or noun nouns, such as people, places, things, activities, or ideas' represented by T; "connective words", which specify two (or more) nouns (including the usual Words interpreted as prepositions and connectives, and words that describe relationships using actions, beings, or states of existence), represented by C; "descriptive words" 4HICKMAN / 200021TW; AND1P115.TW 119 This paper is applicable to this paper China National Standard (CNS) A4 Specification (210 X 297 mm) ----------- · Installation -------- Order --------- (Please read first Note on the back, please fill in this page again) 548631 A7 B7 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economy Represented by D; and "logical connectives" to create a group Noun, represented by C. The preferred logical conjunctions are "and " (and) and" or ,, (or). The dictionary naturally cannot and will not contain a list of possible proper names; the proper names will be returned in angle brackets like other words not recognized by the present invention, indicating that there is no translation. The system also does not recognize verb tenses; the connectives are all expressed in the present tense, because tenses are easily understood from the context. However, a time, a specific day, and / or a date is still specified to represent the tense. According to the four expansion rules, sentences according to the present invention are constructed from nouns in the dictionary. The most basic sentence starts with the following three structures (any of which can be built using T terms according to the expansion rules discussed below). These structures represent the smallest possible set of words that can convey information, and are the building blocks for constructing more complex sentences. The simple structure helps to quickly translate into conversational natural language sentences; using the present invention, even compound sentences can be analyzed through module analysis of more basic sentence components (programs completed with better performance discussed below). Place into synonymous natural language. Basic structure 1 (BS1) is to place the descriptive word after the noun word to form the structure TD. BS1 sentences, such as "dog brown" and "Bill swim," can be easily translated into English sentences ‘’ the dog is brown ”(or the phrase“ the brown dog ") and “Bill swims”. 4UICKMAN / 200021TW; AND1P115.TW 1? 0 This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm)
Aw- Μ------ (請先閱讀背面之注意事項再填寫本頁) 訂--------- 548631 A7 B7 五、發明說明(\屮) ------------·裝 (請先閱讀背面之注意事項再填寫本頁) BS2是在兩個名詞之間加入一個連接詞,形成結構TCT。 像”dog eat food"這樣的BS2句子很容易可以翻譯成同義 的英文句子。 BS3是在兩個名詞之間加入一個遞輯連接詞’形成以結構 TCT.…表示的序列。序列可以是單一連接詞,例如"Bob and Ted”,也可以是複合結構,例如’’Bob and Tedand A1 and Jill” 或"red or blue or green”。 利用以下規則可以將前面提到的一或多個基本結構加以擴 充: 規則I :爲名詞加上描述字(T—>TD) #. 經濟部智慧財產局員工消費合作社印製 根據規則I,名詞類別的任何語言單位都可以擴充爲原始 項目加上修飾原始類別的描述字類別新項目。例如,’’dog” 變成”dog big”。規則I也像本發明的所有規則一樣,應用 範圍不只限於單一的名詞(雖然這是BS1句子形成的方 式);而是可以應用在任何名詞,同時名詞在較長句子中 的位置並沒有限制。因此,根據規則I,TD1 —>(TD2)D1。 例如,"dog big”會變成"(dog brown) big”(對應至英語句 子,"the brown dog is big”)〇 使用連續的形容詞時,加入新形容詞的順序可以很重要, 也可以不重要,因爲所有的詞都是獨立地修飾T ;例如, 4HICKMAN/200021TW: AND1P115.TW 121 本紙張尺度適用中國國家標準(CNS)A4規格(210 χ 297公釐) 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明(0V) "(dog big) brown”裡,形容詞"big”已經把這隻狗和其他的 狗作一區別,’’brown"可以形容可能不知道的特性。d詞 語是不及物動詞時,加詞的順序就很重要。例如,在TD 句子"dog run”(對應至"the dog runs”或”the running dog") 中加入描述字” fast”以擴充句子,根據規則I形成’’(dog fast) run”(對應至"the fast dog runs")。如果要表示”the dog runs fast”,必須利用描述字”run”擴充TD句子”dog fast”,變 成’’(dog run) fast" 〇 將擴充規則I套用至結構BS2會產生TCT—>(TD)CT。例 如,”dog eat food”變成’’(dog big) eat food”。規則 I 也可 以套用至TCT形式的複合名詞,使BS3的結構變成TCT -_>(TCT)D。例如,’’mother and father’,變成’’(mother and father) drive”。利用這種方法可以將多個名詞連接或替代 性結合,進行修飾。另外也要注意,資料庫中包含了及物 動詞,例如"drive”,作爲連接字和描述字。另一種例子是 動詞"capsize”,可以當成不及物("boat capsize”)和及物 ("captain capsize boat”)動詞。 規則Ila:爲名詞加入連接字和其他的名詞(τ—>TCT)。 根據規則Ila,名詞類別中的任何語言單位都可以改成連 接字連接前後的兩個名詞,其中一個是原來的語言單位。 例如,"house”變成”house on hill”。將擴充規則Ila套用 4HICKMAN/200021TW; AND1P115.TW 122 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公爱) -----------·裝--------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明(丨y) 至 BS1 會產生 TD-->(TCT)D ;例如,’’gl〇〇my house"變成 "(house on hill) gloomy"^"the house on the hill is gloomy” o 規則Ila可用於加入及物動詞及其受詞。例如,複合字 ”mother and father”可以擴充爲’’(mother and father) drive car" ° 規則lib :爲名詞加入連接字和其他的名詞(T—>TCT)。 根據規則lib,名詞類別中的任何語言單位都可以改成連 接詞前後接兩個名詞,其中一個是原來的語言單位。例如, ’’dog”變成’’dog and cat” 〇 同樣地,應用規則Rule Ila和規則lib時,名詞可以是由 連接字連接的二或多個名詞構成的複合字。例如,擴充句 "(john and bill) go-to market”就滿足規則Ila。接著再套用 規則I,可以將此句子進一步擴充爲”((j〇hn and bill) go-to market) together" ° 規則in:爲描述字加上邏輯連接字和其他的描述字(D->DCD)。 根據規則III,描述字可以改成邏輯連接字前後接兩個描 述字,其中一個是原來的描述字。例如,"big”變成"big and brown”。將擴充規則III套用至BS1會產生TD-->T(DCD); 例如"dog big”(等於"the dog is big”或”the big dog”)變成 4HICKMAN/200021TW; AND1P115.TW 123 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注音?事項再填寫本頁)Aw- Μ ------ (Please read the notes on the back before filling this page) Order --------- 548631 A7 B7 V. Description of the invention (\ 屮) ------- ----- · Equipment (please read the notes on the back before filling this page) BS2 is to add a connection word between two nouns to form the structure TCT. BS2 sentences like "dog eat food " can easily be translated into English sentences with the same meaning. BS3 is to add a reciprocal connective 'between two nouns to form a sequence represented by the structure TCT .... The sequence can be a single link Words, such as "Bob and Ted", can also be compound structures, such as "Bob and Tedand A1 and Jill" or " red or blue or green ". The following rules can be used to expand one or more of the basic structures mentioned above: Rule I: Add a descriptive word to the noun (T— > TD) #. Employee Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs print according to Rule I, Any language unit of the noun category can be expanded to the original item plus a new item of the descriptive category that modifies the original category. For example, "dog" becomes "dog big". Like all rules of the present invention, Rule I is not limited to a single noun (although this is the way BS1 sentences are formed); it can be applied to any noun, at the same time There is no restriction on the position of nouns in longer sentences. Therefore, according to rule I, TD1 — > (TD2) D1. For example, " dog big ”would become " (dog brown) big” (corresponding to an English sentence, " the brown dog is big ") 〇 When using consecutive adjectives, the order of adding new adjectives may or may not be important because all words modify T independently; for example, 4HICKMAN / 200021TW: AND1P115.TW 121 This paper size is in accordance with Chinese National Standard (CNS) A4 specification (210 χ 297 mm) 548631 Printed by A7 B7, Consumer Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs 5. Inventory (0V) " (dog big) brown ", The adjective " big " has distinguished this dog from other dogs, " brown " can describe characteristics that may not be known. When d-words are intransitive verbs, the order of addition is important. For example, in the TD sentence " dog run "(corresponding to " the dog runs" or "the running dog "), add the descriptive word" fast "to expand the sentence, and form" (dog fast) run "( Corresponds to " the fast dog runs "). If you want to express "the dog runs fast", you must use the descriptive word "run" to expand the TD sentence "dog fast" and become "(dog run) fast". 〇 Applying the expansion rule I to the structure BS2 will generate TCT— > ( TD) CT. For example, "dog eat food" becomes "(dog big) eat food". Rule I can also be applied to compound nouns in the form of TCT, so that the structure of BS3 becomes TCT -_ > (TCT) D. For example, "mother and father ', becomes "(mother and father) drive". In this way, multiple nouns can be connected or alternately combined for modification. Also note that the database contains transitive verbs, such as " drive ", as conjunctions and descriptive words. Another example is the verb " capsize", which can be considered as intransitive (" boat capsize ") and (&Quot; captain capsize boat ") verb. Rule Ila: Add conjunctions and other nouns (τ— > TCT) to nouns. According to rule Ila, any language unit in the noun category can be changed to two nouns before and after the connection, one of which is the original language unit. For example, " house "becomes" house on hill ". Apply the expansion rule Ila to 4HICKMAN / 200021TW; AND1P115.TW 122 This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 public love) ----- ------ · Equipment -------- Order --------- (Please read the precautions on the back before filling out this page) 548631 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 V. Invention description (丨 y) to BS1 will produce TD-> (TCT) D; for example, "gl〇〇my house " becomes " (house on hill) gloomy " ^ " the house on the hill is gloomy ”o Rule Ila can be used to join transitive verbs and their acceptors. For example, the compound word "mother and father" can be expanded to ‘’ (mother and father) drive car " ° rule lib: add a connection word and other nouns to a noun (T— > TCT). According to the rule lib, any language unit in the noun category can be changed to a conjunction followed by two nouns, one of which is the original language unit. For example, '' dog 'becomes' 'dog and cat'. Similarly, when the rule Rule Ila and the rule lib are applied, the noun can be a compound word composed of two or more nouns connected by a connection word. For example, the extended sentence " (john and bill) go-to market "satisfies rule Ila. Then applying rule I, this sentence can be further expanded to" ((j〇hn and bill) go-to market) together " ° Rule in: Add logical connection words and other description words (D-> DCD) to the description words. According to Rule III, the description word can be changed to a logical connection word followed by two description words, one of which is the original description word. For example, " big "becomes " big and brown". Applying Extended Rule III to BS1 will produce TD-> T (DCD); For example, " dog big "(equivalent to " the dog is big" or "the big dog") becomes 4HICKMAN / 200021TW; AND1P115.TW 123 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the note on the back? Matters before filling out this page)
548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明(\叫) ’'dog (big and brown)’’(等於丨’the dog is big and brown’’或 "the big brown dog’’)o 根據本發明套用這些規則以形成可接受句子的方法如圖 38所示。圖38是從步驟3810的cat之類名詞開始,依照 擴充規則I、Ila及lib形成三種基本結構的任何一種,分 別如 3812、3814、3816 所示,以產生”cat striped”(BS1)、 ’’cat on couch”(BS2)或 ’’cat and Sue”(BS3)。在 3818 和 3820重複套用擴充規則Ila會產生TCI T1 —>(TC1 T1)C2 T2 或”((cat on couch) eat mouse)"和(TCI T1)C2 T2 — >((TC1 T1)C2 T2)C3 T3 或”(((cat on couch) eat mouse) with tail)”等格式。可以在任一點將擴充規則I套用至T 語言單位,如3822所示(修飾原始的T,也就是cat,以 產生"(happy cat) on couch")和 3824 所示(修飾"eat mouse”)。也可以如3826 (進一步修飾cat以產生"(((happy and striped) cat) on couch)")和 3828 (進一步修飾"eat mouse”)所示套用規則III。 擴充規則I可以如3812、3830所示重複套用,進一步修 飾原始的T (雖然如3830強調的,描述字不一定要是形容 詞)。利用擴充規則Ila可以顯示被修飾的T的動作(如 3832所示),而規則I可用於修飾新加入的T (如3834)所 示。也可以使用規則I修飾(擴大利用本發明)由規則nb 構成的複合主詞,如3836所示。 4HICKMAN/200021TW; AND1P115.TW 124 本纸張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) · n n n 1 mmmte im 一 0、 i tm— ϋ ·1 1· n · 548631 A7 B7 五、發明說明(\25) 語言單位的組合順序對意義的影響很大。例如,擴充句TC1 T1 “>(TC1 T1)C2 T2可以有許多種形式。句子結構”cat hit (ball on couch)"傳達的意義和’’cat hit ball (on couch)”不 同。前面一句的球當然是在沙發上,後面一句則是在沙發 上發生的動作。句子”(john want car).fast”表示動作應該 趕快完成,而"(john want (car fast))”表示車子應該開得很 快。 下表是前面提到的各種擴充規則更詳細的範例,說明了利 用本發明表現自然語言的情形: 表8548631 Printed A7 B7 by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs V. Invention Description (\ called) "dog (big and brown)" (equivalent to 'the dog is big and brown' or 'the big brown dog' '') O A method of applying these rules to form an acceptable sentence according to the present invention is shown in Figure 38. FIG. 38 starts from a noun such as cat in step 3810, and forms any of three basic structures according to the expansion rules I, Ila, and lib, as shown in 3812, 3814, and 3816, respectively, to generate "cat striped" (BS1), ' 'cat on couch' (BS2) or 'cat and Sue' (BS3). Repeating the application of the extended rule Ila at 3818 and 3820 will generate TCI T1 — & (TC1 T1) C2 T2 or “(cat on couch) eat mouse) " and (TCI T1) C2 T2 — > ((TC1 T1) C2 T2) C3 T3 or "(((cat on couch) eat mouse) with tail)" and other formats. You can apply the expansion rule I to the T language unit at any point, as shown in 3822 (modify the original T, which is cat To produce " (happy cat) on couch ") and 3824 (modify " eat mouse "). You can also apply Rule III as shown in 3826 (further modify cat to produce " ((happy and striped) cat) on couch) ") and 3828 (further modification " eat mouse "). Extended rule I can be as 3812 Repeat the application as shown in 3, 3830 to further modify the original T (although as described in 3830, the descriptive word does not have to be an adjective). Using the extended rule Ila can show the action of the modified T (as shown in 3832), while rule I is available It is shown in the modification of newly added T (such as 3834). You can also use Rule I to modify (expand the present invention) the compound subject composed of rule nb, as shown in 3836. 4HICKMAN / 200021TW; AND1P115.TW 124 Paper size Applicable to China National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page) · nnn 1 mmmte im a 0, i tm— ϋ · 1 1 · n · 548631 A7 B7 V. Description of the Invention (\ 25) The combination order of the language units has a great influence on meaning. For example, the extended sentence TC1 T1 "> (TC1 T1) C2 T2 can have many forms. The meaning of the sentence structure "cat hit (ball on couch)" is different from '' cat hit ball (on couch) '. The ball in the first sentence is of course on the sofa, and the ball in the back is the action that takes place on the sofa. The sentence "(john want car) .fast" indicates that the action should be completed quickly, and "(john want (car fast))" indicates that the car should drive fast. The following table is a more detailed example of the various expansion rules mentioned earlier , Illustrates the use of the present invention to express natural language: Table 8
Zairian health officials said 97 people have died from the Ebola virus so far. Jean Tamfun, a virologist, who helped identify the virus in 1976, criticized the government’s quarantines and roadblocks as ineffective.Zairian health officials said 97 people have died from the Ebola virus so far. Jean Tamfun, a virologist, who helped identify the virus in 1976, criticized the government ’s quarantines and roadblocks as ineffective.
OnOn
Saturday the quarantine on the Kikwith region was officially lifted. health-official/s of zaire 4HICKMAN/200021TW; AND1P115.TW 125 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) -- I I----I 訂· —-------, 經濟部智慧財產局員工消費合作社印製 548631 A7 B7 經濟部智慧財產局員工消費合作社印制衣 五、發明說明(丨u) *say* people 97 *dead *because-of* virus named ebola jean-tamfun be* virologist in zaire he help* scientist/s identify* virus named ebola *in 1976 jean-tamfun criticize* government of zaire he say* quarantine/s ineffective *and* roadblock/s ineffective government end* quarantine of* region named kikwit *on Saturday 4HICKMAN/200021TW; AND1P115.TW 126 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 裝 ----訂--------- 548631 A7 B7 五、發明說明(丨巧) (請先閱讀背面之注意事項再填寫本頁) 圖39是本發明的代表性硬體具體實施例。依照圖中顯示, 系統包含了主要的雙向匯流排3900,所有的系統元件都 是在匯流排上通訊。實現本發明的主要指令集和後面要討 論的資料庫都存放在大量儲存裝置(例如硬碟或光學儲存 單元)3902,系統作業時還會常駐在系統主記憶體3904 裡°這些指令和本發明的功能都是利用中央處理單元 ("CPU”)3906 執行。 使用者利用鍵盤3910和位置感應裝置(例如滑鼠)3912 連接系統。上述任一種裝置的輸出可用於指定資訊或選取 螢幕畫面3914上的特定區域,指示系統應執行的功能。 經濟部智慧財產局員工消費合作社印製 主記憶體3904包含了一組模組,負責控制CPU 3906的 作業和與其他硬體元件的互動。作業系統3920負責管理 執行低階的系統基本功能,例如記憶體配置、檔案管理及 大量儲存裝置3902的作業。分析模組3925在更高的層次 實行一連串的儲存指令,指示執行本發明實行的主要功 能,詳細情形會在下面討論;同時定義使用者介面3930 的指令也允許直接在顯示器3914畫面上互動。使用者介 面3930會在顯示器3914上產生文字或圖形影像,提示使 用者執行動作,並且從鍵盤3910及/或位置感應裝置3912 接受使用者的指令。 主記憶體3904也包含了定義能夠儲存本發明各語言單位 4HICKMAN/200021TW; AND1P115.TW 127 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 經濟部智慧財產局員工消費合作社印製 A7 B7 五、發明說明(丨z?) 的許多資料庫的分割區,以參考數字3935^ 3935^ 39353、 39354表示。這些資料庫3935可以是實體上不同(以個別 檔案的形式儲存在儲存裝置3902上的不同記憶體分割區) 或邏輯上不同(以結構化淸單的形式儲存在單一的記憶體 分割區,利用定址方式識別許多個資料庫),每一個都包 含與至少兩種語言中特定類別對應的語言單位。換句話 說,每一個資料庫都組織成一個資料表,其中的資料欄列 出了單一語言特定類別的所有語言單位,讓每一列都能包 含以系統能夠翻譯的不同語言所表示的相同語言單位。在 具體實施例中,名詞是放在資料庫39351裡,而該資料庫 單一語言(英語)內容的代表範例,也就是多欄工作資料 庫中一欄的內容如表9所示·,連接字放在資料庫39352裡, 表10是其資料欄範例;描述字放在資料庫39353裡,表π 是其資料欄範例;而邏輯連接字(最簡單的"and”和”or”) 放在資料庫39354。Saturday the quarantine on the Kikwith region was officially lifted. Health-official / s of zaire 4HICKMAN / 200021TW; AND1P115.TW 125 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (please read the back first) Please fill in this page for the matters needing attention)-I I ---- I Order ---------, printed by the Employees 'Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 A7 B7 printed by the Employees' Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs Garment 5. Description of the invention (丨 u) * say * people 97 * dead * because-of * virus named ebola jean-tamfun be * virologist in zaire he help * scientist / s identify * virus named ebola * in 1976 jean-tamfun criticize * government of zaire he say * quarantine / s ineffective * and * roadblock / s ineffective government end * quarantine of * region named kikwit * on Saturday 4HICKMAN / 200021TW; AND1P115.TW 126 This paper applies the Chinese National Standard (CNS) A4 Specifications (210 X 297 mm) (Please read the precautions on the back before filling out this page) Binding ---- Order --------- 548631 A7 B7 V. Description of the Invention (丨 Clever) (Please first Read the back Note then fill Page) FIG. 39 is a specific embodiment of a representative hardware embodiment according to the present invention. As shown in the figure, the system includes the main two-way bus 3900. All system components communicate on the bus. The main instruction set for implementing the present invention and the database to be discussed later are stored in a large number of storage devices (such as hard disks or optical storage units) 3902. The system will also reside in the system main memory 3904 during system operation. These instructions and the present invention All functions are performed using the central processing unit (" CPU ") 3906. The user uses a keyboard 3910 and a position sensing device (such as a mouse) 3912 to connect to the system. The output of any of the above devices can be used to specify information or select a screen image 3914 The specific area above indicates the functions that the system should perform. The main memory 3904 printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs contains a set of modules responsible for controlling the operation of the CPU 3906 and interaction with other hardware components. Operating system 3920 is responsible for the management and execution of low-level system basic functions, such as memory configuration, file management, and operation of a large number of storage devices 3902. The analysis module 3925 implements a series of storage instructions at a higher level, instructing the execution of the main functions implemented by the present invention, Details are discussed below; the commands that define the user interface 3930 also It is possible to interact directly on the screen of the display 3914. The user interface 3930 generates text or graphic images on the display 3914, prompting the user to perform actions, and accepts user instructions from the keyboard 3910 and / or the position sensing device 3912. Main memory 3904 also contains definitions capable of storing the language units of the present invention. 4HICKMAN / 200021TW; AND1P115.TW 127 This paper size applies to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 548631 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 V. The partitions of many databases of the invention description (丨 z?) Are indicated by reference numbers 3935 ^ 3935 ^ 39353, 39354. These databases 3935 can be physically different (stored in the form of individual files on the storage device) Different memory partitions on 3902) or logically different (stored in a single memory partition in the form of a structured document, using addressing to identify many databases), each of which contains at least two languages Language units for specific categories. In other words, each database is organized into a table of The material column lists all language units of a specific category of a single language, so that each column can contain the same language units represented by different languages that the system can translate. In a specific embodiment, the nouns are placed in the database 39351, and A representative example of the single language (English) content of this database, that is, the content of a column in the multi-column work database is shown in Table 9. The connection word is placed in database 39352, and Table 10 is an example of its data column; Description Words are placed in database 39353, and table π is an example of its column; logical connectives (the simplest " and "and" or ") are placed in database 39354.
4HICKMAN/200021TW; AND1P115.TW 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公餐) -----------,·裝 -------訂--------- (請先閱讀背面之注意事項再填寫本頁) 548631 A/ B7 五、發明說明(丨:3〇) avalanche behavior brake ambassador capital 經濟部智慧財產局員工消費合作社印製 baby belgium brass captain amount back bell brazil car animal backpack belt bread cardboard ankle bag benefit breakfast cargo answer baker beverage breath carpenter ant balcony bicycle brick carpet apartment ball bill bridge carrot appetite banana billiard broom cash apple bandage bird brother cat appointment bank birth brush cattle barley birthday building cauliflc apricot barn bladder bulgaria cellar april barrel blanket bullet cemetery acchitect basket blood bus chain argentina4HICKMAN / 200021TW; AND1P115.TW This paper size is applicable to China National Standard (CNS) A4 specification (210 X 297 meals) -----------, · install ------- order-- ------- (Please read the precautions on the back before filling this page) 548631 A / B7 V. Description of the invention (丨: 3〇) avalanche behavior brake ambassador capital Ministry of Economic Affairs Intellectual Property Bureau Employee Consumption Cooperative printed baby belgium brass captain amount back bell brazil car animal backpack belt bread ankle bag benefit breakfast cargo answer baker beverage breath carpenter ant balcony bicycle brick carpet apartment ball bill bridge carrot appetite banana billiard broom cash apple bandage bird brother cat appointment bank birth brush cattle barley birthday building cauliflc apricot barn bladder bulgaria cellar april barrel blanket bullet cemetery acchitect basket blood bus chain argentina
4HICKMAN/200021TW; AND1P115.TW 130 -----------裝--------訂---------^__w— (請先閱讀背面之注音心事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 五、發明說明(\w) 經濟部智慧財產局員工消費合作社印製 bath blouse butcher chair cheek copy dinner export germany cheese corkscrew direction eye gift chemistry corn disease face girl cherry cost dish factory glass chess cotton distance fall glasses chest couch document family glove chicken country dog farm glue child courage donkey father goat chile cousin door february god chin cow drawing ferry gold china cracker dream fig goose chocolate crane dress finger government Christmas cream driver fingernail grape church crib drum finland grapefruit cigar crime duck fire grass cigarette cuba dust fish greece circle cucumber eagle fist group4HICKMAN / 200021TW; AND1P115.TW 130 ----------- install -------- order --------- ^ __ w— (Please read the note on the back first (Fill in this page again) This paper size is in accordance with Chinese National Standard (CNS) A4 (210 X 297 mm) 548631 A7 B7 V. Description of invention (\ w) Printed by the consumer property cooperative of the Intellectual Property Bureau of the Ministry of Economics, bath blouse butcher chair cheek copy dinner export germany cheese corkscrew direction eye gift chemistry corn disease face girl cherry cost dish factory glass chess cotton distance fall glasses chest couch document family glove chicken country dog farm glue child courage donkey father goat chile cousin door february god chin cow drawing ferry gold china cracker dream fig goose chocolate crane dress finger government Christmas cream driver fingernail grape church crib drum finland grapefruit cigar crime duck fire grass cigarette cuba dust fish greece circle cucumber eagle fist group
4HICKMAN/200021TW: AND1P115.TW 131 (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 --------B7__五、發明說明 citizen cup ear flea guard clock curtain earring flood guest clothing czechoslov earthquake 經濟部智慧財產局員工消費合作社印制衣 floor guide cloud akia ecuador flour gun clove damage education flower gymnastics club dance eel flute hail coal danger egg fly hair coat date egypt food hairdresser cockroach daughter elbow foot half cocoa day electricity football hammer coffee death elevator forest hand collar debt end fork handkerchief Colombia december enemy fox color decision energy france harbor comb degree engine friday harvest comfort denmark engineer friend hat competition 4HICKMAN/200021TW; AND1P115.TW 132 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 裝---- 111111. 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(\分) dentist england frog he computer departure entrance front head concert desert envelope fruit health condition dessert ethiopia funeral heart connection diarrhea europe game heel conversation dictionary excuse garden here digestion exhibition cook dining- exit copper room holland key honey kidney horse kind horse-race king hospital kitchen expense luggage lunch lung highway hole gauge holiday movie pain mushroom painting mustard pair garlic gasoline machine nail pakistan magazine nail-file4HICKMAN / 200021TW: AND1P115.TW 131 (Please read the notes on the back before filling this page) This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) 548631 A7 -------- B7__Fifth, the invention description citizen cup ear flea guard clock curtain earring flood guest clothing czechoslov earthquake fly hair coat date egypt food hairdresser cockroach daughter elbow foot half cocoa day electricity football hammer coffee death elevator forest hand collar debt end fork handkerchief Colombia december enemy fox color decision energy france harbor comb degree engine friday harvest comfort denmark engineer friend hat competition 4HICKMAN / 200021TW AND1P115.TW 132 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page) Equipment ---- 111111. 548631 A7 B7 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs. 5. Description of invention (\ minutes) dentist england frog he computer departure entrance front head concert desert envelope fruit health condition dessert ethiopia funeral heart connection diarrhea europe game heel conversation dictionary excuse garden here digestion exhibition cook dining- exit copper room holland key honey kidney horse kind horse-race king hospital kitchen expense luggage lunch lung highway hole gauge holiday movie pain mushroom painting mustard pair garlic gasoline machine nail pakistan magazine nail-file
4HICKMAN/200021TW; AND1P115.TW 133 (請先閱讀背面之注音心事項再填寫本頁) 裝---- 訂---------· 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) A7 548631 B7 五、發明說明(丨外) 經濟部智慧財產局員工消費合作社印製 pancake hotel knee magic name panic hour knife maid nature pants house kuwait mail neck paper hungary lace malaysia necklace parachute husband ladder malta needle parents I lake man neighbor parking ice lamb map nepal part ice-cream language march netherlands partridge iceland lawyer market new- passport idea lead marriage Zealand pea import leaf match newspaper peace india leather mattress nicaragua pear Indonesia lebanon may nigeria peasant information leg meat night pen ink lemon medicine noodle pencil 4HICKMAN/200021TW; AND1P115.TW 134 -----------裝--------訂-------- (請先閱讀背面之注音心事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明(\π) insect letter insurance liberia interpreter library invention libya meeting noon people iran iraq license life ireland light melon north- pepper member america persia memorial north-pole peru metal norway pharmacy mexico nose Philippines middle november physician iron light-bulb milk number piano island lightning minute nurse picture israel lime mistake nut Pig it linen monday oak pigeon italy lion money oar pillow january lip monkey oats pilot japan liquid month October pin jewel liver moon office pine-tre( job living-: room morning oil pipe joke lobster morocco olive plant4HICKMAN / 200021TW; AND1P115.TW 133 (Please read the phonetic notes on the back before filling out this page) Loading ---- Order --------- · This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) A7 548631 B7 V. Description of the invention (outside) Printed by pancake hotel knee magic name panic hour knife maid nature pants house kuwait mail neck paper hungary lace malaysia necklace parachute husband ladder malta needle parents I lake man neighbor parking ice lamb map nepal part ice-cream language march netherlands partridge iceland lawyer market new- passport idea lead marriage Zealand pea import leaf match newspaper peace india leather mattress nicaragua pear Indonesia lebanon may nigeria peasant information leg meat night pen ink lemon medicine noodle pencil 4HICKMAN / 200021TW; AND1P115.TW 134 ----------- install -------- order -------- (Please read the Note the heart note, then fill out this page) This paper size applies Chinese national standards CNS) A4 specification (210 X 297 mm) 548631 A7 B7 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs. 5. Description of invention (\ π) insect letter insurance liberia interpreter library invention libya meeting noon people iran iraq license life ireland light melon north- pepper member america persia memorial north-pole peru metal norway pharmacy mexico nose Philippines middle november physician iron light-bulb milk number piano island lightning minute nurse picture israel mistake mistake nut Pig it linen monday oak pigeon italy lion money oar pillow january lip monkey oats pilot japan liquid month October pin jewel liver moon office pine-tre (job living-: room morning oil pipe joke lobster morocco olive plant
4HICKMAN/200021TW; AND1P115.TW 135 (請先閱讀背面之注咅心事項再填寫本頁) • 裝 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) A7 548631 B7 五、發明說明(丨%) 經濟部智慧財產局員工消費合作社印製 jordan lock mosquito onion platform juice look mother orange play July loom mountain ore playing- june love mouse ox card kenya luck mouth package pleasure plum room skin story tin pocket root skis stove tire poison rope sky street toast poland rubber sled student tobacco police- rumania smell subway today officer russia smoke sugar toe porter rust snake summer toilet portual saddle snow sun tomato post-office saddness soap Sunday tomorrow postcard safety socks surprise tongue pot saftey-belt soda swamp tool potato sailor soldier Sweden tooth powder salt solution Switzerland toothbrush prison sand son syria top 4HICKMAN/200021TW; AND1P115.TW 136 ------------#裝---- (請先閱讀背面之注意事項再填寫本頁) 訂-------- .線·Γ 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 i、發明說明(丨V]) problem Saturday property sauce purse saudi-quarter arabia queen squsage question scale rabbit scarf song sound table towel tail tailor town toy soup south-africa taste train south- tax tree trip america tea south-pole teacher trouble radio school soviet- telephone truth television tuesday tent rag science union (請先閱讀背面之注意事項再填寫本頁) rain raincoat rat scissors space Scotland spain screw spice tunisia --------訂---------一 經濟部智慧財產局員工消費合作社印一衣 spoon test turkey thailand tv-show theater typewriter razor sea receipt self spring they umbrella record- September staircase thief uncle player shape stamp thigh united-4HICKMAN / 200021TW; AND1P115.TW 135 (Please read the note on the back before filling out this page) • The size of the paper is applicable to China National Standard (CNS) A4 (210 X 297 mm) A7 548631 B7 V. Invention Description (丨%) Printed by the Consumer Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs, jordan lock mosquito onion platform juice look mother orange play July loom mountain ore playing- june love mouse ox card kenya luck mouth package pleasure plum room skin story tin pocket root skis stove tire poison rope sky street toast poland rubber sled student tobacco police- rumania smell subway today officer russia smoke sugar toe porter rust snake summer toilet portual saddle snow sun tomato post-office saddness soap Sunday tomorrow postcard safety socks surprise tongue pot saftey-belt soda swamp tool potato sailor soldier Sweden tooth powder salt solution Switzerland toothbrush prison sand son syria top 4HICKMAN / 200021TW; AND1P115.TW 136 ------------ # 装 ---- (Please read first Please fill in this page again before ordering) Order --------. Line · Γ This paper size is applicable to Chinese National Standard (CNS) A4 (210 X 297 mm) 548631 A7 B7 i. Description of the invention (丨V]) problem Saturday property sauce purse saudi-quarter arabia queen squsage question scale rabbit scarf song sound table towel tail tailor town toy soup south-africa taste train south- tax tree trip america tea south-pole teacher trouble radio school soviet- telephone truth television tuesday tent rag science union (Please read the notes on the back before filling in this page) rain raincoat rat scissors space Scotland spain screw spice tunisia -------- Order --------- Ministry of Economic Affairs Intellectual Property Bureau Employee Consumer Cooperatives Print Yiyi spoon test turkey thailand tv-show theater typewriter razor sea receipt self spring they umbrella record- September staircase thief uncle player shape stamp thigh united-
4HICKMAN/200021TW; AND1P115.TW 137 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 五、發明說明(別) refrigerator 經濟部智慧財產局員工消費合作社印製 she star thing states religion sheep starch thirst Uruguay rent shirt station thread us restaurant shoe steak throat vaccination result shoulder steel thumb vegetable rice side stick thunder velvet ring signature stock- thursday Venezuela risk silk market ticket victim river silver stomach tie view rocket sister stone tiger village roll situation store time vinegar roof size storm timetable violin voice water weight window work waiter we wheat winter year wall weather where? woman yesterday war wedding who? wood you waste Wednesday wife wool Yugoslavia watch week wind word 4HICKMAN/200021TW: AND1P115.TW 138 -------------------訂---------線 (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 五 經濟部智慧財產局員工消費合作社印製 發明說明表10 連接字 able-to call from mix shoot about called from more-than should above capsize fry move sing across capture give near smell afraid-of carry go-in need speak after catch go-through occupy steal against cause go-to of sting allow change hang on stop answer climb hate outside study arrest close have pay take arrive-at cook hear play teach ask count help prepare :throw at cut hit print to bake deal-with hunt promise touch be decrease if prove translate because defeat in pull try 4HICKMAN/200021TW; AND1P115.TW 139 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) --------------------訂---------線 (請先閱讀背面之注意事項再填寫本頁) 548631 A7 B74HICKMAN / 200021TW; AND1P115.TW 137 This paper size applies to Chinese National Standard (CNS) A4 (210 X 297 mm) 548631 A7 B7 V. Description of invention (other) refrigerator Intellectual Property Bureau, Ministry of Economic Affairs, Employee Consumption Cooperative, printed by she star thing states religion sheep starch thirst Uruguay rent shirt station thread us restaurant shoe steak throat vaccination result shoulder steel thumb vegetable rice side stick thunder velvet ring signature stock- thursday Venezuela risk silk market ticket victim river silver stomach tie view rocket sister stone tiger village roll situation store time vinegar roof size storm timetable violin voice water weight window work waiter we wheat winter year wall weather where? woman yesterday war wedding who? wood you waste Wednesday wife wool Yugoslavia watch week wind word 4HICKMAN / 200021TW: AND1P115.TW 138 --- ---------------- Order --------- line (please read the precautions on the back before filling this page) This paper size applies Chinese national standard (C NS) A4 specification (210 X 297 mm) 548631 A7 B7 Five inventions printed by the Intellectual Property Bureau of the Ministry of Economic Affairs, printed by the consumer co-operatives of the invention 10 Formable-to call from mix shoot about called from more-than should above capsize fry move sing across capture give near smell afraid-of carry go-in need speak after catch go-through occupy steal against cause go-to of sting allow change hang on stop answer climb hate outside study arrest close have pay take arrive-at cook hear play teach ask count help prepare : throw at cut hit print to bake deal-with hunt promise touch be decrease if prove translate because defeat in pull try 4HICKMAN / 200021TW; AND1P115.TW 139 This paper standard is applicable to China National Standard (CNS) A4 specification (210 X 297 mm) -------------------- Order --------- line (Please read the precautions on the back before filling this page) 548631 A7 B7
五、發明說明(丨VO become deliver in-front-of push turn-off before discuss in-order-to put turn-on begin down include read under behind drink increase reduce understand believe drive kill refuse until bet drop kiss remember use betray eat know repeat value between examine learn ride visit blame explain leave roast want bother find like say wash break finish live-in see while bring fix look-for sell win burn for made-of send with but for make sew work-for buy forget meet shave write --------------------訂---------線 (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 4HICKMAN/200021TW; AND1P115.TW 140 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 Α7 Β7 五、發明說明(\+\) 表11 描述字 經濟部智慧財產局員工消費合作社印製 abroad clean flat long round absent clear fly malignant run again cold forbidden maybe sad agree complain foreign mean safe alive continue fragile more short all correct free much sick almost cough fresh mute similar alone crazy fun mutual sit also cry funny my sleep always curious glad nervous slow angry damp good neutral slowly another dangerous goodbye never small any dark green new smile argue dead grey next soft artificial deaf grow nice some automatic decrease guilty north sometimes 4HICKMAN/200021TW; AND1P115.TW 141 (請先閱讀背面之注意事項再填寫本頁) -If 訂---------線· 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 548631 A7 B7 五、發明說明(\o available deep hang backward defective happen bad different happy bashful difficult hard beautiful not sour now often south special okay stand 經濟部智慧財產局員工消費合作社印制衣 dirty healthy old strong begin drop heavy open sweet black drown hungry our swim blind dry illegal permitted talk blond early important pink tall blue east increase play thanks boil easy intelligent please there boring empty interesting poor thick born enough jealous portable thin brave expensive kiss possible thinl broken expire large previous tiredV. Description of the invention (丨 VO becomes deliver in-front-of push turn-off before discuss in-order-to put turn-on begin down include read under behind drink increase reduce understand believe drive kill refuse until bet drop kiss remember use betray eat know repeat value between examine learn ride visit blame explain leave roast want bother find like say wash break finish live-in see while bring fix look-for sell win burn for made-of send with but for make sew work-for buy forget meet shave write -------------------- Order --------- line (Please read the precautions on the back before filling in this page) Intellectual Property of the Ministry of Economic Affairs 4HICKMAN / 200021TW; AND1P115.TW 140 printed by the Bureau ’s Consumer Cooperatives. This paper size applies to Chinese National Standard (CNS) A4 (210 X 297 mm) 548631 Α7 Β7 V. Description of the invention (\ + \) Table 11 Descriptive Word Economy Printed by the Consumer Cooperatives of the Ministry of Intellectual Property Bureau, broad clean flat long round absent clear fly malignant run again cold forbidden maybe sad agree complain for eign mean safe alive continue fragile more short all correct free much sick almost cough fresh mute similar alone crazy fun mutual sit also cry funny my sleep always curious glad nervous slow angry damp good neutral slowly another dangerous goodbye never small any dark green new smile argue dead grey next soft artificial deaf grow nice some automatic decrease guilty north sometimes 4HICKMAN / 200021TW; AND1P115.TW 141 (Please read the precautions on the back before filling in this page) -If order --------- line · this paper Standards are applicable to China National Standard (CNS) A4 specifications (210 X 297 mm) 548631 A7 B7 V. Description of the invention (\ o available deep hang backward defective happen bad different happy bashful difficult hard beautiful not sour now often south special okay stand Ministry of Economic Affairs Intellectual Property Bureau employee consumer cooperative printed clothes dirty healthy old strong begin drop heavy open sweet black drown hungry our swim blind dry illegal permitted blond early important pink tall blue east increas e play thanks boil easy intelligent please there boring empty interesting poor thick born enough jealous portable thin brave expensive kiss possible thinl broken expire large previous tired
4HICKMAN/200021TW; AND1P115.TW 142 (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) A7 548631 B7 五、發明說明 brown extreme last quiet together burn far late red too-much capsize fast laugh rest transparent careful fat lazy rich travel change few left right ugly cheap first legal ripe upstairs urgent warm wet worry young wait weak white wrong your walk west why? yellow (請先閱讀背面之注音心事項再填寫本頁) 0 -丨線· 經濟部智慧財產局員工消費合作社印制衣 輸入緩衝區3940透過鍵盤3910接收使用者的輸入句子, 最好是根據本發明結構組成,而且格式遵照下面的說明。 這時候,分析模組3925會開始檢查輸入句子是否符合結 構。模組3925會以互動方式處理輸入句子的單一語言單 位,尋找資料庫以找出和指定語言中每一個語言單位對應 的項目,以及目標語言中的對應項目。分析模組3925會 翻譯句子,使用目標語言的項目取代輸入項目,將翻譯結 果送入輸出緩衝區3945,然後顯示在顯示器3914的畫面 上。 雖然主記憶體3904的模組分開討論,但也只是爲了能夠 淸楚表達;只要系統能夠執行一切必要的功能,模組在系 4HICKMAN/200021TW; AND1P115.TW 14^ 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) A7 548631 ____B7 ____ 五、發明說明(丨蚪) 統裡的分配方式和其程式設計結構並不重要。 爲了讓模組3925方便分析,輸入句子最好是以能夠方便 處理的獨特格式構成,以便於進行個別語言單位的直接識 別,以及簡單驗證確定連串的單元是合乎本發明擴充規則 的合理句子。一種應用方法(直式)裡,句子的每一個語 言單位都顯示成個別的一行。如果有套用擴充規則,會使 用星號(*)標記擴充的部分;也就是利用*將基本句子 結構連接在一起,構成更長的句子。例如,圖1中各項目 的圖面, cat striped *hit* ball red 代表步驟132和134的結果。 或者可以使用代數(橫式)格式表示句子,以括弧括住擴 充詞來識別擴充部分: (cat striped) hit (ball red) 4HICKMAN/200021TW; AND1P115.TW 144 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ------------dr (請先閱讀背面之注意事項再填寫本頁) 訂---------線 經濟部智慧財產局員工消費合作社印製 A7 548631 B7__ 五、發明說明(H5) 不論是哪一種情況,使用者的輸入都會當成字元字串處 理,並且使用標準的字串分析常式,模組3925會識別各 個語言單位和擴充點。然後再拿這些單位和對應至允許的 擴充規則的樣版比較,驗證句子,接著就是執行資料庫查 尋和翻譯。如果句子不符合本發明的規則,模組3925會 透過顯示器3914警告使用者。 根據上述各種表現格式,英語單數名詞會在字尾加上n/s" 構成複數(例如"nation/s”)。至於其他語言,會使用最通 用的方法形成複數;例如,法語會和英語一樣加上W, 但是義大利語則會加上"/i”。數字會以數値表示。 或者,可以組態分析模組3925,使其處理未格式化的輸 入句子。模組3925在執行時會查尋資料庫3935中每一個 輸入字(或者一組字),然後根據組成字的語言類別建構 一個句子表示,也就是以語言類別符號來取代每一個單 位。然後模組3925再評估產生的類別順序是否是依照允 許的擴充規則產生,是的話就將語言單位歸類,以利查尋 和翻譯。輸出是以對應至輸入的無結構格式提供,或者以 前面提到的格式之一提供。較佳的是後者的輸出形式,因 爲只靠取代的話,一種語言的字串很難完全對應至另外一 種語言的字串;通常把語言單位隔離,強調擴充式的輸出 格式會比較容易了解。 4HICKMAN/200021TW: AND1P115.TW 145 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) --------------------訂---------"mp (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 548631 a? ____ Β7 發明說明(丨从) 本發明可以再加入其他特性以簡化作業。例如前面提到 的’利用字尾的句點區分有多種意義的字;特定字義後面 的句點數目代表任意的選擇。因此,另一個資料庫3935 可以構成具有多種意義字詞的字典,在各種定義後設定本 發明能夠辨識的每一種字義格式。使用者介面3930會將 使用者在定義上按鍵的動作解讀爲選擇該定義,然後將字 的適當編碼輸入輸入緩衝區3940。 同樣地,由於作業經濟規模和速度的考量限制了整體資料 庫的大小,因此可以將其中一個資料庫3935設定爲同義 字典,爲無法辨識的輸入字提供最接近本發明能夠辨識的 語言單位。如果分析模組3925的作業嘗試失敗,想要找 出資料庫中的字的話,可以排定模組3925查閱同義字資 料庫3935,然後傳回語言資料庫中存在的一組字詞淸單。 模組3925也可以包含能夠識別及修正(例如在使用者認 可後)句子結構上常出現的錯誤的特定公用程式。例如, 本發明通常會使用動詞"to have1’表示指定的人的所有格; 例如’’PauPs computer is fast"這個句子會(以代數格式)表 示成’’paul have (computer fast)”或"(computer of paui) fast” ;如果未指名的話,會使用一般的所有格代名詞(例 如"(computer my) fast")。所以,可以組態模組3925 ,使 其辨識’’Paul’s”這樣的結構,並且根據本發明傳回適當的 結構。4HICKMAN / 200021TW; AND1P115.TW 142 (Please read the precautions on the back before filling this page) This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) A7 548631 B7 V. Invention description brown extreme last quiet together burn far late red too-much capsize fast laugh rest transparent careful fat lazy rich travel change few left right ugly cheap first legal ripe upstairs urgent warm wet worry young wait weak white wrong your walk west why? yellow (please read the first (Please note the phonetic matters and fill in this page again.) 0-丨 Line · The employee's consumer clothing cooperative printed clothing input buffer 3940 of the Ministry of Economic Affairs receives the user's input sentence through the keyboard 3910. It is preferably composed according to the structure of the present invention and the format is The following description. At this time, the analysis module 3925 starts to check whether the input sentence conforms to the structure. Module 3925 will process the single-language unit of the input sentence interactively, and search the database to find the item corresponding to each language unit in the specified language and the corresponding item in the target language. The analysis module 3925 translates the sentence, replaces the input item with the target language item, sends the translation result to the output buffer 3945, and displays it on the screen of the display 3914. Although the module of main memory 3904 is discussed separately, it is only for the sake of clear expression; as long as the system can perform all necessary functions, the module is in Department 4HICKMAN / 200021TW; AND1P115.TW 14 ^ This paper standard applies Chinese national standards ( CNS) A4 specification (210 X 297 mm) A7 548631 ____B7 ____ 5. Description of the invention (丨 蚪) The distribution method in the system and its programming structure are not important. In order to make the module 3925 easy to analyze, the input sentence is preferably formed in a unique format that can be easily processed, so as to facilitate the direct identification of individual language units and simple verification to determine that a series of units are reasonable sentences that comply with the expansion rules of the present invention. In one application (straight), each language unit of a sentence is displayed on a separate line. If an expansion rule is applied, the asterisk (*) is used to mark the expanded part; that is, the basic sentence structure is connected together with * to form a longer sentence. For example, in the drawing of each item in Figure 1, cat striped * hit * ball red represents the results of steps 132 and 134. Or you can use the algebraic (horizontal) format to represent the sentence, and use the brackets to expand the words to identify the extended parts: (cat striped) hit (ball red) 4HICKMAN / 200021TW; AND1P115.TW 144 This paper standard applies to Chinese National Standard (CNS) A4 specification (210 X 297 mm) ------------ dr (Please read the notes on the back before filling this page) Order --------- Intellectual Property of the Ministry of Economics Printed by the Bureau ’s Consumer Cooperatives A7 548631 B7__ 5. Description of the Invention (H5) In any case, the user ’s input will be treated as a character string, and using standard string analysis routines, module 3925 will identify each Language units and extension points. These units are then compared with a template corresponding to the allowed expansion rules, the sentences are verified, and then a database search and translation is performed. If the sentence does not comply with the rules of the present invention, the module 3925 alerts the user via the display 3914. According to the above various forms of expression, English singular nouns will have n / s " at the end to form plurals (such as " nation / s "). For other languages, plurals will be formed using the most common method; for example, French and English Add W as well, but in Italian it will add " / i ". Numbers are represented by numbers. Alternatively, the analysis module 3925 can be configured to process unformatted input sentences. When module 3925 is executed, it will search each input word (or a group of words) in database 3935, and then construct a sentence representation according to the language category of the words, that is, replace each unit with a language category symbol. Module 3925 then evaluates whether the generated category order is generated in accordance with the allowed expansion rules. If so, the language units are classified for search and translation. The output is provided in an unstructured format corresponding to the input, or in one of the previously mentioned formats. The latter is the preferred output form, because if only replaced, strings in one language can hardly correspond to strings in another language; usually the language units are isolated, and the extended output format will be easier to understand. 4HICKMAN / 200021TW: AND1P115.TW 145 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) -------------------- Order- ------- " mp (Please read the notes on the back before filling this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 a? ____ Β7 Description of the invention (丨 from) This invention can be added to other Features to simplify the job. For example, the aforementioned ‘uses a period at the end of a word to distinguish words with multiple meanings; the number of periods after a particular word meaning represents an arbitrary choice. Therefore, another database 3935 can constitute a dictionary with multiple meaning words, and after each definition, set each word meaning format that the present invention can recognize. The user interface 3930 interprets the user's key press on the definition as selecting the definition, and then inputs the appropriate encoding of the word into the input buffer 3940. Similarly, because the consideration of the scale and speed of the operation economy limits the size of the overall database, one of the databases 3935 can be set as a synonym dictionary to provide an unrecognizable input word that is closest to the language unit that the present invention can recognize. If the operation attempt of the analysis module 3925 fails, and you want to find the words in the database, you can schedule the module 3925 to look up the synonym database 3935, and then return a list of words in the language database. Module 3925 may also contain specific utilities that can identify and correct (for example, after user approval) common errors in sentence structure. For example, the present invention usually uses the verb " to have1 'to indicate the possessive of the specified person; for example, `` PauPs computer is fast " This sentence will be expressed (in algebraic format) as `` paul have (computer fast)' 'or " (computer of paui) fast "; if unnamed, generic possessive pronouns are used (eg " (computer my) fast "). Therefore, the module 3925 can be configured to recognize a structure such as' 'Paul's', and return an appropriate structure according to the present invention.
4HICKMAN/200021TW; AND1P115.TW 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -------------— (請先閱讀背面之注音心事項再填寫本頁) 訂---------線 _· 經濟部智慧財產局員工消費合作社印製 548631 Α7 Β7 經濟部智慧財產局員工消費合作社印制农 五、發明說明(丨 由此可以看出,以上所述代表了一種在多種語言間翻譯的 方便又迅速的方法。此處所使用的字詞和表示式只是作爲 說明,本發明的應用並不限於此,而且使用這些字詞和表 示式並不排除顯示及討論特性或部分的任何同等應用’我 們確定在本發明申請的範圍內有許多可能的變化應用。例 如,在一般用途的電腦,使用適當的軟體指令或硬體電路’ 或者軟硬體的組合,就可以實行本發明的各種模組。 前面討論過各種的具體應用,不過要了解的是,這些都是 說明範例,並不是本發明的應用限制。因此,較佳具體實 施例的應用面和範圍並不受前面任何示範性具體實施例的 限制,而是應根據以下的專利申請項目及其同等應用定 義0 4HICKMAN/200021TW; AND1P115.TW 147 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) --------------------訂---------線 (請先閱讀背面之注意事項再填寫本頁)4HICKMAN / 200021TW; AND1P115.TW This paper size is applicable to Chinese National Standard (CNS) A4 (210 X 297 mm) -------------— (Please read the note on the back of the phonetic note first, then (Fill in this page) Order --------- line _ · Printed by the Employees ’Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 548631 Α7 Β7 Printed by the Employees’ Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs It can be seen that the above represents a convenient and fast method of translating between multiple languages. The words and expressions used here are for illustration only, and the application of the present invention is not limited to this, and these words and expressions are used The formula does not preclude the display and discussion of any equivalent application of features or parts. 'We determine that there are many possible variations of applications within the scope of this application. For example, in a general-purpose computer, using appropriate software instructions or hardware circuits' or The combination of software and hardware can implement the various modules of the present invention. Various specific applications have been discussed previously, but it should be understood that these are illustrative examples and are not limitations of the application of the present invention Therefore, the application scope and scope of the preferred embodiments are not limited by any of the foregoing exemplary embodiments, but should be defined according to the following patent application items and their equivalent applications. 0 4HICKMAN / 200021TW; AND1P115.TW 147 sheets Standards are applicable to China National Standard (CNS) A4 specifications (210 X 297 mm) -------------------- Order --------- line (please (Read the notes on the back before filling out this page)
Claims (1)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US38774599A | 1999-08-31 | 1999-08-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
TW548631B true TW548631B (en) | 2003-08-21 |
Family
ID=23531225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW89117686A TW548631B (en) | 1999-08-31 | 2001-02-08 | System, method, and article of manufacture for a voice recognition system for identity authentication in order to gain access to data on the Internet |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU7115400A (en) |
TW (1) | TW548631B (en) |
WO (1) | WO2001016940A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI398793B (en) * | 2009-11-24 | 2013-06-11 | Nat Univ Chin Yi Technology | Method for performing computerized document editing by the respiration and the applied device thereof |
TWI416904B (en) * | 2010-04-07 | 2013-11-21 | Hon Hai Prec Ind Co Ltd | System and method for restricting a user to read content of web pages |
US9320068B2 (en) | 2012-04-20 | 2016-04-19 | Wistron Corporation | Information exchange method and information exchange system |
TWI569176B (en) * | 2015-01-16 | 2017-02-01 | 新普科技股份有限公司 | Method and system for identifying handwriting track |
US9691393B2 (en) | 2010-05-24 | 2017-06-27 | Microsoft Technology Licensing, Llc | Voice print identification for identifying speakers at an event |
TWI677852B (en) * | 2017-07-20 | 2019-11-21 | 大陸商北京三快在線科技有限公司 | A method and apparatus, electronic equipment, computer readable storage medium for extracting image feature |
TWI742562B (en) * | 2019-03-18 | 2021-10-11 | 德商贏創運營有限公司 | Speech-to-text conversion of unsupported technical language |
TWI749683B (en) * | 2020-08-04 | 2021-12-11 | 香港商女媧創造股份有限公司 | Interactive companion system and method thereof |
TWI794342B (en) * | 2018-01-25 | 2023-03-01 | 南韓商三星電子股份有限公司 | Application processor supporting low power echo cancellation, electronic device including the same and method of operating the same |
TWI833328B (en) * | 2022-08-16 | 2024-02-21 | 乂迪生科技股份有限公司 | Reality oral interaction evaluation system |
CN118351576A (en) * | 2024-06-14 | 2024-07-16 | 四川大学 | Diffusion model fine tuning method and system for identity maintenance |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7222075B2 (en) | 1999-08-31 | 2007-05-22 | Accenture Llp | Detecting emotions using voice signal analysis |
US7027986B2 (en) | 2002-01-22 | 2006-04-11 | At&T Corp. | Method and device for providing speech-to-text encoding and telephony service |
US8265931B2 (en) | 2002-01-22 | 2012-09-11 | At&T Intellectual Property Ii, L.P. | Method and device for providing speech-to-text encoding and telephony service |
US7088220B2 (en) | 2003-06-20 | 2006-08-08 | Motorola, Inc. | Method and apparatus using biometric sensors for controlling access to a wireless communication device |
EP1708172A1 (en) * | 2005-03-30 | 2006-10-04 | Top Digital Co., Ltd. | Voiceprint identification system for E-commerce |
GB2514943A (en) * | 2012-01-24 | 2014-12-10 | Auraya Pty Ltd | Voice authentication and speech recognition system and method |
US9251792B2 (en) | 2012-06-15 | 2016-02-02 | Sri International | Multi-sample conversational voice verification |
CN107123427B (en) * | 2016-02-21 | 2020-04-28 | 珠海格力电器股份有限公司 | Method and device for determining noise sound quality |
GB2552722A (en) | 2016-08-03 | 2018-02-07 | Cirrus Logic Int Semiconductor Ltd | Speaker recognition |
GB2552723A (en) * | 2016-08-03 | 2018-02-07 | Cirrus Logic Int Semiconductor Ltd | Speaker recognition |
CN107393541B (en) * | 2017-08-29 | 2021-05-07 | 百度在线网络技术(北京)有限公司 | Information verification method and device |
CN108022600B (en) * | 2017-10-26 | 2021-08-17 | 珠海格力电器股份有限公司 | Equipment control method and device, storage medium and server |
CN111179943A (en) * | 2019-10-30 | 2020-05-19 | 王东 | Conversation auxiliary equipment and method for acquiring information |
EP4091164A4 (en) * | 2020-01-13 | 2024-01-24 | The Regents Of The University Of Michigan | Secure automatic speaker verification system |
CN116698680B (en) * | 2023-08-04 | 2023-09-29 | 天津创盾智能科技有限公司 | Automatic monitoring method and system for biological aerosol |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6266640B1 (en) * | 1996-08-06 | 2001-07-24 | Dialogic Corporation | Data network with voice verification means |
WO1998023062A1 (en) * | 1996-11-22 | 1998-05-28 | T-Netix, Inc. | Voice recognition for information system access and transaction processing |
-
2000
- 2000-08-31 AU AU71154/00A patent/AU7115400A/en not_active Abandoned
- 2000-08-31 WO PCT/US2000/024365 patent/WO2001016940A1/en active Application Filing
-
2001
- 2001-02-08 TW TW89117686A patent/TW548631B/en not_active IP Right Cessation
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI398793B (en) * | 2009-11-24 | 2013-06-11 | Nat Univ Chin Yi Technology | Method for performing computerized document editing by the respiration and the applied device thereof |
TWI416904B (en) * | 2010-04-07 | 2013-11-21 | Hon Hai Prec Ind Co Ltd | System and method for restricting a user to read content of web pages |
US9691393B2 (en) | 2010-05-24 | 2017-06-27 | Microsoft Technology Licensing, Llc | Voice print identification for identifying speakers at an event |
US9320068B2 (en) | 2012-04-20 | 2016-04-19 | Wistron Corporation | Information exchange method and information exchange system |
TWI547808B (en) * | 2012-04-20 | 2016-09-01 | 緯創資通股份有限公司 | Information exchange method and information exchange system |
TWI569176B (en) * | 2015-01-16 | 2017-02-01 | 新普科技股份有限公司 | Method and system for identifying handwriting track |
TWI677852B (en) * | 2017-07-20 | 2019-11-21 | 大陸商北京三快在線科技有限公司 | A method and apparatus, electronic equipment, computer readable storage medium for extracting image feature |
TWI794342B (en) * | 2018-01-25 | 2023-03-01 | 南韓商三星電子股份有限公司 | Application processor supporting low power echo cancellation, electronic device including the same and method of operating the same |
TWI742562B (en) * | 2019-03-18 | 2021-10-11 | 德商贏創運營有限公司 | Speech-to-text conversion of unsupported technical language |
TWI749683B (en) * | 2020-08-04 | 2021-12-11 | 香港商女媧創造股份有限公司 | Interactive companion system and method thereof |
TWI833328B (en) * | 2022-08-16 | 2024-02-21 | 乂迪生科技股份有限公司 | Reality oral interaction evaluation system |
CN118351576A (en) * | 2024-06-14 | 2024-07-16 | 四川大学 | Diffusion model fine tuning method and system for identity maintenance |
Also Published As
Publication number | Publication date |
---|---|
WO2001016940A1 (en) | 2001-03-08 |
AU7115400A (en) | 2001-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TW491991B (en) | System, method, and article of manufacture for a voice recognition system for navigating on the internet utilizing audible information | |
TW548631B (en) | System, method, and article of manufacture for a voice recognition system for identity authentication in order to gain access to data on the Internet | |
US6275806B1 (en) | System method and article of manufacture for detecting emotion in voice signals by utilizing statistics for voice signal parameters | |
CA2353688C (en) | A system, method, and article of manufacture for detecting emotion in voice signals through analysis of a plurality of voice signal parameters | |
US20020002460A1 (en) | System method and article of manufacture for a voice messaging expert system that organizes voice messages based on detected emotions | |
US20020010587A1 (en) | System, method and article of manufacture for a voice analysis system that detects nervousness for preventing fraud | |
US20010056349A1 (en) | 69voice authentication system and method for regulating border crossing | |
US20020002464A1 (en) | System and method for a telephonic emotion detection that provides operator feedback | |
Davis et al. | Leading up the lexical garden path: Segmentation and ambiguity in spoken word recognition. | |
Scherer | On the symbolic functions of vocal affect expression | |
Schroeder | Computer speech: recognition, compression, synthesis | |
JP2006061632A (en) | Emotion data supplying apparatus, psychology analyzer, and method for psychological analysis of telephone user | |
Foucart et al. | How do you know I was about to say “book”? Anticipation processes affect speech processing and lexical recognition | |
Hay et al. | Hearing r-sandhi: The role of past experience | |
KR101779358B1 (en) | voice recognition application controlling method based on smartphone | |
CN108682413A (en) | A kind of emotion direct system based on voice conversion | |
Stemberger et al. | Interference between phonemes during monitoring: Evidence for an interactive activation model of speech perception. | |
Watt et al. | Forensic phonetics and automatic speaker recognition: The complementarity of human-and machine-based forensic speaker comparison | |
Schuller et al. | Computational charisma—A brick by brick blueprint for building charismatic artificial intelligence | |
Kittredge et al. | Effects of nonlinguistic auditory variations on lexical processing in Broca’s aphasics | |
De Jong-Lendle | Speaker identification | |
Jablon | Womanist 1 Storytelling: The Voice of the Vernacular | |
Wood | Social reinforcement, appeasement, and punishment: The multiple functions of laughter | |
CN117352002A (en) | Remote intelligent voice analysis supervision method | |
Kilburn | Answering machine, auto-tune, spectrograph: queer vocality through sonic technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GD4A | Issue of patent certificate for granted invention patent | ||
MK4A | Expiration of patent term of an invention patent |