KR20150067139A

KR20150067139A - Social music system and method with continuous, real-time pitch correction of vocal performance and dry vocal capture for subsequent re-rendering based on selectively applicable vocal effect(s) schedule(s)

Info

Publication number: KR20150067139A
Application number: KR1020157005988A
Authority: KR
Inventors: 지니 양; 니콜라스 엠 크루지; 그레고리 씨 톰슨; 페리 쿡
Original assignee: 스뮬, 인코포레이티드
Priority date: 2012-08-07
Filing date: 2013-08-06
Publication date: 2015-06-17
Also published as: JP2015534095A; WO2014025819A1; KR102246623B1; JP6371283B2

Abstract

모바일 디바이스 플랫폼들 및 애플리케이션 실행 환경들에 의해 부과되는 많은 실시적 제한들에도 불구하고, 보컬 음악 연주들은 캡쳐될 수 있고, 몇몇 경우들 또는 실시예들에서, 강력한 사용자 경험들을 생성하는 방식으로 반주들과 믹싱 및 렌더링하기 위한 사용자 선택가능 보컬 효과 스케줄에 따라 피치-보정 및/또는 프로세싱될 수 있다. 몇몇 경우들에서, 개별적인 사용자들의 보컬 연주들은 반주의 가청 렌더링에 대응하여 가사가 카라오케-스타일로 표시되는 상황에서 모바일 디바이스들 상에서 캡쳐된다. 이러한 연주들은 피치 보정 세팅들에 따라 모바일 디바이스에서 실시간으로 피치-보정될 수 있다. 보컬 효과 스케줄들은 또한 이러한 연주들에 선택적으로 적용될 수 있다. 이러한 방식으로, 심지어 불완전한 피치를 갖는 아마추어 사용자/연주자들도, "스타덤"에 도전하는 것 및/또는 세계적 스케일에서 음악적 콜라보레이션을 용이하게 하는 게임 플레이, 소셜 네트워크 또는 보컬 달성 애플리케이션 아키텍쳐에 참여하는 것 및/또는 몇몇 경우들 또는 실시예들에서, 수익을 발생하는 애플리케이션 내 트랜잭션을 개시하는 것을 행할 용기를 얻는다.Despite the many practical limitations imposed by mobile device platforms and application execution environments, vocal music performances can be captured and, in some instances or embodiments, And / or processed according to a user selectable vocal effects schedule for mixing and rendering. In some cases, vocal performances of individual users are captured on mobile devices in a situation where the lyrics are displayed in a karaoke-style in response to an audible rendering of the accompaniment. These performances can be pitch-corrected in real time in the mobile device according to the pitch correction settings. Vocal effect schedules may also be selectively applied to these performances. In this way, even amateur users / performers with imperfect pitches can be challenged to challenge "stardom " and / or participate in gameplay, social network or vocal accomplishment application architecture that facilitates musical collaboration on a global scale, And / or in some cases or embodiments, obtain a container to do so to initiate a transaction in an application that generates revenue.

Description

TECHNICAL FIELD [0001] The present invention relates to a social music system and method by continuous real-time pitch correction of vocal performance and dry vocal capture for subsequent re-rendering based on selectively applicable vocal effect schedules PERFORMANCE AND DRY VOCAL CAPTURE FOR SUBSEQUENT RE-RENDERING BASED ON SELECTIVELY APPLICABLE VOCAL EFFECT (S) SCHEDULE (S)}

본 발명은 일반적으로 보컬 연주들(performances)의 캡쳐 및/또는 프로세싱에 관한 것이고, 더 상세하게는, 캡쳐된 보컬들에 보컬 효과 스케줄들을 선택적으로 적용하기에 적합한 기술들에 관한 것이다.
The present invention generally relates to the capture and / or processing of vocal performances, and more particularly to techniques suitable for selectively applying vocal effect schedules to captured vocals.

모바일 폰 및 다른 휴대용 컴퓨팅 디바이스의 설치 기반은 매일 수 및 연산 능력 면에서 급격하게 성장한다. 전세계 사람들의 생활에 매우 흔하고 깊게 자리잡은 이러한 모바일 폰 및 다른 휴대용 컴퓨팅 디바이스들은 거의 모든 문화적 및 경제적 장벽을 초월한다. 연산적으로, 오늘날의 모바일 폰들은 10년도 안되어 데스크탑 컴퓨터에 필적할만한 속도 및 저장 능력을 제공하여, 이들이 실시간 음향 합성 및 다른 음악적 애플리케이션들에 놀랍도록 적합하게 한다. 그 결과 부분적으로, Apple Inc.로부터 입수가능한 iPhone^® 핸드헬드 디지털 디바이스와 같은 몇몇 현대의 모바일 폰들은 오디오 및 비디오 재생을 매우 훌륭하게 지원한다.The installed base of mobile phones and other portable computing devices is growing rapidly in terms of daily numbers and computing power. These mobile phones and other portable computing devices, which are very common and deeply embedded in the lives of people around the world, transcend almost all cultural and economic barriers. Operationally, today's mobile phones offer unparalleled speed and storage power in desktop computers in less than a decade, making them astonishingly fit for real-time sound synthesis and other musical applications. As a result, in part, some of the modern mobile phones, such as iPhone ^® handheld digital devices, available from Apple Inc. are very well supported audio and video playback.

전통적인 음향 악기들과 유사하게, 모바일 폰들은 친숙한 음향 생성 디바이스들일 수 있다. 그러나, 대부분의 전통적인 악기들에 비해, 모바일 폰들은 음향 대역폭 및 전력에서 다소 제한된다. 이러한 단점들에도 불구하고, 모바일 폰들은 어디에서나 사용할 수 있다는 점(Ubiquity), 숫자에서의 강점 및 극도의 이동성의 이점들을 가져서, 거의 언제 어느 곳에서든, 즉흥 연주(jam sessions), 리허설, 및 심지어 공연을 위해 아티스트들을 한데 모으는 것이 (적어도 이론적으로) 실현가능하다. 모바일 음악 분야는 몇몇 연구 개발 단체들에서 개발되었다. 일반적으로, 2009년 New Interfaces for Musical Expression, Pittsburgh (June 2009)에서 제시된 G. Wang, Designing Smule's iPhone Ocarina 참조한다. 또한, iPhone^®, iPad^®, iPod Touch^® 및 다른 iOS^® 디바이스들을 위한 Smule, Inc.로부터 입수가능한 Ocarina™, Leaf Trombone: World Stage™, 및 I Am T-Pain™ 애플리케이션들과 같은 애플리케이션들에 의한 경험은, 진보된 디지털 음향 기술들이 강력한 사용자 경험을 제공하는 방식으로 전달될 수 있음을 보여주었다. iPhone, iPad, iPod Touch는 Apple, Inc.의 상표이다. iOS는 라이센스 하에서 Apple에 의해 이용되는 Cisco Technology, Inc.의 상표이다.Similar to traditional acoustic instruments, mobile phones can be familiar sound generating devices. However, compared to most traditional instruments, mobile phones are somewhat limited in acoustic bandwidth and power. Despite these shortcomings, mobile phones have advantages of being able to use them everywhere (Ubiquity), numerical strengths and extreme mobility, so that almost anywhere, anytime, jam sessions, rehearsals, It is feasible (at least theoretically) to bring the artists together for the performance. The mobile music field was developed by several research and development organizations. In general, see G. Wang and Designing Smule's iPhone Ocarina presented at 2009 New Interfaces for Musical Expression, Pittsburgh (June 2009). In ^{^{addition, iPhone ®, iPad ®, iPod}} Touch ® and possible Ocarina ™ available from Smule, Inc. for other iOS ^® devices, Leaf Trombone: by applications such as the World Stage ™, and I Am T-Pain ™ applications Experience has shown that advanced digital sound technologies can be delivered in a way that provides a strong user experience. iPhone, iPad, iPod Touch are trademarks of Apple, Inc. iOS is a trademark of Cisco Technology, Inc., used by Apple under license.

디지털 음향 연구자들은 현존하는 상당한 현실적 난제들인, 프로세서, 메모리 및 이들의 다른 제한된 연산 자원들에 의해 부과되는 현실의 제약들 내에서 그리고/또는 무선 네트워크들에 통상적인 통신 대역폭 및 송신 지연 제약들 내에서 동작가능한 iPhone^® 핸드헬드 및 다른 플랫폼들과 같은 현대의 핸드헬드 디바이스들에 활용가능한 상업적 애플리케이션들로 자신들의 혁신을 이전시키고자 한다. 개선된 기술들, 기능들 및 사용자 경험들이 요구된다.
Digital acoustic researchers have found that within the constraints of the real world that are imposed by the processor, the memory and other limited computing resources thereof, which are considerable real-world challenges, and / or within the communication bandwidth and transmission delay constraints common to wireless networks before their innovation operable iPhone ^® handheld and possible commercial applications used in handheld devices such as Hyundai and other platforms and parties. Improved techniques, features and user experiences are required.

모바일 디바이스 플랫폼들 및 애플리케이션 실행 환경들에 의해 부과되는 많은 실시적 제한들에도 불구하고, 보컬 음악 연주들은 캡쳐될 수 있고, 몇몇 경우들 또는 실시예들에서, 강력한 사용자 경험들을 생성하는 방식으로 반주들과 믹싱 및 렌더링하기 위한 사용자 선택가능 보컬 효과 스케줄에 따라 피치-보정 및/또는 프로세싱될 수 있음이 발견되었다. 몇몇 경우들에서, 개별적인 사용자들의 보컬 연주들은 반주의 가청 렌더링들에 대응하여 가사들을 카라오케-스타일로 제시하는 상황에서 모바일 디바이스들 상에서 캡쳐된다. 이러한 연주들은 피치 보정 세팅들에 따라 모바일 디바이스에서 (또는 더 일반적으로, 모바일 폰, 개인 휴대 정보 단말, 랩탑 컴퓨터, 노트북 컴퓨터, 패드-타입 컴퓨터 또는 넷북과 같은 휴대용 컴퓨팅 디바이스에서) 실시간으로 피치-보정될 수 있다. 보컬 효과 스케줄들은 또한 이러한 연주들에 선택적으로 적용될 수 있다. 이러한 방식으로, 심지어 불완전한 피치를 갖는 아마추어 사용자/연주자들도, "스타덤"에 도전하고/하거나, 세계적 규모의 음악적 콜라보레이션을 용이하게 하는 게임 플레이, 소셜 네트워크 또는 보컬 달성 애플리케이션 아키텍쳐에 참여하고/하거나, 몇몇 경우들 또는 실시예들에서, 수익을 생성하는 애플리케이션 내 트랜잭션(transaction)을 개시하도록 격려받는다.Despite the many practical limitations imposed by mobile device platforms and application execution environments, vocal music performances can be captured and, in some instances or embodiments, And / or processed according to a user-selectable vocal effects schedule for mixing and rendering. In some cases, vocal performances of individual users are captured on mobile devices in a situation where they present karaoke-style lyrics in response to the audible renderings of the accompaniment. These performances may be adjusted in real time by the pitch correction settings in the mobile device (or more generally in a mobile computing device such as a mobile phone, personal digital assistant, laptop computer, notebook computer, pad-type computer or netbook) . Vocal effect schedules may also be selectively applied to these performances. In this way, even amateur users / performers with incomplete pitches may participate in game play, social network or vocal accomplishment application architectures that challenge "stardom " and / or facilitate world-class musical collaboration, In some cases or embodiments, it is encouraged to initiate a transaction within the application that generates revenue.

몇몇 경우들 또는 실시예들에서, 이러한 트랜잭션들은, 캡쳐된 보컬들에 선택적으로 적용될 수 있는 아티스트-, 노래- 및/또는 연주-별 보컬 효과 스케줄의 컴퓨터 판독가능 인코딩의 구매 또는 라이센스를 포함할 수 있다. 몇몇 경우들 또는 실시예들에서, 보컬 효과 스케줄은 음악 장르에 특정된다. 몇몇 경우들 또는 실시예들에서, 트랜잭션들은 가사, 타이밍 및/또는 피치 보정 세팅들 또는 플러그-인들의 컴퓨터 판독가능 인코딩의 구매 또는 라이센스를 포함할 수 있다. 몇몇 경우들 또는 실시예들에서, 트랜잭션들은, 보컬 연주의 전부 또는 일부에 대해 "다시하기" 또는 재시도하는 것의 구매를 포함할 수 있다. 몇몇 경우들 또는 실시예들에서, 애플리케이션 내 구매 타입 트랜잭션들에 추가하여 (또는 그 대신에) 보컬 효과 스케줄들, 가사, 타이밍, 피치 보정 세팅들 및/또는 재시도들의 컴퓨터 판독가능 인코딩들에 대한 액세스는 보컬 달성에 따라 (예를 들어, 목표 스코어 또는 다른 보컬 연주에 대응하는 피치, 타이밍 또는 다른 것에 기초하여) 또는 게임 플레이 로직의 성공적 순회에 기초하여 획득될 수 있다.In some cases or embodiments, such transactions may include purchasing or licensing a computer-readable encoding of artist-, song- and / or performance-specific vocal effects schedules that may optionally be applied to the captured vocals have. In some cases or embodiments, the vocal effects schedule is specific to a music genre. In some cases or embodiments, transactions may include purchasing or licensing of computer-readable encoding of lyric, timing and / or pitch correction settings or plug-ins. In some instances or embodiments, transactions may include purchasing a "replay" or retry of all or a portion of the vocal performance. In some cases or embodiments, in addition to (or instead of) purchase-type transactions in the application, the computer-readable encodings of vocal effects schedules, lyrics, timing, pitch correction settings and / Access may be obtained based on the vocal achievement (e.g., based on the pitch, timing, or other corresponding to the target score or other vocal performance) or successful traversal of the game play logic.

보컬 효과 스케줄 트랜잭션들에서와 같이, 애플리케이션 또는 소셜 네트워크 인프라 구조에 의해 이루어지는 소셜 상호작용들, 예를 들어, 그룹 형성, 그룹에 참여, 연주 공유, 공개 모집 오디션(open call) 개시 등은 "다시하기" 또는 재시도 권한을 수반하는 트랜잭션들에 대한 적용가능한 통화(currency) 또는 크레딧을 생성한다. 몇몇 경우들에서, 광고 컨텐츠를 사용자가 보는 것이 이러한 트랜잭션들을 위한 적용가능한 통화 또는 크레딧을 생성할 수 있다.As in the vocal effect scheduling transactions, social interactions performed by the application or social network infrastructure, such as group formation, group participation, performance sharing, open call initiation, etc., "Or an applicable currency or credit for transactions involving retry privileges. In some cases, the viewing of the advertisement content by the user may generate an applicable currency or credit for these transactions.

몇몇 경우들 또는 실시예들에서, 피치 보정 세팅들은 보컬 연주에 대한 또는 그 일부에 대한 특정한 키 또는 스케일을 코딩한다. 몇몇 경우들 또는 실시예들에서, 피치 보정 세팅들은, 가사 및 반주를 공급받는 또는 그와의 연관을 위한 스코어-코딩된 멜로디 및/또는 화성 시퀀스를 포함한다. 화성 음표들 또는 화음들은 명시적 목표들로 또는 스코어 코딩된 멜로디에 대해 또는 심지어, 원한다면, 보컬리스트가 내는 실제 피치들에 대해 코딩될 수 있다. 몇몇 경우들 또는 실시예들에서, 가사 및 반주를 공급받는 또는 그와의 연관을 위한 보컬 효과 스케줄들 및/또는 피치 보정 세팅들은, 협력된 보컬 연주의 오직 일부 (예를 들어, 리드 보컬, 백업 아티스트의 보컬, 코러스 또는 후렴, 듀엣 파트 또는 세 파트 화성의 일부 등)와 관련될 수 있다.In some cases or embodiments, the pitch correction settings code a particular key or scale for the vocal performance or a portion thereof. In some cases or embodiments, the pitch correction settings include a score-coded melody and / or Mars sequence for receiving or associating with lyrics and accompaniment. Mars notes or chords can be coded for explicit pitches or for scored coded melodies, or even, if desired, for actual pitches produced by the vocalist. In some cases or embodiments, vocal effects schedules and / or pitch correction settings for feeding or associating with lyrics and accompaniment may be used only in some (e.g., lead vocal, backup An artist's vocals, a chorus or chorus, a duet part, or a part of a three-part Mars).

이러한 다양한 방식들로, 사용자 연주들 (통상적으로 아마추어 보컬리스트들의 연주들)은 톤 또는 연주 품질에서 상당히 개선될 수 있고, 사용자는 즉각적이고 용기를 주는 피드백을 제공받을 수 있고, 몇몇 경우들 또는 실시예들에서, 사용자는 선호하는 아티스트, 대표적인 연주 또는 음악적 장르의 스타일 또는 페르소나를 모방하거나 띨 수 있다. 통상적으로, 피드백은, 사용자/보컬리스트가 정확한 음표를 맞추는 경우 (보컬 캡쳐 동안) 피치-보정된 보컬들 자체 및 시각적 강화 둘 모두를 포함할 수 있다. 일반적으로, "정확한" 음표들은, 키와 일치하고, 연주의 특정한 포인트에 따라 예상되는 스코어-코딩된 멜로디 또는 화성에 대응하는 그러한 음표들이다. 그러한 점에서, 작용하는 스코어가 없고 즉석의 오프 스코어를 용이하게 하거나 특정한 피치 보정 세팅들이 디스에이블되는 아카펠라 모드들에서, 주어진 보컬 연주에서 들리는 피치들은 선택적으로, 특정한 키 또는 스케일에 가장 근접한 음표들(예를 들어, C 메이저, C 마이너, E 플랫 메이저 등)로만 보정될 수 있다. 각각의 경우, "정확한" 음표들의 보컬 음향은 (예를 들어, 게임 플레이 시퀀스에서) 사용자-보컬리스트 포인트들 및/또는 (예를 들어, 애플리케이션 내 트랜잭션 프레임워크에서) 크레딧을 획득할 수 있다. 일반적으로, 이러한 포인트들 또는 크레딧들은, (부분적으로는 핸드헬드 디바이스에서 구현되는 트랜잭션 핸들링 로직을 이용하여) 추가적인 아티스트-, 노래-, 연주-, 또는 음악적 장르에 특정된 보컬 효과 스케줄들의, 또는 심지어 이전에 캡쳐된 보컬 연주의 사용자 선택가능한 부분에 대한 보컬 캡쳐 "다시하기"의 추가적인 보컬 스코어들 및 가사의 구매 또는 라이센스에 적용될 수 있다.In these various ways, user performances (typically performances of amateur vocalists) can be significantly improved in tone or performance quality, and the user can be provided with immediate and courageous feedback, In the examples, the user may emulate or assume the style of a preferred artist, representative performance or musical genre, or a persona. Typically, the feedback may include both the pitch-corrected vocals themselves and the visual enhancement when the user / vocalist fits the correct note (during vocal capture). Typically, "correct" notes are those notes that match the key and correspond to a score-coded melody or harmony expected according to a particular point in the performance. In that regard, in a cappella modes in which there is no working score and facilitates an immediate off-score or in which specific pitch correction settings are disabled, the pitches heard in a given vocal performance may optionally include notes that are closest to a particular key or scale For example, C major, C minor, E flat major, etc.). In each case, the vocal sound of the "correct" notes may acquire user-vocal list points (e.g., in a game play sequence) and / or credits (e.g. In general, these points or credits may include vocal effects schedules specific to additional artist-, song-, performance-, or musical genres (or partially using transaction handling logic implemented in a handheld device) Can be applied to additional vocal scores of the vocal capture "repeat" for the user-selectable portion of the previously captured vocal performance and to the purchase or licensing of the lyrics.

아티스트-, 노래-, 연주- 또는 음악적 장르에 특정된 보컬 효과들의 그리고 피치-보정된 보컬들의 강력하고 변환적인 성질에 기초하여, 사용자/보컬리스트들은, 자신들의 보컬 연주들을 공유하는 것과 연관된 다른 자연스러운 수줍음 또는 불안감을 극복할 수 있다. 그 대신, 심지어 단지 아마추어들은 친구들 및 가족과 공유하도록, 또는 가상 "합창단" 또는 "공개 모집"의 일부로서 보컬 연주들을 협력하고 기여하도록 용기를 얻는다. 몇몇 구현들에서, 이러한 상호작용들은, 연주들의 소셜 네트워크- 및/또는 eMail-중재된 공유 및 그룹 연주에 참여하라는 초대를 통해 용이하게 된다. 전술된 휴대용 컴퓨팅 디바이스들과 같은 클라이언트들에서 캡쳐된 업로드된 보컬들을 이용하여, 컨텐츠 서버(또는 서비스)는 다수의 기여한 보컬리스트들의 업로드된 보컬 연주들을 조작 및 믹싱함으로써 이러한 가상 합창단 또는 공개 모집을 중재할 수 있다. 특정한 시스템의 목적들 및 구현에 따라, 업로드들은 (i) 보컬 효과 스케줄 및/또는 피치-보정의 적용(재적용)에 적합한 사용자의 캡쳐된 보컬 연주의 드라이 보컬 버전들, (ii) (화성을 갖거나 갖지 않는) 피치-보정된 보컬 연주들 및/또는 (iii) 사용자 키, 피치 보정 및/또는 보컬 효과 스케줄 선택의 제어 트랙들 또는 다른 표시, 등을 포함할 수 있다. 업로드에 드라이 보컬들을 포함시킴으로써, 선택가능한 보컬 효과 스케줄에 의한 (컨텐츠 서버 또는 서비스에서의) 포스트-프로세싱, 및 각각의 보컬 기여들의, 적절한 스코어 또는 연주 템플릿 슬롯들 또는 포지션으로의 믹싱, 크로스 페이딩 및/또는 피치 시프팅에 상당한 유연성이 부여된다.Based on the powerful and translational nature of vocal effects specific to artist-, song-, performance- or musical genres and of pitch-calibrated vocals, user / vocalists can use other natural Shyness or anxiety can be overcome. Instead, even just amateurs are encouraged to share and contribute to their vocal performances as part of a virtual "choir" or "open recruitment" to share with friends and family. In some implementations, these interactions are facilitated through invitations to participate in social network- and / or eMail-mediated sharing and group performance of performances. Using the uploaded vocals captured at clients such as the portable computing devices described above, the content server (or service) arbitrates these virtual choruses or public offerings by manipulating and mixing uploaded vocal performances of a number of contributed vocalists can do. Depending on the purposes and implementation of a particular system, the uploads may include (i) dry vocal versions of the user's captured vocal performance suitable for application (re-application) of the vocal effects schedule and / or pitch-correction, (ii) (Iii) control tracks or other indications of the user key, pitch correction and / or vocal effects schedule selection, and the like. By including dry vocals in the upload, post-processing (in a content server or service) by a selectable vocal effects schedule, and mixing, crossfading and mixing of each vocal contribution into the appropriate score or performance template slots or positions, Considerable flexibility is imparted to the pitch shifting.

가상 합창단 또는 공개 모집 오디션은 임의의 다양한 방식들로 이루어질 수 있다. 예를 들어, 몇몇 경우들 또는 실시예들에서, 휴대용 컴퓨팅 디바이스에서 반주에 대해 캡쳐된 (그리고 사용자 보컬리스트를 수행하는 이득을 위해 스코어-코딩된 멜로디 및/또는 화성 큐들(cues)에 따라 피치 보정된) 제 1 사용자의 보컬 연주는 컨텐츠 서버 또는 서비스를 통해 다른 잠재적인 보컬 연주자들에게 공급된다. 통상적으로, 캡쳐된 보컬 연주는 피치-보정 및/또는 보컬 효과 스케줄 세팅들 또는 선택들과 연관가능한 인코딩가능한 형태로 또는 그와 함께 드라이 보컬로서 공급된다. 보컬 효과 스케줄은 공급된 보컬 연주(또는 그 일부)에 선택적으로 (컨텐츠 서버 또는 서비스 또는, 선택적으로는 휴대용 컴퓨팅 디바이스에서) 적용될 수 있고, 그 결과는, 제 2 사용자의 보컬이 캡쳐될 수 있는 제 2 생성 반주를 형성하기 위해 배경 악기/보컬과 믹싱된다.The virtual chorus or open recruitment audition can be done in any of a variety of ways. For example, in some cases or embodiments, a portable computing device may be capable of capturing (e.g., pitch-compensated) cues that have been captured for an accompaniment (and which are scored according to score-coded melody and / or cosmic cues The vocal performance of the first user is provided to the other potential vocal players through the content server or service. Typically, the captured vocal performance is supplied as a dry vocal with or in an encodable form associated with pitch-correction and / or vocal effects schedule settings or selections. The vocal effect schedule may be selectively applied to the supplied vocal performance (or a portion thereof) (in a content server or service or, optionally, a portable computing device), the result being a 2 are mixed with background musical instruments / vocals to form a production accompaniment.

몇몇 경우들에서, 연속적인 보컬 기여자들은 지리적으로 분리되고, 서로 (적어도 선험적으로는) 모를 수 있지만, 협력적 경험 자체와 함께 보컬들 서로의 친밀도가 이러한 물리적 분리를 최소화시키는 경향이 있다. 다른 경우들에서, 공개 모집 오디션은 개시하는 사용자-보컬리스트에 의해 또는 달리 그와 연관가능하게 선택되는 잠재적 기여자들의 그룹에 포스팅될 수 있다. 연속적인 보컬 연주들이 (예를 들어, 각각의 휴대용 컴퓨팅 디바이스들에서) 캡쳐되고, 공개 모집 오디션에 대한 응답으로 또는 가상 합창단의 일부로서 추가됨에 따라, 각각의 보컬들이 캡쳐되는 반주는 진화하여, 다른 "멤버들" 또는 공개 모집 오디션 응모자들의 이전에 캡쳐된 보컬들을 포함할 수 있다. 몇몇 경우들에서, 캡쳐된 보컬 연주들의 드라이 보컬 버전들을 저장 또는 유지하는 것은, 변경가능한(또는 추후에 선택가능한) 보컬 효과 스케줄들의 적용을 용이하게 할 수 있다.In some cases, consecutive vocal contributors may be geographically separated and may not know each other (at least a priori), but the intimacy of the vocals with the cooperative experience itself tends to minimize this physical separation. In other cases, the open recruitment audition may be posted to a group of potential contributors selected by the initiating user-vocalist or otherwise associable therewith. As successive vocal performances are captured (e.g., in each portable computing device), added in response to an open recruitment audition, or as part of a virtual choir, the accompaniment with which each vocal is captured evolves, Quot; members "or public recruitment audition entrants. In some cases, storing or maintaining dry vocal versions of the captured vocal performances may facilitate application of changeable (or later selectable) vocal effects schedules.

특정한 시스템의 목적들 및 구현에 따라, 보컬 효과(EFX) 스케줄은, 사용자의 보컬 연주의 하나 이상의 각각의 부분들에의 적용을 위해, 스펙트럼 등화, 오디오 압축, 피치 보정, 스테레오 지연 및 잔향 효과들 중 하나 이상에 대한 세팅들 및/또는 파라미터들을 (컴퓨터 판독가능 매체 인코딩에) 포함시킬 수 있다. 몇몇 경우들 또는 실시예들에서, 보컬 효과 스케줄은, 아티스트, 노래 또는 연주를 특징으로 할 수 있고, 사용자의 캡쳐된 보컬 연주의 오디오 인코딩에 적용될 수 있어서, 파생 오디오 인코딩 (a derivative audio encoding) 또는 가청 렌더링이, 그 선택된 아티스트, 노래 또는 연주의 특징들을 취하게 할 수 있다.Depending on the purpose and implementation of a particular system, the vocal effects (EFX) schedule may be used for spectral equalization, audio compression, pitch correction, stereo delay and reverberation effects for application to one or more respective portions of a user's vocal performance And / or parameters (for computer-readable medium encoding) for one or more of < RTI ID = 0.0 > In some cases or embodiments, the vocal effects schedule may feature an artist, song, or performance and may be applied to audio encoding of the user's captured vocal performance, so that a derivative audio encoding or The audible rendering may cause the selected artist, song or performance features to take on.

본 명세서의 맥락에서, 보컬 효과 스케줄이란 용어는, 적어도 몇몇 경우들 또는 실시예들에서, 캡쳐된 보컬 연주의 (통상적으로는, 드라이 보컬 버전의) 일부 또는 전부에 적용될 보컬 EFX의 나열된 그리고 작용하는 세트를 포함하는 것을 의미함을 이해할 것이다. 따라서, 상이한 보컬 효과 스케줄들이 획득 또는 트랜잭션될 수 있고, 캡쳐된 드라이 보컬들에 적용되어 "Katy Perry effect" 또는 "T-Pain effect"를 제공할 수 있다. 몇몇 경우들에서, 애플리케이션 또는 소셜 네트워크 인프라 구조에 의해 이루어진 소셜 상호작용들, 예를 들어, 그룹 형성, 그룹에 참여, 연주 공유, 공개 모집 오디션 개시 등은 이러한 트랜잭션에 대한 적용가능한 통화 또는 크레딧을 생성한다. 몇몇 경우들에서, 광고 컨텐츠를 사용자가 보는 것이 이러한 트랜잭션들을 위한 적용가능한 통화 또는 크레딧을 생성할 수 있다.In the context of the present description, the term vocal effect schedule is used to denote, at least in some cases or embodiments, the listed and operative (e.g., vocal) effects of a vocal EFX to be applied to some or all of the captured vocal performance Quot; and " a " Thus, different vocal effects schedules can be acquired or transcribed and applied to captured dry vocals to provide a "Katy Perry effect" or "T-Pain effect ". In some cases, social interactions, such as group formation, group participation, performance sharing, public recruitment audition initiation, etc., made by an application or social network infrastructure, generate applicable currencies or credits for such transactions do. In some cases, the viewing of the advertisement content by the user may generate an applicable currency or credit for these transactions.

몇몇 경우들에서, 상이한 보컬 효과 스케줄들은, 특정한 아티스트 또는 노래의 스튜디오 또는 "라이브" 연주 특징에 의한 가청 렌더링의 파생 오디오 인코딩을 채우기 위해, 사용자의 캡쳐된 드라이 보컬에 적용될 수 있다. 적어도 몇몇 경우들 또는 실시예들에서, 보컬 효과 스케줄이란 용어는 추가로, (예를 들어, 노래의 프리-코러스 및 코러스 부분들에 대한 별개의 보컬 EFX 세트들, 및/또는 듀엣 또는 다른 다중-보컬리스트 연주의 각각의 부분들에 대한 별개의 보컬 효과 세트를 갖는) 보컬 스코어의 부분들에 대응하는 시간적 또는 템플릿 에서 변하는 보컬 EFX의 나열된 세트를 포함한다. 유사하게, 단일 보컬 효과 스케줄의 각각의 부분들(또는 그 상황에 대한 한 쌍의 별개의 보컬 효과 스케줄들)은 각각의 보컬 연주 캡쳐들에 대해 이용될 수 있어서, 제 1 사용자에 의해 수행된 듀엣의 제 1 부분의 보컬 연주 캡쳐에 대해 그리고 제 2 사용자에 의해 수행된 듀엣의 제 2 부분의 별개의 보컬 연주 캡쳐에 대해 적절한 각각의 EFX를 제공할 수 있다.In some cases, different vocal effects schedules may be applied to the user's captured dry vocals to fill the derivative audio encoding of the audible rendering by the studio or "live " playing feature of a particular artist or song. In at least some instances or embodiments, the term vocal effect schedule further includes additional vocals (e.g., separate vocal EFX sets for pre-chorus and chorus portions of a song, and / or duets or other multi- And a set of vocal EFXs that vary in time or in a template corresponding to portions of the vocal score (with a distinct vocal effect set for each portion of the vocalist performance). Similarly, each part of a single vocal effect schedule (or a pair of separate vocal effect schedules for that situation) may be used for each vocal performance capture, so that the duet performed by the first user For each vocal performance capture of the first part of the duet performed by the second user and for separate vocal performance captures of the second part of the duet performed by the second user.

몇몇 경우들 또는 실시예들에서, 공개 모집 오디션 관리 또는 보컬 연주 추가 로직 뿐만 아니라 청취자의 코멘트 및 랭킹을 위한 시각적 애니메이션 및/또는 설비들을 사로잡는 것은, 배경 악기들 및/또는 보컬들과 믹싱되는 (예를 들어, 다른 유사하게 구성된 모바일 디바이스에서 캡쳐된) 보컬 연주의 가청 렌더링과 관련하여 제공된다. 합성된 화성들 및/또는 추가적인 보컬들(예를 들어, 또 다른 위치들에서 다른 보컬리스트로부터 캡쳐되고 선택적으로는 다른 보컬들과 화성화되도록 피치-시프팅된 보컬들)이 또한 믹스에 포함될 수 있다. 캡쳐된 보컬 연주들(또는 결합된 연주에의 개별적인 기여들) 및/또는 청취자 피드백의 지오코딩은, 사용자 조작가능한 지구본 상의 특정한 지리적 현장으로부터 발신하는 연주 또는 보증을 나타내는 방식으로 애니메이션들을 용이하게 하거나 아티팩트들을 디스플레이할 수 있다. 이러한 방식으로, 설명된 기능의 구현들은, 달리 일상적인 모바일 디바이스들을, 고유한 의미의 글로벌 접속, 협력 및 커뮤니티를 조성하는 소셜 악기들로 변환시킬 수 있다.In some cases or embodiments capturing visual animation and / or facilities for listener comments and ranking, as well as public recruitment audition management or vocal performance add logic, may be performed by mixing background instruments and / or vocals For example, in an audible rendering of a vocal performance (captured in a similarly configured mobile device, for example). The synthesized Mars and / or additional vocals (e.g., vocals captured from other vocalists at other positions and optionally pitch-shifted to be harmonized with other vocals) can also be included in the mix have. Geocoding of captured vocal performances (or individual contributions to combined performance) and / or listener feedback may facilitate animations in a manner that represents performance or assurance originating from a particular geographic location on a user-operable globe, Lt; / RTI > In this manner, implementations of the described functionality can otherwise transform everyday mobile devices into social instruments that create a global connection, collaboration, and community of unique meanings.

본 발명의 몇몇 실시예들에서, 방법은, 보컬 연주 캡쳐를 위한 휴대용 컴퓨팅 디바이스를 이용하는 단계를 포함하고, 휴대용 컴퓨팅 디바이스는 터치 스크린, 마이크로폰 인터페이스 및 통신 인터페이스를 갖는다. 방법은, 터치 스크린 상의 사용자 선택에 대한 응답으로, 통신 인터페이스를 통해, 대응하는 반주 및 가사와 시간상 동기화된 보컬 스코어를 검색하는 단계를 포함하고, 보컬 스코어는 대응하는 반주 및 가사와 시간상 동기화된다. 휴대용 컴퓨팅 디바이스에서, 반주는 가청 렌더링되고, 동시에, 반주와 시간상 대응하는 가사의 대응하는 부분들이 디스플레이 상에 제시된다. 반주와 시간상 대응하여, 사용자의 보컬 연주는, 마이크로폰 인터페이스를 통해 캡쳐되고, 사용자의 캡쳐된 보컬 연주의 드라이 보컬 버전이 휴대용 컴퓨팅 디바이스에 저장된다. 보컬 스코어에 따라, 휴대용 컴퓨팅 디바이스는, 사용자의 캡쳐된 보컬 연주의 적어도 일부 부분들의 연속적인 실시간 피치 시프팅을 수행하고, 사용자의 피치 시프팅된 결과적 보컬 연주를 반주의 가청 렌더링에 믹싱한다. 방법은, 사용자의 캡쳐된 보컬 연주에 적어도 하나의 보컬 효과 스케줄을 적용하는 단계를 더 포함한다. 보컬 효과 스케줄은, 사용자의 보컬 연주의 하나 이상의 각각의 부분들에의 적용을 위해, 스펙트럼 등화, 오디오 압축, 스테레오 지연 및 잔향 효과들 중 하나 이상에 대한 세팅들 및/또는 파라미터들의 컴퓨터 판독가능 인코딩을 포함한다.In some embodiments of the invention, the method includes using a portable computing device for vocal performance capture, wherein the portable computing device has a touch screen, a microphone interface, and a communication interface. The method includes, in response to a user selection on the touch screen, retrieving vocal scores synchronized in time with a corresponding accompaniment and lyrics through a communication interface, wherein the vocal scores are synchronized in time with corresponding accompaniments and lyrics. In a handheld computing device, the accompaniment is rendered audibly, and at the same time, the corresponding portions of the lyrics corresponding to the accompaniment and time are presented on the display. Corresponding to the accompaniment in time, the user's vocal performance is captured via the microphone interface, and the dry vocal version of the user's captured vocal performance is stored on the portable computing device. Depending on the vocal score, the portable computing device performs continuous real-time pitch shifting of at least some portions of the user's captured vocal performance and mixes the user's pitch-shifted resultant vocal performance into an audible rendering of the accompaniment. The method further includes applying at least one vocal effect schedule to the user's captured vocal performance. The vocal effects schedules may be stored in a computer readable encoding of settings and / or parameters for one or more of spectral equalization, audio compression, stereo delay, and reverberation effects for application to one or more respective portions of a user's vocal performance .

몇몇 경우들에서, 보컬 효과 스케줄은, 반주 또는 가사에 시간상 대응하여 사용자의 보컬 연주의 각각의 부분들에의 적용을 위해 상이한 효과들을 코딩한다. 몇몇 경우들에서, 보컬 효과 스케줄은, 특정한 음악 장르를 특징으로 한다. 몇몇 경우들에서, 보컬 효과 스케줄은, 특정한 아티스트, 노래 또는 연주를 특징으로 한다.In some cases, the vocal effects schedule encodes different effects for application to the respective portions of the user's vocal performance in time correspondence to the accompaniment or lyrics. In some cases, the vocal effect schedule is characterized by a particular music genre. In some cases, the vocal effects schedule is characterized by a particular artist, song or performance.

몇몇 실시예들에서, 방법은, 보컬 효과 스케줄의 적어도 일부의 구매 또는 라이센스를 휴대용 컴퓨팅 디바이스로부터 트랜잭션하는 단계를 더 포함한다. 몇몇 실시예들에서, 방법은, 트랜잭션하는 단계를 촉진하기 위해, 보컬 효과 스케줄의 컴퓨터 판독가능 인코딩에 대한 기존에 저장된 인스턴스를 통신 인터페이스를 통해 검색하거나 잠금해제(unlocking)하는 단계를 포함한다. 몇몇 실시예들에서, 방법은, 보컬 스코어와 사용자의 캡쳐된 보컬 성능의 적어도 일부의 대응을 연산적으로 평가하는 단계, 및 임계 성능 지수(figure of merit)에 기초하여, 보컬 효과 스케줄의 적어도 일부에 대한 라이센스 또는 액세스를 사용자에게 수여하는 단계를 더 포함한다.In some embodiments, the method further comprises transactionally purchasing or licensing at least a portion of the vocal effects schedule from the portable computing device. In some embodiments, the method includes retrieving or unlocking an existing stored instance of a computer-readable encoding of a vocal effect schedule via a communication interface to facilitate a transactional step. In some embodiments, the method includes calculating a mathematical evaluation of the correspondence of the vocal score and at least a portion of the captured vocal performance of the user, and calculating, based at least in part on the threshold of merit, And granting a license or access to the user.

몇몇 경우들에서, 보컬 효과 스케줄은 사용자의 캡쳐된 보컬 연주의 드라이 보컬 버전에 후속적으로 적용된다. 몇몇 경우들에서, 드라이 보컬에 대한 후속적 적용은 휴대용 디바이스에서 행해지고, 방법은, 적용된 보컬 효과들 및 피치 시프팅을 이용하여, 사용자의 캡쳐된 보컬 연주를 휴대용 디바이스에 가청 리렌더링하는 단계를 더 포함한다. 몇몇 실시예들에서, 방법은, 원격 서비스 또는 서버에서의 보컬 효과 스케줄의 후속적 적용을 위해, 사용자의 캡쳐된 보컬 연주의 드라이 보컬 버전의 오디오 신호 인코딩을 통신 인터페이스를 통해 원격 서비스 또는 서버에 전송하는 단계를 포함한다.In some cases, the vocal effects schedule is subsequently applied to the dry vocal version of the user's captured vocal performance. In some cases, the subsequent application to the dry vocals is done in a portable device, and the method further includes the step of audibly re-rendering the user's captured vocal performance to the portable device using the applied vocal effects and pitch shifting . In some embodiments, the method further comprises transmitting the audio signal encoding of the dry vocal version of the user's captured vocal performance to the remote service or server via the communications interface, for subsequent application of the vocal effects schedule at the remote service or server .

몇몇 실시예들에서, 방법은, 드라이 보컬의 송신된 오디오 신호 인코딩과 관련하여 또는 관련을 위해, 사용자의 캡쳐된 보컬 연주가, 원격 서비스 또는 서버에서 결합될 복수의 보컬 연주들 중 단지 하나를 구성한다는 공개 모집 오디션(open call) 표시를 송신하는 단계를 더 포함한다. 몇몇 경우들에서, 공개 모집 오디션 표시는, 사용자의 보컬 연주와의 가청 렌더링을 위해 믹싱될 추가적인 하나 이상의 보컬 연주들을 하나 이상의 다른 보컬리스트들로부터 요청하기 위해 원격 서비스 또는 서버로 향한다. 몇몇 경우들에서, 요청은, (i) 사용자에 의해 특정된 잠재적인 다른 보컬리스트들의 나열된 세트; (ii) 원격 서비스 또는 서버에 의해 정의 또는 인식되는 친밀 그룹의 멤버들; 또는 (iii) 사용자의 소셜 네트워크 관계자들의 세트로 향한다. 몇몇 경우들에서, 공개 모집 오디션 표시는, 응답하는 추가적인 보컬리스트에의 공급을 위한 적어도 하나의 추가적인 보컬리스트 포지션에 대한 제 2 보컬 스코어 및 제 2 가사를 특정한다. 몇몇 경우들에서, 공개 모집 오디션 표시는, 응답하는 추가적인 보컬리스트의 보컬 연주에의 적용을 위해 적어도 하나의 추가적인 보컬 포지션에 대한 제 2 보컬 효과 스케줄을 추가로 특정한다.In some embodiments, the method further comprises, for the sake of or relating to the transmitted audio signal encoding of the dry vocal, the user's captured vocal performer constitutes only one of a plurality of vocal performances to be combined at the remote service or server And transmitting an open call indication. In some cases, the open recall audition display is directed to a remote service or server to request additional one or more vocal performances to be mixed for audible rendering with the user's vocal performance from one or more other vocalists. In some cases, the request may include (i) a listed set of potential other vocalists identified by the user; (ii) Members of the Intimacy Group defined or recognized by the remote service or server; Or (iii) the user's set of social network entities. In some instances, the public recall audition indication specifies a second vocal score and a second utterance for at least one additional vocalist position for provision to a responding additional vocalist. In some cases, the open recall audition indication further specifies a second vocal effect schedule for at least one additional vocal position for application to the vocal performance of the responding additional vocalist.

몇몇 실시예들에서, 방법은, 보컬 효과 스케줄에 따라 프로세싱되는 사용자의 캡쳐된 보컬 연주의 버전을 원격 서비스 또는 서버로부터 수신하는 단계; 및 보컬 효과가 적용된 사용자의 캡쳐된 보컬 연주를 휴대용 디바이스에서 가청 리렌더링하는 단계를 더 포함한다.In some embodiments, the method includes receiving from a remote service or server a version of a captured vocal performance of a user that is processed according to a vocal effects schedule; And audibly re-rendering the captured vocal performance of the user to which the vocal effect has been applied in the portable device.

몇몇 경우들에서, 보컬 효과 스케줄은, 가청 렌더링이 스케줄된 보컬 효과들을 포함하도록 연속적인 실시간 피치 시프팅을 포함하는 렌더링 파이프라인에서 휴대용 컴퓨팅 디바이스에 적용된다.In some cases, the vocal effect schedule is applied to a portable computing device in a rendering pipeline that includes continuous real-time pitch shifting such that the audible rendering includes scheduled vocal effects.

몇몇 실시예들에서, 방법은, 이전에 캡쳐된 보컬 연주의 사용자 선택 부분의 보컬 리캡쳐를 개시하기 위한 권한을 휴대용 컴퓨팅 디바이스로부터 트랜잭션하는 단계를 더 포함한다. 몇몇 실시예들에서, 방법은, 임계 성능 지수에 기초하여 그리고 보컬 스코어를 이용하여 사용자의 캡쳐된 보컬 연주의 적어도 일부의 대응을 연산적으로 평가하여, 이전에 캡쳐된 보컬 연주의 사용자 선택 부분의 보컬 리캡쳐를 개시하기 위한 권한을 사용자에게 부여하는 단계를 더 포함한다. In some embodiments, the method further comprises transaction from a portable computing device with authority to initiate vocal re-capture of a previously selected user selection of the vocal performance. In some embodiments, the method includes the steps of: calculating a threshold performance index and, using a vocal score, computationally evaluating a correspondence of at least a portion of a captured vocal performance of a user to determine a vocal performance of a user selected portion of a previously captured vocal performance And authorizing the user to start re-capturing.

몇몇 경우들에서, 피치 시프팅은 사용자의 캡쳐된 보컬 연주에 대한 피치의 연속적인 시간-도메인 추정에 기초한다. 몇몇 경우들에서, 연속적인 시간-도메인 피치 추정은, 사용자의 캡쳐된 보컬 연주에 대응하는 샘플링된 신호의 현재 블록에 대한 래그-도메인 (lag-domain) 피리오도그램(periodogram)의 연산을 포함하고, 래그-도메인 피리오도그램의 연산은, 샘플링된 신호의 분석 윈도우에 대해, 평균 크기 차이 함수(AMDF) 또는 래그들(lags)의 범위에 대한 자기상관 함수의 평가를 포함한다.In some cases, pitch shifting is based on a continuous time-domain estimate of the pitch for the user's captured vocal performance. In some cases, the continuous time-domain pitch estimation includes an operation of a lag-domain periodogram for the current block of the sampled signal corresponding to the user's captured vocal performance , The operation of the lag-domain pyramidogram involves an evaluation of the autocorrelation function for the mean size difference function (AMDF) or the range of lags, for the analysis window of the sampled signal.

몇몇 실시예들에서, 방법은, 사용자 선택에 대한 응답으로, 데이터 통신 인터페이스를 통해 반주를 또한 검색하는 단계를 더 포함한다. 몇몇 경우들에서, 반주는 휴대용 컴퓨팅 디바이스에 대해 로컬인 스토리지에 저장되고, 그리고 검색하는 단계는, 로컬로 저장된 반주로부터 확인가능한 식별자를 이용하여, 대응하는 반주 및 가사와 시간상 동기화가능한 보컬 스코어를 식별한다. 몇몇 경우들에서, 반주는 악기 및 배경 보컬들 중 하나 또는 둘 모두를 포함하고 다수의 버전들로 렌더링되고, 가사에 대응하여 가청 렌더링되는 반주의 버전은 모노포닉(monophonic) 스크래치 버전이고, 사용자의 보컬 연주의 피치-보정된 보컬 버전들과 믹싱된 반주의 버전은 스크래치 버전보다 더 높은 품질 또는 음질의 폴리포닉(polyphonic) 버전이다.In some embodiments, the method further comprises the step of retrieving the accompaniment through the data communication interface, also in response to the user selection. In some instances, the accompaniment is stored in storage local to the portable computing device, and the retrieving step uses an identifiable identifier from the locally stored accompaniment to identify a vocal score that is synchronizable in time with the corresponding accompaniment and lyrics do. In some instances, the accompaniment includes a musical instrument and background vocals and is rendered with multiple versions, and the version of the accompaniment that is rendered audibly in response to the lyrics is a monophonic scratch version, The pitch-corrected vocal versions of the vocal performance and the version of the accompaniment mixed are polyphonic versions of higher quality or sound quality than the scratch version.

몇몇 실시예들에서, 휴대용 컴퓨팅 디바이스는 모바일 폰; 개인 휴대 정보 단말; 미디어 플레이어 또는 게이밍 디바이스; 및 랩탑 컴퓨터, 노트북 컴퓨터, 태블릿 컴퓨터 또는 넷 북의 그룹으로부터 선택된다. 몇몇 실시예들에서, 디스플레이는 터치 스크린을 포함한다. 몇몇 실시예들에서, 디스플레이는 휴대용 컴퓨팅 디바이스에 무선으로 커플링된다.In some embodiments, the portable computing device is a mobile phone; A personal digital assistant; Media player or gaming device; And a group of laptop computers, notebook computers, tablet computers, or netbooks. In some embodiments, the display includes a touch screen. In some embodiments, the display is wirelessly coupled to a portable computing device.

몇몇 실시예들에서, 방법은, 드라이 보컬들의 송신된 오디오 신호 인코딩을 지오코딩하는 단계를 더 포함한다. 몇몇 실시예들에서, 방법은, 원격 디바이스에서 캡쳐된 제 2 보컬 연주를 포함하는 오디오 신호 인코딩을 원격 서비스 또는 서버로부터 통신 인터페이스를 통해 수신하는 단계; 및 제 2 보컬 연주를 포함하는 가청 렌더링에 대응하여 제 2 보컬 연주에 대한 지리적 기원을 디스플레이하는 단계를 더 포함한다. 몇몇 경우들에서, 지리적 기원의 디스플레이는 지구상의 특정 위치로부터 발신된 연주를 나타내는 디스플레이 애니메이션에 의한 것이다.In some embodiments, the method further comprises geocoding the transmitted audio signal encoding of the dry vocals. In some embodiments, the method includes receiving an audio signal encoding comprising a second vocal performance captured at a remote device from a remote service or server via a communication interface; And displaying the geographic origin for the second vocal performance in response to an audible rendering comprising a second vocal performance. In some cases, the display of geographic origin is by a display animation that represents a performance originating from a specific location on earth.

본 발명(들)에 따른 몇몇 실시예들에서, 방법은, (i) 보컬 연주 캡쳐를 위해 휴대용 컴퓨팅 디바이스를 이용하는 단계 ― 휴대용 컴퓨팅 디바이스는 터치 스크린, 마이크로폰 인터페이스 및 통신 인터페이스를 가짐―; (ii) 터치 스크린 상의 사용자 선택에 대한 응답으로, 통신 인터페이스를 통해, 대응하는 반주 및 가사와 시간상 동기화된 보컬 스코어를 검색하는 단계 ― 보컬 스코어는 보컬 연주의 적어도 일부에 대한 목표 음표들의 시퀀스를 반주에 대해 인코딩함―; (iii) 휴대용 컴퓨팅 디바이스에서, 반주를 가청 렌더링하고, 동시에, 반주에 시간상 대응하여 가사의 대응하는 부분들을 디스플레이 상에 제시하는 단계; (iv) 마이크로폰 인터페이스를 통해 그리고 반주에 시간상 대응하여, 사용자의 보컬 연주를 캡쳐하는 단계; 및 (v) 사용자의 캡쳐된 보컬 연주에 적용될 적어도 하나의 보컬 효과 스케줄의 선택과 함께, 사용자의 캡쳐된 보컬 연주의 드라이 보컬 버전의 오디오 신호 인코딩을 통신 인터페이스를 통해 원격 서비스 또는 서버에 송신하는 단계를 포함한다.In some embodiments according to the present invention (method), the method comprises the steps of: (i) using a portable computing device for vocal performance capture, the portable computing device having a touch screen, a microphone interface and a communication interface; (ii) in response to a user selection on the touch screen, retrieving a vocal score synchronized in time with a corresponding accompaniment and lyrics through a communication interface, the vocal score comprising a sequence of target notes for at least a portion of the vocal performance Gt; (iii) in a portable computing device, rendering the accompaniment audibly and simultaneously presenting corresponding portions of the lyrics on the display in time to the accompaniment; (iv) capturing the vocal performance of the user through the microphone interface and corresponding in time to the accompaniment; And (v) transmitting the audio signal encoding of the dry vocal version of the user's captured vocal performance to the remote service or server via the communication interface, with the selection of at least one vocal effect schedule to be applied to the user's captured vocal performance .

몇몇 실시예들에서, 방법은, 선택된 보컬 효과 스케줄을 원격 서비스 또는 서버에서 적용하는 단계를 더 포함한다. 몇몇 실시예들에서, 방법은, 휴대용 컴퓨팅 디바이스에서 그리고 보컬 스코어에 일치하도록, 사용자의 캡쳐된 보컬 연주의 적어도 일부 부분들의 연속적인 실시간 피치 시프팅을 수행하고, 사용자의 피치-시프팅된 결과적 보컬 연주를 반주의 가청 렌더링에 믹싱하는 단계를 더 포함한다.In some embodiments, the method further comprises applying the selected vocal effect schedule to a remote service or server. In some embodiments, the method may include performing successive real-time pitch shifting of at least some portions of a user's captured vocal performance to match the vocal score at the portable computing device and generating a user ' s pitch- Mixing the performance into an audible rendering of the accompaniment.

몇몇 경우들에서, 선택된 보컬 효과 스케줄은, 사용자의 보컬 연주의 하나 이상의 각각의 부분들에의 적용을 위해, 스펙트럼 등화, 오디오 압축, 피치 보정, 스테레오 지연 및 잔향 효과들 중 하나 이상에 대한 세팅들 및/또는 파라미터들의 컴퓨터 판독가능 인코딩을 포함한다. 몇몇 경우들에서, 보컬 효과 스케줄은 음악 장르에 특정된다. 몇몇 경우들에서, 보컬 효과 스케줄은 특정한 아티스트, 노래 또는 연주를 특징으로 한다.In some cases, the selected vocal effect schedule may include settings for one or more of spectral equalization, audio compression, pitch correction, stereo delay, and reverb effects for application to one or more respective portions of a user's vocal performance And / or computer-readable encoding of the parameters. In some cases, the vocal effect schedule is specific to the music genre. In some cases, the vocal effects schedule is characterized by a particular artist, song or performance.

몇몇 실시예들에서, 방법은 보컬 효과 스케줄의 적어도 일부의 구매 또는 라이센스를 휴대용 컴퓨팅 디바이스로부터 트랜잭션하는 단계를 더 포함한다. 몇몇 실시예들에서, 방법은, 보컬 스코어와 사용자의 캡쳐된 보컬 연주의 적어도 일부의 대응을 연산적으로 평가하는 단계, 및 임계 성능 지수에 기초하여, 보컬 효과 스케줄의 적어도 일부에 대한 라이센스 또는 액세스를 사용자에게 수여하는 단계를 더 포함한다. 몇몇 실시예들에서, 방법은, 보컬 연주의 선택된 부분을 리캡쳐하기 위한 권한을 휴대용 컴퓨팅 디바이스로부터 트랜잭션하는 단계를 더 포함한다. 몇몇 실시예들에서, 방법은, 임계 성능 지수에 기초하여 그리고 보컬 스코어를 이용하여 사용자의 캡쳐된 보컬 연주의 적어도 일부의 대응을 연산적으로 평가하여, 보컬 연주의 선택된 부분을 리캡쳐하기 위한 권한을 사용자에게 부여하는 단계를 더 포함한다.In some embodiments, the method further comprises transactionally purchasing or licensing at least a portion of the vocal effects schedule from the portable computing device. In some embodiments, the method includes the steps of: arithmetically evaluating a correspondence of a vocal score and at least a portion of a user's captured vocal performance, and determining a license or access to at least a portion of the vocal effect schedule based on the threshold performance index And further includes a step of granting to the user. In some embodiments, the method further comprises the step of transactionally authorizing to recaptivate a selected portion of the vocal performance from the portable computing device. In some embodiments, the method includes calculating a threshold value based on a threshold performance index and using a vocal score to computationally evaluate a correspondence of at least a portion of a captured vocal performance of a user to determine a right to recapture a selected portion of the vocal performance To the user.

본 발명(들)에 따른 몇몇 실시예들에서, 휴대용 컴퓨팅 디바이스는, 마이크로폰 인터페이스, 오디오 트랜스듀서 인터페이스, 데이터 통신 인터페이스, 사용자 인터페이스 코드, 피치 보정 코드 및 렌더링 파이프라인을 포함한다. 사용자 인터페이스 코드는, 반주에 대해 선택된 사용자 인터페이스 제스쳐들을 캡쳐하고, 그에 대응하는 적어도 보컬 스코어의 검색을 개시하기 위해 휴대용 컴퓨팅 디바이스 상에서 실행가능하고, 보컬 스코어는 보컬 연주의 적어도 일부에 대한 음표 목표들의 시퀀스를 반주에 대해 인코딩한다. 사용자 인터페이스 코드는, (i) 반주의 가청 렌더링, (ii) 디스플레이 상에 가사의 동시 제시, (iii) 마이크로폰 인터페이스를 이용한 사용자의 보컬 연주의 캡쳐, 및 (iv) 캡쳐된 보컬 연주의 드라이 보컬 버전의, 컴퓨터 판독가능 스토리지에의 저장을 개시하기 위해, 사용자 인터페이스 제스쳐들을 캡쳐하도록 추가로 실행가능하다. 피치 보정 코드는, 가청 렌더링과 동시에, 보컬 스코어에 일치하도록 캡쳐된 보컬 연주를 연속적으로 그리고 실시간으로 피치 보정하기 위해 휴대용 컴퓨팅 디바이스 상에서 실행가능하다. 렌더링 파이프라인은, 사용자의 피치-보정된 보컬 연주를, 사용자의 보컬 연주가 캡쳐된 반주의 가청 렌더링에 믹싱하도록 실행가능하다. 렌더링 파이프라인은, 사용자의 캡쳐된 보컬 연주에 보컬 효과 스케줄들을 적용하도록 추가로 실행가능하고, 보컬 효과 스케줄들은, 사용자에 의해 선택가능하고, 사용자의 보컬 연주의 하나 이상의 각각의 부분들에의 적용을 위해, 스펙트럼 등화, 오디오 압축, 스테레오 지연 및 잔향 효과들 중 하나 이상에 대한 세팅들 및/또는 파라미터들의 컴퓨터 판독가능 인코딩을 포함한다.In some embodiments according to the present invention, the portable computing device includes a microphone interface, an audio transducer interface, a data communication interface, a user interface code, a pitch correction code, and a rendering pipeline. Wherein the user interface code is executable on the portable computing device to capture user interface gestures selected for the accompaniment and to initiate a search for at least a corresponding vocal score thereof and the vocal score comprises a sequence of note goals for at least a portion of the vocal performance Is encoded for the accompaniment. The user interface code may be used to (i) render the accompaniment audibly, (ii) simultaneously present the lyrics on the display, (iii) capture the user's vocal performance using the microphone interface, and (iv) To initiate storage of the user interface gestures in the computer readable storage. The pitch correction code is executable on the portable computing device to continuously and real-time pitch correction of the vocal performance captured to match the vocal score, simultaneously with audible rendering. The rendering pipeline is operable to mix the user's pitch-corrected vocal performance with the audible rendering of the user's vocal performance to the captured accompaniment. The rendering pipeline is further executable to apply vocal effect schedules to a user's captured vocal performance, wherein the vocal effect schedules are selectable by the user and are adapted to apply to one or more respective portions of the user's vocal performance , Computer-readable encoding of settings and / or parameters for one or more of spectral equalization, audio compression, stereo delay and reverberation effects.

몇몇 실시예들에서, 휴대용 컴퓨팅 디바이스는 디스플레이를 포함한다. 몇몇 실시예들에서, 데이터 통신 인터페이스는 디스플레이로의 무선 인터페이스를 제공한다.In some embodiments, the portable computing device includes a display. In some embodiments, the data communication interface provides a wireless interface to the display.

몇몇 실시예들에서, 사용자 인터페이스 코드는, 보컬 효과 스케줄의 사용자 선택을 나타내는 사용자 인터페이스 제스쳐들을 캡쳐하고, 그에 대한 응답으로, 원격 서비스 또는 서버에서 선택된 보컬 효과 스케줄의 후속적 적용을 위해, 사용자의 캡쳐된 보컬 연주의 드라이 보컬 버전의 오디오 신호 인코딩을 데이터 통신 인터페이스를 통해 원격 서비스 또는 서버에 송신하도록 추가로 실행가능하다. 몇몇 경우들에서, 송신은, 드라이 보컬의 오디오 신호 인코딩과 관련하여 또는 관련을 위해, 사용자의 캡쳐된 보컬 연주가, 원격 서비스 또는 서버에서 결합될 복수의 보컬 연주들 중 단지 하나를 구성한다는 공개 모집 오디션 표시를 포함한다.In some embodiments, the user interface code is configured to capture user interface gestures indicative of a user selection of a vocal effects schedule, and in response thereto, for subsequent application of a vocal effects schedule selected at the remote service or server, And to transmit the audio signal encoding of the dry vocal version of the recorded vocal performance to the remote service or server via the data communication interface. In some cases, the transmission may include a public recall audition that the user's captured vocal performer constitutes only one of a plurality of vocal performances to be combined at the remote service or server, either in connection with or related to the audio signal encoding of the dry vocal Display.

몇몇 실시예들에서, 휴대용 컴퓨팅 디바이스는, 보컬 스코어와 사용자의 캡쳐된 보컬 연주의 적어도 일부의 대응을 평가하고, 임계 성능 지수에 기초하여, 보컬 효과 스케줄의 적어도 일부에 대한 라이센스 또는 액세스를 사용자에게 수여하기 위한, 휴대용 컴퓨팅 디바이스 상에서 실행가능한 코드를 더 포함한다. 몇몇 실시예들에서, 휴대용 컴퓨팅 디바이스는, 보컬 연주의 선택된 부분을 리캡쳐하기 위한 권한을 사용자에게 수여하기 위해, 임계 성능 지수에 기초하여 그리고 보컬 스코어를 이용하여 사용자의 캡쳐된 보컬 연주의 적어도 일부의 대응을 평가하기 위한, 휴대용 컴퓨팅 디바이스 상에서 실행가능한 코드를 더 포함한다.In some embodiments, the portable computing device evaluates the correspondence of the vocal score and at least a portion of the user's captured vocal performance and, based on the threshold performance index, provides a license or access to at least a portion of the vocal effect schedule to the user Lt; RTI ID = 0.0 > executable on a portable computing device. &Lt; / RTI > In some embodiments, the portable computing device may be adapted to present the user with authority to recaptivate a selected portion of a vocal performance, based on a threshold performance index, and using a vocal score to provide at least a portion of the user's captured vocal performance The portable computing device further comprising code executable on the portable computing device.

몇몇 실시예들에서, 휴대용 컴퓨팅 디바이스는, 로컬 스토리지를 더 포함하고, 개시된 검색은, 로컬 스토리지에 보컬 스코어 정보의 인스턴스들이 존재하면, 인스턴스들을, 원격 서버로부터 이용가능한 인스턴스들에 대해 체크하는 것, 및 로컬 스토리지 내의 인스턴스들이 이용불가능하거나 오래된 것인지를 원격 서버로부터 검색하는 것을 포함한다.In some embodiments, the portable computing device further includes local storage, wherein the retrieved searches include, if there are instances of vocal score information in the local storage, checking instances for available instances from a remote server, And retrieving from the remote server whether instances in local storage are unavailable or outdated.

본 발명에 따른 몇몇 실시예들에서, 하나 이상의 비일시적 매체에 인코딩된 컴퓨터 프로그램 제품으로서, 컴퓨터 프로그램 제품은, 휴대용 컴퓨팅 디바이스로 하여금, 상기 설명된 방법들 중 하나의 단계들을 수행하게 하도록, 휴대용 컴퓨팅 디바이스의 프로세서 상에서 실행가능한 명령들을 포함한다.In some embodiments according to the present invention, a computer program product encoded on one or more non-volatile media, the computer program product comprising: a portable computing device operable to cause a portable computing device to perform one of the above- And instructions executable on the processor of the device.

본 발명에 따른 이러한 그리고 다른 실시예들은, 후속하는 설명 및 첨부된 청구항들을 참조하여 이해될 것이다.
These and other embodiments in accordance with the present invention will be understood with reference to the following description and appended claims.

본 발명은, 첨부된 도면들을 참조하여 제한이 아닌 예시의 방식으로 예시되며, 도면들에서 유사한 참조들은 일반적으로 유사한 요소들 또는 특징들을 나타낸다.
도 1은, 본 발명의 몇몇 실시예들에 따른 예시적인 모바일 폰-타입 휴대용 컴퓨팅 디바이스들과 컨텐츠 서버 사이의 정보 흐름들을 도시한다.
도 2는, 본 발명의 몇몇 실시예들에 따라, 캡쳐된 보컬 연주에 대해, 스코어-코딩된 피치 또는 화성 큐에 기초한 실시간 연속적 피치-보정 및 화성 생성, 및 그와 함께, 보컬 효과 스케줄의 로컬 및/또는 원격 애플리케이션에 대한 캡쳐된 보컬 연주의 드라이 보컬 버전의 저장 및/또는 업로드를 예시하는 흐름도이다.
도 3은, 본 발명의 몇몇 실시예들에 따라, 원격 컨텐츠 서버에서 보컬 효과 스케줄의 애플리케이션에 대한 드라이 보컬들의 실시간 연속적 피치-보정 및 송신을 용이하게 하기 위해, 예시적인 모바일 폰-타입 휴대용 컴퓨팅 디바이스에서 실행가능한 하드웨어 및 소프트웨어 컴포넌트들의 기능 블록도이다.
도 4는, 본 발명의 몇몇 실시예들에 따른 소프트웨어 구현들의 실행을 위한 플랫폼으로 기능할 수 있는 모바일 디바이스의 특징부들을 예시한다.
도 5는, 본 발명의 몇몇 실시예들에 따른 예시적인 디바이스들의 협력을 예시하는 네트워크 도면이다.
도 6a 및 도 6b는, 본 발명의 각각의 그리고 예시적인 실시예들에 따라, 보컬 효과 스케줄의 적용을 위한 신호 프로세싱 아키텍쳐의 상보적 (및 몇몇 경우들에서는 협력적) 활용들을 흐름도 형태로 제시한다. 구체적으로, 도 6a는, 클라이언트 애플리케이션 (예를 들어, 휴대용 컴퓨팅 디바이스가 호스팅하는) 보컬 캡쳐 플랫폼과의 상호작용들을 포함하는 신호 프로세싱 아키텍쳐의 컨텐츠 서버-중심 활용을 예시한다. 도 6b는, 유사하게, 컨텐츠 서버와의 상호작용들을 포함하는 신호 프로세싱 아키텍쳐의 (예를 들어, 휴대용 컴퓨팅 디바이스가 호스팅하는) 클라이언트 애플리케이션-중심 활용을 예시한다.
당업자들은, 도면들의 요소들 또는 특징부들이 단순화 및 명확화를 위해 예시되며, 반드시 축척대로 도시되지 않았음을 인식할 것이다. 예를 들어, 본 발명의 실시예들의 이해를 개선시키는 것을 돕기 위한 노력으로, 예시된 요소들 또는 특징부들 중 몇몇의 치수들 또는 중요성이 다른 요소들 또는 특징부들에 비해 과장될 수 있다.The present invention is illustrated by way of example and not limitation with reference to the accompanying drawings, in which like references generally denote like elements or features.
Figure 1 illustrates information flows between exemplary mobile phone-type portable computing devices and a content server in accordance with some embodiments of the present invention.
2 is a block diagram illustrating, in accordance with some embodiments of the present invention, for a captured vocal performance, real-time continuous pitch-correction and Mars generation based on a score-coded pitch or Mars cue, And / or < / RTI > storing and / or uploading a dry vocal version of a captured vocal performance for a remote application.
3 is a flow diagram illustrating an exemplary mobile phone-type portable computing device < RTI ID = 0.0 > (e. G., &Lt; / RTI >Lt; / RTI > is a functional block diagram of hardware and software components executable in a computer system.
Figure 4 illustrates features of a mobile device that may function as a platform for execution of software implementations in accordance with some embodiments of the present invention.
Figure 5 is a network diagram illustrating the cooperation of exemplary devices in accordance with some embodiments of the invention.
Figures 6A and 6B present complementary (and in some cases, collaborative) uses of the signal processing architecture for application of vocal effects schedules in flow diagram form, in accordance with each and illustrative embodiments of the present invention . Specifically, FIG. 6A illustrates content server-centric utilization of a signal processing architecture that includes interactions with a client application (e.g., hosted by a handheld computing device) vocal capture platform. Figure 6b similarly illustrates client application-centric utilization (e.g., hosted by a handheld computing device) of a signal processing architecture that includes interactions with a content server.
Those skilled in the art will recognize that the elements or features of the figures are illustrated for simplicity and clarity and are not necessarily drawn to scale. For example, in an effort to help improve understanding of embodiments of the present invention, the dimensions or significance of some of the illustrated elements or features may be exaggerated relative to other elements or features.

핸드헬드 또는 다른 휴대용 컴퓨팅 디바이스들 상에서 보컬 연주들의 캡쳐, 피치 보정, 화성화, 보컬 효과(EFX) 프로세싱, 인코딩 및 가청 렌더링을 용이하게 하기 위한 기술들이 개발되어 왔다. 이러한 기술들을 기초로, 이러한 보컬 연주들을 포함하는 믹스들은, 이러한 핸드헬드 또는 휴대용 컴퓨팅 디바이스들 뿐만 아니라 데스크탑들, 워크스테이션들, 게이밍 스테이션들 및 심지어 통화용 대상 제품들을 포함하는 제품들 상에서 가청 렌더링을 위해 준비될 수 있다. 설명되는 기술들의 구현들은, 이러한 핸드헬드 또는 휴대용 컴퓨팅 디바이스들의 일반적으로 제한된 능력들을 감안했을 때 적합하고, 휴대용 컴퓨팅 디바이스들 또는 다른 제품들 상에서의 렌더링을 위해 무선 및/또는 유선 대역폭-제한된 네트워크들을 통해 피치-보정된 보컬 연주들(또는 이들의 전조들 또는 변형들)의 효율적인 인코딩 및 통신을 용이하게 하는 시스템 기능의 할당 및 신호 프로세싱 기술들을 이용한다.Techniques have been developed to facilitate capture, pitch correction, harmonization, vocal effect (EFX) processing, encoding and audible rendering of vocal performances on handheld or other portable computing devices. On the basis of these techniques, mixes including these vocal performances can be used to perform audible rendering on products including desktops, workstations, gaming stations and even target products for calls, as well as handheld or portable computing devices Can be prepared for. Implementations of the described techniques are well suited in view of the generally limited capabilities of such handheld or portable computing devices and are suitable for wireless and / or wireline bandwidth-limited networks for rendering on portable computing devices or other products Utilizes the allocation of system functions and signal processing techniques to facilitate efficient encoding and communication of pitch-corrected vocal performances (or their precursors or variations).

사용자의 보컬 연주의 피치 검출 및 보정은, 핸드헬드 또는 휴대용 컴퓨팅 디바이스에서의 반주(backing track)의 가청 렌더링에 대해 연속적으로 그리고 실시간으로 수행된다. 이러한 방식으로, 피치-보정된 보컬들은, 사용자의 보컬 연주가 캡쳐된 반주의 해당 악기들 및/또는 보컬들을 (실시간으로) 중첩시키기 위해 가청 렌더링과 믹싱될 수 있다. 몇몇 구현들에서, 피치 검출은, 캡쳐된 보컬 신호의 피치와 스코어-코딩된 목표 피치들 사이의 차이들을 식별하기 위해 제로-크로싱 및/또는 피크 피킹 기술들과 함께 평균 크기 차이 함수(AMDF) 또는 자동상관-기반 기술들을 이용하는 시간-도메인 피치 보정 기술들에 기초한다. 검출된 차이들에 기초하여, 피치 동기식 중첩 가산(PSOLA) 및/또는 선형 예측 코딩(LPC) 기술들에 기초한 피치 보정은, 캡쳐된 보컬들이, 스코어-코딩된 멜로디 목표들 및 화성들을 코딩하는 피치 보정 세팅들에 따라 "정확한" 음표들로 실시간으로 피치 시프팅되도록 허용한다. 피치 검출을 위한 FFT 피크 피킹 및 피치 시프팅을 위한 위상 보코딩과 같은 주파수 도메인 기술들은, 몇몇 구현들에서, 특히, 연산 설비들이 통상적인 현세대 모바일 디바이스들을 실질적으로 초과하거나 오프라인 프로세싱이 이용되는 경우 이용될 수 있다. (예를 들어, 피치 보정, 합성 다중-보컬리스트의 화성들 및/또는 준비, 가상 합창단 믹스들을 위한) 피치 검출 및 시프팅은 또한 포스트-프로세싱 모드에서 수행될 수 있다.The pitch detection and correction of the user's vocal performance is performed continuously and in real time for the audible rendering of the backing track in a handheld or portable computing device. In this way, the pitch-corrected vocals can be mixed with audible rendering to superimpose the user's vocal performance (in real time) with the corresponding instruments and / or vocals of the captured accompaniment. In some implementations, pitch detection may be performed using an average magnitude difference function (AMDF) with zero-crossing and / or peak picking techniques to identify differences between the pitch of the captured vocal signal and the score- Domain pitch correction techniques that use autocorrelation-based techniques. Based on the detected differences, pitch correction based on pitch-synchronized superposition additive (PSOLA) and / or linear predictive coding (LPC) techniques can be used to determine if the captured vocals are pitch-coded melody targets and pitch Allowing pitch shifting in real time to "correct" notes in accordance with the correction settings. Frequency domain techniques, such as FFT peak picking for pitch detection and phase vocoding for pitch shifting, may be used in some implementations, especially when computation facilities substantially exceed current conventional generation mobile devices or when offline processing is used . Pitch detection and shifting (for example, pitch correction, Martians of a synthetic multi-vocal list and / or preparations, virtual chorus mixes) may also be performed in the post-processing mode.

일반적으로, "정확한" 음표들은, 특정된 키 또는 스케일에 일치하거나, 또는 몇몇 실시예들에서, 연주의 특정한 포인트에 따라 예측되는 스코어-코딩된 멜로디(또는 화성)에 대응하는 그러한 음표들이다. 따라서, 동작되는 스코어(operant score)가 없는 (또는 보컬 캡쳐 동안, 사용자가 기존의 스코어의 피치 보정 세팅들을 동적으로 변화시키도록 허용하는) 아카펠라 모드들에서, 몇몇 구현들에서는 애드리브를 용이하게 하기 위해 제공될 수 있다. 예를 들어, 모바일 폰(또는 다른 휴대용 컴퓨팅 디바이스)에서 캡쳐된 사용자 인터페이스 제스쳐들은, 특정한 가사에 대해, 사용자가 (i) 스코어-코딩된 음표 목표들을 스위치 오프(및 온)시키는 것, (ii) 동작되는 피치 보정 세팅들로서 멜로디와 화성 음표 세트들 사이에서 동적으로 앞뒤로 스위칭시키는 것, 및/또는 (iii) 들리는 피치들이 특정한 키 또는 스케일(예를 들어, C 메이저, C 마이너, E 플랫 메이저 등)에 가장 근접한 음표들로만 보정되게 하는 세팅들에 선택적으로 (보컬 캡쳐에서 제스쳐 선택된 포인트들에서) 의존하는 것을 가능하게 할 수 있다. 즉, 사용자 인터페이스 제스쳐 캡쳐 및 동적 가변 피치 보정 세팅들은 고급 사용자들에게 프리스타일 모드를 제공할 수 있다.In general, "correct" notes are those notes that correspond to a specified key or scale, or, in some embodiments, corresponding to a score-coded melody (or Mars) predicted according to a particular point in the performance. Thus, in a cappella modes that do not have an operant score (or allow the user to dynamically change the pitch correction settings of an existing score during vocal capture), in some implementations, Can be provided. For example, user interface gestures captured on a mobile phone (or other portable computing device) can be used to (i) switch off (and turn on) score-coded note goals for a particular piece of music, (ii) (E.g., C major, C minor, E flat major, etc.); and / or (iii) dynamically switching back and forth between melody and harmonic note sets as the pitch correction settings to be operated, (At the gesture selected points in the vocal capture) to the settings that cause only the notes closest to the vocals to be corrected. That is, the user interface gesture capture and dynamic variable pitch correction settings can provide a free style mode for advanced users.

몇몇 경우들에서, 피치 보정 세팅들은, 캡쳐된 보컬 연주를 원하는 효과에 따라, 예를 들어, 특정한 음악적 연주 또는 특정한 아티스트에 의해 대중화된 피치 보정 효과들에 따라 왜곡하도록 선택될 수 있다. 몇몇 실시예들에서, 피치 보정은, Antares Audio Technologies로부터 입수가능하고 그에 의해 대중화된 Auto-Tune® 기술의 플러그-인 구현들과 같이, 캡쳐된 보컬 신호로부터의 샘플들의 가변 윈도우에 적용되는 자동상관 계산들을 연산적으로 단순화시키는 기술들에 기초할 수 있다.In some cases, the pitch correction settings may be selected to distort the captured vocal performance according to the desired effect, e.g., according to the particular musical performance or pitch correction effects popularized by a particular artist. In some embodiments, the pitch correction is performed using an autocorrelation applied to the variable window of samples from the captured vocal signal, such as plug-in implementations of the Auto-Tune 占 technology, available from and popularized by Antares Audio Technologies. May be based on techniques that simplify calculations computationally.

특정한 시스템의 목적들 및 구현에 따라, 사용자 선택가능한 보컬 효과(EFX) 스케줄은, 사용자의 보컬 연주의 하나 이상의 각각의 부분들에의 적용을 위해, 스펙트럼 등화, 오디오 압축, 피치 보정, 스테레오 지연 및 잔향 효과들 중 하나 이상에 대한 세팅들 및/또는 파라미터들을 (컴퓨터 판독가능 매체 인코딩에) 포함시킬 수 있다. 몇몇 경우들 또는 실시예들에서, 보컬 효과 스케줄은 아티스트, 노래 또는 연주를 특징으로 할 수 있고, 파생 오디오 인코딩 또는 가청 렌더링이, 선택된 아티스트, 노래 또는 연주의 특징을 취하게 하도록, 사용자의 캡쳐된 보컬 연주의 오디오 인코딩에 적용될 수 있다.Depending on the purpose and implementation of a particular system, a user selectable vocal effect (EFX) schedule may be used for spectral equalization, audio compression, pitch correction, stereo delay and so on for application to one or more respective portions of a user & And / or parameters (for computer-readable medium encoding) for one or more of the reverberation effects. In some instances or embodiments, the vocal effects schedule may feature an artist, song, or performance, and the derived audio encoding or audible rendering may take the characteristics of the selected artist, song, It can be applied to audio encoding of vocal performance.

따라서, 하나의 보컬 효과 스케줄은, 예를 들어, "P.Y.T.(Pretty Young Thing)"을 노래하는 아티스트 Michael Jackson에 의한 리드 보컬의 스튜디오 레코딩을 특징으로 할 수 있는 한편, 다른 보컬 효과 스케줄은 다른 아티스트 T-Pain에 의한 동일한 노래의 커버 버전을 특징으로 할 수 있다. 이러한 경우, (Michael Jackson에 의한 원래의 연주에 대응하는) 제 1 보컬 효과 스케줄은, (스튜디오 엔지니어들에 의해 종종 이용되는 용어를 사용하면) 베이스 롤-오프(bass roll-off), 중간적 압축(moderate compression) 및 디지털 플레이트 잔향을 포함하는 컴퓨터 판독가능한 형태의 EFX로 인코딩될 수 있다. 더 구체적으로, 제 1 보컬 효과 스케줄은, 120 Hz에서 12 dB/옥타브 고역 통과 필터, 4:1 비 및 -10 dB의 임계치를 갖는 튜브 압축기, 및 30ms 프리-지연 및 15% 웨트/드라이 믹스의 웜(warm) 플레이트 세팅을 갖는 디지털 잔향기의 파라미터들 또는 세팅들을 인코딩할 수 있다. 반대로, (T-Pain에 의한 커버 버전에 대응하는) 제 2 보컬 효과 스케줄은, (또한 스튜디오 엔지니어들에 의해 종종 이용되는 용어를 사용하면) 고역 통과 등화(high-pass equalization), 팝 압축, 빠른 피치 보정, 몇몇 단어들에 대한 보컬 더블링, "경쾌함(airiness)"을 위한 가벼운 잔향을 포함하는 컴퓨터 판독가능 형태의 EFX로 인코딩될 수 있다. 더 구체적으로, 제 2 보컬 효과 스케줄은, 200 Hz에서 24 dB/옥타브 고역 통과 필터, 4:1 비 및 -15 dB의 임계치를 갖는 디지털 압축, 0 ms 어택을 갖는 피치 보정, 0.3 Hz의 레이트, (특정 스코어 코딩된 위치들에서 "pretty young thing"과 같이 더블링되는 단어를 모방하기 위해) 100%의 강도 및 100%의 믹스를 갖는 스테레오 코러스, 및 300 Hz의 고역 통과 필터링, 2.5초의 길이 및 10% 웨트/드라이 믹스를 갖는 콘서트 홀에 대한 임펄스-응답-기반 반향의 파라미터들 또는 세팅들을 인코딩할 수 있다.Thus, one vocal effect schedule may feature, for example, studio recordings of lead vocals by artist Michael Jackson singing "PYT (Pretty Young Thing)", while other vocal effects schedules may be featured by other artist T You can feature a cover version of the same song by -Pain. In such a case, the first vocal effects schedule (corresponding to the original performance by Michael Jackson) would be bass roll-off (using terms often used by studio engineers), intermediate compression moderate compression, and digital plate reverberation. < RTI ID = 0.0 > EFX < / RTI > More specifically, the first vocal effect schedule is a tube compressor with a 12 dB / octave high pass filter at 120 Hz, a 4: 1 ratio and a threshold of -10 dB, and a 30 ms pre-delay and 15% wet / dry mix And may encode parameters or settings of a digital remainder having a warm plate setting. Conversely, the second vocal effects schedule (which corresponds to a cover version by T-Pain) is a high-pass equalization, pop-compression, fast (using terms often used by studio engineers) Pitch correction, vocal doubling for some words, and light reverberation for "airiness ". More specifically, the second vocal effects schedule includes digital compression with a 24 dB / octave high pass filter at 200 Hz, a 4: 1 ratio and a threshold of -15 dB, a pitch correction with 0 ms attack, a rate of 0.3 Hz, Stereo chorus with 100% intensity and 100% mix, and 300 Hz high pass filtering, 2.5 second length and 10 < RTI ID = 0.0 >Gt; impulse-response-based < / RTI > reverberation parameters or settings for a concert hall with% wet / dry mix.

유사하게, 몇몇 경우들 또는 실시예들에서, 보컬 효과 스케줄은 특정한 음악적 장르를 특징으로 할 수 있다. 예를 들어, 하나의 보컬 효과 스케줄은 댄스 장르를 특징으로 할 수 있는 한편 (예를 들어, 250 Hz에서 24 dB/옥타브 고역 통과 필터, 6:1 비 및 -15dB의 임계치를 갖는 디지털 압축기, 좌측 채널[200ms 지연, 15% 웨트/드라이 믹스, 40% 피드백 계수] 및 우측 채널[260ms 지연, 15% 웨트/드라이 믹스, 40% 피드백 계수]를 갖는 스테레오 지연, 및 밝은 플레이트 세팅 및 15% 웨트/드라이 믹스를 갖는 디지털 반향기의 파라미터들 또는 세팅들을 인코딩함), 다른 보컬 효과 스케줄은 발라드 장르를 특징으로 할 수 있다 (예를 들어, 120 Hz에서 12 dB/옥타브 고역 통과 필터, 4:1 비 및 -8dB의 임계치를 갖는 디지털 압축기, 및 30ms 프리-지연 및 20% 웨트/드라이 믹스인 대형 콘서트 홀 세팅을 갖는 디지털 반향기의 파라미터들 또는 세팅들을 인코딩함). 음악적 장르-별 보컬 효과 스케줄들의 특정한 파라미터화는 일반적으로 구현예에 따라 달라지지만, 본 명세서의 설명에 기초하여, 당업자들은, 다양한 음악적 장르들에 대한 보컬 효과 스케줄의 적절한 변형들 및 다른 파라미터화를 인식할 것이다. 댄스 및 발라드 장르는 단지 예시적이다.Similarly, in some instances or embodiments, a vocal effects schedule may feature a particular musical genre. For example, one vocal effect schedule may feature a dance genre (for example, a digital compressor with a 24 dB / octave high pass filter at 250 Hz, a 6: 1 ratio and a threshold of -15 dB, Stereo delay with channel [200 ms delay, 15% wet / dry mix, 40% feedback factor] and right channel [260 ms delay, 15% wet / dry mix, 40% feedback factor] (E.g., encode parameters or settings of a digital reflector having a dry mix), other vocal effects schedules may feature a ballad genre (e.g., 12 dB / octave high pass filter at 120 Hz, 4: And a digital compressor with a threshold of -8 dB and a digital reflector with large concert hall settings of 30 ms pre-delay and 20% wet / dry mix). Although specific parameterization of musical genre-by-vocal effects schedules generally depends on implementation, based on the description herein, those skilled in the art will appreciate that appropriate variations of the vocal effects schedule for different musical genres and other parameterization Will recognize. Dance and ballad genres are merely illustrative.

본 개시의 상황에서, 보컬 효과 스케줄이란 용어는, 적어도 몇몇 경우들 또는 실시예들에서, 캡쳐된 보컬 연주(통상적으로는, 드라이 보컬 버전)의 일부 또는 전부에 적용될 보컬 EFX의 나열된 그리고 동작되는 세트를 포함하는 것을 의미한다는 것을 이해할 것이다. 따라서, "Katy Perry effect" 또는 "T-Pain effect" 를 제공하기 위해, 캡쳐된 드라이 보컬들에 상이한 보컬 효과 스케줄들이 트랜잭션 및 적용될 수 있다. 유사하게, 파생 오디오 인코딩 또는 가청 렌더링에 음악적 장르-별 효과를 불어넣기 위해, 캡쳐된 드라이 보컬들에 상이한 보컬 효과 스케줄들이 트랜잭션 및 적용될 수 있다. 몇몇 경우들에서, 파생 오디오 인코딩 또는 가청 렌더링에 스튜디오 또는 "라이브" 공연 특징들을 불어넣기 위해, 사용자의 캡쳐된 드라이 보컬들에 상이한 보컬 효과 스케줄들이 트랜잭션될 수 있고 대안적으로는 적용될 수 있다. 반면, 아티스트-, 노래-, 연주-별 보컬 EFX 스케줄들이 음악적 장르-별 보컬 EFX 스케줄들과는 별개로 설명되지만, 몇몇 경우들 또는 실시예들에서, 특정한 보컬 EFX 스케줄은 아티스트-, 노래- 또는 연주- 및/또는 음악적 장르-별 양상들을 융합할 수 있음을 인식할 것이다.In the context of this disclosure, the term vocal effects schedule refers to a set of vocal EFXs, in at least some instances or embodiments, that are arranged and operated sets of vocal EFXs to be applied to some or all of the captured vocal performance (typically a dry vocal version) As used herein. Thus, different vocal effect schedules can be transacted and applied to the captured dry vocals to provide a "Katy Perry effect" or "T-Pain effect. Similarly, different vocal effects schedules can be transacted and applied to the captured dry vocals, in order to introduce musical genre-star effects into the derivative audio encoding or audible rendering. In some cases, different vocal effect schedules may be transacted and alternatively applied to the user's captured dry vocals, in order to inject studio or "live " performance features into the derivative audio encoding or audible rendering. On the other hand, while the artist-, song-, and vocal-vocal EFX schedules are described separately from the musical genre-specific vocal EFX schedules, in some instances or embodiments, certain vocal EFX schedules may be artist-, And / or may combine musical genre-like aspects.

적어도 몇몇 경우들 또는 실시예들에서, 보컬 효과 스케줄이란 용어는 추가로, 보컬 스코어의 부분들에 대한 (예를 들어, 노래의 프리-코러스 및 코러스 부분들에 대한 별개의 보컬 EFX 세트들, 및/또는 듀엣 또는 다른 다중-보컬리스트 연주의 각각의 부분들에 대한 별개의 보컬 효과 세트에 대한) 시간상의 또는 템플릿 대응에서 변하는 보컬 EFX의 열거된 세트를 포함할 수 있다. 따라서, Cher의 "Believe"의 상징적인 연주에 대한 보컬 효과 스케줄에서, 연주의 프리-코러스 섹션에 대응하는 특정한 스코어-배열된 부분들은, (스튜디오 엔지니어들에 의해 종종 이용되는 용어를 사용하면) 스펙트럼 등화, 중간적 압축, 강한 피치 보정 및 가벼운 스테레오 지연을 포함하는 컴퓨터 판독가능 형태의 EFX로 인코딩될 수 있는 한편, 연주의 코러스 섹션들에 대응하는 부분들은, 베이스 롤-오프, 팝 압축, 긴 고역 통과 스테레오 지연 및 리치(rich)/웜 반향을 포함하는 EFX를 인코딩할 수 있다. 더 기술적 용어들로, 보컬 효과 스케줄의 프리-코러스 섹션 EFX는, 400 Hz에서 24 dB/옥타브 고역 통과 필터 및 2.2 kHz에서 12 dB/옥타브 저역 통과 필터, 3:1 비 및 -10dB의 임계치를 갖는 디지털 소프트-니(soft-knee) 압축기, 0 ms 어택을 갖는 피치 보정, 및 좌측 채널 상에서 4분음표 싱크된 지연, 우측 채널 상에서 8분음표만큼 오프셋되고, 두 채널 모두 15% 웨트/드라이 믹스에서 33%의 피드백을 갖는 파라미터들 및 세팅들을 인코딩할 수 있다. 반대로, 보컬 효과 스케줄에서 코러스 섹션 EFX는, 120 Hz에서 12 dB/옥타브 고역 통과 필터, 4:1 비 및 -15 dB의 임계치를 갖는 튜브 압축기, 좌측 채널 상에서 2분음표 싱크된 지연, 우측 채널 상에서 20 ms만큼 오프셋되고, 두 채널 모두 25% 웨트/드라이 믹스에서 45% 피드백을 갖고, 200Hz의 고역 통과 필터링 4.5초의 길이 및 18% 웨트/드라이 믹스를 갖는 콘서트 홀의 임펄스-응답-기반 반향 특징에 대한 파라미터들 또는 세팅들을 인코딩할 수 있다.In at least some instances or embodiments, the term vocal effect schedule further includes additional vocals, such as for the portions of the vocal score (e.g., distinct vocal EFX sets for the pre-chorus and chorus portions of the song, Or a set of vocal EFXs that vary in temporal or template correspondence (for a separate set of vocal effects for respective portions of a duet or other multi-vocalist performance). Thus, in the vocal effects schedule for the symbolic performance of Cher's "Believe ", the specific score-aligned portions corresponding to the pre-chorus section of the performance are (in terms of the terms often used by studio engineers) Can be encoded in a computer-readable form of EFX, including equalization, intermediate compression, strong pitch correction, and light stereo delay, while portions corresponding to chorus sections of the performance can be encoded with a base roll-off, pop compression, Pass-through stereo delay and rich / worm echo can be encoded. In more technical terms, the pre-chorus section EFX of the vocal effects schedule has a 24 dB / octave high pass filter at 400 Hz and a 12 dB / octave low pass filter at 2.2 kHz, a 3: 1 ratio and a threshold of -10 dB A soft-knee compressor, a pitch correction with 0 ms attack, and a delayed quarter note on the left channel, offset by an eighth note on the right channel, both channels having a 15% wet / dry mix It is possible to encode parameters and settings with 33% feedback. Conversely, in the vocal effects schedule, the chorus section EFX has a 12-dB / octave high-pass filter at 120 Hz, a tube compressor with a 4: 1 ratio and a threshold of -15 dB, a delayed second-note sink on the left channel, Response-based reverberation feature of the concert hall offset by 20 ms, both channels having 45% feedback at 25% wet / dry mix, high frequency pass filtering at 200 Hz for 4.5 seconds length and 18% wet / dry mix Parameters or settings.

유사하게, 단일 보컬 효과 스케줄의 각각의 부분들(또는 그 상황에 대한 한 쌍의 별개의 보컬 효과 스케줄들)은 각각의 보컬 연주 캡쳐들에 대해 이용될 수 있어서, 제 1 사용자에 의해 수행된 듀엣의 제 1 부분의 보컬 연주 캡쳐에 대해 그리고 제 2 사용자에 의해 수행된 듀엣의 제 2 부분의 별개의 보컬 연주 캡쳐에 대해 적절한 각각의 EFX를 제공할 수 있다.Similarly, each part of a single vocal effect schedule (or a pair of separate vocal effect schedules for that situation) may be used for each vocal performance capture, so that the duet performed by the first user For each vocal performance capture of the first part of the duet performed by the second user and for separate vocal performance captures of the second part of the duet performed by the second user.

피치-보정된 보컬들 및 선택가능한 보컬 효과들(EFX)의 강력하고 변환적인 성질에 기초하여, 사용자/보컬리스트들은 통상적으로, 자신들의 보컬 연주들을 공유하는 것과 연관된 다른 자연스러운 수줍음 또는 불안감을 극복한다. 심지어 단지 아마추어들도 친구들 및 가족과 공유하도록, 또는 친밀 그룹의 일부로서 보컬 연주들을 협력하고 기여하도록 용기를 얻는다. 몇몇 구현들에서, 이러한 상호작용들은, 연주들의 소셜 네트워크- 및/또는 eMail-중재된 공유를 통해서 그리고 그룹 연주 또는 가상 합창단에 참여하라는 초대를 통해서 용이하게 된다. 전술된 휴대용 컴퓨팅 디바이스들과 같은 클라이언트들에서 캡쳐된 업로드된 보컬들을 이용하여, 컨텐츠 서버(또는 서비스)는 다수의 기여한 보컬리스트들의 업로드된 보컬 연주들을 조작 및 믹싱함으로써 이러한 친밀 그룹들 중재할 수 있다. 특정한 시스템의 목적들 및 구현들에 따라, 업로드는, 피치-보정된 보컬 연주들, 드라이(즉, 미보정된) 보컬들, 및/또는 사용자 키 및/또는 피치 보정 선택들의 제어 트랙 등을 포함할 수 있다.Based on the powerful and transformative nature of pitch-corrected vocals and selectable vocal effects (EFX), user / vocalists typically overcome other natural shyness or anxiety associated with sharing their vocal performances . Even amateurs are encouraged to share and contribute to vocal performances as part of an intimate group or to share with friends and family. In some implementations, these interactions are facilitated through social network- and / or eMail-mediated sharing of performances and invitations to participate in group play or virtual chorus. Using the uploaded vocals captured at clients such as the portable computing devices described above, the content server (or service) can arbitrate these intimate groups by manipulating and mixing uploaded vocal performances of a number of contributed vocalists . Depending on the purposes and implementations of the particular system, the upload may include pitch-corrected vocal performances, dry (i.e., uncorrected) vocals, and / or control tracks of user keys and / can do.

종종, 동일한 기본 오디오 소스 재료의 (종종 상이한 품질 또는 음질의) 제 1 및 제 2 인코딩들이 이용될 수 있다. 예를 들어, 반주의 제 1 및 제 2 인코딩들(예를 들어, 하나는 보컬들이 캡쳐된 핸드헬드 또는 다른 휴대용 컴퓨팅 디바이스에서 그리고 하나는 컨텐츠 서버에서)의 이용은, 각각의 인코딩들이 데이터 전송 대역폭 제약들 또는 이들이 이용되는 특정한 디바이스/플랫폼에서의 요구들에 적응되도록 허용할 수 있다. 몇몇 실시예들에서, 보컬 캡쳐에 대한 오디오 백드롭으로 핸드헬드 또는 다른 휴대용 컴퓨팅 디바이스에서 가청 렌더링된 반주의 제 1 인코딩은, 가청 렌더링을 위한 믹싱된 연주를 준비하기 위해 컨텐츠 서버에서 이용되는 그와 동일한 반주의 제 2 인코딩보다 더 낮은 품질 또는 음질일 수 있다. 이러한 방식으로, 보컬 연주의 캡쳐 및 피치 보정을 위해 이용되는 핸드헬드 디바이스에 대한 데이터 대역폭 요건들을 제한하면서, 높은 품질의 믹싱된 오디오 컨텐츠가 제공될 수 있다.Often, first and second encodings of the same basic audio source material (often of different quality or sound quality) may be used. For example, the use of first and second encodings of an accompaniment (e.g., one in a handheld or other portable computing device where vocals are captured, and one in a content server) Constraints or the requirements in the particular device / platform in which they are used. In some embodiments, the first encoding of an audible rendered accompaniment in a handheld or other portable computing device with an audio backdrop for vocal capture may be accomplished by using a first encoding of the accompaniment, May be of lower quality or sound quality than the second encoding of the same accompaniment. In this way, high quality mixed audio content can be provided, while limiting data bandwidth requirements for handheld devices used for capture and pitch correction of vocal performance.

그럼에도, 휴대용 컴퓨팅 디바이스에서 이용되는 상기 반주 인코딩들은, 몇몇 경우들에서, 컨텐츠 서버에서의 반주 인코딩들과 동등하거나 심지어 더 양호한 품질/음질일 수 있다. 예를 들어, 모바일 폰 상에 상주하는 뮤직 라이브러리로부터 또는 컨텐츠 서버로부터의 이전의 다운로드에 기초하는 경우와 같이 반주의 적절한 인코딩이 모바일 폰(또는 다른 휴대용 컴퓨팅 디바이스)에 이미 존재하는 실시예들 또는 상황들에서, 다운로드 데이터 대역폭 요건들은 매우 낮을 수 있다. 가사, 타이밍 정보 및 적용가능한 피치 보정 세팅들은, 확인가능한 임의의 다양한 식별자들을 이용하여, 예를 들어, 오디오 메타데이터, 트랙 타이틀, 연관된 썸네일 또는 심지어, 원한다면 오디오에 적용되는 핑거프린팅 기술들로부터 기존의 반주와 관련하여 검색될 수 있다.
Nevertheless, the accompaniment encodings used in the portable computing device may, in some cases, be equivalent or even better quality / sound quality to the accompaniment encodings at the content server. For example, embodiments or situations in which the appropriate encoding of the accompaniment already exists in the mobile phone (or other portable computing device), such as based on a previous download from or a music library residing on the mobile phone The download data bandwidth requirements may be very low. The lyrics, timing information, and applicable pitch correction settings may be made from fingerprinting techniques that are applied to audio metadata, track titles, associated thumbnails, or even audio if desired, using any of a variety of identifiable identifiers, Can be searched in association with the accompaniment.

라오케Laoke -스타일 보컬 연주 - Style vocal performance 캡쳐capture

본 발명의 실시예들이 반드시 이에 제한되는 것은 아니지만, 모바일 폰-호스팅된, 피치-보정된 카라오케-스타일 보컬 캡쳐는 유용한 설명적 상황을 제공한다. 예를 들어, 도 1에 예시된 바와 같은 몇몇 실시예들에서, Apple, Inc.로부터 입수가능한 iPhone™ 핸드헬드(또는 더 일반적으로는 핸드헬드(101))는, 보컬 캡쳐 및 캡쳐된 보컬들의 연속적인 실시간 스코어-코딩된 피치 보정 및 화성화를 제공하기 위해 컨텐츠 서버와 협력하여 실행하는 소프트웨어를 호스팅한다. 카라오케-스타일 애플리케이션들(예를 들어, 둘 모두 Smule, Inc.로부터 입수가능한, 2009년 9월에 오리지널로 릴리스된 iPhone을 위한 "I am T-Pain" 애플리케이션, 또는 그 후의 "Glee" 애플리케이션)에서 통상적인 바와 같이, 악기들의 반주 및/또는 보컬들은, 그에 대해 노래하는 보컬리스트/사용자를 위해 가청 렌더링될 수 있다. 이러한 경우들에서, 가사는, 사용자에 의한 카라오케-스타일 보컬 연주를 용이하게 하도록 가청 렌더링에 대응하여 디스플레이될 수 있다(102). 몇몇 경우들 또는 상황들에서, 배경 오디오는, 예를 들어, 핸드헬드 상에 상주하는 iTunes™ 라이브러리의 컨텐츠와 같은 로컬 스토어로부터 렌더링될 수 있다.Mobile-phone-hosted, pitch-corrected karaoke-style vocal captures provide useful illustrative situations, although embodiments of the present invention are not necessarily limited thereto. For example, in some embodiments as illustrated in FIG. 1, an iPhone ™ handheld (or more generally a handheld 101), available from Apple, Inc., may be used for vocal capture and for a series of captured vocals Lt; RTI ID = 0.0 > real-time < / RTI > score-coded pitch correction and harmonization. Karaoke-style applications (e.g., "I am T-Pain" application for the iPhone originally released in September 2009, both available from Smule, Inc., or "Glee & As is conventional, the accompaniment and / or vocals of instruments may be rendered audibly for a vocalist / user singing about it. In these cases, the lyrics can be displayed 102 in response to an audible rendering to facilitate karaoke-style vocal performance by the user. In some cases or situations, the background audio may be rendered from a local store such as, for example, the contents of the iTunes ™ library residing on the handheld.

사용자 보컬들(103)은 핸드헬드(101)에서 캡쳐되고, (또한 핸드헬드에서) 연속적으로 그리고 실시간으로 피치-보정되고, 사용자에게 그 자신의 보컬 연주의 개선된 톤 품질 렌디션을 제공하기 위해 가청 렌더링된다(104 참조, 반주와 믹싱된다). 피치 보정은 통상적으로 스코어-코딩된 음표 세트들 또는 신호들(예를 들어, 피치 및 화성 큐들(105))에 기초하고, 이들은 현재의 키 또는 스케일에서 목표 음표들의 연주 동기화된 시퀀스들을 갖는 연속적인 피치-보정 알고리즘을 제공한다. 연주 동기화된 멜로디 목표들에 추가하여, 스코어-코딩된 화성 음표 시퀀스들(또는 세트들)은, 사용자 자신의 캡쳐된 보컬들의 화성 버전들로 피치-시프팅을 위해 (통상적으로 리드 멜로디 음표 트랙에 대해 오프셋들로서 코딩되고 통상적으로 목표들의 선택된 부분들에 대해서만 스코어되는) 추가적인 목표들을 갖는 피치-시프팅 알고리즘들을 제공한다. 몇몇 경우들에서, 피치 보정 세팅들은, 특정한 반주와 연관된 보컬들을 수행한 아티스트와 같은 특정한 아티스트의 특징일 수 있다.The user vocals 103 are captured in the handheld 101 and are pitch-corrected continuously and in real time (also in the handheld) to provide the user with an improved tone quality rendition of his own vocal performance (See note 104, mixed with the accompaniment). Pitch correction is typically based on score-coded set of notes or signals (e.g., pitch and harmonics cues 105), which are successively generated with performance synchronized sequences of target notes in the current key or scale And provides a pitch-correction algorithm. In addition to the performance-synchronized melody goals, the score-coded melodious note sequences (or sets) are used for pitch-shifting to Mars versions of the user's own captured vocals (typically to the lead melody note track Lt; / RTI > are coded as offsets and are typically scored only for selected portions of goals). In some cases, the pitch correction settings may be a feature of a particular artist, such as an artist performing vocals associated with a particular accompaniment.

예시된 실시예에서, 배경 오디오(여기서는, 하나 이상의 악기 및/또는 보컬 트랙들), 가사 및 타이밍 정보 및 피치/화성 큐들 모두는 하나 이상의 컨텐츠 서버들 또는 호스팅된 서비스 플랫폼들(여기서는, 컨텐츠 서버(110))로부터 공급된다(또는 요구 업데이트된다). 주어진 노래 및 연주, 예를 들어, "Hot N Cold"의 경우, 배경 트랙의 몇몇 버전들이, 예를 들어, 컨텐츠 서버 상에 저장될 수 있다. 예를 들어, 몇몇 구현들 또는 활용들에서, 버전들은:In the illustrated embodiment, both the background audio (here, one or more musical instruments and / or vocal tracks), the lyrics and timing information, and the pitch / fade cues are stored in one or more content servers or hosted service platforms 110) < / RTI > In the case of a given song and play, for example, "Hot N Cold ", some versions of the background track may be stored on the content server, for example. For example, in some implementations or applications, the versions include:

·미압축된 스테레오 wav 포맷 반주,· Uncompressed stereo wav format accompaniment,

·미압축된 모노 wav 포맷 반주, 및· Uncompressed mono wav format accompaniment, and

·압축된 모노 m4a 포맷 반주· Compressed mono m4a format accompaniment

를 포함할 수 있다.. &Lt; / RTI >

또한, 가사, 멜로디 및 화성 트랙 음표 세트들 및 관련된 타이밍 및 제어 정보가 배경 트랙(들)과 함께 공급하기 위한 적절한 콘테이너 또는 오브젝트(예를 들어, 미디(MIDI; Musical Instrument Digital Interface) 또는 자바 스크립트 오브젝트 노테이션 json 타입 포맷)에 코딩된 스코어로서 캡슐화될 수 있다. 이러한 정보를 이용하면, 핸드헬드(101)는, 사용자에 의한 카라오케-스타일 보컬 연주를 용이하게 하기 위해, 가사 및 심지어 목표 음표들과 관련된 시각적 큐들, 화성들 및 반주(들)의 가청 연주에 대응하여 현재 검출된 보컬 피치를 디스플레이할 수 있다.It is also contemplated that sets of lyrics, melodies and Martian track notes and associated timing and control information may be stored in a suitable container or object (e.g., a MIDI (Musical Instrument Digital Interface) or JavaScript object Notation < / RTI > json type format). With this information, the handheld 101 can respond to audible performance of visual cues, harmonics, and accompaniment (s) associated with the lyrics and even the target notes, to facilitate karaoke-style vocal performance by the user So that the currently detected vocal pitch can be displayed.

따라서, 보컬리스트가 되려는 사람들이 핸드헬드 디바이스 상에서 "Hot N Cold"를 아티스트 Katy Perry에 의해 원래 유명해진 대로 선택하면, HotNCold.json 및 HotNCold.m4a는 (이전의 다운로드에 기초하여 캐시되거나 이미 이용가능하지 않다면) 컨텐츠 서버로부터 다운로드될 수 있고, 그 다음, 배경 음악, 동기화된 가사, 및 몇몇 상황들 또는 실시예들에서는, 사용자가 노래하는 동안 연속적인 실시간 피치-보정 시프팅들을 위한 스코어-코딩된 음표 트랙들을 제공하기 위해 이용될 수 있다. 선택적으로, 적어도 특정한 실시예들 또는 장르들의 경우, 화성 음표 트랙들이 캡쳐된 보컬들에 대한 화성 시프팅들을 위해 스코어 코딩될 수 있다. 통상적으로, 캡쳐된 피치-보정된(가능하게는 화성화된) 보컬 연주는 하나 이상의 wav 파일들로서 핸드헬드 디바이스 상에 로컬로 저장되고, 후속적으로 (예를 들어, 손실없는 ALE(Apple Lossless Encoder) 또는 손실있는 AAC(Advanced Audio Coding) 또는 보비스 코덱을 이용하여) 압축되고, MPEG-4 오디오, m4a, 또는 ogg 콘테이너 파일로서 컨텐츠 서버(110)에의 업로드(106)를 위해 인코딩된다. MPEG-4는, 디지털 멀티미디어 컨텐츠의 코딩된 표현 및 인터넷, 모바일 네트워크들 및 진보된 브로드캐스트 애플리케이션들에 대한 송신을 위한 국제 표준이다. OGG는, 손실있는 오디오 압축을 위한 보비스 오디오 포맷 규격 및 코덱과 관련하여 종종 이용되는 개방형 표준 콘테이너 포맷이다. 다른 적절한 코덱들, 압축 기술들, 코딩 포맷들 및/또는 콘테이너가, 원한다면, 이용될 수 있다.Thus, if the people who want to become vocalists choose "Hot N Cold" on the handheld device as it was originally famed by artist Katy Perry, HotNCold.json and HotNCold.m4a (cached or already available based on previous downloads) (If not) can be downloaded from the content server, and then the background music, synchronized lyrics, and, in some situations or embodiments, the score-coded for continuous real-time pitch- Can be used to provide note track (s). Alternatively, for at least certain embodiments or genres, Mars note track can be score-coded for Mars shifting for captured vocals. Typically, the captured pitch-corrected (possibly harmonized) vocal performance is stored locally on the handheld device as one or more wav files, and subsequently (for example, ALE (Apple Lossless Encoder ) Or lost AAC (Advanced Audio Coding) or Vorbis codec) and encoded for upload 106 to the content server 110 as an MPEG-4 audio, m4a, or ogg container file. MPEG-4 is an international standard for the transmission of coded representations of digital multimedia content and the Internet, mobile networks and advanced broadcast applications. OGG is an open standard container format often used in connection with Vorbis audio format specifications and codecs for lossy audio compression. Other suitable codecs, compression techniques, coding formats and / or containers can be used, if desired.

구현에 따라, 드라이 보컬 및/또는 피치-보정된 보컬들의 인코딩들이 컨텐츠 서버(110)에 업로드될 수 있다(106). 그 다음, 일반적으로, (예를 들어, wav, m4a, ogg/보비스 컨텐츠 또는 그외의 것들로서 인코딩되는) 이러한 보컬들은 이미 피치-보정되었든 컨텐츠 서버(110)에서 피치-보정되든, 예를 들어, 배경 오디오 및 다른 캡쳐된(그리고 가능하게는 피치 시프팅된) 보컬 연주들과 믹싱될 수 있어서(111), 특정한 제품(예를 들어, 핸드헬드(120)) 또는 네트워크의 능력들 또는 제한들에 따라 선택되는 품질 또는 코딩 특성들의 파일들 또는 스트림들을 생성할 수 있다. 예를 들어, 상이한 품질의 스트림들을 생성하기 위해, 피치-보정된 보컬들은 스테레오 및 모노 wav 파일들 둘 모두와 믹싱될 수 있다. 몇몇 경우들에서, 웹 재생을 위해 고품질 스테레오 버전이 생성될 수 있고, 핸드헬드 디바이스 자체와 같은 디바이스들에 대한 스트리밍을 위해 더 낮은 품질의 모노 버전이 생성될 수 있다.Depending on the implementation, the encodings of dry vocal and / or pitch-corrected vocals may be uploaded to the content server 110 (106). These vocals, which are then generally encoded (e.g., as wav, m4a, ogg / vobis content or otherwise), whether already pitch-corrected or pitch-corrected in content server 110, Background audio and other captured (and possibly pitch-shifted) vocal performances 111 may be mixed with the capabilities or limitations of a particular product (e.g., handheld 120) And may generate files or streams of quality or coding characteristics to be selected accordingly. For example, to produce different quality streams, the pitch-corrected vocals may be mixed with both stereo and mono wav files. In some cases, a high quality stereo version may be created for web playback and a lower quality mono version may be created for streaming to devices such as the handheld device itself.

본 명세서의 다른 곳에서 설명되는 바와 같이, 다수의 보컬리스트들의 연주들은 공개 모집에 대한 응답으로 부착될 수 있다. 몇몇 실시예들에서, 보컬들 중 하나의 세트(예를 들어, 도 1의 예시에서, 핸드헬드(101)에서 캡쳐된 메인 보컬들)에 (예를 들어, 리드 보컬들로서) 우수성이 부여될 수 있다. 일반적으로, 사용자 선택가능한 보컬 효과 스케줄은 보컬 연주의 각각의 캡쳐된 그리고 업로드된 인코딩에 적용될 수 있다(112). 예를 들어, 초기에 캡쳐된 드라이 보컬들은 "Hot N Cold"의 Katy Perry의 스튜디오 연주의 보컬 효과 스케줄 특성에 따라 컨텐츠 서버(100)에서 프로세싱될 수 있다(예를 들어, 112). 몇몇 경우들 또는 실시예들에서, 프로세싱은 이전에 설명된 피치 큐들(105)에 따라 (서버(100)에서의) 피치 보정을 포함할 수 있다. 몇몇 실시예들에서, 결과적 믹스(예를 들어, EFX 적용되고, 압축된 모노 m4a 포맷 반주 및 하나 이상의 추가적인 보컬들과 믹싱되는 캡쳐된 피치-보정된 메인 보컬들, 추가적인 보컬들 자체는 EFX 적용되고 메인 보컬들 위 또는 아래의 각각의 화성 포지션들로 피치 시프팅됨)는, 추가적인 보컬 연주들의 캡쳐를 위해 제 2 생성 반주로서의 가청 렌더링(121) 및/또는 이용을 위해 원격 디바이스(예를 들어, 핸드헬드(120))에서 다른 사용자에게 공급될 수 있다.
As described elsewhere herein, performances of multiple vocalists may be attached in response to an open recruitment. In some embodiments, excellence may be granted (e.g., as lead vocals) to one of the vocals (e.g., the main vocals captured in the handheld 101 in the example of FIG. 1) have. In general, a user-selectable vocal effect schedule may be applied 112 to each captured and uploaded encoding of the vocal performance. For example, initially captured dry vocals can be processed (e.g., 112) in the content server 100 according to Katy Perry's studio performance vocal effects schedule characteristics of "Hot N Cold". In some cases or embodiments, processing may include pitch correction (at server 100) in accordance with pitch cues 105 previously described. In some embodiments, the captured pitch-corrected main vocals that are mixed with the resulting mix (e. G., EFX applied, compressed mono m4a format accompaniment and one or more additional vocals, additional vocals themselves are EFX applied Shifted to the respective Mars positions above or below the main vocals) may be used to create a visual rendering 121 as a second generation accompaniment for capturing additional vocal performances and / The user may be supplied to another user in the personal computer (e.g.

스코어-코딩된 피치 Score-coded pitch 시프팅들Shifting 및 보컬 효과 스케줄들 And vocal effect schedules

도 2는, 본 발명의 몇몇 실시예들에 따라 캡쳐된 보컬 연주에 대한 실시간 연속적 스코어-코딩된 피치-보정 및/또는 화성 생성을 예시하는 흐름도이다. 예시된 구성에서 뿐만 아니라 이전에 설명된 바와 같이, 사용자/보컬리스트는 반주 카라오케 스타일과 함께 노래한다. 마이크로폰 입력(201)으로부터 캡쳐(251)된 보컬들은, 메인 보컬 피치 큐(cues)로, 또는 몇몇 경우들에는, 하나 이상의 음향 트랜스듀서들(202)에서 가청 렌더링되는 반주와의 믹싱(253)을 위해 실시간으로 대응하는 화성 큐로 연속적으로 피치-보정된다(252). 몇몇 경우들 또는 실시예들에서, "메인" 멜로디로 보정된 캡쳐된 보컬 피치의 가청 렌더링은 선택적으로, 스코어 코딩된 오프셋들에 따라 캡쳐된 보컬들로부터 동기화된 화성들(HARMONY1, HARMONY2)과 믹싱(254)될 수 있다.2 is a flow chart illustrating real-time continuous score-coded pitch-correction and / or Mars generation for a captured vocal performance in accordance with some embodiments of the present invention. The user / vocalist sings with the accompaniment karaoke style as well as in the illustrated arrangement, as previously described. The vocals captured 251 from the microphone input 201 may be mixed with the main vocal pitch cues or in some cases with an accompaniment that is audibly rendered in one or more acoustic transducers 202 (252) in succession to a corresponding Mars Cue in real time. In some cases or embodiments, the audible rendering of the captured vocal pitch corrected to the "main" melody may optionally include synchronizing harmonics (HARMONY1, HARMONY2) from the vocals captured according to the scored coded offsets, (254).

당업자들에게 자명할 바와 같이, (예를 들어, 헤드폰 또는 이어폰의 이용을 통해) 트랜스듀서(들)(202)로부터 마이크로폰(201)으로의 피드백 루프들을 제한하는 것이 일반적으로 바람직하다. 실제로, 본 명세서의 예시적인 설명 대부분은, 모바일 폰 상황들, 및 더 상세하게는, Apple iPhone 핸드헬드에 대해 친숙한 특징들 및 능력들 상에서 확립되지만, 심지어 빌트인 마이크로폰 능력들이 없는 휴대용 컴퓨팅 디바이스들도, 헤드폰/마이크로폰 잭들이 제공되면, 연속적인 실시간 피치 보정 및 화성화에 의한 보컬 캡쳐를 위한 플랫폼으로서 동작할 수 있다. Apple iPod Touch 핸드헬드 및 Apple iPad 태블릿이 이러한 2개의 예시들이다.As will be apparent to those skilled in the art, it is generally desirable to limit feedback loops from the transducer (s) 202 to the microphone 201 (e.g., through the use of headphones or earphones). Indeed, much of the exemplary description of the present disclosure is based on mobile phone situations, and more particularly on portable computing devices that are established on features and capabilities familiar to the Apple iPhone handheld, but that do not even have built- Provided with headphone / microphone jacks, it can act as a platform for continuous vocal capture by real-time pitch correction and harmonization. Apple iPod Touch handhelds and Apple iPad tablets are two examples.

(메인 또는 화성 피치들로의) 피치 보정 및 선택적으로 추가된 화성들은, 예시된 구성에서 가사(208) 및 반주(209)의 오디오 인코딩과 함께 보컬 캡쳐 및 피치-보정이 수행될 디바이스에 (예를 들어, 도 1을 다시 참조하면, 컨텐츠 서버(110)로부터 iPhone 핸드헬드(101) 또는 다른 휴대용 컴퓨팅 디바이스로) 무선으로 통신되는(261) 스코어(207)에 대응하도록 선택된다. 몇몇 설계들 및 구현들에서 직면되는 하나의 난제는, 사용자가 노래의 예상되는 멜로디를 노래하도록 선택하는 경우에만 화성이 좋게 들리는 경향이 있을 수 있다는 것이다. 사용자가 노래를 장식하거나 자기 자신의 버전으로 부르기를 원하면, 화성들은 차선으로 들릴 수 있다. 이러한 난제를 다루기 위해, 특정한 컨텐츠에 대해 (예를 들어, 특정한 노래 및 노래의 선택된 부분들에 대해) 상대적인 화성들이 사전-스코어되고 코딩된다. 목표 피치들은, 스코어 및 사용자가 노래하고 있는 것 모두에 기초한 화성들에 대해 런타임으로 선택된다. 이러한 접근법은 강력한 사용자 경험을 도출시켰다.Pitch correction and optionally added harmonics (to the main or Mars pitches) are applied to the device in which the vocal capture and pitch-correction is to be performed, together with the audio encoding of the lyrics 208 and the accompaniment 209 1, it is selected to correspond to score 207 (261) that is wirelessly communicated (from content server 110 to iPhone handheld 101 or other portable computing device). One difficulty encountered in some designs and implementations is that the user may tend to hear Mars well only if he or she chooses to sing the anticipated melody of the song. If a user wants to decorate a song or call it their own version, Mars can be heard in the lane. In order to deal with this difficulty, relative Mars (e.g., for selected songs and selected portions of a song) is pre-scored and coded for specific content. Target pitches are selected at runtime for Mars based on both the score and what the user is singing. This approach has led to a strong user experience.

본 명세서에서 설명되는 기술들의 몇몇 실시예들에서, 사용자/보컬리스트에 의해 들리는 것에 가장 근접한 (현재의 스케일 또는 키에서의) 음표를 우리의 스코어로부터 결정하였다. 이러한 가장 근접한 음표는 통상적으로 스코어 코딩된 보컬 멜로디에 대응하는 메인 피치일 수 있지만, 반드시 그럴 필요는 없다. 실제로, 몇몇 경우들에서, 사용자/보컬리스트는, 화성을 노래하려 의도할 수 있고, 들리는 음표들은 화성 트랙에 더 밀접하게 근접할 수 있다. 어느 경우이든, 피치 보정기(252) 및/또는 화성 생성기(255)는, (사용자/보컬리스트가 의도적으로 화성을 노래하고 있는 경우에도) 캡쳐된 보컬들의 적절한 피치-시프팅된 버전들을 생성함으로써 원하는 스코어-코딩된 화음의 다른 부분들을 합성할 수 있다. 사용자의 캡쳐된 보컬 연주의 드라이 보컬 버전, 및 선택적으로는, 가청 렌더링된 반주와의 믹스(253)를 위해 결합(254) 또는 집합되는 결과적 피치-시프팅된 버전들 중 하나 이상은 컨텐츠 서버(110) 또는 원격 디바이스(예를 들어, 핸드헬드(120))에 무선으로 통신(262)될 수 있다.In some embodiments of the techniques described herein, we have determined from our scores the notes (at the current scale or key) closest to what is heard by the user / vocal list. This closest note may, but need not necessarily, be the main pitch, which typically corresponds to a score-coded vocal melody. Indeed, in some cases, the user / vocalist may intend to sing Mars, and the heard notes may come closer to the Mars track. In either case, the pitch corrector 252 and / or the Mars generator 255 may be implemented by generating appropriate pitch-shifted versions of the captured vocals (even if the user / vocalist intentionally sings Mars) Other portions of the score-coded chord can be synthesized. One or more of the resultant pitch-shifted versions that are combined 254 or aggregated for a dry vocal version of the user's captured vocal performance and, optionally, a mix 253 with the audibly rendered accompaniment, Or wirelessly to a remote device (e. G., Handheld 120).

보컬 효과들의 컨텐츠 서버(100) 측 애플리케이션이 설명되었지만, 휴대용 컴퓨팅 디바이스(예를 들어, 101, 120)에서 구현되는 신호 프로세싱 흐름들(250)에서 사용자 선택가능한 보컬 효과(EFX) 스케줄들이 유사하게 적용될 수 있음을 인식할 것이다. 이전과 같이, 본 경우에서는 무선 송신(261)에서 인코딩되고 포함될 수 있는 선택된 보컬 효과(EFX) 스케줄은, 사용자의 캡쳐된 보컬 연주의 하나 이상의 각각의 부분들에의 적용을 위해, 스펙트럼 등화, 오디오 압축, 피치 보정, 스테레오 지연, 및 잔향 효과들 중 하나 이상에 대한 세팅들 및/또는 파라미터들을 포함한다. 예시된 구성에서, 로컬 스토리지에 저장되고, 음향 트랜스듀서(202)를 이용한 가청 렌더링을 위해 이전에 설명된 반주와 믹싱(253)되는 드라이 보컬들의 오디오 신호 인코딩을 위해, 선택적인 신호 프로세싱 흐름이 제공된다. 통상적으로, 선택된 EFX의 성질 및 연산 복잡도에 따라, 실시간 연속적 프로세션(스코어 코딩된 피치 보정을 포함함)이 몇몇 실시예들에서 제공될 수 있지만, 휴대용 컴퓨팅 디바이스에서 사용자 선택된 보컬 효과(EFX) 스케줄의 적용은 포스트-프로세싱 애플리케이션이다.Although user-selectable vocal effects (EFX) schedules are similarly applied in the signal processing flows 250 implemented in portable computing devices (e.g., 101 and 120), although the content server 100 side application of vocal effects has been described As will be appreciated by those skilled in the art. As before, in this case, the selected vocal effects (EFX) schedule, which may be encoded and included in the wireless transmission 261, may include spectral equalization, audio Settings, and / or parameters for one or more of compression, pitch correction, stereo delay, and reverberation effects. In the illustrated arrangement, an optional signal processing flow is provided for audio signal encodings of dry vocals stored in local storage and mixed (253) with the previously described accompaniment for audible rendering using acoustic transducer 202 do. Typically, depending on the nature of the selected EFX and the computational complexity, a real-time continuous session (including score-coded pitch correction) may be provided in some embodiments, but a user- Application is a post-processing application.

당업자들은 임의의 다양한 스코어-코딩 프레임워크들이 이용될 수 있음을 인식할 것이지만, 본 명세서에서 설명되는 예시적인 구현들은, 널리 이용되고 표준화된 미디(musical instrument digital interface) 데이터 포맷들에 대한 확장들에 대해 확립된다. 이러한 프레임워크에 대해 확립되면, 스코어들은 MIDI 파일, 데이터 구조 또는 콘테이너로 표현되는 트랙들의 세트로서 코딩될 수 있고, 몇몇 구현들 또는 활용들에서 콘테이너는:Those skilled in the art will recognize that any of a variety of score-coding frameworks may be used, but the exemplary implementations described herein may be applied to extensions to widely used and standardized musical instrument digital interface data formats . Once established for this framework, the scores may be coded as a set of tracks represented as a MIDI file, data structure, or container, and in some implementations or applications the container may be:

· 제어 트랙: 키 변화들, 이득 변화들, 피치 보정 제어들, 화성 제어들 등Control track: key changes, gain changes, pitch correction controls, Mars controls, etc.

· 하나 이상의 가사 트랙들: 디스플레이 주문화에 의한 가사 이벤트들One or more lyrics tracks: Lyrics events by display customization

· 피치 트랙: (통상적으로 코딩된) 메인 멜로디· Pitch track: Main melody (usually coded)

· 하나 이상의 화성 트랙들: 화성 음성 1, 2... 제어 트랙 이벤트들에 따라, 주어진 화성 트랙에서 특정되는 음표들은 절대적 스코어된 피치들로서 또는 상대적인 사용자의 현재의 피치에 대해 (현재의 세팅들에 따라) 보정되거나 보정되지 않은 것으로 해석될 수 있음.According to the control track events, the notes specified in a given Mars track are recorded as absolute scored pitches or relative to the current pitch of the user (according to the current settings It may be interpreted as uncorrected or uncorrected).

· 화음 트랙: 원하는 화성들이 화성 트랙들에 설정되지만, 사용자의 피치가 스코어된 피치와 상이하면, 상대적인 오프셋들이 현재의 화음의 음표 세트에 근접하게 유지될 수 있음Chord tracks: Desired Mars are set on Mars tracks, but if the user's pitch is different from the scored pitch, relative offsets can be kept close to the set of notes in the current chord

을 포함한다..

상기에 대해 확립되면, 피치 보정기(252) 및/또는 화성 생성기(255)의 런타임 동작들을 설정하기 위해 그리고 그에 따라, 종래의 정적 화성들에 의해 달성가능한 것을 초과하는 (넓은 범위의 보컬 스킬 레벨들에 대한) 피치-보정된 보컬들 및 사용자 경험을 제공하기 위해, 상당히 스코어-코딩된 특수화들이 정의될 수 있다.Once established for this, it is possible to set the runtime actions of the pitch corrector 252 and / or the Mars generator 255, and thus, to overcome what is achievable by conventional static Mars (a wide range of vocal skill levels Considerable score-coded specializations can be defined to provide pitch-corrected vocals and user experience (e.g.

구체적으로 제어 트랙 특징부들을 참조하면, 몇몇 실시예들에서, 하기 텍스트 마커들이 지원될 수 있다:Referring specifically to control track features, in some embodiments, the following text markers may be supported:

· Key : <string> : 들리는 음표들이 보정되는 키(예를 들어, G ? 메이저 g#M, E 마이너 Em, B 플랫 메이저 BbM 등)를 나타낸다. C로 디폴트된다.Key: <string>: Represents the keys whose audible notes are to be corrected (for example, G? Major g # M, E minor Em, B flat major BbM, etc.). C is the default.

· PitchCorrection : {ON, OFF} : 사용자/보컬리스트의 피치를 보정할지를 코딩함. 디폴트는 ON이다. 보컬 연주에서 시간상 동기화된 포인트들에서 턴 ON 및 OFF될 수 있다.PitchCorrection: {ON, OFF}: Coding to correct the pitch of the user / vocal list. The default is ON. It can be turned ON and OFF at points synchronized in time in vocal performance.

· SwapHarmony : {ON, OFF} : 사용자/보컬리스트에게 들리는 피치가 화성에 가장 근접하게 대응하면, 멜로디보다는 화성에 대해 피치 보정하는 것이 좋은지를 코딩한다. 디폴트는 ON이다.SwapHarmony: {ON, OFF}: If the pitch heard by the user / vocalist is closest to Mars, code to pitch correction to Mars rather than melody. The default is ON.

· Relative : {ON, OFF } : ON인 경우, 화성 트랙들은 (다른 피치 보정 세팅들에 따라 보정된) 사용자의 현재의 피치로부터 상대적인 오프셋들로서 해석된다. 화성 트랙들로부터의 오프셋들은 스코어된 피치 트랙에 대한 화성 트랙들의 오프셋들이다. OFF인 경우, 화성 트랙들은 화성 시프팅들에 대한 절대적 피치 목표들로 해석된다.Relative: {ON, OFF}: When ON, Mars tracks are interpreted as offsets relative to the user's current pitch (calibrated according to different pitch correction settings). Offsets from Mars tracks are offsets of Mars tracks for the scored pitch track. If OFF, Mars tracks are interpreted as absolute pitch targets for Mars shifting.

· Relative : {OFF, < +/- N > ... < +/- N > } : OFF가 아니면, (원하는 만큼 많은) 화성 오프셋들은, 임의의 동작되는 키 또는 음표 세트들에 따라, 스코어된 피치 트랙에 대해 상대적이다.Relative: {OFF, <+/- N> ... <+/- N>}: If not OFF, the Mars offsets (as many as you want) are scored according to any active key or set of notes It is relative to the pitch track.

· RealTimeHarmonyMix : {value} : 가청 렌더링된 화성/메인 보컬 믹스에서 메인 음성 및 화성들의 보컬 연주에서의 시간상 동기화된 포인트들에서, 믹스 비에서의 변화들을 코딩함. 1.0은 모든 화성 음성들이다. 0.0은 모든 메인 음성이다.RealTimeHarmonyMix: {value}: Codes the changes in the mix ratio at the time-synchronized points in the vocal performance of the main voices and harmonies in an audibly rendered Mars / main vocal mix. 1.0 is all the Mars voices. 0.0 is all main audio.

· RecordedHarmonyMix : { value } : 업로드된 화성/메인 보컬 믹스에서 메인 음성 및 화성들의, 보컬 연주에서의 시간상 동기화된 포인트들에서, 믹스 비에서의 변화들을 코딩한다. 1.0은 모든 화성 음성들이다. 0.0은 모든 메인 음성이다.RecordedHarmonyMix: {value}: Codes the changes in the mix ratio at the time-synchronized points in the vocal performance of the main voice and Mars in the uploaded Mars / Main vocal mix. 1.0 is all the Mars voices. 0.0 is all main audio.

화음 트랙 이벤트들은, 몇몇 실시예들에서, 루트 및 품질(예를 들어, C min7 또는 Ab maj)을 나타내고, 음표 세트가 정의되도록 허용하는 하기 텍스트 마커들을 포함한다. 원하는 화성들이 화성 트랙(들)에서 설정되지만, 사용자의 피치가 그 스코어된 피치와 상이하면, 상대적인 오프셋들이 현재 화음에 있는 음표들에 근접하게 유지될 수 있다. 스코어의 화음 트랙에 대해 이용되는 바와 같이, 용어 "화음"은, 이용가능한 피치들의 세트를 의미하는 것으로 이해될 것인데, 이는, 화음 트랙 이벤트들이 통상적인 관점에서 표준 화음들을 인코딩할 필요가 없기 때문이다. 이러한 그리고 다른 스코어-코딩된 피치 보정 세팅들은, 본 명세서에서 설명되는 창작적 기술들을 촉진하기 위해 이용될 수 있다.
The chord track events, in some embodiments, represent root and quality (e.g., C min7 or Ab maj) and include the following text markers that allow the set of notes to be defined. Desired Mars are set in the Mars track (s), but if the user's pitch is different from the scored pitch, relative offsets can be kept close to the notes in the current chord. As used for the chord tracks of a score, the term "chord" will be understood to mean a set of available pitches, since chord track events do not need to encode standard chords in the usual sense . These and other score-coded pitch correction settings may be used to facilitate the inventive techniques described herein.

피치 검출, 보정 및 Pitch detection, correction and 시프팅들을Shifting 위한 연산 기술들 Computational techniques for

본 명세서의 개시의 이득을 갖는 당업자들에 의해 인식될 바와 같이, 피치-검출 및 보정 기술들은, 캡쳐된 보컬 신호의 목표 피치 또는 음표로의 보정 및 캡쳐된 보컬 신호의 피치-시프팅된 변형들로서 화성들의 생성, 둘 모두를 위해 이용될 수 있다. 도 2 및 도 3은, 가청 렌더링을 위해 (로컬로 및/또는 원격 목표 디바이스에서) 피치-보정되고 선택적으로는 화성화된 보컬들을 생성하기 위해, 예를 들어, 모바일 디바이스(101)로서 예시된 iPhone™ 핸드헬드에 적합한 특정한 구현들에 따른 기본 신호 프로세싱 흐름들(250, 350)을 예시한다.As will be appreciated by those of ordinary skill in the art having the benefit of the disclosure of this disclosure, pitch-detection and correction techniques include correction of the captured vocal signal to a target pitch or note, and pitch- The creation of Mars can be used for both. Figures 2 and 3 illustrate an example of a mobile device 101 that may be used to create pitch-corrected and optionally harmonized vocals for audible rendering (locally and / or at a remote target device) illustrating basic signal processing flows 250 and 350 according to particular implementations suitable for an iPhone ™ handheld.

본 명세서의 설명에 기초하여, 당업자들은, 도 3에 도시된 신호 프로세싱 흐름들(350)을 제공하도록 실행가능한 소프트웨어의 기능 블록들(예를 들어, 디코더(들)(352), 디지털-투-아날로그(D/A) 변환기(351), 캡쳐(253) 및 인코더(355))로, 신호 프로세싱 기술들(샘플링, 필터링, 데시메이션 등)과 데이터 표현들을 적절히 할당하는 것을 인식할 것이다. 유사하게, 신호 프로세싱 흐름들(250) 및 예시적인 스코어 코딩된 음표 목표들(화성 음표 목표들을 포함함)에 대해, 당업자들은, 핸드헬드 또는 다른 휴대용 컴퓨팅 디바이스 상에서 실행가능한 소프트웨어의 적어도 일부로서 구현되는, 도 2에서와 같은 기능 블록들 및 신호 프로세싱 구성들(예를 들어, 디코더(들)(258), 캡쳐(251), 디지털-투-아날로그(D/A) 변환기(256), 믹서들(253, 254) 및 인코더(257))로 신호 프로세싱 기술들 및 데이터 표현들을 적절히 할당하는 것을 인식할 것이다.Based on the description herein, those skilled in the art will appreciate that the functional blocks of software (e.g., decoder (s) 352, digital-to- (Such as sampling, filtering, decimation, and the like) and data representations to the analog to digital (D / A) converter 351, capture 253 and encoder 355). Similarly, for signal processing flows 250 and exemplary score-coded musical note goals (including musical note goals), those skilled in the art will appreciate that, as implemented as at least a portion of software executable on a handheld or other portable computing device (E.g., decoder (s) 258, capture 251, digital-to-analog (D / A) converter 256, mixers 253, and 254, and encoder 257) to properly allocate signal processing techniques and data representations.

상기 신호 프로세싱 구성들의 임의의 다양한 적절한 구현들에 대해 확립되면, 다음으로, 도 2 및 도 3에 도시된 피치 보정, 화성 생성 및 결합된 피치 보정/화성화 블록들(252, 255 및 354)의 촉진을 포함하는, 본 명세서에서 설명되는 다양한 실시예들에서 이용될 수 있는 피치 검출 및 보정/시프팅 기술들을 참조한다.Once established for any of the various suitable implementations of the signal processing arrangements, it will be appreciated that the pitch correction, harmonic generation, and combined pitch correction / smoothing blocks 252, 255, and 354 shown in FIGS. Reference is made to pitch detection and correction / shifting techniques that may be used in the various embodiments described herein, including acceleration.

당업자들에 의해 인식될 바와 같이, 피치-검출 및 피치-보정은 음악 및 음성 코딩 분야에서 풍부한 기술적 이력을 갖는다. 실제로, 광범위한 특징 피킹, 시간 도메인 및 심지어 주파수 도메인 기술들이 이 분야에서 이용되어 왔고, 본 발명에 따른 몇몇 실시예들에서 이용될 수 있다. 본 설명은, 본 설명에 따른 다양한 설계 또는 구현들에 적합할 수 있는 광범위한 신호 프로세싱 기술들의 목록을 완전하게 만들고자 하는 것이 아니며, 오히려, 본 설명은 CPU-제한된 연산 플랫폼들을 다루는 (모바일 디바이스 애플리케이션들과 같은) 구현들에서 동작가능한 것으로 증명된 특정 기술들을 요약한다.As will be appreciated by those skilled in the art, pitch-detection and pitch-correction have a rich technical history in the field of music and speech coding. Indeed, a wide range of feature peaking, time domain, and even frequency domain techniques have been used in this field and can be used in some embodiments in accordance with the present invention. The present description is not intended to be a complete listing of a wide variety of signal processing techniques that may be suitable for a variety of designs or implementations in accordance with the present description, Lt; RTI ID = 0.0 > (e. G., &Lt; / RTI > like) implementations.

따라서, 상기의 관점에서, 그리고 제한 없이, 특정한 예시적인 실시예들은 다음과 같이 동작한다:Thus, in view of the above, and without limitation, certain exemplary embodiments operate as follows:

1) 샘플링된 사용자 보컬들을 포함하는 오디오 데이터의 버퍼를 획득한다.1) Obtain a buffer of audio data containing sampled user vocals.

2) 저역통과 필터링에 의해 44.1kHz 샘플링 레이트로부터 다운샘플링하고, (통상적으로 스코어-코딩된 멜로디 음표 목표에 대해, 메인 음성으로서 샘플링되는 보컬들의 피치 검출 및 보정에 이용하기 위해) 22k 및 (샘플링된 보컬들의 피치 검출 및 이들의 화성 변형들의 시프팅을 위해) 11k로 데시메이션한다.2) down-sampling from a 44.1 kHz sampling rate by low-pass filtering, and 22 k (sampled (for use as a score-coded melody note target, for use in pitch detection and correction of vocals sampled as main audio) For pitch detection of vocals and shifting of their Mars transformations).

3) 샘플링된 오디오 신호가 충분한 진폭을 갖는지 그리고 그 샘플링된 오디오가 진행하기에는 너무 잡음이 심하지 않은지(과도한 제로 크로싱들인지)를 확인하기 위해 먼저 체크하는 피치 검출기(PitchDetector : : CalculatePitch ( ))를 호출한다. 샘플링된 오디오가 허용가능하면, CalculatePitch ( ) 방법은 평균 크기 차이 함수(AMDF)를 계산하고, 피치 주기의 추정치에 대응하는 피크를 선택하기 위한 로직을 실행한다. 추가적인 프로세싱이 그 추정치를 개선시킨다. 예를 들어, 몇몇 실시예들에서, 피크 및 인접한 샘플들의 포물선 보간이 이용될 수 있다. 몇몇 실시예들 및 주어진 적절한 연산 대역폭에서, 추가적인 AMDF는, 더 양호한 주파수 분해능을 얻기 위해 피크 샘플 주위에서 더 높은 샘플링 레이트로 실행될 수 있다.3) Call a pitch detector (PitchDetector:: CalculatePitch ()) that first checks to see if the sampled audio signal has sufficient amplitude and the sampled audio is not too noisy (excessive zero crossings) to proceed . If the sampled audio is acceptable, the CalculatePitch () method computes an average magnitude difference function (AMDF) and executes logic to select a peak corresponding to an estimate of the pitch period. Additional processing improves that estimate. For example, in some embodiments, parabolic interpolation of peaks and adjacent samples may be used. In some embodiments and given appropriate computation bandwidth, the additional AMDF may be performed at a higher sampling rate around the peak sample to obtain better frequency resolution.

4) 22kHz 샘플링 레이트에서 (더 높은 품질 및 중첩 정확도를 위해) 피치-동기식 중첩 가산(PSOLA) 기술을 이용함으로써, 메인 음성을 스코어-코딩된 목표 피치로 시프팅한다. PSOLA 구현(Smola : : Pitchshiftvoice ( ) )은, 원하는 보정을 특정하기 위해 요구되는 정보(검출된 피치, 피치 목표 등)를 포함하는 클래스 변수들 및 데이터 구조들로 호출된다. 일반적으로, (멜로디 음표 트랙에 대응하여 빈번하게 변하는) 스코어-코딩된 목표들에 기초하여 그리고 현재의 스케일/모드 세팅들에 따라 목표 피치가 선택된다. 스케일/모드 세팅들은 특정한 보컬 연주 동안 업데이트될 수 있지만, 통상적으로 스코어-코딩된 정보에 기초하여 너무 자주 업데이트되지는 않고, 사용자 선택들에 기초한 아카펠라 또는 프리스타일 모드에서 업데이트될 수 있다. 4) By shifting the main voice to a score-coded target pitch by using a Pitch-Synchronous Overlap Addition (PSOLA) technique (for higher quality and overlapping accuracy) at a 22 kHz sampling rate. The PSOLA implementation (Smola:: PitchshiftVoice ()) is invoked with class variables and data structures that contain the information required to specify the desired correction (detected pitch, pitch target, etc.). Generally, the target pitch is selected based on the score-coded targets (which frequently change in response to the melody note track) and according to the current scale / mode settings. Scale / mode settings may be updated during a particular vocal performance, but may not be updated too frequently, typically based on score-coded information, and may be updated in an a capella or freestyle mode based on user selections.

PSOLA 기술들은 파형의 리샘플링을 용이하게 하여 스플라이스의 비주기적 영향들을 감소시키면서 피치-시프팅된 변형을 생성하고, 본 기술 분야에 주지되어 있다. PSOLA 기술들은, 만약 중첩의 세그먼트 동안 2개의 주기적인 파형들 간에 크로스 페이드한다면, 파형들의 주기적 진동 내의 유사한 포인트들(예를 들어, 이상적으로 대략 동일한 기울기를 가지며, 포지티브로 진행하는 제로 크로싱들에서)에서 2개의 주기적인 파형들을 스플라이스하는 것이 훨씬 더 부드럽게 가능하다는 관찰을 기반으로 한다. 예를 들어, 샘플들 {a, b, c, ...} 및 인덱스들 0, 1, 2, ...을 갖는PSOLA techniques are well known in the art to facilitate the resampling of waveforms to produce pitch-shifted distortions while reducing the aperiodic effects of the splice. PSOLA techniques, if cross-faded between two periodic waveforms during overlapping segments, are similar points in the periodic oscillation of the waveforms (e.g., at zero crossings, which ideally have approximately the same slope and proceed positive) It is based on the observation that it is much smoother to splice two periodic waveforms. For example, the samples {a, b, c, ...} and the indices 0, 1, 2,

과 같은 준주기적 시퀀스에 대해서 (여기서, .1 심볼들은 주기로부터의 편차를 나타냄), 뒤로 점프하거나 어딘가 앞으로 가기를 원하면 인덱스들 2 및 10에서 포지티브 진행하는 c-d 전이를 선택할 수 있고, 단순히 점프하는 대신에, 인덱스 10/18에서 (0*c + 1*c.1)에 도달할 때까지 (Where .1 symbols represent deviations from the period), and if you want to jump backward or go somewhere forward, you can choose positive progressive cd transitions in indices 2 and 10, and instead of simply jumping Until (0 * c + 1 * c.1) is reached at index 10/18

와 같이 램핑하여, 앞으로 1 주기 (8 인덱스) 점프하지만 편집 포인트에서 비주기성이 덜 분명해지도록 만들 수 있다. 검출할 수 있는 주기에 가장 근접한 주기인 8개의 샘플들에서 행하기 때문에, 이것은 피치 동기식이다. 이 크로스 페이드는 선형/삼각형 중첩-가산이지만, (더 일반적으로) 상보적 코사인, 1-코사인 또는 원하는대로의 다른 함수들을 이용할 수 있다는 것을 유의해야 한다.(8 indexes) but make the acyclicity less obvious at the edit point. This is pitch synchronous because it is performed on 8 samples which is the cycle closest to the detectable period. Note that this crossfade is a linear / triangular overlap-add, but (more generally) complementary cosine, 1-cosine or other functions as desired.

5) PSOLA 및 선형 예측 코딩(LPC) 기술들 둘 모두를 이용하는 방법을 이용하여 화성 음성들을 생성한다. 화성 음표들은 현재의 세팅들에 기초하여 선택되는데, 이는 스코어-코딩된 화성 목표들에 따라 종종 변하거나, 프리스타일에서는 사용자에 의해 변경될 수 있다. 이들은, 앞서 설명된 바와 같은 목표 피치들이지만, 화성들을 위해 일반적으로 더 큰 피치 시프팅이 주어지면, 상이한 기술이 이용될 수 있다. (이제, 22k 또는 선택적으로는 44k인) 메인 음성은, 앞서 설명된 바와 같은 PSOLA 기술들을 이용하여 목표로 피치-보정된다. 유사하게, 각각의 화성들로의 피치 시프팅들이 PSOLA 기술들을 이용하여 수행된다. 그 다음, 각각의 화성에 대해 나머지 신호를 생성하기 위해 각각에 선형 예측 코딩(LPC)이 적용된다. LPC는, 피치-시프팅된 나머지들에 적용할 스펙트럼 템플릿을 유도하기 위해, 11k(또는 선택적으로 22k)에서 메인 피치-미보정 음성에 적용된다. 이것은, 헤드-사이즈 변조 문제(상향 시프팅에 대한 치프멍크(chipmunk) 또는 먼치킨화(munchkinification) 또는 하향 시프팅에 대해 사람들의 목소리를 다쓰 베이더(Darth Vader)처럼 만드는 것)를 회피하는 경향이 있다.5) Generate Mars voices using methods that use both PSOLA and LPC techniques. Mars notes are selected based on current settings, which often change according to score-coded Mars goals, or can be changed by the user in Freestyle. These are the target pitches as described above, but different techniques can be used if generally greater pitch shifting is given for Mars. The main voice (now 22k or alternatively 44k) is pitch-corrected to the target using PSOLA techniques as described above. Similarly, pitch shifting to each Mars is performed using PSOLA techniques. Linear Predictive Coding (LPC) is then applied to each to generate the residual signal for each Mars. The LPC is applied to the main pitch-uncompensated voice at 11k (or alternatively 22k) to derive a spectral template to apply to the pitch-shifted remainders. This tends to avoid head-size modulation problems (making people's voices like Darth Vader for chipmunk or munchkinification or downward shifting for upward shifting) .

6) 마지막으로, 나머지들은 함께 믹싱되고, 메인 피치-미보정된 음성 신호에 대해 유도되는 LPC 계수들에 의해 정의되는 필터를 이용하여 각각의 피치-시프팅된 화성들을 재합성하기 위해 이용된다. 그 다음, 피치-시프팅된 화성들의 결과적 믹스는 피치-보정된 메인 음성과 믹싱된다.6) Finally, the remainder are mixed together and used to reconstruct each pitch-shifted Mars using a filter defined by LPC coefficients derived for the main pitch-uncompensated speech signal. The resulting mix of pitch-shifted harmonics is then mixed with the pitch-corrected main audio.

7) 결과적 믹스는 44.1k까지 다시 업샘플링되고, 반주와 믹싱되거나 (프리스타일 모드에서는 제외됨), 또는 재생을 위한 오디오 서브시스템으로의 핸드오프를 위해 버퍼링되는 그의 향상된 음질의 변형과 믹싱된다. 당업자들에 의해 인식될 바와 같이, AMDF 계산들은, 신호의 주기성을 측정하기에 적합한 단지 하나의 시간 도메인 연산 기술이다. 더 일반적으로, 래그-도메인 피리오도그램이란 용어는, 신호의 이산 시간 샘플들 x(n)의 시리즈 또는 시간 도메인 함수를 입력으로서 취하고, 그 함수 또는 신호를 일련의 지연들에서 자기 자신과 (즉, 래그-도메인에서) 비교하여 원래 함수 x의 주기성을 측정하는 함수를 설명한다. 이것은 관심있는 래그들에서 행해진다. 따라서, 본 명세서에서 설명되는 기술들에 대하여, 피치 검출을 위한 적절한 래그-도메인 피리오도그램 연산들의 예들은, 현재의 블록에 대해 캡쳐된 보컬 입력 신호 x(n)을 그와 동일한 것의 래그된 버전으로부터 감산하는 것(차이 함수), 또는 그 감산의 절대값을 취하는 것(AMDF), 또는 신호를 그의 지연된 버전과 곱하고 값들을 합산하는 것(자기상관)을 포함한다.7) The resulting mix is re-sampled up to 44.1k, mixed with accompaniment (mixed in freestyle mode), or mixed with its enhanced sound quality variations buffered for handoff to the audio subsystem for playback. As will be appreciated by those skilled in the art, AMDF computations are only one time domain computation technique suitable for measuring the periodicity of a signal. More generally, the term rag-domain pyrimogram takes as input a series or time domain function of the discrete-time samples x (n) of the signal and determines the function or signal as a series of delays, , And lag-domain) to determine the periodicity of the original function x. This is done in interesting lugs. Thus, for the techniques described herein, examples of suitable lag-domain pyramidogram operations for pitch detection include a lagged version of the vocal input signal x (n) captured for the current block, (Difference function), or taking the absolute value of the subtraction (AMDF), or multiplying the signal by its delayed version and summing the values (autocorrelation).

AMDF는 입력 신호의 주파수 성분들에 대응하는 주기들에서 밸리(valley)를 나타낼 것이며, 자기상관은 피크를 나타낼 것이다. 신호가 비주기적이면(예를 들어, 잡음), 피리오도그램들은 제로 래그 포지션에서를 제외하고는 어떠한 명확한 피크들 또는 밸리들을 나타내지 않을 것이다. 수학적으로,AMDF will represent a valley in cycles corresponding to the frequency components of the input signal, and autocorrelation will indicate a peak. If the signal is aperiodic (e.g., noise), the pyrograms will not exhibit any distinct peaks or valleys except at the zero lag position. Mathematically,

이다.to be.

본 명세서에서 설명되는 구현들의 경우, AMDF-기반 래그-도메인 피리오도그램 계산들은, 심지어 현세대의 모바일 디바이스들의 연산 설비들을 이용해서도 효율적으로 수행될 수 있다. 그럼에도 불구하고, 본 명세서의 설명에 기초하여 당업자들은, 현재 또는 장래에, 주어진 목표 디바이스 또는 플랫폼 상에서 연산하기 쉬울 수 있는 임의의 다양한 피치 검출 기술들을 확립하는 구현들을 인식할 것이다.
In the implementations described herein, AMDF-based lag-domain pyramid calculations can be performed efficiently even using computing facilities of current generation mobile devices. Nonetheless, those skilled in the art, based on the description herein, will recognize implementations that presently or in the future establish any of a variety of pitch detection techniques that may be easier to operate on a given target device or platform.

"공개 모집 오디션"에 대한 응답으로 보컬 연주의 부착Attachment of vocal performance in response to "public recruitment audition"

일단 보컬 연주가 핸드헬드 디바이스에서 캡쳐되면, 캡쳐된 보컬 연주 오디오(통상적으로, 드라이 보컬이지만 선택적으로는 피치 보정됨)는 오디오 코덱(예를 들어, 진보된 오디오 코딩(AAC) 또는 ogg/보비스 코덱)을 이용하여 압축되고 컨텐츠 서버에 업로드된다. 도 1, 도 2 및 도 3 각각은 이러한 업로드를 도시한다. 일반적으로, 컨텐츠 서버(예를 들어, 컨텐츠 서버(110, 310))는 그 다음, 선택된 보컬 효과(EFX) 스케줄 및 적용가능한 스코어-코딩된 피치 보정 세트들에 따라, 업로드된 드라이 보컬들을 프로세싱한다(112, 312). 그 다음, 컨텐츠 서버는, 이러한 캡쳐되고, 피치-보정되고, EFX 적용된 보컬 연주 인코딩을 다른 컨텐츠와 리믹스한다(111, 311). 예를 들어, 컨텐츠 서버는, 믹싱된 연주의 고음질의 마스터 오디오를 생성하기 위해, 이러한 보컬들을 높은 품질 또는 음질의 악기(및/또는 배경 보컬) 트랙과 믹싱할 수 있다. 도 1에 도시되고 본 명세서에서 설명되는 바와 같이, 다른 캡쳐된 보컬 연주들이 또한 믹싱될 수 있다.Once the vocalist is captured on a handheld device, the captured vocal performance audio (typically dry vocals, but optionally pitch corrected) is converted to an audio codec (e.g., Advanced Audio Coding (AAC) or ogg / And uploaded to the content server. Figures 1, 2 and 3 each illustrate such an upload. Generally, the content server (e.g., content server 110, 310) then processes the uploaded dry vocals according to the selected vocal effects (EFX) schedule and applicable score-coded pitch correction sets (112, 312). The content server then remixes these captured, pitch-corrected, and EFX applied vocal performance encodings with other content (111, 311). For example, the content server may mix these vocals with high quality or quality musical instrument (and / or background vocal) tracks to produce high quality master audio of the mixed performance. As shown in FIG. 1 and described herein, other captured vocal performances may also be mixed.

일반적으로, 결과적 마스터는 그 다음, 적절한 코덱(예를 들어, AAC 코덱)을 이용하여 다양한 비트 레이트들로 그리고/또는 중요성이 부여된 선택된 보컬과 인코딩되어, 캡쳐 핸드헬드 디바이스(및/또는 다른 원격 디바이스들)로 다시 스트리밍되기에 적합하고, 웹을 통해 스트리밍/재생하기에 적합한 압축된 오디오 파일들을 생성한다. 일반적으로, 통상적으로 활용되는 무선 네트워크들의 능력들에 비해, 업로드된 데이터를, 필요한 시점 및 장소에서 믹싱하면서 보컬 연주를 표현하기 위해 필요한 데이터로 제한하는 것이 오디오 데이터 대역폭 관점에서 바람직할 수 있다. 몇몇 경우들에서, 제 2 (또는 제 N) 생성 반주로서 재생 또는 이용하기 위해 스트리밍되는 데이터는 가청 렌더링 목표에서 제 1 생성 반주와의 믹싱을 위해 보컬 트랙들을 별개로 인코딩할 수 있다. 일반적으로, 핸드헬드 디바이스와 컨텐츠 서버 사이에서 보컬 및/또는 반주 오디오 교환은, 이용가능한 데이터 통신 채널의 품질 및 능력들에 적응될 수 있다.Generally, the resulting master is then encoded with the selected vocals at various bit rates and / or with significance using an appropriate codec (e.g., an AAC codec), and then transmitted to the capture handheld device (and / Devices) and generates compressed audio files suitable for streaming / playback over the web. In general, it may be desirable in terms of audio data bandwidth to limit uploaded data to the data needed to represent the vocal performance while mixing at the required point-in-time and place, compared to the capabilities of wireless networks that are typically utilized. In some cases, the data streamed for playback or use as a second (or Nth) generation accompaniment may independently encode the vocal tracks for mixing with the first production accompaniment in an audible rendering target. Generally, vocal and / or accompaniment audio exchanges between the handheld device and the content server may be adapted to the quality and capabilities of the available data communication channels.

본 발명의 몇몇 실시예들에서, 공개 모집 오디션 핸들링을 용이하게 하는 특정한 소셜 네트워크 구성들에 대해, 추가적인 또는 대안적인 믹스들이 바람직할 수 있다. 예를 들어, 몇몇 실시예들에서, 초기 또는 이전의 기여자로부터 캡쳐된 피치-보정되고 EFX 적용된 보컬들의 부착은, (예를 들어, 다른 핸드헬드 디바이스에서) 다른 사용자/보컬리스트로부터의 후속 보컬 캡쳐에서 이용되는 반주의 기반을 형성할 수 있다. 따라서, 본 명세서에서 도시되고 설명되는 반주들의 공급 및 이용에 대하여, 캡쳐되고, 피치-보정되고, EFX 적용된(그리고, 통상적은 아니지만 가능하게는 화성화된) 보컬들은 스스로 믹싱되어, 후속 보컬 캡쳐를 모티베이션, 가이드 또는 프레이밍하기 위해 이용되는 "배경 트랙"을 생성할 수 있음을 이해할 것이다.In some embodiments of the invention, additional or alternative mixes may be desirable for certain social network configurations that facilitate open recruitment audition handling. For example, in some embodiments, the attachment of the pitch-corrected and EFX-applied vocals captured from the initial or previous contributor may result in a subsequent vocal capture (e.g., from another user's handheld device) Can form the basis of the accompaniment to be used in. Thus, for the supply and use of the accompaniments shown and described herein, the captured, pitch-corrected, EFX applied (and, though not usually, but possibly harmonized) vocals are themselves mixed so that subsequent vocal captures &Quot; background track "that is used for motion, guide, or framing.

일반적으로, 추가적인 보컬리스트들이 특정한 부분(예를 들어, 테너, 듀엣에서 파트 B 등)을 노래하도록 또는 단순히 서명하도록 초대될 수 있고, 이 때, 컨텐츠 서버(110)는 그들의 캡쳐된 보컬들을 피치 시프팅하고, 공개 모집 오디션 또는 가상 합창단 내의 하나 이상의 포지션들에 배치할 수 있다. 통상적으로, 공개 모집 오디션을 개시한 사용자-보컬리스트는, 후속적으로 부착되는 보컬 연주들이 슬롯팅되거나 배치되는 (적용가능한 피치 큐들 및/또는 적용된 EFX에 의해, 연주 템플릿/블루프린트에 의해 또는 시간상으로 특성화되는) 슬롯들 또는 포지션들을 선택한다. 믹싱된 보컬들이 이러한 반주에 포함될 수 있지만, 예시되고 설명된 시스템들은 개별적인 보컬 연주들을 별개로 캡쳐하고 보컬 효과 스케줄들 및 피치-보정을 적용하기 때문에, 컨텐츠 서버(예를 들어, 컨텐츠 서버(110))는, 공개 모집 오디션을 개시한 사용자 보컬리스트의 감성을 수용하거나 가상 합창단의 객체들을 추가하는 방식으로 믹스들을 조작(112)하는 위치에 있음을 이해할 것이다.In general, additional vocalists may be invited to sing or simply sign a particular part (e.g., tenor, part B, etc. in a duet), at which time the content server 110 sends their captured vocals to a pitch- And may be placed in one or more positions within a public recruitment audition or virtual choir. Typically, a user-vocalist who initiates an open recruitment audition will be able to play the vocals that are subsequently attached to the vocabulary, such that the vocal performances that are subsequently attached are slotted or arranged (by applicable pitch cues and / or by applied EFX, ) &Lt; / RTI > Mixed vocals may be included in this accompaniment, but since the illustrated and described systems separately capture individual vocal performances and apply vocal effect schedules and pitch-correction, a content server (e.g., content server 110) ) Will understand that it is in a position to manipulate (112) the mixes in such a way as to accommodate the emotion of the user vocalist who initiated the public offer audition or add the objects of the virtual choir.

예를 들어, 본 발명의 몇몇 실시예들에서, 3명의 상이한 기여 보컬리스트들의 대안적인 믹스들이 다양한 방식들로 제시될 수 있다. 제 1 기여자에게(또는 그를 위해) 제공되는 믹스들은, 그 제 1 기여자의 보컬을 (예를 들어, 메인 멜로디에 대한 적절한 피치 보정으로 리드 보컬로서, 그리고 아티스트-, 노래-, 연주- 또는 음악 장르-특정 보컬 효과(EFX) 스케줄의 적용으로) 다른 두명의 보컬들보다 더 중요하게 피쳐링할 수 있다. 일반적으로, 컨텐츠 서버(110)는, 여기서 다양한 캡쳐된 보컬들에 적용되는 피치 보정들 및 EFX를 조작함으로써, 하나의 보컬 연주가 다른 것들보다 더 중요해지도록 믹스들을 변경할 수 있다.
For example, in some embodiments of the invention, alternative mixes of three different contributing vocalists may be presented in various ways. The mixes provided to (or for) the first contributor may include a vocal of the first contributor (e.g., as a lead vocal with appropriate pitch correction for the main melody, and an artist-, song-, performance- or musical genre - can be featured more important than the other two vocals (with the application of certain vocal effects (EFX) schedules). In general, the content server 110 may modify the mixes so that one vocal performance becomes more important than the others, by manipulating the pitch corrections and EFX applied to the various captured vocals here.

월드 스테이지World Stage

본 명세서의 설명 대부분은 보컬 연주 캡쳐와, 피치 보정과, 사용자 자신의 보컬 연주의 믹스 및 캡쳐에 대한 반주의 제 1 및 제 2 인코딩들 각각의 사용에 초점을 두지만, 원격으로 캡쳐된 다른 사람들의 연주들의 가청 렌더링을 위한 설비가, 몇몇 상황들 또는 실시예들에서 제공될 수 있음을 이해할 수 있을것이다. 이러한 상황들 또는 실시예들에서, 보컬 연주 캡쳐는 다른 디바이스에서 발생하고, 그에 해당하는 캡쳐된(그리고 통상적으로 피치-보정된) 보컬 연주의 인코딩이 현재의 디바이스에서 수신된 후, 이것은, 지구본 상의 특정한 위치로부터 발신하는 보컬 연주를 나타내는 시각적 디스플레이 애니메이션과 관련하여 가청 렌더링된다. 도 1은, 핸드헬드(120)에서 이러한 시각적 디스플레이 애니메이션의 스냅샷을 도시하며, 본 도시는, (스냅샷으로 도시된 바와 같이) 핸드헬드(120)가 앞서 길게 설명된 캡쳐 및 피치-보정 모드가 아닌 플레이(또는 청취자) 모드에서 동작하고 있다는 것을 제외하고는, 핸드헬드 디바이스 인스턴스들(101 및 301)(도 3 참조)을 참조하여 설명되고 도시된 것과 같은 프로그래밍된 모바일 폰(또는 다른 휴대용 컴퓨팅 디바이스)의 다른 인스턴스로서 이해될 것이다. Most of the description in this specification focuses on the use of each of the first and second encodings of the accompaniment for vocal performance capture, pitch correction, and mixing and capturing of the user's own vocal performance, It will be appreciated that facilities for the audible rendering of performances of the present invention may be provided in some situations or embodiments. In these situations or embodiments, the vocal performance capture occurs at another device, and after the corresponding encoded (and typically pitch-corrected) vocal performance encoding is received at the current device, And is rendered audibly in conjunction with a visual display animation that represents a vocal performance originating from a particular location. Figure 1 illustrates a snapshot of this visual display animation in handheld 120 which shows the handheld 120 in the capture and pitch-correction mode (as shown in the snapshot) (Or other portable computing device), such as those described and illustrated with reference to handheld device instances 101 and 301 (see FIG. 3), except that the handheld device instances 101 and 301 are operating in a play (or listener) Lt; / RTI > device).

사용자가 핸드헬드 애플리케이션을 실행하고, 이 플레이(또는 청취자) 모드에 액세스하는 경우, 월드 스테이지가 제시된다. 더 구체적으로, 핸드헬드의 현재 네트워크 접속 상태 및 재생 선호도(예를 들어, 랜덤 글로벌, 가장 인기있는 것, 나의 연주들 등)를 리포팅하는 컨텐츠 서버(110)에 대한 네트워크 접속이 행해진다. 이 파라미터들에 기초하여, 컨텐츠 서버(110)는, 연주(예를 들어, 초기에 핸드헬드 디바이스 인스턴스(101 또는 301)에서 캡쳐되었을 수 있는, 피치-보정되고 EFX 적용된 보컬 연주)를 선택하고, 그와 연관된 메타데이터를 송신한다. 몇몇 구현들에서, 메타데이터는, 핸드헬드(120)가 실제 오디오 스트림(파이프의 사이즈에 따라 높은 품질 또는 낮은 품질임)을 검색하도록 허용하는 URL(uniform resource locator) 뿐만 아니라, 추가적인 정보, 예를 들어, 보컬 연주 캡쳐의 (GPS를 이용하여) 지오코딩된 위치(화성들 또는 백업 보컬들로서 포함되는 추가적인 보컬 연주들에 대한 지오코드들을 포함함) 및 특정한 연주에 대해 좋아하거나, 태깅하거나, 코멘트를 남긴 다른 청취자들의 속성들을 포함한다. 몇몇 실시예들에서, 청취자 피드백 자체가 지오코딩된다. 재생 동안, 사용자는 연주를 태깅할 수 있고, 후속 청취자 및/또는 원래의 보컬 연주자를 위해 자기 자신의 피드백 또는 코멘트를 남길 수 있다. 일단 연주가 태깅되면, 연주자와 청취자 사이에 관계가 설정될 수 있다. 몇몇 경우들에서, 청취자는, 동일한 연주자에 의한 추가적인 연주들에 대해 필터링하도록 허용될 수 있고, 서버는 또한, 사용자 선호도들의 평가에 기초하여 사용자가 청취할 새로운 "랜덤" 연주들을 더 지능적으로 제공할 수 있다.When a user runs a handheld application and accesses this play (or listener) mode, a world stage is presented. More specifically, a network connection is made to the content server 110 that reports the current network connection status and playback preferences of the handheld (e.g., random global, most popular, my performances, etc.). Based on these parameters, the content server 110 selects a performance (e.g., a pitch-corrected and EFX applied vocal performance that may have been captured in the handheld device instance 101 or 301 initially) And transmits metadata associated therewith. In some implementations, the metadata may include additional information, such as a uniform resource locator (URL) that allows the handheld 120 to search for an actual audio stream (which is of high or low quality depending on the size of the pipe) For example, a geocoded location (including geocodes for additional vocal performances included as Mars or backup vocals) of vocal performance capture (using GPS) and a favorite, tagged, or commented And attributes of other listeners left behind. In some embodiments, the listener feedback itself is geocoded. During playback, the user can tag the performance and leave his own feedback or comments for subsequent listeners and / or original vocalists. Once the performance is tagged, a relationship can be established between the player and the listener. In some cases, the listener may be allowed to filter for additional performances by the same performer, and the server may also be able to more intelligently provide new "random" performances that the user will listen to based on an evaluation of user preferences .

스냅샷에 특별히 도시되지는 않지만, 지오코딩된 메타데이터에 따라, 대응하는 청취자 피드백을 송신한 각각의 지리적 위치들을 제시하기 위한 위치들에서, 지오코딩된 청취자 피드백 표시들이 지구본 상에 (예를 들어, 별표 또는 "썸업(thumbs up)" 등으로) 제시되거나 또는 선택적으로 제시될 수 있음을 인식할 것이다. 몇몇 실시예들에서, 시각적 디스플레이 애니메이션은 대화형이고, 핸드헬드(120)의 터치 스크린 디스플레이에서 캡쳐되는 사용자 인터페이스 제스쳐들에 대응하는 뷰포인트 조작에 종속됨을 추가로 인식할 것이다. 예를 들어, 몇몇 실시예들에서, 시각적 디스플레이 애니메이션의 지구본의 디스플레이된 이미지에 걸쳐 손가락 또는 스타일러스를 이동하면, 손가락 또는 스타일러스 이동 방향에 대해 일반적으로 직교하는 축을 중심으로 지구본이 회전하게 한다. 지구본 상의 특정한 위치로부터 발신하는 보컬 연주를 나타내는 시각적 디스플레이 애니메이션과 청취자 피드백 표시, 둘 모두는 이와 같이 상호작용하며 회전하는 지구본 사용자 인터페이스 내에서 그들 각각의 지오태그에 따른 위치들에서 제시된다.
Although not specifically shown in the snapshot, geocoded listener feedback indications may be displayed on the globe (e.g., at locations to present respective geographic locations that have transmitted corresponding listener feedback, in accordance with the geocoded metadata) Quot ;, asterisk, or "thumbs up"), or may be presented selectively. In some embodiments, the visual display animation is interactive and will further recognize that it is subject to a viewpoint operation corresponding to user interface gestures captured in the touch screen display of the handheld 120. [ For example, in some embodiments, moving a finger or stylus over a displayed image of a globe of a visual display animation causes the globe to rotate about an axis that is generally orthogonal to the direction of finger or stylus movement. Both the visual display animation representing the vocal performance originating from a particular location on the globe and the listener feedback display are thus presented in locations that correspond to their respective geotags within the interacting and rotating globe user interface.

예시적인 모바일 Exemplary mobile 디바이스device

도 4는, 본 발명의 몇몇 실시예들에 따른 소프트웨어 구현들의 실행을 위한 플랫폼으로서 기능할 수 있는 모바일 디바이스의 특징부들을 도시한다. 더 구체적으로, 도 4는, 일반적으로 iPhone™ 모바일 디지털 디바이스의 상업적으로 이용가능한 버전들에 따른 모바일 디바이스(400)의 블록도이다. 본 발명의 실시예들이 iPhone 활용들 또는 애플리케이션들에(또는 심지어, iPhone-타입 디바이스들에도) 제한되지 않지만, iPhone 디바이스는 그의 풍부한 센서 보완물, 멀티미디어 설비들, 애플리케이션 프로그래머 인터페이스들 및 무선 애플리케이션 전달 모델과 함께, 특정한 구현들을 활용할 높은 능력의 플랫폼을 제공한다. 본 명세서의 설명에 기초하여, 당업자들은, 본 명세서에서 설명되는 창작적 기술들의 주어진 구현 또는 활용에 (현재 또는 추후에) 적합할 수 있는 광범위한 추가적인 모바일 디바이스 플랫폼들을 인식할 것이다.Figure 4 illustrates features of a mobile device that may serve as a platform for execution of software implementations in accordance with some embodiments of the present invention. More specifically, FIG. 4 is a block diagram of a mobile device 400, generally according to commercially available versions of an iPhone ™ mobile digital device. While embodiments of the present invention are not limited to iPhone applications or applications (or even to iPhone-type devices), the iPhone device may include its rich sensor complement, multimedia facilities, application programmer interfaces, and wireless application delivery model Along with a high-capacity platform for leveraging specific implementations. Based on the description herein, one of ordinary skill in the art will recognize a wide variety of additional mobile device platforms that may be suitable for a given implementation or use of the inventive techniques described herein (presently or in the future).

간략하게 요약하면, 모바일 디바이스(400)는, 사용자와의 햅틱 및/또는 촉각 접촉에 감응할 수 있는 디스플레이(402)를 포함한다. 터치-감응 디스플레이(402)는 다수의 동시 터치 포인트들을 프로세싱하는 멀티-터치 특징들을 지원할 수 있고, 이는 각각의 터치 포인트의 압력, 정도 및/또는 위치와 관련된 데이터의 프로세싱을 포함한다. 이러한 프로세싱은 다수의 손가락들, 화음, 및 다른 상호작용들과의 상호작용들 및 제스쳐들을 용이하게 한다. 물론, 예를 들어, 스타일러스 또는 다른 포인팅 디바이스를 이용하여 접촉이 행해지는 디스플레이와 같은 다른 터치-감응 디스플레이 기술들 또한 이용될 수 있다.Briefly summarized, the mobile device 400 includes a display 402 that is sensitive to haptic and / or haptic contact with a user. The touch-sensitive display 402 may support multi-touch features for processing multiple simultaneous touch points, including processing of data associated with the pressure, degree and / or position of each touch point. This processing facilitates interactions and gestures with multiple fingers, chords, and other interactions. Of course, other touch-sensitive display technologies, such as, for example, a display in which a contact is made using a stylus or other pointing device, may also be utilized.

통상적으로, 다양한 시스템 오브젝트들에 대한 사용자 액세스를 제공하고, 정보를 전달하기 위하여, 모바일 디바이스(400)는 터치-감응 디스플레이(402) 상에 그래픽 사용자 인터페이스를 제시한다. 몇몇 구현들에서, 그래픽 사용자 인터페이스는 하나 이상의 디스플레이 오브젝트들(404, 406)을 포함할 수 있다. 도시된 예에서, 디스플레이 오브젝트들(404, 406)은 시스템 오브젝트들의 그래픽 표현들이다. 시스템 오브젝트들의 예들은, 디바이스 기능들, 애플리케이션들, 윈도우들, 파일들, 경보들, 이벤트들, 또는 다른 식별가능한 시스템 오브젝트들을 포함한다. 본 발명의 몇몇 실시예들에서, 애플리케이션들은 실행되는 경우, 본 명세서에서 설명되는 디지털 음향 기능 중 적어도 일부를 제공한다.Typically, the mobile device 400 presents a graphical user interface on the touch-sensitive display 402 to provide user access and convey information to the various system objects. In some implementations, the graphical user interface may include one or more display objects 404,406. In the illustrated example, display objects 404 and 406 are graphical representations of system objects. Examples of system objects include device functions, applications, windows, files, alerts, events, or other identifiable system objects. In some embodiments of the invention, the applications, when executed, provide at least some of the digital acoustic functions described herein.

통상적으로, 모바일 디바이스(400)는, 예를 들어, 사용자가 모바일 디바이스(400) 및 그의 연관된 네트워크-인에이블드 기능들을 갖고 이동할 수 있도록 모바일 라디오 및 무선 인터네트워킹 기능 둘 모두를 포함하는 네트워크 접속을 지원한다. 몇몇 경우들에서, 모바일 디바이스(400)는 (예를 들어, Wi-Fi, 블루투스 등을 통해) 인근의 다른 디바이스들과 상호작용할 수 있다. 예를 들어, 모바일 디바이스(400)는 하나 이상의 디바이스들에 대한 피어들 또는 기지국과 상호작용하도록 구성될 수 있다. 따라서, 모바일 디바이스(400)는 다른 무선 디바이스들에 대한 네트워크 액세스를 승인 또는 거부할 수 있다.The mobile device 400 typically includes a network connection that includes both mobile radio and wireless internetworking capabilities so that the user can move with the mobile device 400 and its associated network- Support. In some cases, the mobile device 400 may interact with other nearby devices (e.g., via Wi-Fi, Bluetooth, etc.). For example, the mobile device 400 may be configured to interact with peers or base stations for one or more devices. Thus, the mobile device 400 may grant or deny network access to other wireless devices.

모바일 디바이스(400)는, 다양한 입/출력(I/O) 디바이스들, 센서들 및 트랜스듀서들을 포함한다. 예를 들어, 본 명세서의 다른 곳에서 설명된 바와 같은 보컬 연주들의 캡쳐 및 반주들과 믹싱되고 피치-보정된 보컬 연주의 가청 렌더링과 같은 오디오를 용이하게 하기 위한 스피커(460) 및 마이크로폰(462)이 통상적으로 포함된다. 본 발명의 몇몇 실시예들에서, 스피커(460) 및 마이크로폰(662)은 본 명세서에서 설명되는 기술들에 대한 적절한 트랜스듀서들을 제공할 수 있다. 스피커 폰 기능들과 같은 핸즈프리 음성 기능을 용이하게 하기 위해 외부 스피커 포트(464)가 포함될 수 있다. 헤드폰들 및/또는 마이크로폰의 이용을 위해 오디오 잭(466)이 또한 포함될 수 있다. 몇몇 실시예들에서, 본 명세서에서 설명되는 기술들에 대한 트랜스듀서로서 외부 스피커 및/또는 마이크로폰이 이용될 수 있다.The mobile device 400 includes various input / output (I / O) devices, sensors and transducers. For example, speakers 460 and microphone 462 for facilitating audio, such as audio rendering of pitch-corrected vocal performances mixed with captures and accompaniment of vocal performances as described elsewhere herein, Are usually included. In some embodiments of the invention, the speaker 460 and the microphone 662 may provide appropriate transducers for the techniques described herein. An external speaker port 464 may be included to facilitate a hands-free voice function, such as speaker phone functions. An audio jack 466 may also be included for use with headphones and / or a microphone. In some embodiments, external speakers and / or microphones may be used as transducers for the techniques described herein.

다른 센서들이 또한 이용되거나 제공될 수 있다. 모바일 디바이스(400)의 사용자 위치의 검출을 용이하게 하기 위해 근접도 센서(468)가 포함될 수 있다. 몇몇 구현들에서, 터치-감응 디스플레이(402)의 밝기를 조정하는 것을 용이하게 하기 위해 주위 광 센서(470)가 활용될 수 있다. 방향 화살표(474)로 표시된 바와 같이, 모바일 디바이스(400)의 이동을 검출하기 위해 가속도계(472)가 활용될 수 있다. 따라서, 디스플레이 오브젝트들 및/또는 미디어는 검출된 배향, 예를 들어, 초상 또는 경치에 따라 제시될 수 있다. 몇몇 구현들에서, 모바일 디바이스(400)는, 본 명세서에서 설명되는 지오코딩들을 용이하게 하기 위해 글로벌 포지셔닝 시스템(GPS) 또는 다른 포지셔닝 시스템들(예를 들어, Wi-Fi 액세스 포인트들, 텔레비젼 신호들, 셀룰러 그리드들, URL들(Uniform Resource Locators)을 이용한 시스템들)에 의해 제공되는 것과 같은 위치 결정 능력을 지원하기 위한 회로 및 센서들을 포함할 수 있다. 모바일 디바이스(400)는 또한 카메라 렌즈 및 센서(480)를 포함할 수 있다. 몇몇 구현들에서, 카메라 렌즈 및 센서(480)는 모바일 디바이스(400)의 후면 상에 위치될 수 있다. 카메라는 캡쳐된 피치-보정된 보컬들과 연관된 스틸 이미지들 및/또는 비디오를 캡쳐할 수 있다.Other sensors may also be used or provided. Proximity sensor 468 may be included to facilitate detection of the user location of mobile device 400. [ In some implementations, an ambient light sensor 470 may be utilized to facilitate adjusting the brightness of the touch-sensitive display 402. Accelerometer 472 may be utilized to detect movement of mobile device 400, as indicated by directional arrow 474. Thus, display objects and / or media may be presented in accordance with the detected orientation, e.g., portrait or landscape. In some implementations, the mobile device 400 may include a Global Positioning System (GPS) or other positioning systems (e.g., Wi-Fi access points, television signals, etc.) to facilitate the geocodes described herein , Cellular grids, systems using Uniform Resource Locators (URLs), etc.). The mobile device 400 may also include a camera lens and a sensor 480. In some implementations, the camera lens and sensor 480 may be located on the back side of the mobile device 400. [ The camera may capture still images and / or video associated with the captured pitch-corrected vocals.

모바일 디바이스(400)는 또한, 802.11b/g 통신 디바이스 및/또는 Bluetooth™ 통신 디바이스(488)와 같은 하나 이상의 무선 통신 서브시스템들을 포함할 수 있다. 다른 802.x 통신 프로토콜들(예를 들어, WiMAX, Wi-Fi, 3G), 코드 분할 다중 접속(CDMA), 모바일 통신용 범용 시스템(GSM), 향상된 데이터 GSM 환경(EDGE) 등을 포함하는 다른 통신 프로토콜들이 또한 지원될 수 있다. 다른 컴퓨팅 디바이스들, 예를 들어, 다른 통신 디바이스들(400), 네트워크 액세스 디바이스들, 개인용 컴퓨터, 프린터, 또는 데이터를 수신 및/또는 송신할 수 있는 다른 프로세싱 디바이스들에 대한 유선 접속을 설정하기 위해, 예를 들어, 범용 직렬 버스(USB) 포트, 또는 도킹 포트 또는 몇몇 다른 유선 포트 접속과 같은 포트 디바이스(490)가 포함되고 이용될 수 있다. 포트 디바이스(490)는 또한, 모바일 디바이스(400)가, 예를 들어, TCP/IP, HTTP, UDP 및 임의의 다른 공지된 프로토콜과 같은 하나 이상의 프로토콜들을 이용하여 호스트 디바이스와 동기화하도록 허용할 수 있다.The mobile device 400 may also include one or more wireless communication subsystems, such as an 802.11b / g communication device and / or a Bluetooth communication device 488. Other communications, including other 802.x communication protocols (e.g., WiMAX, Wi-Fi, 3G), Code Division Multiple Access (CDMA), Universal System for Mobile Communications (GSM), Enhanced Data GSM Environment Protocols can also be supported. To establish a wired connection to other computing devices, e.g., other communication devices 400, network access devices, personal computers, printers, or other processing devices capable of receiving and / or transmitting data , A port device 490, such as a universal serial bus (USB) port, or a docking port or some other wired port connection, may be included and utilized. The port device 490 may also allow the mobile device 400 to synchronize with the host device using one or more protocols such as, for example, TCP / IP, HTTP, UDP, and any other known protocol .

도 5는, 본 명세서의 기능 설명들에 따라 사용자 인터페이스 코드, 피치 보정 코드, 오디오 렌더링 파이프라인 및 재생 코드로 프로그래밍된 모바일 디바이스(400)와 같은 휴대용 컴퓨팅 디바이스의 각각의 인스턴스들(501 및 520)을 도시한다. 디바이스 인스턴스(501)는 보컬 캡쳐 및 연속적인 피치 보정 모드에서 동작하는 한편, 디바이스 인스턴스(520)는 청취자 모드에서 동작한다. 둘 모두는, 컨텐츠 서버(110, 210)에 대해 본 명세서에서 설명되는 기능 및/또는 스토리지를 호스팅하는 서버(512) 또는 서비스 플랫폼과 무선 데이터 전송 및 중재 네트워크들(504)을 통해 통신한다. 캡쳐되고 피치-보정된 보컬 연주들은 (선택적으로) 랩탑 컴퓨터(511)로부터 스트리밍되고 랩탑 컴퓨터(511)에서 가청 렌더링될 수 있다.
5 illustrates an example of each instance 501 and 520 of a portable computing device, such as mobile device 400 programmed with user interface code, pitch correction code, audio rendering pipeline, and playback code, in accordance with the functional descriptions herein. Lt; / RTI > Device instance 501 operates in vocal capture and continuous pitch correction mode while device instance 520 operates in listener mode. Both communicate with the server 512 or the service platform hosting the functions and / or storage described herein with respect to the content servers 110, 210 via the wireless data transfer and arbitration networks 504. The captured and pitch-corrected vocal performances may (optionally) be streamed from the laptop computer 511 and rendered audibly on the laptop computer 511.

다른 Other 실시예들Examples

본 발명(들)이 다양한 실시예들을 참조하여 설명되지만, 이러한 실시예들은 예시적이고 본 발명(들)의 범주가 이들에 제한되지 않음을 이해할 것이다. 다수의 변형들, 변화들, 추가들 및 개선들이 가능하다. 예를 들어, 카라오케-스타일 인터페이스에 따라 캡쳐되는 피치 보정 보컬 연주들이 설명되었지만, 다른 변형들이 인식될 것이다. 또한, 특정한 예시적인 신호 프로세싱 기술들이 특정한 예시적인 애플리케이션들의 상황에서 설명되었지만, 당업자들은, 다른 적절한 신호 프로세싱 기술들 및 효과들을 수용하기 위해, 설명된 기술들을 변형하는 것이 간단함을 인식할 것이다.While the invention (s) have been described with reference to various embodiments, it is to be understood that such embodiments are illustrative and that the scope of the invention (s) is not limited thereto. Many variations, changes, additions, and improvements are possible. For example, pitch correction vocal performances captured in accordance with a karaoke-style interface have been described, but other variations will be appreciated. In addition, while certain exemplary signal processing techniques have been described in the context of particular illustrative applications, those skilled in the art will recognize that it is straightforward to modify the described techniques to accommodate other suitable signal processing techniques and effects.

본 발명에 따른 실시예들은, 명령 시퀀스들 및 소프트웨어의 다른 기능 구성들로서 머신-판독가능 매체에 인코딩되는 컴퓨터 프로그램 제품의 형태를 취할 수 있고 그리고/또는 컴퓨터 프로그램 제품으로서 제공될 수 있으며, 그 다음, 소프트웨어는, 본 명세서에서 설명되는 방법들을 수행하기 위해 (iPhone 핸드헬드, 모바일 또는 휴대용 컴퓨팅 디바이스, 또는 컨텐츠 서버 플랫폼과 같은) 연산 시스템에서 실행될 수 있다. 일반적으로, 머신 판독가능 매체는, 머신(예를 들어, 컴퓨터, 모바일 디바이스 또는 휴대용 컴퓨팅 디바이스의 연산 설비들 등) 뿐만 아니라 정보의 송신에 따른 유형의 스토리지에 의해 판독가능한 형태(예를 들어, 애플리케이션들, 소스 또는 오브젝트 코드, 기능적으로 설명적인 정보 등)로 정보를 인코딩하는 유형의 물품들을 포함할 수 있다. 머신-판독가능 매체는, 자기 저장 매체(예를 들어, 디스크들 및/또는 테이프 스토리지); 광학 저장 매체(예를 들어, CD-ROM, DVD 등); 자기-광학 저장 매체; 판독 전용 메모리(ROM); 랜덤 액세스 메모리(RAM); 소거가능한 프로그래머블 메모리(예를 들어, EPROM 및 EEPROM); 플래쉬 메모리; 또는 전자적 명령들, 동작 시퀀스들, 기능적으로 설명적인 정보 인코딩들 등을 저장하기에 적합한 다른 타입들의 매체를 포함할 수 있지만, 이에 제한되는 것은 아니다. Embodiments in accordance with the present invention can take the form of a computer program product encoded on a machine-readable medium as instruction sequences and other functional configurations of software, and / or can be provided as a computer program product, The software may be executed in an operating system (such as an iPhone handheld, mobile or portable computing device, or a content server platform) to perform the methods described herein. In general, a machine-readable medium may be in a form readable by a type of storage upon transmission of information, such as a machine (e.g., computational facilities of a computer, mobile device or a portable computing device, , Source or object code, functionally descriptive information, and the like). The machine-readable medium can be a magnetic storage medium (e.g., disks and / or tape storage); Optical storage media (e.g., CD-ROM, DVD, etc.); A magneto-optical storage medium; A read only memory (ROM); A random access memory (RAM); Erasable programmable memory (e. G., EPROM and EEPROM); Flash memory; Or other types of media suitable for storing electronic instructions, operational sequences, functionally descriptive informational encodings, and the like.

일반적으로, 단일 인스턴스로서 본 명세서에서 설명되는 컴포넌트들, 동작들 또는 구조들에 대해 복수의 인스턴스들이 제공될 수 있다. 다양한 컴포넌트들, 동작들 및 데이터 스토어들 사이의 경계들은 다소 임의적이고, 특정한 동작들은 특정한 예시적인 구성들의 상황에서 예시된다. 기능의 다른 할당들이 고안되고, 본 발명(들)의 범주에 속할 수 있다. 일반적으로, 예시적인 구성들에서 별개의 컴포넌트들로 제시된 구조들 및 기능은 결합된 구조 또는 컴포넌트로서 구현될 수 있다. 유사하게, 단일 컴포넌트로서 제시된 구조들 및 기능은 별개의 컴포넌트들로서 구현될 수 있다. 이러한 그리고 다른 변형들, 변화들, 추가들 및 개선들은 본 발명(들)의 범주에 속할 수 있다.In general, a plurality of instances may be provided for the components, acts, or structures described herein as a single instance. The boundaries between the various components, operations and data stores are somewhat arbitrary, and the specific operations are illustrated in the context of certain exemplary configurations. Other assignments of functionality are devised and may fall within the scope of the present invention (s). In general, the structures and functions presented in separate components in the exemplary arrangements may be implemented as a combined structure or component. Similarly, the structures and functions presented as a single component may be implemented as separate components. These and other variations, changes, additions, and improvements may fall within the scope of the invention (s).

Claims

Using a portable computing device for vocal performance capture, the portable computing device having a touch screen, a microphone interface and a communication interface,
In response to a user selection on the touch screen, searching through the communication interface for a vocal score synchronized in time with a corresponding accompaniment and lyrics, the vocal score comprising a sequence of target notes for at least a portion of a vocal performance Encoding for accompaniment - and,
At the portable computing device, audibly rendering the accompaniment, at the same time presenting a corresponding portion of the lyrics on the display in time corresponding to the accompaniment,
Capturing a vocal performance of the user over the microphone interface and corresponding to the accompaniment in time;
Storing a dry vocal version of the user's captured vocal performance in the portable computing device, wherein, in accordance with the vocal score, the portable computing device is configured to: And mixing the user's pitch-shifted resultant vocal performance into an audible rendering of the accompaniment;
Applying at least one vocal effect schedule to the user's captured vocal performance, wherein the vocal effect schedule comprises at least one of spectral equalization, audio compression, stereo delay And a computer-readable encoding of parameters and / or parameters for one or more of a reverberation effect
Way.

The method according to claim 1,
The vocal effects schedule may be generated by coding a different effect for application to each portion of the vocal performance of the user corresponding to the accompaniment or lyrics in time
Way.

3. The method according to claim 1 or 2,
The vocal effect schedule may include at least one of
Way.

3. The method according to claim 1 or 2,
The vocal effect schedule may be a vocal effect schedule, which is characterized by a particular artist, song or performance
Way.

5. The method according to any one of claims 1 to 4,
&Lt; / RTI > further comprising the step of: < RTI ID = 0.0 > transacting < / RTI > purchasing or licensing at least a portion of said vocal effects schedule from said portable computing device
Way.

6. The method of claim 5,
Further comprising retrieving or unlocking an existing stored instance of the computer readable encoding of the vocal effects schedule via the communication interface to facilitate the transaction step
Way.

7. The method according to any one of claims 1 to 6,
Arithmetically evaluating a correspondence between the vocal score and at least a portion of the captured vocal performance of the user;
Further comprising awarding to the user a license or access to at least a portion of the vocal effects schedule based on a threshold figure of merit
Way.

8. The method according to any one of claims 1 to 7,
The vocal effect schedule is subsequently applied to the dry vocal version of the user ' s captured vocal performance
Way.

9. The method of claim 8,
Subsequent application to the dry vocals is done in the portable computing device,
The method comprises:
Further comprising audibly re-rendering the user's captured vocal performance at the portable computing device using the applied vocal effect and pitch shifting,
Way.

9. The method of claim 8,
Transmitting the audio signal encoding of the dry vocal version of the user's captured vocal performance to the remote service or server via the communication interface for the subsequent application of the vocal effects schedule at the remote service or server, More included
Way.

11. The method of claim 10,
For the purpose of relating or relating to the transmitted audio signal encoding of the dry vocal, the user's captured vocal performer constitutes only one of a plurality of vocal performances to be combined at the remote service or server. ) &Lt; / RTI >
Way.

12. The method of claim 11,
The open recruitment audition indication instructs the remote service or server to request one or more additional vocal performances to be mixed for one or more other vocalists to be audibly rendered with the user's vocal performance
Way.

12. The method of claim 11,
The request includes:
A listed set of potential other vocalists identified by the user,
A member of an affinity group defined or recognized by the remote service or server,
To one or more of the user's set of social network relations
Way.

12. The method of claim 11,
The open recruitment audition indication may include identifying a second vocal score and a second utterance for at least one additional vocalist position for provision to a responding additional vocalist
Way.

15. The method of claim 14,
Wherein the open recruitment audition indication further identifies a second vocal effect schedule for the at least one additional vocalist position for application to the vocal performance of the responding additional vocalist
Way.

11. The method of claim 10,
Receiving from the remote service or server a version of the user's captured vocal performance to be processed according to the vocal effects schedule;
Further comprising the steps of audibly re-rendering the captured vocal performance of the user to which the vocal effect has been applied in the portable computing device
Way.

17. The method according to any one of claims 1 to 16,
Wherein the vocal effect schedule is applied to the portable computing device in a rendering pipeline including the continuous real time pitch shifting such that the audio rendering includes a scheduled vocal effect
Way.

18. The method according to any one of claims 1 to 17,
Further comprising transaction from the portable computing device authorization to initiate vocal re-capture of a previously selected user selection of vocal performance
Way.

19. The method according to any one of claims 1 to 18,
The method comprising: computing the vocal score and at least a portion of the captured vocal performance of the user in an arithmetical manner; and determining, based on the threshold performance index, an authority to initiate vocal re- To < RTI ID = 0.0 >
Way.

20. The method according to any one of claims 1 to 19,
Wherein the continuous real-time pitch shifting is based on a continuous time-domain estimate of the pitch for the user's captured vocal performance
Way.

21. The method of claim 20,
Wherein the continuous time-domain pitch estimation comprises an operation of a lag-domain periodogram for a current block of sampled signals corresponding to the user's captured vocal performance, The computation of the domain pyramidogram comprises an evaluation of an autocorrelation function for an analysis window of the sampled signal, an average magnitude difference function (AMDF) or a range of lag
Way.

22. The method according to any one of claims 1 to 21,
Further comprising, in response to the user selection, retrieving the accompaniment also via a data communication interface
Way.

23. The method according to any one of claims 1 to 22,
Wherein the accompaniment is stored in storage local to the portable computing device,
Wherein the searching step uses an identifier identifiable from the locally stored accompaniment to identify a vocal score that is synchronizable in time with the corresponding accompaniment and lyrics
Way.

24. The method according to any one of claims 1 to 23,
The accompaniment includes one or both of a musical instrument and a background vocal, rendered in multiple versions,
Wherein the version of the accompaniment that is audibly rendered corresponding to the lyrics is a monophonic scratch version and the version of the accompaniment mixed with the pitch-corrected vocal version of the user's vocal performance is greater than the monophonic scratch version A polyphonic version of high quality or fidelity < RTI ID = 0.0 >
Way.

25. The method according to any one of claims 1 to 24,
The portable computing device
Mobile phones,
A personal digital assistant,
A media player or a gaming device,
A laptop computer, a laptop computer, a tablet computer, or a netbook
Selected from the group of
Way.

26. The method according to any one of claims 1 to 25,
Wherein the display comprises a touch screen
Way.

27. The method according to any one of claims 1 to 26,
Wherein the display is wirelessly coupled to the portable computing device
Way.

12. The method of claim 11,
Further comprising geocoding the transmitted audio signal encoding of the dry vocals
Way.

29. The method of claim 28,
Receiving an audio signal encoding including a second vocal performance captured at a remote device from the remote service or server via the communication interface;
And displaying the geographic origin for the second vocal performance in response to an audible rendering comprising the second vocal performance
Way.

30. The method of claim 29,
Wherein the display of the geographic origin is by a display animation representing a performance originating from a specific location of the earth,
Way.

Using a portable computing device for vocal performance capture, the portable computing device having a touch screen, a microphone interface and a communication interface,
In response to a user selection on the touch screen, searching through the communication interface for a vocal score synchronized in time with a corresponding accompaniment and lyrics, the vocal score comprising a sequence of target notes for at least a portion of a vocal performance Encoding for accompaniment - and,
At the portable computing device, rendering the accompaniment audibly and simultaneously presenting a corresponding portion of the lyrics on the display in time corresponding to the accompaniment,
Capturing a vocal performance of the user over the microphone interface and corresponding to the accompaniment in time;
Transmitting the audio signal encoding of the dry vocal version of the user's captured vocal performance to the remote service or server via the communication interface with the selection of at least one vocal effect schedule to be applied to the user's captured vocal performance Included
Way.

32. The method of claim 31,
And applying the selected vocal effect schedule to the remote service or server
Way.

33. The method according to claim 31 or 32,
Performing successive real-time pitch shifting of at least a portion of the captured vocal performance of the user to match the vocal score at the portable computing device and to provide the user's pitch- Further comprising mixing to an audible rendering
Way.

34. The method according to any one of claims 31 to 33,
Wherein the selected vocal effect schedule includes at least one of settings and / or parameters for one or more of spectral equalization, audio compression, pitch correction, stereo delay, and reverberation effects for application to one or more respective portions of the vocal performance of the user. Including readable encoding
Way.

35. The method according to any one of claims 31 to 34,
The vocal effect schedule may be a feature of a particular artist, song or performance
Way.

36. The method according to any one of claims 31 to 35,
The vocal effect schedule may include at least one of
Way.

37. The method according to any one of claims 31 to 36,
&Lt; / RTI > further comprising the step of: < RTI ID = 0.0 > transacting < / RTI > purchasing or licensing at least a portion of said vocal effects schedule from said portable computing device
Way.

37. The method according to any one of claims 31 to 37,
Further comprising computing the vocal score and at least a portion of the captured vocal performance of the user in a computational manner and awarding a license or access to at least a portion of the vocal effect schedule to the user based on a threshold performance index doing
Way.

39. The method according to any one of claims 31 to 38,
&Lt; / RTI > further comprising the step of < RTI ID = 0.0 > transacting < / RTI > authority from the portable computing device to recapture a selected portion of the vocal performance
Way.

39. The method according to any one of claims 31 to 38,
The method further comprises computing the vocal score and at least a portion of the captured vocal performance of the user in an arithmetical manner and granting the user the right to recapture a selected portion of the vocal performance based on the threshold performance index doing
Way.

As a portable computing device,
A microphone interface, an audio transducer interface, a data communication interface,
A user interface code executable on the portable computing device for capturing a user interface gesture selected for the accompaniment and initiating a search for at least one vocal score corresponding to the selected user interface gesture, (Ii) display of the lyrics on the display at the same time; (iii) capture of the user's vocal performance using the microphone interface; And (iv) further executable to capture a user interface gesture to initiate storage of the dry vocal version of the captured vocal performance in a computer readable storage,
A pitch correction code executable on the portable computing device for continuously and real-time pitch correction of the captured vocal performance to coincide with the vocal score,
A rendering pipeline executable to mix the user's pitch-corrected vocal performance with the audible rendering of the accompaniment of which the user's vocal performance has been captured, the rendering pipeline comprising a vocal effect schedule for the user's captured vocal performance Wherein the vocal effect schedule is selectable by the user and comprises at least one of spectral equalization, audio compression, stereo delay, and reverberation effects for application to one or more portions of the user's vocal performance And / or < / RTI > parameters for one or more of < RTI ID = 0.0 >
Portable computing device.

42. The method of claim 41,
Further comprising the display
Portable computing device.

43. The method of claim 41 or 42,
Wherein the data communication interface provides a wireless interface to the display
Portable computing device.

44. The method according to any one of claims 41 to 43,
Wherein the user interface code is adapted to capture a user interface gesture representing a user selection of a vocal effect schedule and in response to the selection of a vocal performance schedule of the user for a subsequent application of the selected vocal effect schedule at a remote service or server, Further operable to transmit the dry vocal version of the audio signal encoding to the remote service or server via the data communication interface
Portable computing device.

45. The method of claim 44,
The transmission may include a public recall audition indication that the user's captured vocal performer constitutes only one of a plurality of vocal performances to be combined at the remote service or server, in association with or relating to the audio signal encoding of the dry vocal Included
Portable computing device.

46. The method according to any one of claims 41 to 45,
For evaluating a correspondence of the vocal score and at least a portion of the user's captured vocal performance and awarding to the user a license or access to at least a portion of the vocal effects schedule based on a threshold performance index, Lt; RTI ID = 0.0 >
Portable computing device.

46. The method according to any one of claims 41 to 45,
To evaluate a corresponding correspondence of the vocal score and at least a portion of the captured vocal performance of the user and to award to the user authority to recapture a selected portion of the vocal performance based on a threshold performance index on the portable computing device More executable code
Portable computing device.

49. The method according to any one of claims 41 to 47,
Further including local storage,
Wherein the searching comprises: if there are instances of vocal score information in the local storage, checking the instance for available instances from a remote server; and if the instance in the local storage is unavailable or outdated And retrieving from the remote server
Portable computing device.