もとからやろうと思っていた ChatGPT を使うという機能と、自前の音楽を再生させるスキルと、2つ実装してみた。

1. ChatGPT を使う

k4l1sh/alexa-gpt: A tutorial on how to use ChatGPT in Alexa がセットアップのhowtoとしてわかりやすかった。唯一つまったのはchatgpt の有料プランに入ってても、APIは別途課金が必要というところ。

使ってみた感想としては、alexa の聞き取り能力に制約されるので、chatgpt の iOS app で直接聞き取ってもらう時よりかなりストレスフル。でも、簡単なことなら役立つのでそれなりに使っている。

2. 自前の音楽を再生させる

自前の mp3 をサーバにあげて、それを特定の命令で再生させるだけの機能。

あまりうまく完成していないが、Intent の JSON は以下。基本聞き取り能力がダメなので、曲の指定は t とか h とかにしている。日本語も設定しているが、日本語で聞き取ってくれたことはない。InvocationName は日本語で聞き取ってくれているので、なんで聞き取ってくれないのかはわからない。基本、自分で作曲とか譜面化とかしない限り、著作権やDRMに引っかかることが多く、あまりニーズがないんだろうなという気はする。（僕は昔から持っているmp3が大量にある）

{
    "interactionModel": {
        "languageModel": {
            "invocationName": "イヤーワーム",
            "intents": [
                {
                    "name": "ChooseSongIntent",
                    "slots": [
                        {
                            "name": "SongChoice",
                            "type": "SONG_OPTIONS"
                        }
                    ],
                    "samples": [
                        "Play {SongChoice}",
                        "I want to hear {SongChoice}",
                        "Play song {SongChoice}",
                        "Start {SongChoice}",
                        "Play the song {SongChoice}",
                        "{SongChoice} 再生",
                        "{SongChoice} で"
                    ]
                },
                {
                    "name": "AMAZON.HelpIntent",
                    "samples": []
                },
                {
                    "name": "AMAZON.StopIntent",
                    "samples": []
                },
                {
                    "name": "AMAZON.CancelIntent",
                    "samples": []
                },
                {
                    "name": "AMAZON.PauseIntent",
                    "samples": []
                },
                {
                    "name": "AMAZON.ResumeIntent",
                    "samples": []
                },
                {
                    "name": "AMAZON.LoopOffIntent",
                    "samples": []
                },
                {
                    "name": "AMAZON.LoopOnIntent",
                    "samples": []
                },
                {
                    "name": "AMAZON.NavigateHomeIntent",
                    "samples": []
                }
            ],
            "types": [
                {
                    "name": "SONG_OPTIONS",
                    "values": [
                        {
                            "name": {
                                "value": "t",
                                "synonyms": [
                                    "ティー",
                                    "トウキョー"
                                ]
                            }
                        },
                        {
                            "name": {
                                "value": "h",
                                "synonyms": [
                                    "エイチ",
                                ]
                            }
                        }
                    ]
                }
            ]
        }
    }
}

そして、node.js で作った index.js は以下。

const Alexa = require('ask-sdk-core');
const urlHead = ＜自分のサーバのURL＞;

// Helper function to get the playback information from session attributes
function getPlaybackInfo(handlerInput) {
    const sessionAttributes = handlerInput.attributesManager.getSessionAttributes();
    if (!sessionAttributes.playbackInfo) {
        sessionAttributes.playbackInfo = {
            token: 'default-audio-token',
            offsetInMilliseconds: 0,
            inPlaybackSession: false,
            hasPreviousPlaybackSession: false,
            songUrl: ""  // Placeholder for the song URL
        };
        handlerInput.attributesManager.setSessionAttributes(sessionAttributes);
    }
    return sessionAttributes.playbackInfo;
}

// Helper function to set the playback information to session attributes
function setPlaybackInfo(handlerInput, playbackInfoObject) {
    const sessionAttributes = handlerInput.attributesManager.getSessionAttributes();
    sessionAttributes.playbackInfo = playbackInfoObject;
    handlerInput.attributesManager.setSessionAttributes(sessionAttributes);
}

// LaunchRequestHandler to handle the initial invocation of the skill
const LaunchRequestHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'LaunchRequest';
    },
    handle(handlerInput) {
        const speakOutput = '準備ができました！';
        return handlerInput.responseBuilder
            .speak(speakOutput)
            .reprompt('say play t or play h')
            .getResponse();
    }
};

// ChooseSongIntentHandler to handle the selection of a song
const ChooseSongIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest' &&
               Alexa.getIntentName(handlerInput.requestEnvelope) === 'ChooseSongIntent';
    },
    handle(handlerInput) {
        try {
            const songChoice = handlerInput.requestEnvelope.request.intent.slots.SongChoice.value;
            let playbackInfo = getPlaybackInfo(handlerInput);
            let songUrl;

            if (songChoice === "t") {
                songUrl = "{$urlHead}tsr_alexa.mp3";
            } else if (songChoice === "h") {
                songUrl = "{$urlHead}hc_alexa.mp3";
            } else {
                return handlerInput.responseBuilder
                    .speak("すみません、その曲はわかりません。t か h で選んでください")
                    .reprompt("t か h で選んでください")
                    .getResponse();
            }

            playbackInfo.songUrl = songUrl;
            setPlaybackInfo(handlerInput, playbackInfo);

            const speakOutput = `選択された曲を再生します。`;
            return handlerInput.responseBuilder
                .speak(speakOutput)
                .addAudioPlayerPlayDirective('REPLACE_ALL', songUrl, playbackInfo.token, 0)
                .getResponse();
        } catch (error) {
            return handlerInput.responseBuilder
                .speak(`エラーが発生しました: ${error.message}`)
                .getResponse();
        }
    }
};

// Handler for stopping the audio
const CancelAndStopIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest' &&
               (Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.CancelIntent' ||
                Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.StopIntent');
    },
    handle(handlerInput) {
        const speakOutput = '再生を停止します。';
        return handlerInput.responseBuilder
            .speak(speakOutput)
            .addAudioPlayerStopDirective()
            .withShouldEndSession(true)
            .getResponse();
    }
};

// Handler for pausing the audio
const PauseIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest' &&
               Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.PauseIntent';
    },
    handle(handlerInput) {
        const speakOutput = '再生を一時停止します。';
        return handlerInput.responseBuilder
            .speak(speakOutput)
            .addAudioPlayerStopDirective()
            .withShouldEndSession(false)
            .getResponse();
    }
};

// Handler for resuming the audio
const ResumeIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest' &&
               Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.ResumeIntent';
    },
    handle(handlerInput) {
        const playbackInfo = getPlaybackInfo(handlerInput);
        const speakOutput = '再生を再開します。';
        return handlerInput.responseBuilder
            .speak(speakOutput)
            .addAudioPlayerPlayDirective('REPLACE_ALL', playbackInfo.songUrl, playbackInfo.token, playbackInfo.offsetInMilliseconds)
            .withShouldEndSession(false)
            .getResponse();
    }
};

// Handler for the LoopOnIntent
const LoopOnIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest' &&
               Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.LoopOnIntent';
    },
    handle(handlerInput) {
        let playbackInfo = getPlaybackInfo(handlerInput);
        playbackInfo.isLooping = true;
        setPlaybackInfo(handlerInput, playbackInfo);

        const speakOutput = 'ループ再生をオンにしました。';
        return handlerInput.responseBuilder
            .speak(speakOutput)
            .getResponse();
    }
};

// Handler for the LoopOffIntent
const LoopOffIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest' &&
               Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.LoopOffIntent';
    },
    handle(handlerInput) {
        let playbackInfo = getPlaybackInfo(handlerInput);
        playbackInfo.isLooping = false;
        setPlaybackInfo(handlerInput, playbackInfo);

        const speakOutput = 'ループ再生をオフにしました。';
        return handlerInput.responseBuilder
            .speak(speakOutput)
            .getResponse();
    }
};

// ErrorHandler to catch any unexpected errors
const ErrorHandler = {
    canHandle() {
        return true;
    },
    handle(handlerInput, error) {
        console.error(`Error handled: ${error.message}`);
        return handlerInput.responseBuilder
            .speak('申し訳ありませんが、問題が発生しました。')
            .getResponse();
    }
};

// Skill builder configuration
exports.handler = Alexa.SkillBuilders.custom()
    .addRequestHandlers(
        LaunchRequestHandler,
        ChooseSongIntentHandler,
        CancelAndStopIntentHandler,
        PauseIntentHandler,
        ResumeIntentHandler
    )
    .addErrorHandlers(ErrorHandler)
    .lambda();

本当はループとか再生のいろんな操作をしたかったのだが、今更ながらAlexaのAudio Playerをきちんと理解してみる③ 〜複数の曲を扱ってみる〜 - kun432's blog のようにしてみようとしたら、AudioPlayer.PlaybackNearlyFinished が受け取れずに諦めました。テスト機能でそもそもそのイベントが発生していなかったようなので、諦めました。

m3u も対応しているということでしたが、単純に曲のmp3を並べただけの m3u は対応してませんでした。MyVideo をいうサービスに関わっての情報ですが、Each .m3u8 file should contain only a single stream link. という制限がかかっているのだと思うので、ストリーミングのためのサーバーを建てる必要があるのではという感じです。

単に短い（4分以内）の曲を一度再生させるだけなら SSML だけでもいけます。そっちの方が実装は楽です。（参考：Alexa上で音源（MP3）を制御するための知見 [SSML, AudioPlayer] #AlexaSkillsKit - Qiita）

総評

さて、Alexa のスキルストアをみても正直ろくな機能が公開されていません。それは、m3u の件のように仕様の詳細がわかりにくいのと、console.log のような単純なデバッグができない（やりかたがわからない）のと、すでに公開されているアプリのコメントでも書かれていますがそもそもの聞き取り能力が低過ぎて色々なアプリが実現できない状況であることが原因だと思います。

基本機能としてタイマーや時計は便利ですし、単に、おはよう、いってきます、ただいま、を言う相手としては悪くないです。Prime の無料期間には頼んだ曲も一曲目に演奏してくれたのでミュージックプレイヤーとしても優秀でした。しかし、無料期間が終わるとだんだん演奏してくれなくなりました。ストリーミングサービスに慣れた人たちは良いのでしょうが、ポップスにしろクラシックにしろ「今この曲が聞きたい！」という僕のようなタイプは Amazon Music Unlimited に入らざるを得ないのですが、基本的にPCの前に居るので、Alexa を bruetooth スピーカーとして使って音楽を流した方がいいのです。

Amazonのデバイス事業が少なくとも約4兆円の損失を出していたことが判明、Alexaから収益を上げる計画は崩壊 - GIGAZINEということで、上に挙げた聞き取り能力の低さについては Alexa Plus というサブスクサービスでLLMベースのアップデートをするらしく、それは値段と性能次第では契約するかもしれない。スマートホーム家電を買うのは面倒でいろいろ制約が強くなるので、とりあえずインタフェースだけ便利に対応してほしくはある。ラズパイ＋赤外線みたいな方法でね。ADHD用のハックのように「行ってらっしゃい」のついでに戸締りなどの確認をアナウンスで僕にさせるというのがいまのところ良い使い道で、そこで消灯の確認などが自動化できるといい、くらいの意味なので、やはり、ここも現在の機能で十分なのかもしれないけど。