用 80 行 Javascript 代碼構建自己的語音助手

在本教程中，我們將使用 80 行 JavaScript 代碼在瀏覽器中構建一個虛擬助理（如 Siri 或 Google 助理）。你可以在這裏測試這款應用程序，它將會聽取用戶的語音命令，然後用合成語音進行回覆。

你所需要的是：

Google Chrome （版本 25 以上）
一款文本編輯器

由於 Web Speech API 仍處於試驗階段，該應用程序只能在受支持的瀏覽器上運行：Chrome（版本 25 以上）和 Edge（版本 79 以上）。

我們需要構建哪些組件？

要構建這個 Web 應用程序，我們需要實現四個組件：

一個簡單的用戶界面，用來顯示用戶所說的內容和助理的回覆。
將語音轉換爲文本。
處理文本並執行操作。
將文本轉換爲語音。

用戶界面

第一步就是創建一個簡單的用戶界面，它包含一個按鈕用來觸發助理，一個用於顯示用戶命令和助理響應的 div 、一個用於顯示處理信息的 p 組件。

 複製代碼

conststartBtn =document.createElement("button");
startBtn.innerHTML ="Start listening";
constresult =document.createElement("div");
constprocessing =document.createElement("p");
document.write("<body><h1>My Siri</h1><p>Give it a try with 'hello', 'how are you', 'what's your name', 'what time is it', 'stop', ... </p></body>");
document.body.append(startBtn);
document.body.append(result);
document.body.append(processing);

語音轉文本

我們需要構建一個組件來捕獲語音命令並將其轉換爲文本，以進行進一步處理。在本教程中，我們使用 Web Speech API 的 SpeechRecognition 。由於這個 API 只能在受支持的瀏覽器中使用，我們將顯示警告信息並阻止用戶在不受支持的瀏覽器中看到 Start 按鈕。

 複製代碼

constSpeechRecognition =window.SpeechRecognition ||window.webkitSpeechRecognition;
if(typeofSpeechRecognition ==="undefined") {
startBtn.remove();
result.innerHTML ="<b>Browser does not support Speech API. Please download latest chrome.<b>";
}

我們需要創建一個 SpeechRecognition 的實例，可以設置一組各種屬性來定製語音識別。在這個應用程序中，我們將 continuous 和 interimResults 設置爲 true ，以便實時顯示語音文本。

 複製代碼

constrecognition =newSpeechRecognition();
recognition.continuous =true;
recognition.interimResults =true;

我們添加一個句柄來處理來自語音 API 的 onresult 事件。在這個處理程序中，我們以文本形式顯示用戶的語音命令，並調用函數 process 來執行操作。這個 process 函數將在下一步實現。

 複製代碼

functionprocess(speech_text){
return"....";
}
recognition.onresult =event=>{
constlast = event.results.length -1;
constres = event.results[last];
consttext = res[0].transcript;
if(res.isFinal) {
processing.innerHTML ="processing ....";
constresponse = process(text);
constp =document.createElement("p");
p.innerHTML =`You said:${text}</br>Siri said:${response}`;
processing.innerHTML ="";
result.appendChild(p);
// add text to speech later
}else{
processing.innerHTML =`listening:${text}`;
}
}

我們還需要將用戶界面的 button 與 recognition 對象鏈接起來，以啓動 / 停止語音識別。

 複製代碼

letlistening =false;
toggleBtn =()=>{
if(listening) {
recognition.stop();
startBtn.textContent ="Start listening";
}else{
recognition.start();
startBtn.textContent ="Stop listening";
}
listening = !listening;
};
startBtn.addEventListener("click", toggleBtn);

處理文本並執行操作

在這一步中，我們將構建一個簡單的會話邏輯並處理一些基本操作。助理可以回覆“ hello ”、“ what's your name？ ”、“ how are you？ ”、提供當前時間的信息、“ stop ”聽取或打開一個新的標籤頁來搜索它不能回答的問題。你可以通過使用一些 AI 庫進一步擴展這個 process 函數，使助理更加智能。

 複製代碼

functionprocess(rawText){
// remove space and lowercase text
lettext = rawText.replace(/\s/g,"");
text = text.toLowerCase();
letresponse =null;
switch(text) {
case"hello":
response ="hi, how are you doing?";break;
case"what'syourname":
response ="My name's Siri.";break;
case"howareyou":
response ="I'm good.";break;
case"whattimeisit":
response =newDate().toLocaleTimeString();break;
case"stop":
response ="Bye!!";
toggleBtn();// stop listening
}
if(!response) {
window.open(`http://google.com/search?q=${rawText.replace("search","")}`,"_blank");
return"I found some information for "+ rawText;
}
returnresponse;
}

文本轉語音

在最後一步中，我們使用 Web Speech API 的 speechSynthesis 控制器爲我們的助理提供語音。這個 API 簡單明瞭。

 複製代碼

speechSynthesis.speak(newSpeechSynthesisUtterance(response));

就是這樣！我們只用了 80 行代碼就有了一個很酷的助理。程序的演示可以在這裏找到。

 複製代碼

// UI comp
conststartBtn =document.createElement("button");
startBtn.innerHTML ="Start listening";
constresult =document.createElement("div");
constprocessing =document.createElement("p");
document.write("<body><h1>My Siri</h1><p>Give it a try with 'hello', 'how are you', 'what's your name', 'what time is it', 'stop', ... </p></body>");
document.body.append(startBtn);
document.body.append(result);
document.body.append(processing);
// speech to text
constSpeechRecognition =window.SpeechRecognition ||window.webkitSpeechRecognition;
lettoggleBtn =null;
if(typeofSpeechRecognition ==="undefined") {
startBtn.remove();
result.innerHTML ="<b>Browser does not support Speech API. Please download latest chrome.<b>";
}else{
constrecognition =newSpeechRecognition();
recognition.continuous =true;
recognition.interimResults =true;
recognition.onresult =event=>{
constlast = event.results.length -1;
constres = event.results[last];
consttext = res[0].transcript;
if(res.isFinal) {
processing.innerHTML ="processing ....";
constresponse = process(text);
constp =document.createElement("p");
p.innerHTML =`You said:${text}</br>Siri said:${response}`;
processing.innerHTML ="";
result.appendChild(p);
// text to speech
speechSynthesis.speak(newSpeechSynthesisUtterance(response));
}else{
processing.innerHTML =`listening:${text}`;
}
}
letlistening =false;
toggleBtn =()=>{
if(listening) {
recognition.stop();
startBtn.textContent ="Start listening";
}else{
recognition.start();
startBtn.textContent ="Stop listening";
}
listening = !listening;
};
startBtn.addEventListener("click", toggleBtn);
}
// processor
functionprocess(rawText){
lettext = rawText.replace(/\s/g,"");
text = text.toLowerCase();
letresponse =null;
switch(text) {
case"hello":
response ="hi, how are you doing?";break;
case"what'syourname":
response ="My name's Siri.";break;
case"howareyou":
response ="I'm good.";break;
case"whattimeisit":
response =newDate().toLocaleTimeString();break;
case"stop":
response ="Bye!!";
toggleBtn();
}
if(!response) {
window.open(`http://google.com/search?q=${rawText.replace("search","")}`,"_blank");
return`I found some information for${rawText}`;
}
returnresponse;
}
×
Drag and Drop
The image will be downloaded