Integrate Azure Cognitive Services Speech to Text to your React.JS App


If Azure Cognitive Services has text to speech, the good thing about it is that it also has speech to text feature. On this blog post, we'll incorporate that feature to your React.JS application as well.

Aug. 31, 2022
Mark Deanil Vicente - dnilvincent.net
Introduction

Speech-to-text or also known as Speech recognition enables real-time and batch transcription of audio or voice into text. Depending on the reference text input, speech to text also enables real-time pronunciation assessment and gives speakers feedback on the accuracy and fluency of spoken audio or voice.

Below are the known advantages of this feature:

  • * Ease of Communication - No more or lessen the illegible of typing or writing.
  • * Increase Efficiency & Less Paperwork
  • * Can Produce Faster Documents
  • * It Solves Inefficiencies and Reduces Wasted Time
  • * Flexibility to Share Across Different Devices and more

There are also some common disadvantages like there could be background noise interference, not all voice recognition software won't always put the words on the screen completely accurately that's why it's important to see the generated text and fix them.

The good thing about the Azure Cognitive Services Speech to Text, it's way smarter to lessen those common issues in voice recognition

To learn more, you can visit the official documentation from Microsoft Docs Speech-to-Text Cognitive Services

On this blog, we'll integrate the Speech-to-Text to our React.JS Application.

FTF (First Things First)
Prerequisites

1. Machine with your text editor/IDE

2. Microsoft Azure Account (Try it for free)

3. React.JS Application

1. Create Cognitive Services in Azure Portal

1.1 Create resource in Azure Portal. (Make sure you already have subscription whether free or paid in Azure)

Mark Deanil Vicente - dnilvincent.net

Below is a sample. Click the "Create" then once the creation is done, click the button "Go to resource"

Mark Deanil Vicente - dnilvincent.net

1.2 click the "Click here to manage keys" to navigate to the key section.

Mark Deanil Vicente - dnilvincent.net

1.3 Save the keys because we are going to need them on our react.js configuration.

Mark Deanil Vicente - dnilvincent.net
2. Install Package in React.JS App

2.1 npm i microsoft-cognitiveservices-speech-sdk

https://www.npmjs.com/package/microsoft-cognitiveservices-speech-sdk
3. Configure the React.JS App

On my sample, I'm using the React.JS project template. For this demo, I'm overwriting the App.tsx file.

3.1 Import the package below.

        const sdk = require("microsoft-cognitiveservices-speech-sdk");
      

3.2 Inside your component function, configure the speech sdk

        const key = "YOUR_KEY_FROM_YOUR_COGNITIVE_SERVICE";
const region = "westus2";
const speechConfig = sdk.SpeechConfig.fromSubscription(key, region);

// Create the speech recognizer.
let speechRecognizer = new sdk.SpeechRecognizer(speechConfig);
      

3.3 Create a function that invoke the recognizeOnceAsync from the SDK's Synthesizer

        const test = () => {
    speechRecognizer.recognizeOnceAsync((result: any) => {
      switch (result.reason) {
        case sdk.ResultReason.RecognizedSpeech:
          console.log(result.text)
          break;
        case sdk.ResultReason.NoMatch:
          console.log("NOMATCH: Speech could not be recognized.");
          break;
        case sdk.ResultReason.Canceled:
          const cancellation = sdk.CancellationDetails.fromResult(result);
          console.log(`CANCELED: Reason=${cancellation.reason}`);

          if (cancellation.reason === sdk.CancellationReason.Error) {
            console.log(`CANCELED: ErrorCode=${cancellation.ErrorCode}`);
            console.log(`CANCELED: ErrorDetails=${cancellation.errorDetails}`);
            console.log(
              "CANCELED: Did you set the speech resource key and region values?"
            );
          }
          break;
      }
      speechRecognizer.close();
    });
};
      

3.4 Call the function inside the useEffect or create an UI that has a button and input that trigger the function.

Below is a sample code that you paste and play around on your machine.

        import { useState } from "react";
const sdk = require("microsoft-cognitiveservices-speech-sdk");

function App() {
  const key = "YOUR_KEY_FROM_YOUR_COGNITIVE_SERVICE";
  const region = "westus2";
  const speechConfig = sdk.SpeechConfig.fromSubscription(key, region);

  // Create the speech recognizer.
  let speechRecognizer = new sdk.SpeechRecognizer(speechConfig);

  const [text, setText] = useState("");
  const [loading, setLoading] = useState(false);

  const test = () => {
    setLoading(true);
    speechRecognizer.recognizeOnceAsync((result: any) => {
      switch (result.reason) {
        case sdk.ResultReason.RecognizedSpeech:
          setText(result.text);
          break;
        case sdk.ResultReason.NoMatch:
          console.log("NOMATCH: Speech could not be recognized.");
          break;
        case sdk.ResultReason.Canceled:
          const cancellation = sdk.CancellationDetails.fromResult(result);
          console.log(`CANCELED: Reason=${cancellation.reason}`);

          if (cancellation.reason === sdk.CancellationReason.Error) {
            console.log(`CANCELED: ErrorCode=${cancellation.ErrorCode}`);
            console.log(`CANCELED: ErrorDetails=${cancellation.errorDetails}`);
            console.log(
              "CANCELED: Did you set the speech resource key and region values?"
            );
          }
          break;
      }
      speechRecognizer.close();
      setLoading(false);
    });
  };

  return (
    <div className="App">
      {loading ? (
        "Listening..."
      ) : (
        <button onClick={test}>Run Speech To Text</button>
      )}
      <br />
      <textarea defaultValue={text} />
    </div>
  );
}

export default App;
      
Additional Info

Here's the official documentation from Microsoft where you can learn more about real-time speech-to-text, batch speech-to-text and the custom speech. Speech-to-text documentation

Speech-to-text FAQ

If you have some questions or comments, please drop it below 👇 :)

Buy Me A Tea