Google Speech Recognition HTTP API – Tutorial with Live Demo .. !

Convert to Audio Formats accepted by Google : 

For the demonstration, I had recorded some audio on my phone and converted them to FLAC format and LINEAR PCM 16bit format for sending to the API.

For converting the audio to FLAC format i used this online tool – CONVERT TO FLAC

For converting the audio to LINEAR16 format i used the Audacity tool, This is quite a popular tool and is free to use. 

Encode the file contents to send in HTTP request : 

Once you get these files converted to the above mentioned formats, next to send it over the internet in the HTTP request, we need to convert the binary data in the files to Base64 encoded format. Which essentially converts the binary data to normal text or ASCII format which can be easily transferred in the body of a HTTP request.

I used this tool Gift of Speed Base64 encoder which can take in the file as input and give us the Base64 encoded version of the content. if the file is larger than 100KB it might take some time to convert.

Testing out the API : 

The documentation of the Google Speech API we are using is located at,

But when you are trying to generate a request from any external application, you need to add the Key provided by Google. You can generate your own authentication key using the Google Developer console. They provide it for free for limited use.

Then the API to generate the request will look like,

Once we have the data ready we have to make a HTTP post request in JSON format as shown below. 

{   "audio": {     "content":  "//8CAP//AAD//wIA/v8C/AgD+/wIA/////wIA/f8EAP3/....dummy.../wMA/f8DAP7/AAAAAAAAAQD//wA"   },   "config": {     "encoding": "LINEAR16",     "sampleRateHertz": 16000,     "languageCode": "en-US"   }  }

In the above POST request body, there are two objects, 

Audio object  : Which provides the content to be converted, You can either provide a link to file hosted on Google cloud storage or provide direct content. In our case we have provided the direct BASE64 encoded content here. But its not full audio content, I have just pasted some clipped dummy data for demonstration. 

Config object : Which indicates what type of audio is being sent, its encoding type (FLAC, LINEAR16 etc) and at what sample rate is audio recorded and the language of the audio. 

Thats how simple it is, Just put your audio content and set the parameters as shown above and hit the send request button and Google will reply back to you with the corresponding converted text. It also tells how much confidence it has on the converted Speech to Text data. 

{   "results": [     {       "alternatives": [         {           "transcript": "check check",            "confidence": 0.92078114         }       ]     }   ] }

Here “check check”  transcript in the response is the text version of the audio i sent to the API and Google is 92% confident on the conversion.

For the live demo of how to do this watch the video shown below, 


Also see : 


Leave a Comment