top of page
  • Yogi

Software for Dictation - Not what you think



In my post on can a dictation software replace your stenographer/Typist, I had given reasons why a software cannot fully replace the stenographer.


I am super excited to say that I am now WRONG. All thanks to OpenAI’s revolutionary Open Source Speech Recognition model called ‘Whisper’.



HOW CAN ONE USE WHISPER

A simple search for 'how to use Whisper' can be confusing. While Whisper itself can seem intricate for non-techies, several consumer-friendly platforms are built on it.


If you're on a Mac, try macwhisper. For a more universal solution, there's audiopen.ai, compatible with all operating systems.


Thanks to multi-lingual training, it recognizes a vast array of non-English words. I tested it with some region-specific names and places, and it identified most of them with near precision.


Being an AI model, its accuracy and capabilities will only improve over time.



RAW TRANSCRIPTION TO MAGIC

Whisper's transcription is accurate but can be literal. Say "next line", and it'll transcribe it as “next line”.


But, magic happens when you copy this transcription into ChatGPT for correction.


I am sure most of you have used Chat GPT, or heard of its prowess, by now. It has the ability to understand natural language and respond. It can also follow instructions in natural language to execute tasks.


This instruction following ability can be put to use to correct the transcription.


So whenever you see an anomaly in the transcribed content, which otherwise requires your manual correction, just frame the correction into a Rule and say that to ChatGPT in natural language. It will apply this rule for future dictations and allows easy rule modifications—all using everyday language.



SOME WORKFLOW SUGGESTIONS


This part of the blogpost is descriptive. Please bear with me as I explain the use case scenarios. You can skip this heading, if you find it lengthy.


Here are few examples of Rules I gave initially to ChatGPT

  • Rule 1: Replace "Honourable", "Hon." with "Hon'ble".

  • Rule 2: If I say full stop, it is a punctuation. If full stop is already added at that place, remove the word full stop

  • Rule 3: If I say next line or Next paragraph, you should replace by taking it to next paragraph

  • Rule 4: Rupees should be written as Rs. and after the number following the Rupees add /-.

  • Rule 5: All dates should be in DD.MM.YYYY format. Even if the transcript has the whole date.

  • Rule 6: All amounts should be written in numerals, not words, with commas placed as per Indian system of accounting


You can also get very specific to suit your style. For example, in most of my matters, area is denoted in Acres and Guntas. In this regard, I have given the following instruction:

  • Rule 10: In my dictation area is given in Acres and Guntas. Guntas is a local term. Whenever there is acres and guntas, it should be written as "Ac.3-10 Guntas". In this 3 stands for Acres and 10 stands for guntas.


Chat GPT can also implement complicated instructions to completely alter the dictation. For example, I gave the following instructions to make a chronological list of dates.

  • Rule 1: Each date should be in a bullet point

  • Rule 2: The date could be in full DD.MM.YYY format or simply a month and year or just year. If it is in DD.MM.YYY format irrespective how the date is transcribed, it should be written in DD.MM.YYYY format. if just month and year are available, then it should be month in words and year in numerals. If only year, then year in numerals.

  • Rule 3: the list of dates should be arranged in chronological sequence.

  • Rule 4: There could also be dates with NIL date, but they fall between certain date sequence. Nil date sequence should be kept at the correct place chronologically with date as NIL

  • Rule 5: Each bullet list should first contain the date followed by the event narrated for that date

  • Rule 6: In a given date, There could be important points, which should be listed as sub points of that date

  • Rule 7: In case of date range, the first date should be used to put in appropriate place in chronological sequence.

  • Rule 8: In case of clash between dates of only year, or month and exact date, first should be year, followed by month and year and then exact date. For example, I I gave 3 different events one with 2018, then with August 2018 and then with 25.08.2018, they are 3 different events and the order should be 2018, then August 2018 and then 25.08.2018 with respective content

  • Rule 9: Also in the middle of dictation, I could refer to some content of pervious date, I will use the command insert in so and so date, then such insertion should be inserted at appropriate date.

And it followed every instruction exactly to make me a perfect chronological list of dates irrespective of the sequence I dictated in.



THE COST


All three software referred above (audiopen.ai, macwhiper and ChatGPT), have free versions for you to try them out. In most cases, the free software is sufficient to execute the work.


However, if you fully want to implement them for your dictation, I strongly recommend the paid versions which come with greater accuracy.

  • ChatGPT will cost you $20 per month,

  • audiopen.ai will cost you around $150 for an outright purchase.

  • macwhisper will cost you around €20 for an outright purchase.

bottom of page