In this article, I will provide a brief overview of the uses of voice transcription technology, including eLearning development and delivery. Currently, most voice transcription depends on human transcribers. However, the use of dictation software and voice transcription is increasing, along with interest in artificial intelligence (AI).
Artificial intelligence and voice transcription are two technologies that are increasingly important for eLearning, and in a certain sense are interdependent. In order to succeed, AI must be able to interpret data in the form of text, in the form of spoken words recorded on audio or video media, and in the form of live speech. Voice transcription must be able to convert spoken words, whether recorded or live, to text. These are challenging goals, and their solution will facilitate many applications—from machine learning and eLearning, to new text and new media creation.
Depending on the transcription product you choose, transcription software can take your audio (live or recorded) and turn it into accurate, editable text. In some cases, you will need to use a good quality voice recorder to capture your interview or your dictated article. You may be able to connect the recorder directly to your computer, or you may be able to take the memory card out of your recorder and insert it into your computer. The transcription software will capture and transcribe your voice and your interviewee’s voice (or the voice of a subject matter expert) from the audio files saved on the recorder or from audio files saved on the memory card.
There are also applications that support dictation from a mobile phone, returning the transcription to the user via email or through online services such as Dropbox.
AI features in transcription software can take those audio files and incorporate information into the transcription to:
- Track who is speaking
- Detect tone of voice
- Provide analytics about the content
Key features to look for when selecting transcription software include:
- Support for audio and video playback
- Transcription into multiple languages
- Ability to send and receive voice notes and transcribed text between team members
- Real-time collaboration
- Removal of "filler" sounds ("um," "ah," "er")
- Integration with word processing and calendaring software
- Ability to search transcriptions
Some additional uses of voice transcription technologies
There are other categories of voice transcription that eLearning designers and developers may find useful. Voice-to-text transcription is important in the medical, legal, and media fields, often done by human transcribers who charge $3 or $4 per minute. Human transcribers include medical scribes and court reporters, along with closed captioners for television and other media. AI-supported transcription is less costly, may be faster, and is improving rapidly in the accuracy of the transcriptions.
Some transcription services are offered online, not necessarily or exclusively by humans. For example, newspaper reporters and magazine writers (including me) sometimes use online services such as Rev.com and others that are a hybrid of software translation and humans. Another choice is Otter.ai, which, as the name implies, is handled by software rather than humans. The quality of Otter.ai’s transcriptions depends in part on the dictation skills of the user, but it is very useful for recording and transcribing Zoom calls. Prices and features vary between these services.
Dictation software is not really intended for use as computer voice control, or for transcription of interviews—the software learns one person’s voice and does not adjust to multiple users. There are a number of choices in this category, and they come in versions designed for different specialties (medical, legal, etc.). Among the best known are Nuance Communications’ Dragon line of products, NaturallySpeaking (multiple versions), and Dragon Anywhere; these are very accurate (again, assuming the user knows how to dictate) and they take time to learn to use to good advantage. I have used Dragon Home, Dragon Professional for Individuals (there is also a Dragon Professional for Groups, which I have not used), and Dragon Anywhere for almost 30 years for books, magazine articles, business documents, and other purposes. If you are willing to pay for the Dragon products (and they do offer pretty frequent "deals"), they are about the best you will find, with up to 90% accuracy out of the box and they get more accurate as you get better with dictation and as the software learns your voice and your manner of speaking. If you prefer, other speech-to-text services are available for free from Google Docs and from the Microsoft and Apple operating systems.