Live transcription Archives - Epiphan Video https://www.epiphan.com/blog/topic/live-transcription/ capture, stream, record Mon, 01 Nov 2021 22:18:04 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 AI speech-to-text accuracy: Is it good enough for your live events? https://www.epiphan.com/blog/ai-speech-to-text-accuracy/ https://www.epiphan.com/blog/ai-speech-to-text-accuracy/#respond Thu, 22 Oct 2020 10:34:52 +0000 https://www.epiphan.com/blog/blog-template-copy/ Professional transcriptionists who work in real time are in short supply and command premium prices. Today’s automatic transcription technology offers a viable alternative for live events – and we have the data on AI speech-to-text accuracy to prove it.

The post AI speech-to-text accuracy: Is it good enough for your live events? appeared first on Epiphan Video.

]]>

Does today’s automatic transcription technology offer a viable alternative to traditional live transcription services? Short answer: Yes. With advancements in speech recognition technology, AI speech-to-text accuracy has reached a level that’s suitable for live events – from conference presentations and corporate meetings to university lectures and church sermons.

By no means is this a groundless conclusion. It’s based on our own research looking into the performance of leading speech recognition APIs to determine their “real-time readiness.” Read on for a breakdown of those results.

Contents

    livescrypt

    Get the best AI speech-to-text accuracy

    Capture the crystal-clear audio that’s essential for accurate AI transcription with Epiphan LiveScrypt, a dedicated automatic transcription device with inputs for professional audio (XLR/TRS) and many more powerful features.

    Discover LiveScrypt

    Methods: Assessing AI speech-to-text accuracy

    We compared three leading speech recognition application programming interfaces (APIs) – Amazon TranscribeGoogle Cloud Speech-to-Text, and IBM Watson Speech to Text – to human transcriptionists on a number of criteria:

    • Accuracy: The rate at which the solution makes mistakes in transcribing uttered words, measured as the Word Error Rate (WER [Transcript, Reference] = [Substitutions + Deletions + Insertions] / Words in Reference).
    • First-hypothesis latency: The time between the utterance of a word and the output of text.
    • Stable-hypothesis latency: The time between the utterance of a word and the output of correct text.
    • Cost: The fee for use of the associated service.

    To evaluate automatic transcription performance, we fed each API over 1,500 sample phrases from a test set made available by Texas Instruments and the Massachusetts Institute of Technology (TIMIT). We compared the results to the reference transcriptions included with the test set and measured latency. Ultimately, we decided against adjusting transcription timings for round-trip time (RTT) since RTT made up a relatively small portion of overall latency in every case.

    To establish a baseline for human transcription performance, we drew and generalized results from multiple academic sources.

    A note about terminology

    By “transcriptionist” we mean a professional who transcribes speech using a computer keyboard versus a stenographer, who would be capable of typing at higher speeds using a stenograph. The corporate, education, and special events markets tend to use transcriptionists because stenographers charge considerably higher rates.

    Regarding the TIMIT test set, the recording of those samples took place in a noise-controlled environment. We normalized the reference transcriptions by converting capital letters to lowercase, removing punctuation, and spelling out numerical terms. Then we calculated the word error rate (WER) for every utterance. Based on the complete test set, for each engine we also calculated a mean WER and WER confidence interval (two-sided, 95% confidence, t-distribution, if you want to get specific).

    Our data set did include some variance since the test phrases were made up of a variety of people speaking at different rates. But this is true to the various speaking rates, pitches, and other speech differences you’d find in real-world settings. None of the speakers were instructed to talk slowly into a microphone to make an accurate transcription more likely. Given all these precautions, we’re confident the amalgamated data is a close reflection of each API’s true accuracy.

    It’s also worth noting that our testing was in English only. English is the most widely used language in the applications we analyzed, which may mean English gets the lion’s share of developer focus. In any case, we suspect there would be only minor variances between languages.

    Results: AI and human transcription compared

    Accuracy (mean WER) First-hypothesis latency (seconds) Stable-hypothesis latency (seconds) Cost per hour (USD)
    Human (generalized) 0.04–0.09 4.2 60–200
    Amazon 0.088 2.956 3.034 1.44
    Google 0.085 0.576 0.738 1.44
    Google (Enhanced) 0.06 0.605 0.761 2.16

    *Recorded January 2020

    It’s important to note that these results reflect the state of each API in January 2020, when testing took place. Performance could only be better if we ran the same tests today since speech recognition technology, as a piece of machine learning, improves over time.

    Conclusion: AI speech-to-text accuracy is comparable to humans

    Each API achieved a level of accuracy and latency suitable for real-time captioning. The latency of Amazon’s API was a bit higher than IBM’s and Google’s engines, but the three are comparable when it comes to accuracy and cost. We also tested each engine for noise resilience (transcription accuracy in the presence of noise) and found that audio equipment quality, microphone placement, and other factors are essential for acceptable performance.

    What does all this mean in practical terms? These APIs are ready for use in live event scenarios – but how can organizations actually leverage them?

    This would require developing:

    • An automatic speech recognition edge agent to capture and stream audio data to the cloud
    • A digital signage platform and agent to receive, render, and display transcriptions
    • A Web portal or mobile application to accommodate users who are seated far from in-room monitors or who have visual impairments or vision loss

    And so on. The other, less burdensome option is to use an off-the-shelf dedicated automatic transcription device.

    LiveScrypt top down

    Accurate, affordable, and automatic live transcription

    Epiphan LiveScrypt converts speech to text in real time for display on monitors and mobile devices during live events, improving accessibility and participant engagement affordably.

    Get product details

    Get the best of automatic transcription technology today

    Powered by Google’s advanced speech recognition technology, LiveScrypt features professional audio inputs (XLR, TRS) so you can capture crystal-clear audio that’s conducive to high AI speech-to-text accuracy. LiveScrypt also includes HDMI and SDI inputs to capture embedded audio, a built-in screen for configuration, and a QR code system for easy streaming, simplifying setup and making for fewer points of failure.

    LiveScrypt diagram

    Visit https://epiphan.com/products/livescrypt to learn more about how our dedicated automatic transcription device can help make your live events more accessible and engaging.

    The post AI speech-to-text accuracy: Is it good enough for your live events? appeared first on Epiphan Video.

    ]]>
    https://www.epiphan.com/blog/ai-speech-to-text-accuracy/feed/ 0
    Real-time captioning: Four options for your live event https://www.epiphan.com/blog/real-time-captioning/ https://www.epiphan.com/blog/real-time-captioning/#respond Fri, 28 Feb 2020 15:17:27 +0000 https://www.epiphan.com/blog/blog-template-copy/ With today’s AI-powered speech-to-text technology, adding real-time transcription to your live event is easier than ever. We'll explore five different ways to transcribe event proceedings in real time.

    The post Real-time captioning: Four options for your live event appeared first on Epiphan Video.

    ]]>

    Live event planners have been facing increasing demand for real-time transcription or captioning. In the past, real-time captioning has been a pricey proposition, requiring organizers to budget for the cost of hiring a transcriptionist. Happily, advances in speech-to-text software and automatic transcription have widened the field, giving organizations a range of choices for adding live captions. But faced with an array of options, how do you decide on a transcription solution? In this post, we’ll run down some of the pros and cons of four different ways to add live captions to your event.

    Four options for real-time captioning

    When it comes down to it, there are four ways to add real-time captioning to your live event:

      1. Hire a transcriptionist

      In the past, hiring a professional transcriptionist was the only option for captioning in real time. This approach involves hiring an individual who listens to proceedings (on site or remotely) and transcribes on the fly.

      There are advantages to human transcriptionists. A human can listen closer to someone who is speaking softly and still be able to discern what they mean, while an AI-based system may not be able to reach the same level of precision. Some medical or legal events may require the transcriptionist to carry certain professional certifications. For example, a certified health documentation specialist credential indicates the transcriptionist understands clinical health terminology and can apply that knowledge to correct commonly confused medical terms to ensure an accurate medical record. In other instances, an experienced professional may be able to parse industry-specific terminology or slang that some automatic transcription solutions may struggle with.

      But human transcriptionists are also highly variable in quality and reliability. Someone transcribing a single 20-minute speech could be highly accurate, but that accuracy rate could change if you’re asking them to transcribe four hours of lectures. Similarly, that transcriptionist could be taken out of commission by inclement weather, unexpected illness or personal emergencies. Finally, not all transcriptionists carry the equipment needed to share captions in real time. In addition to booking someone with gear that can connect to AV equipment, it will likely be on you to find a way to ensure transcripts can be shared with your audience in real-time.

      Price:

      Highly variable, with prices ranging from $90 to $180 an hour, with experienced or credentialed transcriptionists coming in at the higher end of the scale. Transcriptionists may bill you at an overtime rate for longer events, further increasing costs.

      Pros

      • Humans are better at understanding poor quality audio
      • Experienced transcriptionists can parse industry-specific terms, slang, or informal language

      Cons

      • Expensive, especially those with specialized skills
      • Reliability is variable
      • Sharing transcript live requires equipment and know-how
      • Low availability, high demand

      Bottom line:

      While there are definitely places where a human transcriptionist is required, the high price point can be prohibitive. High demand for real-time transcription services only continues to drive that price up, and it may mean a professional transcriptionist isn’t available on the day or time of your event.

      2. Buy a hardware solution

      A relatively new entry to the speech-recognition market, hardware solutions provide a simplified real-time transcription option. A hardware device includes a way to capture audio live, convert that speech to text, and share that transcription with guests. Typically, these devices connect directly to a local audio source to ensure a clean audio feed and also include some kind of standardized video output to share the transcription on a monitor in real time.

      A dedicated hardware solution also removes possible points of failure present in an AI-transcription solution that relies on a computer or mobile device. A dedicated hardware transcriber will not suffer from a blue screen of death, receive unexpected text messages during an important presentation, or require the same ongoing care and maintenance that other solutions might.

      Epiphan LiveScrypt

      A purpose-built hardware solution will also include extra features depending on the hardware developer. LiveScrypt, Epiphan Video’s own dedicated automatic real-time transcription solution, supports transcription in over 20 languages and dialects, and includes additional features like profanity filters and text scaling to ensure visibility on connected monitors.

      These solutions have a higher initial investment cost, expressed as the cost of hardware itself. This cost may be steep for some, but organizations and people in regular need of transcription break even quickly.

      Examples include a college or university intending to caption several lectures a day, or a convention planner who wants to transcribe dozens of speakers at each event they stage. Even after the cost of hardware is accounted for, the per-hour cost of transcription remains far below the cost of professional transcriptionists for those organizations and groups.

      Price:

      Variable. People and organizations buying into a hardware solution will need to pay for the hardware itself as well as the ongoing subscription costs of transcription. However, the cost of these services still remain far below the cost of hiring a transcriptionist , and value-for-money increases the more the hardware is used.

      Pros

      • Affordability
      • Reliability
      • Speed
      • Built-in professional audio connections to ensure high-quality transcription
      • Standardized video output connection to share transcripts live
      • Simple setup

      Cons

      • Initial investment cost

      Bottom line:

      For live events that require real-time captioning or transcription of proceedings, hardware solutions are the most hassle-free option on the market.

      epiphan pearl mini

      Simplified real-time automatic transcription

      Provide real-time transcription at your next live event the easy way, with LiveScrypt.

      Learn more

      3. Build a cloud-based transcription solution

      Services like Google Speech-to-Text, Amazon Transcribe, and IBM Watson Speech to Text all use very similar technology to convert speech into text. In brief, automatic transcription services take a digital audio signal, breaks that signal into smaller segments of sound, and compares those segments (also called phonemes) to an existing database of language sounds. When a match is found, the service then determines what word those phonemes are constructing, and returns a result as text.

      The process typically requires a lot of computing power, which is why these services use cloud computing to deliver quick results. The accuracy of AI transcription services is often comparable to human typists, with the gap between the two narrowing daily.Cloud-based transcription solutions: IBM Watson Speech to Text, Google Speech-to-Text and Amazon TranscribeConsistency and lack of downtime are a couple of the clear benefits of using a cloud-based automatic transcription service. While it’s unreasonable to expect a human to transcribe hours of speech without a break, that kind of task is well-suited for AI-powered speech recognition services. Modern AI-based transcription services can yield word error rates low enough for real-time event captioning.

      The cost of these services is also significantly lower than working with a professional transcriptionist, making it attractive for longer events with many hours of speech to be transcribed and for organizations staging multiple live events throughout a year.

      The low price also means it’s possible to offer real-time captioning from end to end. A conference organizer using a professional transcriptionist may be forced to limit captioning to one or two keynote speeches for budgetary reasons. But for a fraction of that price, an automatic transcription service could caption the entire event – from the keynote presentation to the final word.

      But cloud services also require a degree of computer competency that is beyond many organizations. These services provide a way for digital audio to be converted to text, but a coder is required to develop a program which interacts with that cloud service. That process takes time, and will require testing, patching, and updating as problems emerge.

      You will also need some form of local console that can convert an analog audio signal to a digital one, send that signal to the cloud, and receive your transcription. This can be some kind of personal computer, though the general-purpose nature of these presents a few challenges. This includes unplanned system updates and unknowing bystanders interfering with the transcribing to charge their phones.

      Most personal computers also do not natively have a way to receive professional audio feeds, such as through XLR connections. It’s possible to add that capacity using expansion cards or an external sound card, but this adds complexity to the job. Any computer enthusiast will tell you that care and maintenance of a computer – even a single-purpose one – is an ongoing task.

      Price:

      Among the most affordable options, prices range from  96 an hour for Google Text-to-Speech to $1.44 an hour for Amazon Transcribe. Price can also come down with volume. IBM, for instance, offers discounted rates for users who need to transcribe over 250,000 minutes, 500,000 minutes, or one million minutes of speech.

      You will also need a computer to send audio to the cloud, receive the transcription, and share it with your audience. While some organizations may have that gear handy, building a PC for this purpose increases cost.

      Pros

      • Low cost
      • High reliability
      • Accuracy
      • Speed

      Cons

      • Setup is complex
      • Requires some kind of local interface to use cloud service
      • Computer could be expensive if not already available

      Bottom line:

      Low cost makes this an attractive option, but cloud services still rely on you to find a way to gather audio and share the transcription live. The added complications associated with sourcing a local console capable of doing this may make this option unappealing for people and organizations looking for a simple way to add live transcription to an event.

      4. Find a speech-to-text app

      While phone-based speech-recognition apps have many effective uses, they’re limited by the hardware they’re tied to. Smartphones and tablets are limited by storage and processing capacity, while microphone audio on a smart device can be variable in quality. This may mean their best applications are typically in one-on-one interviews, or small meetings rather than a large lecture, in a hall where the speaker may be far from the transcribing phone.

      App-based solutions are also dependent on the app developer to add functionality and resolve issues. More popular apps will be responsive to user needs with developers rolling out regular updates, but an app developed by an independent firm or individual user could see updates stopped, or be abandoned completely.

      Users will also need a way to share the transcript. Smartphones and tablets capable of using these apps are not typically designed with audiences in mind; sending the transcript to a large screen will require additional setup. Plus, solutions that rely on a smartphone are vulnerable to unexpected phone calls, instant messages, and software updates.

      Epiphan video thumb

      Price:

      Variable. Many apps are free for individual users but require you to pay for a monthly or by-the-minute plan after exceeding a certain number of minutes. Some services have a monthly minutes cap, which could be a dealbreaker for people with a lot of audio that needs transcribing.

      Pros

      • Audio capture usually done natively
      • Simple setup

      Cons

      • Expensive
      • Audio quality is variable, affecting transcription accuracy
      • Limited by phone hardware
      • Support is dependent on app developer
      • Some apps include a minutes cap
      • No easy way to share transcription

      Bottom line:

      Cost remains relatively low and transcription quality is typically fairly high, but difficulties gathering audio and sharing the transcription with a wide audience remain a barrier for live event organizers.

      Simplify your real-time captioning setup

      Only you will be able to determine which of these solutions is best suited for your live event. Smaller events may be able to use a smartphone-based app without problems, while more tech-savvy users may enjoy the idea of building a computer with pro audio connections to use a cloud-based solution. However, the additional functionality and abilities built into hardware solutions mean organizers who are looking to add transcription to their live events on a regular basis should definitely take a long look at a dedicated hardware option.

      LiveScrypt is geared toward real-time transcription for a broad range of events, offering additional features like a profanity filter, support for over 20 languages. LiveScrypt is also supported by Epiphan’s developers and in-house technical support team, ensuring new updates are constantly being produced and problems you encounter are handled by a human being ready to address the issue.

      LiveScrypt has a simple setup process and easy operation, minimizing the technological complexity of your live event.

      Contact our sales team to learn more about how LiveScrypt can benefit your organization or to arrange a live demo.

      The post Real-time captioning: Four options for your live event appeared first on Epiphan Video.

      ]]>
      https://www.epiphan.com/blog/real-time-captioning/feed/ 0
      Automatic transcription is ready for your live events https://www.epiphan.com/blog/automatic-transcription/ https://www.epiphan.com/blog/automatic-transcription/#respond Mon, 03 Feb 2020 08:01:11 +0000 https://www.epiphan.com/blog/blog-template-copy/ Automatic transcription has come a long way – but is it ready for your live event? Find out how AI-driven transcription compares to transcription by humans.

      The post Automatic transcription is ready for your live events appeared first on Epiphan Video.

      ]]>

      A lot can limit what audience members take away from your live event. Maybe in some sections it’s tough to hear what’s being said on stage due to audio issues or chatty table neighbors. And for people who are deaf or hard of hearing, your event might be totally inaccessible. Happily, there’s a solution to these challenges: live transcription. In some cases it’s even a legal requirement. Question is, do you enlist human help, or machine?

      Machine-driven transcription, or automatic transcription, isn’t a new invention. It’s one of many applications for automatic speech recognition (ASR) technology, which has been around for over half a century. ASR technology has come very far over the years. No, transcriptionist hasn’t gone the way of elevator operator or bowling alley pinsetter as a needless human occupation. But with recent advances in artificial intelligence (AI) and machine learning, automatic transcription technology is ready for prime time.

      How automatic transcription works

      Automatic transcription services link the sounds that make up human speech to words in a digital dictionary. When these sounds have multiple possible matches – homophones, for example, or due to unclear speech or audio – the auto transcription software examines the overall context and assigns each possible word a probability, selecting the word it deems the most likely fit. Deep learning algorithms drive this analysis, informed by a broad range of inputs that vary between solutions.

      The same basic process is at work when you interact with Siri, Alexa, Cortana, or Google, only in this case the system outputs its conclusions as text.

      Most automatic transcription solutions on the market today are built for post-production. Some work by having you upload an audio recording. Services of this sort will run your audio file through automatic transcription software and send you the result. Processing typically takes place in the cloud, but local speech-to-text solutions are also available. Of course, post-production solutions like these aren’t suitable for live events, whether it’s a conference, a court hearing, a legislative assembly, a corporate town hall, or a sermon.

      Two ways to transcribe live events

      If your goal is to deliver subtitles in real time, you have two options:

      1. Hire one or more human transcriptionists (to work on-site or remotely)
      2. Use an auto transcription service capable of analyzing speech and outputting subtitles quickly enough to keep pace with speakers.

      Option A is pretty straightforward. Working on-site or from home, human transcriptionists capture what presenters are saying in real time. The tricky part is figuring out how to display the text on a monitor, tablet, or other device. Live transcription is a whole different game from working with pre-recorded audio, so you’ll want someone who has a degree or certification in court reporting or captioning to ensure they can keep up.

      Option B is a bit more complex from a technical standpoint but does offer significant advantages over human-based transcription. You can find live transcription solutions from big names like Google, Amazon, and IBM.

      Live transcription set-up

      On the surface, AI-driven live transcription doesn’t look all that different from human-based transcription. Imagine a speaker on stage delivering a keynote address. The microphone they’re speaking into is connected to a laptop or other device that’s running cloud-based automatic transcription software. Everything the speaker says is projected through the conference hall speaker system but also sent as audio to the cloud. In the cloud, natural language processing technology matches the various sounds with words in a digital dictionary. The software then sends back the text to display on a monitor so anyone can follow along. The data the solution uploads and downloads is tiny, so all of this happens very quickly.

      Subtitles and captions: What’s the difference?

      It’s important to note that many automatic transcription solutions generate subtitles rather than captions. A lot of people use these terms interchangeably, but they do refer to slightly different things. Subtitles provide a text alternative for speech or dialogue, whether it’s a translation or in the same language. Closed captions convert speech and dialogue into text but also background music and sound effects (e.g., a phone or a doorbell ringing).

      Why does this matter? It could factor in if your live event must adhere to a set of accessibility standards. For instance, in the United States there’s the Web Content Accessibility Guidelines (WCAG), Americans with Disabilities Act (ADA), or Section 508 of the Rehabilitation Act. Some standards distinguish between subtitles and closed captions, requiring the latter to give people who are deaf or hard of hearing a fuller experience. This might not be an issue for a conference, sermon, or any other event where there’s a single person delivering a presentation or speaking. But it’s something to investigate if a mandate is driving your interest in live transcription.

      Automatic transcription versus human transcription

      Like many things, there’s a bit of give and take when deciding between human and AI-driven transcription. Yes, humans are still better at some things. We’ve all dealt with self-checkout machines that insist there’s an item in the bagging area when there’s no item in sight, only to be rescued by a dutiful (and very human) self-checkout attendant. But machines often win out when it comes to core business concerns like cost and convenience.

      We’ll compare human and auto transcription on five key criteria:

      1. Accuracy
      2. Cost
      3. Convenience
      4. Consistency
      5. Privacy

      1. Accuracy

      Research suggests human transcription accuracy is around 95 percent. That’s one mistake in every 20 words transcribed. Speech recognition researchers are aiming for an error rate that’s on par.

      Both Microsoft and IBM claim to have met a level of accuracy nearing this with their own speech-to-text solutions. But AI-based transcription doesn’t always fare so well outside the ideal conditions of a corporate laboratory. Background noise, poor acoustics, heavy accents and dialects, specialized vocabulary, and subpar recording equipment can all hamper the accuracy of automatic transcription. In truly unfavorable conditions you might end up with “word salad”, puzzling (or drawing laughter from) anyone in the audience who’s following along.

      Humans tend to do better at transcription particularly when multiple speakers are involved. Machines struggle with this, which may or may not be an issue depending on the nature of your event. (But machines are closing the gap in this regard; see, for example, Google’s AI speaker diarization technology, which will make live automatic transcription of panel discussions and other multi-participant formats possible.)

      Don’t discount automatic transcription just yet. Thanks to the deep neural networks that power speech recognition technology, machine-driven transcription is improving by the day. Some solutions you can prime before an event to more accurately interpret a specific speaker, potentially dealing with difficult accents or dialects more effectively than a human transcriptionist. With others, it’s possible to add words and terms to the solution’s dictionary to aid recognition. This feature is invaluable for events that feature specialized language and jargon, such as a conference for engineers or medical practitioners. It’s even possible to improve transcription accuracy of industry-specific terms by identifying North American Industry Classification System (NAICS) code lists.

      AI’s accuracy edge doesn’t end there. Recall that speech recognition solutions analyze context to help resolve word use ambiguity. Machine-driven live transcription software can make corrections on the fly as a speaker finishes a thought (at the same time giving the system more context to work with). Humans certainly aren’t immune to mixing up homophones or similar sounding words; we may even be more likely to do so when the pressure is on to keep up with speakers. The difference is that human transcriptionists don’t have time to fix these mistakes – unless they’re willing to risk falling behind.

      2. Cost

      Live events can be expensive affairs. The costs of venue rental, catering, and travel and accommodations for guest speakers can leave little in the budget for much else. This can present problems if you’d like to (or must) provide live subtitles for audience members.

      Human transcriptionist pay rates and pay models vary. Some transcription services charge by the minute, others by the hour. Transcriptionists who can keep up with live speakers will command a higher price than those who work with audio files or videos. Travel expenses might factor in if the transcriptionist isn’t local and needs to be on site. Fees can also be tied to on-site time rather than transcription time, in which case you’re paying the transcriptionist even when the show isn’t on (e.g., during lunch and networking breaks). And if a session runs long? That’s right: overtime fees.

      Whatever the case, transcription fees can really climb when you’re relying on human help, especially when your event takes place over multiple days or includes sessions that run in parallel. When budgets are tight, organizations sometimes have to limit subtitles to select speeches or sessions. This can put event planners in an uncomfortable position, as guest speakers may wonder why their talk isn’t important enough to ensure it’s accessible to everyone.

      An automatic transcription solution can help you avoid issues like these. AI-driven services still charge transcription fees, but these are significantly lower than the average pay rate for a human. You can run the service only when there’s transcribing to do. And with the lower cost of AI-based transcription, it’s less likely you’ll have to pick and choose which sessions will feature subtitling. The potential savings are even more impressive if you hold or produce multiple events a year.

      3. Convenience

      It’s not always possible to bring in human help for live captioning or subtitling. Maybe you’ve scheduled a meeting with short notice and you’d really like to send participants away with a transcript for review, or it’s Sunday and the volunteer who usually subtitles your sermons is swaddled in bed with a nasty head cold. Perhaps there are other conferences happening at the same time as yours and no transcriptionist with the right skill set is available. And what happens if the transcriptionist you hired can’t make your event because they’re sick or their flight gets delayed to the next day?

      No need to worry about any of this with AI-based transcription. Machines don’t have busy professional lives like people do. At a moment’s notice, you can set up your automatic transcription service and it’ll do its thing. You can test it before the event to gauge accuracy, which is difficult to do with humans (and potentially costly). You can even customize it to recognize industry-specific words.

      Automatic transcription services are more flexible as well. Many support multiple languages, eliminating the need to search for a transcriptionist with the right knowledge.

      4. Consistency

      Transcription ability varies widely between people (a matter of experience, most often). Performance can vary in the same individual, too – for example, if the person you hired slept poorly the night before your event.

      This variability is cause for concern. Will the person you hired (or their replacement) be up for the task? Will they be at their best on event day? Are they familiar enough with the subject matter? No such trouble with automatic transcription services. Of course, environmental factors like background noise and the quality of the AV equipment you’re using will affect the software’s ability to transcribe speech. But with these things controlled, you’ll get consistent transcription from one event to the next.

      5. Privacy

      Transcripts are great for anyone who missed the big meeting and a convenient reference for anyone who was there. But what if that meeting included discussions about unpatented technology or other company secrets? No business wants outsiders privy to such things, but it can’t be avoided if you’re bringing in an external transcriptionist to caption or subtitle the event. Non-disclosure agreements are a thing, though you can never be too careful; leaks happen all the time.

      Opting for an automatic transcription service will reduce such privacy risks. It won’t eliminate privacy risks necessarily, since many send audio to the cloud for processing. The risk of a breach is much lower, in any case, which makes AI-driven transcription the way to go for subtitling private events.

      Get the best of automatic transcription today

      Automatic transcription is a feasible alternative for live subtitling conferences, meetings, and other events – under the right conditions. Epiphan LiveScrypt makes it easier than ever to get those conditions just right. Powered by Google’s advanced speech recognition technology, LiveScrypt features professional audio inputs (XLR, TRS) so you can capture crystal-clear audio conducive to accurate AI transcription. Our automatic transcription solution also includes HDMI and SDI inputs, a built-in screen for configuration, and a QR code system for easy streaming. These simplify setup for auto transcription and make for fewer points of failure.

      LiveScrypt diagram

      Learn more about how Epiphan LiveScrypt can help you maximize the benefits of today’s automatic transcription technology. And if you have any questions, just ask our product specialists.

      The post Automatic transcription is ready for your live events appeared first on Epiphan Video.

      ]]>
      https://www.epiphan.com/blog/automatic-transcription/feed/ 0