How to use whisper to handle long video? – API


Hello everyone, I currently want to use Whisper for speech synthesis in videos, but I’ve encountered a few issues.

  1. I am a Plus user, and I’ve used the paid API to split a video into one file per minute and then batch process it using the code below. However, the code inside uses “model=‘whisper-1’”. How can I modify it to use the latest Whisper v3?
  • from openai import OpenAI
  • client = OpenAI()
  • audio_file= open(“/path/to/file/german.mp3”, “rb”)
  • transcript = client.audio.translations.create(
  • model=“whisper-1”,
  • file=audio_file
  • )
  • 
    
  1. Why are some instances of Whisper shared for free on the internet, but I have to pay for using Whisper through the API?

  2. Does the current version of Whisper still have a limitation where each analysis can only process files up to 25MB?

If my 40-minute speech file is over 500MB, does that mean I have to split and process it in batches?

I used to be able to split the file into batches for processing, but there were issues with integrating the timestamps in the batched SRT file. How can I address the timestamp integration issue?

Thank you.

Thank you very much

I have visit GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

But I’m still not quite sure how to install it. Is it possible to install it using Python code? Are there any sample codes I can refer to?

Also, does using Whisper here mean there are no file size limitations?

Additionally, how do I find the names of different model types to call in the code, such as base, medium, large, and the latest V3?

Thank you.

It means you need to encode it to a voice format that doesn’t waste so much data.

Whisper is open source, meaning that it can be used or recoded by anyone, and is light enough to be run on a 4GB GPU or slowly on CPU.

A person could, for example, use cloud Google Tensor processor ASICs and transcribe 50x faster than OpenAI can.

https://huggingface.co/spaces/sanchit-gandhi/whisper-jax, or make an API for it.

Other varieties you can run on your own good hardware can offer by-the-word timestamps or be oriented towards video transcriptions.

OpenAI API runs whisper-v2-large, but could be v3-upgraded without you knowing, as the newly released model is the same size.



1 Like

To install whisper 3

pip install git+https://github.com/openai/whisper.git 

You also need ffmpeg installed on your system


# macos

brew install ffmpeg

# windows using chocolatey

choco install ffmpeg

# windows using scoop

scoop install ffmpeg

To use whisper using command line in your terminal

whisper audio1.mp3 audio2.mp3 --model medium

Thank you for your reply. So, are you saying that it’s possible to convert audio files into node.js format?

Thank you for your reply. So, are you saying that I can simply open the terminal and install it directly? I’m using Windows 11 with Python 3.10.6, but I don’t have a dedicated GPU, only a CPU.

powershellCopy code

PS C:\User\XXX> python --version

Then, can I directly install the program you provided?

bashCopy code

pip install git+https://github.com/openai/whisper.git

For this line, I have some questions:

bashCopy code

whisper audio1.mp3 audio2.mp3 --model medium

Can it process two audio files at once? And it seems to use “–model medium,” not “large V3.” Is that correct? Thank you.

There is no “node.js” format.

You know how mp3 can take a CD and make it 1/10th the size? That’s 25 year old technology now. Opus, as I gave an example of, has codecs optimized for voice, and by limiting the input to just phone call quality where voice audio lives, you can even improve the transcription.

With the many questions you have without a good foundation, starting with the OpenAI services would be a good start, although it doesn’t timestamp or make subtitle files.

Yes, you can install it from the terminal. I have a very old mac and it can translate/transcribe audio files but of course it takes very long time and I cannot do almost real-time transcription. But it’s free, so I cannot complain lol.

You may need to update your python version as written in the github page. Then you can install it directly. Refer to the github page for complete installation procedure including if you hit a snag.

Yes, you can process more than one file at once. You are already using whisper 3 so just using “large” is enough. But like I mentioned previously, if you have a not so good system, maybe start using “tiny” first.

If you want to use this inside node.js application, just use exec:

import { exec } from 'child_process'

const command = `whisper './${filepath}' --language ${language} --temperature ${temperature} --model ${model} --output_dir '${outputDir}'`

exec(command, (error, stdout, stderr) => {
                
                if (error) {
                    console.log(error.message)
                    return
                }

               if (stderr) {
                    console.log(stderr)
                    return
                }

               console.log(stdout)
                
            })





Source link

Leave a Comment