How to use whisper to handle long video? - API

Hello everyone, I currently want to use Whisper for speech synthesis in videos, but I’ve encountered a few issues.

I am a Plus user, and I’ve used the paid API to split a video into one file per minute and then batch process it using the code below. However, the code inside uses “model=‘whisper-1’”. How can I modify it to use the latest Whisper v3?


from openai import OpenAI
client = OpenAI()
audio_file= open(“/path/to/file/german.mp3”, “rb”)
transcript = client.audio.translations.create(
model=“whisper-1”,
file=audio_file
)




Why are some instances of Whisper shared for free on the internet, but I have to pay for using Whisper through the API?


Does the current version of Whisper still have a limitation where each analysis can only process files up to 25MB?


If my 40-minute speech file is over 500MB, does that mean I have to split and process it in batches?
I used to be able to split the file into batches for processing, but there were issues with integrating the timestamps in the batched SRT file. How can I address the timestamp integration issue?
Thank you.




Thank you very much
I have visit GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision
But I’m still not quite sure how to install it. Is it possible to install it using Python code? Are there any sample codes I can refer to?
Also, does using Whisper here mean there are no file size limitations?
Additionally, how do I find the names of different model types to call in the code, such as base, medium, large, and the latest V3?
Thank you.






_j
 


 



November 28, 2023, 7:20am





4






 jason123:

If my 40-minute speech file is over 500MB, does that mean I have to split and process it in batches?


It means you need to encode it to a voice format that doesn’t waste so much data.


For reliable use of ffmpeg, there can be other streams in the input file, like m4a that has mjpeg icons (video), and other metadata that corrupts and wastes space, that must be not passed to the output.

Another thing you can do is recompress with ffmpeg.

I take a 64k stereo mp3 and mash it with OPUS in an OGG container down to 12kbps mono, also using the speech optimizations. Command line is below:

ffmpeg -i audio.mp3 -vn -map_metadata -1 -ac 1 -c:a libopus -b:a 12k -application voip audio.og…


Whisper is open source, meaning that it can be used or recoded by anyone, and is light enough to be run on a 4GB GPU or slowly on CPU.
A person could, for example, use cloud Google Tensor processor ASICs and transcribe 50x faster than OpenAI can.
https://huggingface.co/spaces/sanchit-gandhi/whisper-jax, or make an API for it.


 twitter.com




Sanchit Gandhi

Whisper JAX ⚡️ can now be used as an endpoint - send audio files straight from a Python shell to be transcribed as fast as on the demo!
The only requirement is the lightweight Gradio Client library - everything else is taken care for you (including loading the audio file) 🚀





Other varieties you can run on your own good hardware can offer by-the-word timestamps or be oriented towards video transcriptions.



 GitHub




GitHub - SYSTRAN/faster-whisper: Faster Whisper transcription with CTranslate2
Faster Whisper transcription with CTranslate2. Contribute to SYSTRAN/faster-whisper development by creating an account on GitHub.


OpenAI API runs whisper-v2-large, but could be v3-upgraded without you knowing, as the newly released model is the same size.






1 Like




To install whisper 3
pip install git+https://github.com/openai/whisper.git 

You also need ffmpeg installed on your system

# macos

brew install ffmpeg

# windows using chocolatey

choco install ffmpeg

# windows using scoop

scoop install ffmpeg


To use whisper using command line in your terminal
whisper audio1.mp3 audio2.mp3 --model medium





Thank you for your reply. So, are you saying that it’s possible to convert audio files into node.js format?




Thank you for your reply. So, are you saying that I can simply open the terminal and install it directly? I’m using Windows 11 with Python 3.10.6, but I don’t have a dedicated GPU, only a CPU.
powershellCopy code
PS C:\User\XXX> python --version

Then, can I directly install the program you provided?
bashCopy code
pip install git+https://github.com/openai/whisper.git

For this line, I have some questions:
bashCopy code
whisper audio1.mp3 audio2.mp3 --model medium

Can it process two audio files at once? And it seems to use “–model medium,” not “large V3.” Is that correct? Thank you.






_j
 

 



November 28, 2023, 7:56am





8




There is no “node.js” format.
You know how mp3 can take a CD and make it 1/10th the size? That’s 25 year old technology now. Opus, as I gave an example of, has codecs optimized for voice, and by limiting the input to just phone call quality where voice audio lives, you can even improve the transcription.
With the many questions you have without a good foundation, starting with the OpenAI services would be a good start, although it doesn’t timestamp or make subtitle files.




Yes, you can install it from the terminal. I have a very old mac and it can translate/transcribe audio files but of course it takes very long time and I cannot do almost real-time transcription. But it’s free, so I cannot complain lol.
You may need to update your python version as written in the github page. Then you can install it directly. Refer to the github page for complete installation procedure including if you hit a snag.
Yes, you can process more than one file at once. You are already using whisper 3 so just using “large” is enough. But like I mentioned previously, if you have a not so good system, maybe start using “tiny” first.
If you want to use this inside node.js application, just use exec:
import { exec } from 'child_process'

const command = `whisper './${filepath}' --language ${language} --temperature ${temperature} --model ${model} --output_dir '${outputDir}'`

exec(command, (error, stdout, stderr) => {
                
                if (error) {
                    console.log(error.message)
                    return
                }

               if (stderr) {
                    console.log(stderr)
                    return
                }

               console.log(stdout)
                
            })









Source link

How to use whisper to handle long video? – API

Leave a Comment Cancel reply