Updated on 2025-01-18 at 13:42; some content may be time-sensitive. Please leave a comment if outdated.
Overview
RealtimeSTT is an efficient, low-latency library for real-time speech-to-text conversion, featuring advanced speech activity detection and wake word activation. Developed by Kolja Beigel, this project aims to support applications requiring quick and accurate speech-to-text conversion. Whether for voice assistants or applications needing precise speech transcription, RealtimeSTT offers excellent performance and ease of use.
(adsbygoogle=window.adsbygoogle||[]).push({});
Feature List
- Real-time Speech-to-Text: Converts speech to text in real-time, suitable for various application scenarios.
- Speech Activity Detection: Automatically detects when the user starts and stops speaking to enhance transcription accuracy.
- Wake Word Activation: Supports wake word functionality, allowing users to activate the system with specific words.
- Low Latency: Ensures low latency during the speech-to-text process to enhance user experience.
- Multi-platform Support: Compatible with various operating systems and platforms for easy integration.
- Open Source Code: Provides complete open-source code for developers to further develop and customize.
Usage Guide
Installation Process
- Clone the project repository:
git clone https://github.com/KoljaB/RealtimeSTT.git
- Enter the project directory:
cd RealtimeSTT
- Install dependencies:
pip install -r requirements.txt
- (Optional) Install GPU support:
pip install -r requirements-gpu.txt
Usage Method
Start the Server
- Start the speech-to-text server:
stt-server
- After the server starts, wait for the prompt "speak now".
Client Usage
- Start the client and connect to the server:
stt
- After the client starts, begin speaking; the system will transcribe speech to text in real-time.
Main Functional Operation Process
Real-time Speech-to-Text
- Import the
AudioToTextRecorder
class:
from RealtimeSTT import AudioToTextRecorder
- Define a function to process text:
def process_text(text):
print(text)
- Start recording and process text:
if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)
Speech Activity Detection
- The system automatically detects when the user starts and stops speaking without additional configuration.
Wake Word Activation
- Configure the wake word functionality to allow users to activate the system with specific words. Refer to the project documentation for detailed configuration.
Detailed Operation Example
Typing Out Everything Said
- Import
AudioToTextRecorder
andpyautogui
:
from RealtimeSTT import AudioToTextRecorder
import pyautogui
- Define a function to process text:
def process_text(text):
pyautogui.typewrite(text + " ")
- Start recording and process text:
if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)
暂无评论