RealtimeSTT实时语音转文字库详解

Updated on 2025-01-18 at 13:42; some content may be time-sensitive. Please leave a comment if outdated.

Overview

RealtimeSTT is an efficient, low-latency library for real-time speech-to-text conversion, featuring advanced speech activity detection and wake word activation. Developed by Kolja Beigel, this project aims to support applications requiring quick and accurate speech-to-text conversion. Whether for voice assistants or applications needing precise speech transcription, RealtimeSTT offers excellent performance and ease of use.

RealtimeSTT: Real-time Speech-to-Text Tool, Low-latency Speech Recognition-1


(adsbygoogle=window.adsbygoogle||[]).push({});

 

Feature List

  • Real-time Speech-to-Text: Converts speech to text in real-time, suitable for various application scenarios.
  • Speech Activity Detection: Automatically detects when the user starts and stops speaking to enhance transcription accuracy.
  • Wake Word Activation: Supports wake word functionality, allowing users to activate the system with specific words.
  • Low Latency: Ensures low latency during the speech-to-text process to enhance user experience.
  • Multi-platform Support: Compatible with various operating systems and platforms for easy integration.
  • Open Source Code: Provides complete open-source code for developers to further develop and customize.

 

Usage Guide

Installation Process

  1. Clone the project repository:
   git clone https://github.com/KoljaB/RealtimeSTT.git
  1. Enter the project directory:
   cd RealtimeSTT
  1. Install dependencies:
   pip install -r requirements.txt
  1. (Optional) Install GPU support:
   pip install -r requirements-gpu.txt

Usage Method

Start the Server

  1. Start the speech-to-text server:
   stt-server
  1. After the server starts, wait for the prompt "speak now".

Client Usage

  1. Start the client and connect to the server:
   stt
  1. After the client starts, begin speaking; the system will transcribe speech to text in real-time.

Main Functional Operation Process

Real-time Speech-to-Text

  1. Import the AudioToTextRecorder class:
   from RealtimeSTT import AudioToTextRecorder
  1. Define a function to process text:
   def process_text(text):
print(text)
  1. Start recording and process text:
   if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)

Speech Activity Detection

  1. The system automatically detects when the user starts and stops speaking without additional configuration.

Wake Word Activation

  1. Configure the wake word functionality to allow users to activate the system with specific words. Refer to the project documentation for detailed configuration.

Detailed Operation Example

Typing Out Everything Said

  1. Import AudioToTextRecorder and pyautogui:
   from RealtimeSTT import AudioToTextRecorder
import pyautogui
  1. Define a function to process text:
   def process_text(text):
pyautogui.typewrite(text + " ")
  1. Start recording and process text:
   if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)

相关推荐

暂无评论

发表评论