Dhravani - Manual

Copied to clipboard!

Overview

Dhravani is a web-based application developed under the "Center of Indian Language Data" project for creating speech corpora for Automatic Speech Recognition (ASR). The platform streamlines the creation and management of audio datasets by facilitating recording, managing, and organizing voice recordings with their transcriptions.

Users record audio from provided transcripts, with data being stored in PostgreSQL tables for both transcripts and metadata. Moderators then verify recordings for quality control, after which validated content is transferred to HuggingFace either through manual triggers or scheduled synchronization intervals. This comprehensive workflow ensures high-quality speech data collection and organization.

Installation

Prerequisites

Steps

  1. Clone the repository:
    git clone 
    cd dataset-preparation-tool
  2. Set up the environment:
  3. Configure environment variables:

    Create a .env file in the root directory with the following configuration:

    # Security
    FLASK_SECRET_KEY=your_secure_secret_key
    JWT_SECRET_KEY=${FLASK_SECRET_KEY}  # Defaults to FLASK_SECRET_KEY
    SUPER_ADMIN_PASSWORD=your_secure_admin_password
    SUPER_USER_EMAILS=admin1@example.com,admin2@example.com
    ENABLE_AUTH=true
    
    # Database and Services
    POSTGRES_URL=postgresql://user:password@localhost:5432/dataset_db
    POCKETBASE_URL=http://localhost:8090
    HF_TOKEN=your_huggingface_token
    HF_REPO_ID=your_username/your_dataset
    
    # Storage Configuration
    SAVE_LOCALLY=true
    DATASET_BASE_DIR=/app/datasets
    TEMP_FOLDER=./temp
    
    # Batch Processing
    TRANSCRIPT_BATCH_SIZE=100
    SYNC_MEMORY_LIMIT_MB=1024
    UPLOAD_CHUNK_SIZE=8388608  # 8MB in bytes
    UPLOAD_BATCH_SIZE=10
    MAX_UPLOAD_WORKERS=4
    MAX_UPLOAD_RETRIES=3
    
    # Network Settings
    NETWORK_TIMEOUT=30  # seconds
    FLASK_PORT=7860
    
    # Sync Schedule (UTC)
    SYNC_HOUR=2
    SYNC_MINUTE=0
    SYNC_TIMEZONE=UTC

    Important: Replace the placeholder values with your actual configuration parameters. Never commit sensitive credentials to version control.

Usage

Running the application

Key Functionalities

  1. Data Recording: Record audio through the intuitive web interface. Audio is saved as WAV files after applying fade-in, trimming, and fade-out processes.
  2. Transcription: Efficiently upload transcriptions in .txt or .csv format via the admin interface. All transcription data is securely stored in the PostgreSQL database.
  3. Validation: Enable moderators to validate recordings and transcriptions using a dedicated web interface, which offers filtering options by language and validation status.
  4. Synchronization: Perform automatic and manual dataset synchronization with Hugging Face Hub. This includes comprehensive hash calculation, Parquet preparation, and secure file uploads.
  5. User Management: Utilize the admin interface to manage user roles (user, moderator, admin) through PocketBase. Super admins have the authority to manage other admin accounts.

Accessing the application

Open your preferred web browser and navigate to http://localhost:7860 (or the appropriate Docker address, if applicable).

Architecture

The application adopts a three-tier architecture:

Architecture Diagram

System architecture diagram showing user authentication, data processing, and dataset publishing flows

Flow Description:

User Authentication (A): The authentication flow supports a four-tier hierarchy where Super Admin have complete system access and adding of Admin with a SUPER_ADMIN_PASSWORD, followed by Admins who manage moderators and system processes. Moderators are assigned for content validation, while regular users can contribute recordings through the platform.

Data Processing (B): At the core of the system, PostgreSQL tables store both transcripts and metadata. The application organizes audio files in language-specific structures, implementing a comprehensive quality control workflow managed by moderators. This phase also handles preparation for HuggingFace synchronization, ensuring data integrity throughout the process.

Dataset Publishing (C): The final stage involves organizing validated recordings in structured, language-specific directories. The system generates and maintains metadata parquet files for efficient data management. Content synchronization with HuggingFace occurs either through scheduled automated processes or manual triggers, making the verified datasets publicly accessible.

API Reference

Authentication Endpoints (auth_middleware.py, PocketBase)

Data Recording Endpoints (app.py)

Validation Endpoints (validation_route.py, moderator access required)

Admin Endpoints (admin_routes.py, admin access required)

Super Admin Endpoints (super_admin.py, super admin access required)

Data Models

User:

{
    "id": "string",
    "email": "string",
    "name": "string",
    "role": "user" | "moderator" | "admin",
    "is_moderator": boolean,
    "gender": "M" | "F" | "O" | null,
    "age_group": "Teenagers" | "Adults" | "Elderly" | null,
    "country": "string" | null,
    "state_province": "string" | null,
    "city": "string" | null,
    "accent": "Rural" | "Urban" | null,
    "language": "string" | null
}

PocketBase API Rules:

# List/Search rule - Only admins can list all users, users can only see their own record
(@request.auth.role = "admin") || (@request.auth.id = id)

# View rule - Only admins can view any user, users can only view their own record
(@request.auth.role = "admin") || (@request.auth.id = id)

# Update rule - Admins can update any user, users/moderators can only update their own record without changing role
(
   @request.auth.role = "admin"
) || (
  (@request.auth.role = "user" || @request.auth.role = "moderator") &&
   @request.auth.id = id &&
   role = role
)

Recording:

{
    "id": integer,
    "user_id": "string",
    "audio_filename": "string",
    "transcription_id": integer,
    "speaker_name": "string",
    "speaker_id": "string",
    "audio_path": "string",
    "sampling_rate": integer,
    "duration": float,
    "language": "string",
    "gender": "string",
    "country": "string",
    "state": "string",
    "city": "string",
    "status": "pending" | "verified" | "rejected",
    "verified_by": "string" | null,
    "username": "string",
    "age_group": "string",
    "accent": "string",
    "transcription": "string"
}

Key Classes and Functions