Hadithi - Generate High-Quality Video Datasets for Language Models
Hadithi is an open-source, bash-based command-line tool developed by QET Lab. It serves as a data factory for generative video models, allowing AI and ML developers to create high-quality video datasets. With Hadithi, developers can easily generate video datasets to fine-tune large language models (LLMs) for enhanced performance and accuracy.
This tool enables users to organize videos, rename them with timestamps, segment them into smaller clips, detect scenes, remove audio if needed, filter out short videos, resize videos, extract frames, batch process videos, validate image counts, and create videos from images with proper frame rate. By providing a comprehensive set of features, Hadithi simplifies the process of preparing video data for training language models.
Developers can leverage Hadithi to streamline the data preprocessing stage, ensuring that the video datasets are optimized for training generative video models. This tool offers a user-friendly interface and a range of functionalities to enhance the efficiency and effectiveness of generating video datasets for AI and ML applications.
To explore Hadithi and start generating high-quality video datasets for language models, visit the GitHub repository at qet-lab/hadithi.