Research — CogniHuman

// 01 · Active

BashaEval

Status: In development
Type: Benchmark / Evaluation framework
License: Apache 2.0 + CC BY-SA 4.0

Benchmarks Indic Languages WER / MOS / Latency

A systematic evaluation framework for Indic Voice AI

Voice AI systems are evaluated almost exclusively on English benchmarks — LibriSpeech, CommonVoice English, VCTK. When researchers report results, they are reporting performance on a language that represents a fraction of India's population. BashaEval changes this.

BashaEval is a systematic evaluation framework providing standardized test sets, evaluation metrics (WER, MOS, speaker similarity, latency), and baseline results across Indian languages. Every component will be fully open-sourced so any research team can run it.

We are currently designing the benchmark methodology, curating initial test sets, and establishing baseline measurements using existing open-source models. Full release details will be announced when the first version is ready.

Interested in collaborating? →

// 02 · Active

Voice Corpus Project

Status: Planning & collection
Type: Dataset
License: CC BY-SA 4.0

Speech Data Regional Dialects Ethics First

Ethical, high-quality speech data for underrepresented dialects

The single largest bottleneck in multilingual Voice AI is data. English TTS has millions of training hours. Telugu, Bhojpuri, or Marwari — the languages hundreds of millions of people speak at home — have almost nothing publicly available, especially conversational speech with natural prosodic variation.

The Voice Corpus Project collects high-quality, ethically sourced speech data for regional Indian languages and dialects. All data is collected under explicit informed consent protocols, fully documented, and released under CC BY-SA 4.0.

We are actively seeking linguists, community partners, and volunteer speakers. If your community speaks a language or dialect underrepresented in existing datasets, we want to hear from you.

Volunteer or partner with us →

// 03 · In Progress

VoiceLoop-X

Status: Phase 2 — Core optimizations
Type: Systems research
License: Apache 2.0

On-Device CPU Inference Latency Research

On-device voice agents for resource-constrained environments

Most Voice AI research assumes cloud infrastructure. We believe the most impactful Voice AI is the kind that works on an affordable Android phone without an internet connection. VoiceLoop-X is a systematic research project into on-device, CPU-first voice agent pipelines.

We are studying latency at every stage of the pipeline (ASR → LLM → TTS → VAD), developing systematic quantization studies, and building efficient KV cache and speculative decoding implementations optimized for edge inference.

Research phases

Complete

Phase 1 — Foundation

Latency profiler, benchmark suite, cross-platform validation

Active

Phase 2 — Core Optimizations

KV cache, GPU acceleration, streaming ASR, quantization study

Planned

Phase 3 — Advanced Features

Speculative decoding, adaptive VAD, memory-efficient streaming

Planned

Phase 4 — Full Publication

VoiceBench standard, INTERSPEECH/ICASSP submissions, open release

Code, weights, and full methodology will be released publicly when research phases are complete.

A note on methodology

We are not a wrapper project

It is easy to call an existing API and build a demo. That is not research. CogniHuman publishes reproducible methodologies, creates novel datasets with proper annotation standards, and open-sources weights and code — not just papers.

If a result cannot be reproduced independently, we do not claim it. If a benchmark is saturated, we build a better one. This is how science is supposed to work.

View our publications →