Building a Robust Voicemail Detection System at Bland

Improve AI call center efficiency with advanced voicemail detection. Learn how Bland AI built high-accuracy models to optimize outbound calls and reduce errors.

Voicemail detection is a surprisingly tricky challenge in telephony. Unlike other core phone system functions, voicemail isn’t a standardized feature—it’s more of an add-on, varying widely across carriers and devices. This lack of consistency makes it difficult for AI systems to reliably detect when a call has reached voicemail.

Humans can identify voicemail with ease—we recognize greeting messages, gauge call duration, and pick up on familiar voicemail patterns. AI, on the other hand, doesn’t have that intuition, making misclassification a real risk. If an AI-powered call assistant mistakenly thinks it’s speaking to a person when it’s actually leaving a voicemail, it could result in awkward or unintended messages.

To prevent this, we need a highly accurate voicemail detection system—one that ensures AI calls disconnect the moment they encounter voicemail, minimizing errors and improving call efficiency.

In terms of building an AI call center, it would need to rely on advanced voicemail detection to optimize outbound calls and improve efficiency. Without accurate detection, time may be wasted, leaving unintended voicemails or misidentifying human responses, leading to dropped calls and poor user experience. By leveraging voicemail detection, call centers can streamline workflows and ensure that bandwidth is only used on productive conversations.

Existing Voicemail Detection Solutions and Their Limitations

One common approach to voicemail detection is Twilio’s built-in voicemail detection feature, which attempts to recognize voicemail tones. However, this method has limitations:‍

‍Variability in Voicemail Systems – Many voicemail systems allow users to record their own messages, making them harder to detect using fixed rules.‍
IVR and Smart Voicemail Systems – Some voicemail systems now incorporate interactive voice response (IVR) or AI call screeners, making it more difficult to differentiate them from real human pick-ups.‍
Accuracy Issues – Twilio’s voicemail detection is not always reliable, leading to false positives or false negatives.

Given these challenges, we needed to develop our own machine learning-based approach for detecting voicemail with higher accuracy.

As AI call center technology continues to evolve, businesses need voicemail detection systems that adapt to changing voicemail patterns and IVR responses. The ability to distinguish between a voicemail system and a live recipient is critical for call automation and AI-sales engagement. By improving voicemail detection, AI call centers can ensure that every interaction is targeted, reducing wasted call attempts and increasing successful customer connections.

How We Built Our Voicemail Detection Models

At Bland, we have access to a large dataset of call recordings, including cases where calls hit voicemail and where they do not. To tackle the problem, we manually labeled a portion of this dataset and explored two primary machine learning models.

1. Fine-Tuned Wave2Vec for Voicemail Detection

Wave2Vec is a powerful self-supervised model originally designed for speech recognition. Our approach was as follows:

We fine-tuned Wave2Vec specifically for voicemail detection.
We extracted the first two seconds of audio from each call.
Since Twilio’s audio is typically in 8 kHz, we up-sampled it to 16 kHz (a requirement for Wave2Vec).
The first two seconds of the upsampled audio served as the input to our fine-tuned model.

This method performed exceptionally well, achieving 98.5% accuracy in detecting voicemails.

2. CNN-Based Voicemail Detection

In parallel, we trained a Convolutional Neural Network (CNN) model:

This model used the first four seconds of each call recording.
It converted audio into Mel spectrograms to extract low-level spectral features, allowing it to differentiate between voicemail and live human responses.
The CNN-based approach achieved a solid 97% accuracy.

While slightly less accurate than the Wave2Vec model, the CNN approach remains a viable and lightweight alternative and is part of Bland’s Model Ensemble.

By integrating voicemail detection into AI call center platforms, companies can significantly enhance outbound and inbound calling strategies. AI call centers powered by machine learning can automatically filter out calls that hit voicemail, redirecting agents to active leads and reducing time spent on ineffective interactions. This improvement allows AI call systems to scale outreach efforts while maintaining a high level of engagement and conversion rates.

Next Steps: Using Silence to Improve Voicemail Detection

Beyond our current models, we’re investigating a new approach based on silence detection.

Voicemail recordings often contain a static machine-generated background noise, while live calls tend to have natural ambient noise from the caller’s environment.
Our hypothesis is that by analyzing the silence portion of a call, we can detect whether we are speaking to an automated system or a real human.

This is an experimental approach, but if successful, it could provide an additional layer of robustness to our voicemail detection pipeline.

Try It Yourself

To help others experiment with voicemail detection, we’ve open-sourced both our models. You can find links to the Hugging face models below to test them yourself.

🔗 Wave2Vec Voicemail Detection Model

🔗 CNN-Based Voicemail Detection Model

While we’ve kept the training data closed-source to protect customer privacy, we’ve made the model weights publicly available for further research and improvement.

Conclusion

By leveraging fine-tuned Wave2Vec and CNN-based models, we’ve built a highly accurate voicemail detection system that significantly outperforms traditional solutions. As voicemail systems continue to evolve, we’ll keep iterating on our approach—especially in exploring silence detection as an additional signal.

If you have any thoughts, feedback, or want to collaborate, feel free to reach out. We’d love to hear from the community!

Jake Downie

I'm an AI Engineer at Bland AI, working at the intersection of machine learning and voice automation. Previously, I built AI-powered workflow tools at Respell and co-founded a startup focused on eDiscovery software for law firms.

‍