For Platform 2.0 Only

Clova Speech Recognition(CSR)

사람의 목소리를 텍스트로 바꿔주어 다양한 음성 인식 서비스에 활용할 수 있습니다

Speech Recognition API with the Highest Speech Recognition Rate for the Korean Language

The Speech Recognition API service can be used to develop services, such as the voice-operated personal assistant application, chatbot, or speech memo. Speech data are sent to the Clova Speech Recognition (CSR) engine through API, and the data is recognized and converted into text.

High Recognition Rate and Multi-language Support
The best speech recognition rate for the Korean language in Korea, as a result of NAVER’s years of research and development experience. English, Japanese, and Chinese (Simplified) are also available. You can use the same level of service as the speech recognition technology applied to Naver's Clova service.
Performance Improvement through Learning
The quality continues to be improved with various types of learning based on machine learning, one of the artificial intelligence technologies. Meet Clova Speech Recognition (CSR), which is becoming more intelligent with rich data obtained via NAVER’s services to enhance your own services.
Safe Personal Information Management
To safely protect the information entered for speech recognition, Clova Speech Recognition (CSR) complies with Korea's privacy laws, such as the Personal Information Protection Act and the Act on Promotion of Information and Communications Network Utilization and Information Protection, Etc.
Easy Use
You can handle various types of management tasks, such as service registration (register, change, and delete), setup of usage limits (daily/monthly), and checking usage statistics on the web-based console. In addition, you can provide the service in a range of environments, such as server and mobile, because both mobile SDK and REST API methods are provided.

Detailed Features

A high quality speech recognition service is provided that learns from NAVER's abundant data.

Provided Services

If a human voice is sent using CSR mobile SDK and API, the results of recognition is returned as text from the server.
An HTTP-based REST API and a mobile SDK type that has an implemented self-streaming protocol are provided.

Method of UseSupported PlatformsRecognizable LanguagesRecognizable TimeRecognizable Audio File FormatData Sent to CSR Engine
Mobile SDKAndroid 10 or laterKorean, English, Japanese, Chinese60 secs-Voice input with microphone
iOS 8 or later
REST API-mp3, aac, ac3, ogg, flac, wavSound recording file

Using API

Applications that have utilized the CSR mobile SDK use the development package name of the client ID and the Android application for API authentication, and REST API uses the client ID and client secret key for authentication. You can apply encryption on all data transfer sections from authentication to using API.

Use Cases

You can use the service that was only used with text input or physical methods to control the service using voice.
You can apply it on home appliances, home network control of apartments, delivery orders, financial services, and more.
Using the mobile SDK, various personal assistant services can be implemented with smart phone applications. You can also utilize REST API for converting the accumulated audio data file into text.

Service Usage Limit

- The usage limit of the Clova Speech Recognition API can be adjusted directly from the service console.
- You can adjust the usage limit of Clova Speech Recognition API yourself using various options from 300,000 secs a month (10,000 secs a day) to 30 mill secs a month (10 mill secs a day) per application.
- Please note that the service usage is rounded up to every 15th second for billing.
- Calling is allowed for the last call that exceeds the usage limit. (The usage fee is calculated based on the number of actual calls.)
E.g., you have set the monthly usage limit to 1,000 seconds. Once you use 990 seconds, you can call for an extra 15 seconds. Then, you can use a total of 1,005 seconds and will be billed for 1,005 seconds.
- If you need to increase the service limit, please contact Customer Support

Pricing Information

The CSR usage fee will be charged in 15 second units based on the usage time of speech recognition.

Charge (Monthly)UnitUsage Fee (in KRW)
Speech recognition usage time (speech recognition request - termination)Usage time (per 15 seconds)4 KRW per 15 seconds
  • Usage time will be rounded up to the nearest 15 second unit.
  • Maximum speech recognition per 1 API call is 60 seconds.
  • The time from the request of speech recognition until termination is measured and calculated into the usage time. In other words, even if there is a period of non-usage in the middle, such as silent intervals in the middle of a file or in the middle of a streaming request for recognition, that time is included in the usage time.
  • The same rate will apply on the mobile SDK and REST API.
Pricing Example

Case 1) Using 10 seconds from API call 1: 4 KRW
Case 2) Using 32 seconds from API call 1: 12 KRW
Case 3) In case of using CSR for 65 seconds: Up to the time limit of 60 seconds is recognized and 16 KRW will be charged.

Was this page helpful?

Please share your opinion and any suggestions for us.
Please enter content.
Send Opinion