API Overview
All endpoints are available through https://gateway.respeecher.com
.
Before making any authenticated requests you must first choose an authentication method. It's recommended you use API key authentication due to its simplicity and ease of use.
As an example of how to integrate with the API, we have an official Python client for our TTS API that can be found here https://github.com/respeecher/respeecher_tts.
API Key Authentication
To acquire your API key you must first go to
https://marketplace.respeecher.com/account
and generate one with a suitable expiration date. Once you have an API key
you must supply it to all authenticated endpoints via the api-key
header:
Session Cookie Authentication
The POST /api/login
endpoint is used to obtain a session_id
cookie and a
CSRF token. The request body must be formatted with JSON and the
Content-Type
header must be set to application/json
. Here is an example
login request body:
If successful, the status code will be 200
. 200
is the default success code
for all endpoints. The response will be of the following format:
For all requests you make to authenticated endpoints, you must include the
session_id
cookie and the CSRF token in the X-Csrf-Token
header as part of
the request.
Projects
Once you have chosen an authentication method you can create a new project
using the POST /api/projects
endpoint. Here is an example project create
request:
owner
is either a user or a group ID. Your user ID is returned as part of the
login response.
Folders
Next you must create a folder for your project. Folders provide a way to group conversions and recordings within your project. For example you might want to group recordings that relate to a specific scene within a game.
Folders are created with the POST /api/folders
endpoint. Here is an example
request:
Both name
and project_id
are required.
Recordings
Now that you have a folder you can upload an original voice recording with the
POST /api/recordings
endpoint. Unlike the endpoints mentioned so far POST
/api/recordings
accepts requests formatted using multipart form data, so the
Content-Type
header must be set to multipart/form-data
.
Here is a list of the form data fields that POST /api/recordings
accepts:
data
- The audio binary data. Currently the supported audio formats are wav, ogg, mp3 or flac.parent_folder_id
- The ID of a parent folder.microphone
- Name of the microphone used to capture the audio. If uploading a file set this field tofile
.label
(Optional) - Iflabel
is not provided it will use the uploaded files name.
TTS Recordings
If you would rather create a recording with text to speech you can use the
POST /api/v2/recordings/tts
endpoint. Here is an example request:
{
"parent_folder_id": "00000000-0000-0000-0000-000000000001",
"text": "string",
"label": "string" (Optional),
}
parent_folder_id
- The ID of a parent folder.text
- The text to be converted to speech.label
(Optional) - A label to display in UI.
If you are using Python it is recommended that you use our dedicated Python API client for TTS. The installation instructions and example code can be found here https://github.com/respeecher/respeecher_tts.
Calibration
The next optional step is calibration. The calibration process determines the mean pitch of your voice, enabling the model to accurately adjust the pitch when converting your voice to another.
You can choose to skip the calibration process if you want to make use of the automatic calibration feature, though it's advisable to create a calibration for better results. Similarly, if you're solely generating TTS conversions you may also skip the calibration process.
Calibration is done with the POST /api/calibration
endpoint. Here is a list
of the multipart/form-data
fields it expects:
name
- Unique name for the calibration.data
- The audio binary data. The supported audio formats are the same as for recording upload.
It could take a few minutes to process the calibration. You can get the status
of your calibrations with the GET /api/calibration
endpoint.
Voices, Accents and Narration Styles
The final step before converting your original recording is selecting your
desired voices with accents or narration style for the conversion. This can be
achieved with the GET /api/v2/voices
endpoint.
Accents used for speech-to-speech (STS) conversions, and narration styles used
for text-to-speech (TTS) conversions.
The following optional query parameters can be supplied to the
GET /api/v2/voices
endpoint:
Name | Description | Options |
---|---|---|
limit |
Limit the number of voices in the response. | Defaults to 25 |
offset |
Offset the location where the list starts. | Defaults to 0 |
sort |
Sort by parameter. | name , pitch , rating or created_at |
direction |
The sort direction. | asc or desc |
visibility |
Filter by visibility. | public , paid , private or kids |
species |
Filter by species . |
human , animal or other |
gender |
Filter by gender . |
male or female |
age_group |
Filter by age group. | child , young , adult or senior |
pitch_group |
Filter by pitch group. | low , mid or high |
nationality |
Filter by nationality. | Search string |
Here is an example of what the response might look like:
{
"list": [
{
"id": "00000000-0000-0000-0000-000000000001",
"owner_id": "00000000-0000-0000-0000-000000000001",
"name": "string",
"slug": "string",
"visibility": "public",
"species": "human",
"artist": "string",
"verified_artist": true,
"gender": "male",
"pitch": 0,
"age_group": "child",
"pitch_group": "low",
"nationality": "string",
"image_url": "string",
"thumbnail_url": "string",
"description": "string",
"rating": 0,
"active": true,
"created_at": "2024-02-28T11:21:32.401Z",
"favorite": false,
"available": false,
"accents": [
{
"id": "00000000-0000-0000-0000-000000000001",
"is_default": true,
"native": true,
"info": {
"name": "string",
"short_name": "string",
"locale": "string"
},
"tiers": [
{
"name": "string"
}
],
"available": false,
"settings": {
"favorite": false,
"semitones_correction": 0
}
}
],
"narration_styles": [
{
"id": "00000000-0000-0000-0000-000000000001",
"is_default": true,
"info": {
"name": "string"
},
"settings": {
"favorite": false
}
}
],
"tiers": [
{
"name": "string"
}
]
}
],
"pagination": {
"count": 0,
"limit": 0,
"offset": 0
}
}
Conversion Order
You can now create a conversion order from your original audio recording or text recording with
POST /api/v2/orders
. Here is an example conversion order request:
{
"original_id": "00000000-0000-0000-0000-000000000001",
"conversions": [
{
"voice_id": "00000000-0000-0000-0000-000000000001",
"narration_style_id": "00000000-0000-0000-0000-000000000001" (Required for TTS),
"accent_id": "00000000-0000-0000-0000-000000000001" (Optional),
"semitones_correction": 0 (Optional),
"label": "string" (Optional)
}
],
"calibration_id": "00000000-0000-0000-0000-000000000001" (Optional),
"use_calibration": false (Optional)
}
original_id
- is the ID of the original recording to create a conversion from.conversions
contains a list of the voices and accents you wish to convert your original voice recording to. Each list item has three properties:voice_id
- The ID of the voice. A list of the available voices can be obtained with theGET /api/v2/voices
endpoint.narration_style_id
(Required for TTS) - The ID of the narration_style to use. Each voice has number of narration_styles that you can choose from. Narration styles used in text to speech conversions.accent_id
(Optional) - The ID of the accent to use. Each voice has number of accents that you can choose from. Accents used in speech-to-speech conversions.semitones_correction
(Optional) - The number of semitones to shift the converted output to. By default most voices will shift your voice by an number of semitones. The resultant pitch will beyour voice's pitch + default voice shift + semitones_correction
. Used only in speech-to-speech conversion.label
(Optional) - A label to identify the conversion.use_calibration
(Optional) - if you wish to use a specific calibration set this totrue
and specify thecalibration_id
otherwise, if it'sfalse
, it will default to the currently enabled calibration on your account. Used only in speech-to-speech conversion.calibration_id
(Optional) - contains the ID of the calibration to use. Used only in speech-to-speech conversion.
Note: When making TTS conversions, you must specify a narration_style_id
otherwise the resulting conversion will be robotic and unrealistic.
Downloading conversions
After the conversion is complete its state
will be set to done
, and the
url
property will contain a link.
{
"id": "a0663dbc-42c9-438b-9cbd-2aa1de3a7738",
"project_id": "d88084a0-b616-49a4-abe9-9b4a5720a2c9",
"parent_folder_id": "5e1ca71d-d331-4c80-bcd2-78a1e9fa74c7",
"type": "converted",
"url": "/storage/a0663dbc-42c9-438b-9cbd-2aa1de3a7738.wav",
"state": "done",
"original_id": "1589a392-aac7-4c56-8b0e-b39739f3de7a",
...
}
With GET /storage/a0663dbc-42c9-438b-9cbd-2aa1de3a7738.wav
you can get your result.
Exporting
After the conversions are complete you can export the project to a .zip archive
with GET /api/projects/{project_id}/export?starred_only={starred_only}
. If
starred_only
is set to true
only recordings marked as starred will be
exported. The response will have a Content-Type
of application/zip
and the
Content-Disposition
header will be an attachment
with the filename
directive set to the name of the exported .zip
file. The response body
contains the .zip
file.
If you want export just the conversions for a specific recording you can use
the GET /api/recordings/export?original_id={original_id}&starred_only={starred_only}
endpoint. The response is the same as for project export.
Rate limits
Rate limits are added for most of endpoints. All endpoints are broken into two
categories: fast
and slow
.
Default limits per user:
fast
- 500 requests every 300 secondsslow
- 100 requests every 300 seconds
If you get a 429 http status code, please adjust you script to send fewer
requests. Within the 429 http status code response, a new response header
Retry-After
is added to indicate the number of seconds you should wait until
attempting to submit new request.
We can increase limits for a particular user upon request.
fast
- lightweight requests like the majority of GET endpoints.slow
- heavy requests such as creating conversions.
By default endpoints are in the fast
category. Here is a list of the
endpoints in the slow
category:
Endpoint |
---|
POST /api/recordings |
POST /api/recordings/tts |
POST /api/recordings/conversion-order |
POST /api/recordings/conversion-redo/{recording_id} |
POST /api/calibration |