API Overview

All endpoints are available through https://gateway.respeecher.com.

Before making any authenticated requests you must first choose an authentication method. It's recommended you use API key authentication due to its simplicity and ease of use.

As an example of how to integrate with the API, we have an official Python client for our TTS API that can be found here https://github.com/respeecher/respeecher_tts.

API Key Authentication

To acquire your API key you must first go to https://marketplace.respeecher.com/account and generate one with a suitable expiration date. Once you have an API key you must supply it to all authenticated endpoints via the api-key header:

api-key: AYvUIp3BANi-H_R5EjwgVa

The POST /api/login endpoint is used to obtain a session_id cookie and a CSRF token. The request body must be formatted with JSON and the Content-Type header must be set to application/json. Here is an example login request body:

{
  "email": "user@example.com",
  "password": "secret password 123"
}

If successful, the status code will be 200. 200 is the default success code for all endpoints. The response will be of the following format:

{
  "user": {...},
  "csrf_token": "string"
}

For all requests you make to authenticated endpoints, you must include the session_id cookie and the CSRF token in the X-Csrf-Token header as part of the request.

Projects

Once you have chosen an authentication method you can create a new project using the POST /api/projects endpoint. Here is an example project create request:

{
  "name": "My Project Name",
  "owner": "00000000-0000-0000-0000-000000000001"
}

owner is either a user or a group ID. Your user ID is returned as part of the login response.

Folders

Next you must create a folder for your project. Folders provide a way to group conversions and recordings within your project. For example you might want to group recordings that relate to a specific scene within a game.

Folders are created with the POST /api/folders endpoint. Here is an example request:

{
  "name": "My Folder",
  "project_id": "00000000-0000-0000-0000-000000000001"
}

Both name and project_id are required.

Recordings

Now that you have a folder you can upload an original voice recording with the POST /api/recordings endpoint. Unlike the endpoints mentioned so far POST /api/recordings accepts requests formatted using multipart form data, so the Content-Type header must be set to multipart/form-data.

Here is a list of the form data fields that POST /api/recordings accepts:

data - The audio binary data. Currently the supported audio formats are wav, ogg, mp3 or flac.
parent_folder_id - The ID of a parent folder.
microphone - Name of the microphone used to capture the audio. If uploading a file set this field to file.
label (Optional) - If label is not provided it will use the uploaded files name.

TTS Recordings

If you would rather create a recording with text to speech you can use the POST /api/v2/recordings/tts endpoint. Here is an example request:

{
  "parent_folder_id": "00000000-0000-0000-0000-000000000001",
  "text": "string",
  "label": "string" (Optional),
}

parent_folder_id - The ID of a parent folder.
text - The text to be converted to speech.
label (Optional) - A label to display in UI.

If you are using Python it is recommended that you use our dedicated Python API client for TTS. The installation instructions and example code can be found here https://github.com/respeecher/respeecher_tts.

Calibration

The next optional step is calibration. The calibration process determines the mean pitch of your voice, enabling the model to accurately adjust the pitch when converting your voice to another.

You can choose to skip the calibration process if you want to make use of the automatic calibration feature, though it's advisable to create a calibration for better results. Similarly, if you're solely generating TTS conversions you may also skip the calibration process.

Calibration is done with the POST /api/calibration endpoint. Here is a list of the multipart/form-data fields it expects:

name - Unique name for the calibration.
data - The audio binary data. The supported audio formats are the same as for recording upload.

It could take a few minutes to process the calibration. You can get the status of your calibrations with the GET /api/calibration endpoint.

Voices, Accents and Narration Styles

The final step before converting your original recording is selecting your desired voices with accents or narration style for the conversion. This can be achieved with the GET /api/v2/voices endpoint. Accents used for speech-to-speech (STS) conversions, and narration styles used for text-to-speech (TTS) conversions.

The following optional query parameters can be supplied to the GET /api/v2/voices endpoint:

Name	Description	Options
`limit`	Limit the number of voices in the response.	Defaults to 25
`offset`	Offset the location where the list starts.	Defaults to 0
`sort`	Sort by parameter.	`name`, `pitch`, `rating` or `created_at`
`direction`	The sort direction.	`asc` or `desc`
`visibility`	Filter by visibility.	`public`, `paid`, `private` or `kids`
`species`	Filter by `species`.	`human`, `animal` or `other`
`gender`	Filter by `gender`.	`male` or `female`
`age_group`	Filter by age group.	`child`, `young`, `adult` or `senior`
`pitch_group`	Filter by pitch group.	`low`, `mid` or `high`
`nationality`	Filter by nationality.	Search string

Here is an example of what the response might look like:

{
  "list": [
    {
      "id": "00000000-0000-0000-0000-000000000001",
      "owner_id": "00000000-0000-0000-0000-000000000001",
      "name": "string",
      "slug": "string",
      "visibility": "public",
      "species": "human",
      "artist": "string",
      "verified_artist": true,
      "gender": "male",
      "pitch": 0,
      "age_group": "child",
      "pitch_group": "low",
      "nationality": "string",
      "image_url": "string",
      "thumbnail_url": "string",
      "description": "string",
      "rating": 0,
      "active": true,
      "created_at": "2024-02-28T11:21:32.401Z",
      "favorite": false,
      "available": false,
      "accents": [
        {
          "id": "00000000-0000-0000-0000-000000000001",
          "is_default": true,
          "native": true,
          "info": {
            "name": "string",
            "short_name": "string",
            "locale": "string"
          },
          "tiers": [
            {
              "name": "string"
            }
          ],
          "available": false,
          "settings": {
            "favorite": false,
            "semitones_correction": 0
          }
        }
      ],
      "narration_styles": [
        {
          "id": "00000000-0000-0000-0000-000000000001",
          "is_default": true,
          "info": {
            "name": "string"
          },
          "settings": {
            "favorite": false
          }
        }
      ],
      "tiers": [
        {
          "name": "string"
        }
      ]
    }
  ],
  "pagination": {
    "count": 0,
    "limit": 0,
    "offset": 0
  }
}

Conversion Order

You can now create a conversion order from your original audio recording or text recording with POST /api/v2/orders. Here is an example conversion order request:

{
  "original_id": "00000000-0000-0000-0000-000000000001",
  "conversions": [
    {
      "voice_id": "00000000-0000-0000-0000-000000000001",
      "narration_style_id": "00000000-0000-0000-0000-000000000001" (Required for TTS),
      "accent_id": "00000000-0000-0000-0000-000000000001" (Optional),
      "semitones_correction": 0 (Optional),
      "label": "string" (Optional)
    }
  ],
  "calibration_id": "00000000-0000-0000-0000-000000000001" (Optional),
  "use_calibration": false (Optional)
}

original_id - is the ID of the original recording to create a conversion from.
conversions contains a list of the voices and accents you wish to convert your original voice recording to. Each list item has three properties:
voice_id - The ID of the voice. A list of the available voices can be obtained with the GET /api/v2/voices endpoint.
narration_style_id (Required for TTS) - The ID of the narration_style to use. Each voice has number of narration_styles that you can choose from. Narration styles used in text to speech conversions.
accent_id (Optional) - The ID of the accent to use. Each voice has number of accents that you can choose from. Accents used in speech-to-speech conversions.
semitones_correction (Optional) - The number of semitones to shift the converted output to. By default most voices will shift your voice by an number of semitones. The resultant pitch will be your voice's pitch + default voice shift + semitones_correction. Used only in speech-to-speech conversion.
label (Optional) - A label to identify the conversion.
use_calibration (Optional) - if you wish to use a specific calibration set this to true and specify the calibration_id otherwise, if it's false, it will default to the currently enabled calibration on your account. Used only in speech-to-speech conversion.
calibration_id (Optional) - contains the ID of the calibration to use. Used only in speech-to-speech conversion.

Note: When making TTS conversions, you must specify a narration_style_id otherwise the resulting conversion will be robotic and unrealistic.

Downloading conversions

After the conversion is complete its state will be set to done, and the url property will contain a link.

{
  "id": "a0663dbc-42c9-438b-9cbd-2aa1de3a7738",
  "project_id": "d88084a0-b616-49a4-abe9-9b4a5720a2c9",
  "parent_folder_id": "5e1ca71d-d331-4c80-bcd2-78a1e9fa74c7",
  "type": "converted",
  "url": "/storage/a0663dbc-42c9-438b-9cbd-2aa1de3a7738.wav",
  "state": "done",
  "original_id": "1589a392-aac7-4c56-8b0e-b39739f3de7a",
  ...
}

With GET /storage/a0663dbc-42c9-438b-9cbd-2aa1de3a7738.wav you can get your result.

Exporting

After the conversions are complete you can export the project to a .zip archive with GET /api/projects/{project_id}/export?starred_only={starred_only}. If starred_only is set to true only recordings marked as starred will be exported. The response will have a Content-Type of application/zip and the Content-Disposition header will be an attachment with the filename directive set to the name of the exported .zip file. The response body contains the .zip file.

If you want export just the conversions for a specific recording you can use the GET /api/recordings/export?original_id={original_id}&starred_only={starred_only} endpoint. The response is the same as for project export.

Rate limits

Rate limits are added for most of endpoints. All endpoints are broken into two categories: fast and slow.

Default limits per user:

fast - 500 requests every 300 seconds
slow - 100 requests every 300 seconds

If you get a 429 http status code, please adjust you script to send fewer requests. Within the 429 http status code response, a new response header Retry-After is added to indicate the number of seconds you should wait until attempting to submit new request.

We can increase limits for a particular user upon request.

fast - lightweight requests like the majority of GET endpoints.
slow - heavy requests such as creating conversions.

By default endpoints are in the fast category. Here is a list of the endpoints in the slow category:

Endpoint
`POST /api/recordings`
`POST /api/recordings/tts`
`POST /api/recordings/conversion-order`
`POST /api/recordings/conversion-redo/{recording_id}`
`POST /api/calibration`