Skip to main content

✍️ Create

Use the ✍️ Create endpoint to write text according to a provided prompt.

Available at

💸️ Pricing

You will be billed for the total number of tokens sent in your request plus the number of tokens generated by the API. Note that when using n_completions to return multiple possibilities, you will be charged for all of them. The amount of tokens used by a call is returned in the "costs" entry of the response JSON as "total_tokens_used".

Example request

curl -X 'POST' \
'' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'X-Model: orion-fr' \
-d '{"text": "Il était une fois", "params": {"mode": "nucleus", "n_tokens": 25, "p": 0.9}}'
Response (JSON)
"request_id": "c2346fc0-630d-4dd1-b0b4-142249b62f22",
"outputs": [
"input_text": "Il était une fois",
"completions": [
"output_text": "un brave étudiant en droit à Harvard. Il venait d'obtenir son diplôme lorsque son père, le Professeur Edmund Corne",
"score": {
"logprob": -67.8477555513382,
"normalized_logprob": -2.713910222053528,
"token_logprobs": null
"execution_metadata": {
"cost": {
"tokens_used": 29,
"tokens_input": 4,
"tokens_generated": 25,
"cost_type": "orion-fr@default",
"batch_size": 1
"finish_reason": "length"
"costs": {
"orion-fr@default": {
"total_tokens_used": 29,
"total_tokens_input": 4,
"total_tokens_generated": 25,
"batch_size": 1


  • text string/array[string] ⚠️ required

    The input(s) that will be used by the model for generation, also known as the prompt. They can be provided either as a single string or as an array of strings for batch processing.

  • params object null

    A set of parameters to control the model output. The following parameters are supported:

    • n_tokens int 20

      Number of tokens to generate. This can be overridden by a list of stop_words, which will cause generation to halt when a word in such list is encountered.

    ⚠️ Maximum content length

    Our models can process sequences of 1,024 tokens at most (length of prompt + n_tokens). Requests overflowing this maximum length will see their prompt truncated from the left to fit.

    • n_completions int 1

      Number of different completion proposals to return for each prompt.

    💸️ Additional costs

    You will be charged for the total number of tokens generated: n_completions * n_tokens, stay reasonable!

    • best_of int null ⚠️ smaller than n_completions

      Among n_completions, only return the best_of ones. Completions are selected according to how likely they are, summing the log-likelihood over all tokens generated.


    • mode (greedy, topk, nucleus) nucleus

      How the model will decide which token to select at each step.

      • Greedy: the model will always select the most likely token. This generation mode is deterministic and only suited for applications in which there is a ground truth the model is expected to return (e.g. question answering).
      • Nucleus: the model will only consider the most likely tokens with total probability mass p. We recommend this setting for most applications.
      • Top-k: the model will only consider the k most likely tokens. For some models, in particular lyra-fr, this mode is a very good alternative to nucleus sampling.
    • temperature float 1.0 ⚠️ only in topk/nucleus mode

      How risky will the model be in its choice of tokens. A temperature of 0 corresponds to greedy sampling. we recommend a value around 1 for most creative applications, and closer to 0 when a ground truth exists.

    • p float 0.9 ⚠️ only in nu²cleus mode

      Total probability mass of the most likely tokens considered when sampling in nucleus mode.

    • k int 5 ⚠️ only in topk mode

      Number of most likely tokens considered when sampling in top-k mode. Lower values are more deterministic, and k=1 is equivalent to greedy.


    • biases map<string, float> null

      Bias the provided words to appear more or less often in the generated text. Values should be comprised between -100 and +100, with negative values making words less likely to occur. Extreme values such as -100 will completely forbid a word, while values between 1-5 will make the word more likely to appear. We recommend playing around to find a good fit for your use case.

      💡 Avoiding repetitions

      When generating longer samples with biases, the model may repeat positively biased words too often. Combine this option with presence_penalty and frequency_penalty to achieve best results. If you generate a first completion, and then use it as a prompt for a new completion, you probably want to turn off the word bias encouraging a certain word once it has been produced to avoid too much repetition.

      ⚙️ Technical details

      The provided bias is directly added to the log-likelihood predicted by the model at a given step, before performing the sampling operation. You can use the return_logprobs option or the Analyse endpoint to access the log-probabilities of samples and get an idea of the range of likelihood values in your specific use case.

      The bias is actually applied at the token level, and not at the word level. For words made of multiple tokens, the bias only applies to the first token (and may thus impact other words).

    • presence_penalty float 0.0

      How strongly should tokens be prevented from appearing again. This is a one-off penalty: tokens will be penalized after their first appearance, but not more if they appear repetitively. Use frequency_penalty if that's what you want instead. Use values between 0 and 1. Values closer to 1 encourage variation of the topics generated.

      ⚙️ Technical details

      Once a token appears at least once, presence_penalty will be removed from its log-likelihood in the future.

    • frequency_penalty float 0.0

      How strongly should tokens be prevented from appearing again if they have appeared repetitively. Contrary to presence_penalty, this penalty scales with how often the token already occurs. Use values between 0 and 1. Values closer to 1 discourage repetition, especially useful in combination with biases.

      ⚙️ Technical details

      frequency_penalty * nTn_T will be removed from the log-likelihood of a token, where nTn_T is how many times it occurs in the text already.

    • stop_words array[string] null

      Encountering any of these strings will halt generation immediately.


    • concat_prompt boolean false

      The original prompt will be concatenated with the generated text in the returned response.

    • return_logprobs bool false

      Returns the log-probabilities of the generated tokens.

    • seed int null

      Make sampling deterministic by setting a seed used for random number generation. Useful for strictly reproducing Create calls.


    • skill string null

      Specify a 🤹 Skill to use to perform a specific task or to tailor the generated text.

Response (outputs)

An array of outputs shaped like your batch.

  • input_text string

    The text used to generate the text.

  • completions array[object]

    One entry for each n_completions requested.

    • output_text string

      Text generated by the model. May be concatenated with the input_text if concat_prompt=True.

    • score Score

      A Score structure.

    • execution_metadata ExecutionMetadata

      An Execution metadata structure.

⚙️ Token representations

Tokens are currently returned as they are represented by the tokenizer, which includes special characters such as Ġ for spaces and possible encoding oddities (such as é for é).