On Thursday, OpenAI announced a new service option for developers utilizing its application programming interface (API). Known as Flex processing, this new tier offers a significant reduction in AI usage costs—cutting them by 50% compared to the standard rates. However, the trade-off includes reduced response times and the possibility of occasional unavailability of resources. Currently, this feature is in beta and is available for select reasoning-oriented large language models (LLMs).
OpenAI Unveils New Service Tier in API
The details surrounding the Flex processing service tier can be found on OpenAI’s support page. This new option is accessible in beta for Chat Completions and Responses APIs, functioning specifically with the o3 and o4-mini AI models. To enable this mode, developers can set the service tier parameter to Flex in their API requests.
One of the primary consequences of the lower pricing is a notable increase in processing time. OpenAI informs developers that they should anticipate slower response rates and sporadic unavailability of resources when using Flex processing. Furthermore, users might encounter API request timeout errors, particularly with longer or more complex prompts. According to the company, this mode is ideally suited for non-production or low-priority tasks such as model evaluations, data enrichment activities, or asynchronous workflows.
Importantly, OpenAI has noted that developers can mitigate timeout errors by adjusting the default timeout settings. Typically, these APIs are configured to timeout after 10 minutes. However, with Flex processing, complex or lengthy prompts may require additional time. The organization advises increasing the timeout to lower the likelihood of encountering errors.
Moreover, Flex processing may occasionally experience resource shortages that could trigger a “429 Resource Unavailable” error code. In such situations, developers are encouraged to retry their requests using an exponential backoff strategy or to revert to the standard service tier if they require timely responses. OpenAI has confirmed that no charges will be incurred for requests that result in this error.
In terms of pricing, the standard mode for the o3 AI model is set at $10 (approximately Rs. 854) per million input tokens and $40 (approximately Rs. 3,418) per million output tokens. With Flex processing, these costs are reduced to $5 (approximately Rs. 427) for input tokens and $20 (approximately Rs. 1,709) for output tokens. Similarly, the o4-mini AI model will see a new charge of $0.55 (approximately Rs. 47) for input tokens and $2.20 (approximately Rs. 188) for output tokens, compared to the standard rates of $1.10 (approximately Rs. 94) for input and $4.40 (approximately Rs. 376) for output.