On Thursday, OpenAI announced a new service option for developers through its application programming interface (API). Named Flex processing, this feature cuts AI usage costs by 50% compared to standard pricing. However, this reduction in cost comes at the expense of slower response times and the possibility of resource unavailability. Currently, the Flex processing feature is in beta and is applicable to select reasoning-focused large language models (LLMs).
OpenAI Introduces New API Service Tier
In a support page outlining the new offering, OpenAI provided details on Flex processing, which is accessible in beta for Chat Completions and Responses APIs and is compatible with the o3 and o4-mini AI models. Developers can initiate Flex processing by setting the service tier parameter to Flex in their API requests.
A key consideration for developers using the more affordable API option is the increase in processing time. OpenAI cautions that opting for Flex processing may result in slower response times and unpredictability in resource availability. Users could also encounter timeout issues with API requests, particularly when prompts are lengthy or complex. The service is designed for non-production or low-priority tasks, such as model evaluations, data enrichment, or handling asynchronous workloads.
To mitigate timeout errors, OpenAI advises developers to adjust the default timeout setting. The APIs are preset to timeout after 10 minutes, but with Flex processing, complex requests may require more time. By extending this timeout, developers can reduce the likelihood of encountering errors.
Furthermore, Flex processing could occasionally lead to insufficient resources to fulfill requests, generating a “429 Resource Unavailable” error code. In these scenarios, developers can either retry their requests with exponential backoff or revert to the default service tier if prompt completion is essential. OpenAI has indicated that it will not bill developers for requests that return this error.
Under the standard pricing model, the o3 AI model incurs a charge of $10 (approximately Rs. 854) for every million input tokens and $40 (around Rs. 3,418) for million output tokens. With Flex processing, these costs drop to $5 (about Rs. 427) for input tokens and $20 (approximately Rs. 1,709) for output tokens. In the case of the o4-mini AI model, the new tier will charge $0.55 (roughly Rs. 47) for input tokens and $2.20 (around Rs. 188) for output tokens, compared to the standard rates of $1.10 (approximately Rs. 94) for input and $4.40 (around Rs. 376) for output.