Page 23
Introducing SWE-bench Verified
OpenAI has released a human-reviewed version of SWE-bench, a benchmark that evaluates AI models' ability to solve real software engineering problems. The validated subset provides more dependable performance metrics by incorporating expert verification. This effort helps ensure that AI coding tools are assessed fairly and accurately against practical, real-world challenges.
Zico Kolter Joins OpenAIβs Board of Directors
OpenAI has brought Zico Kolter onto its board of directors, leveraging his background in AI safety and alignment to enhance the company's governance. Kolter will additionally participate in the Safety & Security Committee, reinforcing OpenAI's focus on responsible AI development oversight.
GPT-4o System Card External Testers Acknowledgements
GPT-4o system card external testers acknowledgements
GPT-4o System Card
OpenAI released a detailed safety report documenting the protective measures implemented for GPT-4o before its launch, including external security testing and risk evaluations based on the company's Preparedness Framework. The assessment outlines multiple mitigation strategies designed to address identified vulnerabilities while maintaining the model's capabilities. This transparent documentation demonstrates OpenAI's commitment to responsible AI development practices.
Enabling a data-driven workforce
ChatGPT Enterprise enables employees across organizations to independently analyze data and uncover business insights without requiring specialized technical skills. By making advanced data analysis more accessible through natural language interactions, the platform helps democratize intelligence gathering and accelerates decision-making across departments.
Pairing data with APIs to unlock customer value
Rakuten Pairs Data with AI to Unlock Customer Insights and Value
Introducing Structured Outputs in the API
OpenAI has launched Structured Outputs for its API, enabling developers to specify JSON schemas that model responses will reliably follow. This eliminates the need for workarounds and post-processing by guaranteeing that outputs conform to developer-defined data structures. The feature simplifies integration and reduces validation complexity for production applications. ---