OpenAI has unveiled CriticGPT, a new AI model designed to detect errors in code generated directly by ChatGPT. CriticGPT will be used as an algorithmic assistant for testers who review the code produced by ChatGPT.

Fonte da imagem: Copiloto

According to a new study, “LLM Critics Help Catch LLM Bugs,” published by OpenAI, the new CriticGPT model is designed to act as an AI assistant for human testers reviewing the software code generated by ChatGPT. Based on the GPT-4 family of large language models (LLMs), CriticGPT analyzes code and points out potential errors, making it easier for human testers to spot flaws that might otherwise go unnoticed due to human error. The researchers trained CriticGPT on a dataset of code samples containing intentionally introduced errors, teaching it to recognize and flag various errors.

The researchers found that in 63% of cases involving naturally occurring LLM errors, annotators preferred CriticGPT criticism over human criticism. Additionally, teams using CriticGPT wrote more complete reviews than people not using the AI ​​assistant, while also producing fewer confabulations (false facts and hallucinations).

Developing the automated “critic” involved training the model on a large set of inputs with deliberately introduced errors. Experts were asked to modify code written by ChatGPT, introducing errors, and then provide output with the supposedly discovered bugs. This process allowed the model to learn to identify and criticize different types of errors in the code.

In experiments, CriticGPT demonstrated the ability to catch both introduced bugs and naturally occurring errors in ChatGPT response results. The researchers also created a new method, “Force Sampling Beam Search” (FSBS), which helps CriticGPT write more detailed code reviews by allowing the ability to adjust the thoroughness of the problem search while controlling the false positive rate.

Interestingly, CriticGPT’s capabilities go beyond simple code checking. In experiments, the model was applied to a set of ChatGPT training data that had previously been rated as flawless by humans. Surprisingly, CriticGPT detected errors in 24% of cases, which were later confirmed by human experts. OpenAI believes that this demonstrates the model’s potential not only for technical tasks, but also highlights its ability to catch subtle errors that can elude even careful human review.

Despite the promising results, CriticGPT, like all AI models, has limitations. The model was trained on relatively short ChatGPT responses, which may not fully prepare it to evaluate longer, more complex tasks that future AI systems may face. The team acknowledges that the model is most effective at detecting errors that can be pinpointed in one specific, bottlenecked section of the code. However, real-world errors in AI output can often be scattered across multiple parts of the response, presenting a challenge for future iterations of the model.

Furthermore, while CriticGPT reduces confabulation, it does not eliminate it completely, and human experts can still make mistakes based on this false data.

Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *