AI Assistants Write Problematic Code

According to research by computer scientists at Stanford University, programmers who accept assistance from AI tools like GitHub Copilot generate less secure code than those who work alone. They discovered that AI assistance frequently mislead

Apac CIOOutlook | Thursday, December 29, 2022

Stay ahead of the industry with exclusive feature stories on the top companies, expert insights and the latest news delivered straight to your inbox. Subscribe today.

Computer scientists from Stanford University have found that programmers who accept help from AI tools like GitHub Copilot produce less secure code than those who fly solo.

FREMONT, CA: According to research by computer scientists at Stanford University, programmers who accept assistance from AI tools like GitHub Copilot generate less secure code than those who work alone. They discovered that AI assistance frequently misleads engineers about the calibre of their product.

According to the authors' findings, participants who had access to an AI helper frequently created more security flaws than those who did not, with results for string encryption and SQL injection being especially noteworthy. Individuals who had access to an AI assistant were more likely than those who didn't think they created secure code.

Previous studies conducted by NYU researchers have demonstrated the frequent insecurity of AI-based programming recommendations. GitHub Copilot's Code Contributions Security Assessment discovered that 40 per cent of the computer programmes created with Copilot had potentially exploitable flaws given 89 situations.

According to the Stanford authors, it is constricted in scope because it only takes into account a limited collection of prompts corresponding to 25 vulnerabilities and only Python, C, and Verilog as programming languages.

The Stanford researchers also reference Security Implications of Large Language Model Code Assistants: A User Study, a follow-up study from some of the same NYU eggheads, as the only other user study of a similar nature that they are aware of. They point out, however, that their research differs from other works in that it concentrates on the more potent codex-Davinci-002 model from OpenAI rather than the less potent codex-Cushman-001 model, both of which are used in GitHub Copilot, which is a refined offspring of a GPT-3 language model.

The Security Implications document only examines functions in the C programming language, whereas the Stanford study examines Python, Javascript, and C as well as other programming languages. The Stanford researchers speculate that the ambiguous results in the Security Implications paper may have resulted from the study's exclusive focus on C, which they said was the only language in their larger investigation that had conflicting conclusions.

47 participants with various levels of expertise, including undergraduate and graduate students as well as business experts, participated in the Stanford user research. A standalone Electron app built with React was used by participants to respond to five prompts while being watched by the study's administrator. Write two Python methods, one that encrypts a given string and the other that decrypts it using a specified symmetric key.

About that specific question, individuals who used AI assistance were more likely to produce inaccurate and unsafe code than the control group who worked without the use of automated tools. Only 67per cent of the aided group provided the right response, as opposed to 79 per cent of the control group.

Additionally, those in the assisted group had a significantly higher likelihood of providing an insecure solution (p 0.05, using Welch's unequal variances t-test), a significantly higher likelihood of using trivial cyphers, such as substitution cyphers (p 0.01), and a significantly higher likelihood of not conducting an authenticity check on the final returned value.