ChatGPT And Google Gemini Pass Cybersecurity Exams
ChatGPT and Google Gemini's performance on CEH exam questions evaluated by Prasad Calyam at the University of Missouri.
Ashish Khaitan July 13, 2024
Share on LinkedInShare on Twitter
The University of Missouri, in collaboration with Amrita University, India, has released a new paper on how large language models (LLMs) like ChatGPT and Google Gemini, formerly known as Bard, can contribute to ethical hacking practices—a critical domain in safeguarding digital assets against malicious cyber threats.
The study, titled “ChatGPT and Google Gemini Pass Ethical Hacking Exams,” investigates the potential of AI-driven tools to enhance cybersecurity defenses. Led by Prasad Calyam, Director of the Cyber Education, Research and Infrastructure Center at the University of Missouri, the research evaluates how AI models perform when challenged with questions from the Certified Ethical Hacker (CEH) exam.
This cybersecurity exam, administered by the EC-Council, tests professionals on their ability to identify and address vulnerabilities in security systems.
ChatGPT and Google Gemini Passes Ethical Hacker (CEH) Exam
Ethical hacking, akin to its malicious counterpart, aims to preemptively identify weaknesses in digital defenses. The study utilized questions from the CEH exam to gauge how effectively ChatGPT and Google Gemini could explain and recommend protections against common cyber threats. For instance, both models successfully elucidated concepts like the man-in-the-middle attack, where a third party intercepts communication between two systems, and proposed preventive measures.
Key findings from the research indicated that while both ChatGPT and Google Gemini achieved high accuracy rates—80.8% and 82.6% respectively—Google Gemini, now rebranded as Gemini, edged out ChatGPT in overall accuracy. However, ChatGPT exhibited strengths in comprehensiveness, clarity, and conciseness of responses, highlighting its utility in providing detailed explanations that are easy to understand.
The study also introduced confirmation queries to enhance accuracy further. When prompted with “Are you sure?” after initial responses, both AI systems often corrected themselves, highlighting the potential for iterative query processing to refine AI effectiveness in cybersecurity applications.
Calyam emphasized the role of AI tools as complementary rather than substitutive to human expertise in cybersecurity. “These AI tools can be a good starting point to investigate issues before consulting an expert,” he noted. “They can also serve as valuable training tools for IT professionals or individuals keen on understanding emerging threats.”
Despite their promising performance, Calyam cautioned against over-reliance on AI tools for comprehensive cybersecurity solutions. He highlighted the criticality of human judgment and problem-solving skills in devising robust defense strategies. “In cybersecurity, there’s no room for error,” he warned. Relying solely on potentially flawed AI advice could leave systems vulnerable to attacks, posing significant risks.
Establishing Ethical Guidelines for AI in Cybersecurity
The study’s implications extend beyond performance metrics. It highlighted the use and misuse of AI in the cybersecurity domain, advocating for further research to enhance the reliability and usability of AI-driven ethical hacking tools. The researchers identified areas such as improving AI models’ handling of complex queries, expanding multi-language support, and establishing ethical guidelines for their deployment.
Looking ahead, Calyam expressed optimism about the future capabilities of AI models in bolstering cybersecurity measures. AI models have the potential to significantly contribute to ethical hacking,” he remarked. With continued advancements, they could play a pivotal role in fortifying our digital infrastructure against evolving cyber threats.
The study, published in the journal Computers & Security, not only serves as a benchmark for evaluating AI performance in ethical hacking but also advocates for a balanced approach that leverages AI’s strengths while respecting its current limitations.
Artificial Intelligence (AI) has become a cornerstone in the evolution of cybersecurity practices worldwide. Its applications extend beyond traditional methods, offering novel approaches to identify, mitigate, and respond to cyber threats. Within this paradigm, large language models (LLMs) such as ChatGPT and Google Gemini have emerged as pivotal tools, leveraging their capacity to understand and generate human-like text to enhance ethical hacking strategies.
The Role of ChatGPT and Google Gemini in Ethical Hacking
In recent years, the deployment of AI in ethical hacking has garnered attention due to its potential to simulate cyber attacks and identify vulnerabilities within systems. ChatGPT and Google Gemini, originally known as Bard, are prime examples of LLMs designed to process and respond to complex queries related to cybersecurity. The research conducted by the University of Missouri and Amrita University explored these models’ capabilities using the CEH exam—a standardized assessment that evaluates professionals’ proficiency in ethical hacking techniques.
The study revealed that both ChatGPT and Google Gemini exhibited commendable performance in understanding and explaining fundamental cybersecurity concepts. For instance, when tasked with describing a man-in-the-middle attack, a tactic where a third party intercepts communication between two parties, both AI models provided accurate explanations and recommended protective measures.
The research findings revealed that Google Gemini slightly outperformed ChatGPT in overall accuracy rates. However, ChatGPT exhibited notable strengths in comprehensiveness, clarity, and conciseness of responses, highlighting its ability to provide thorough and articulate insights into cybersecurity issues. This nuanced proficiency underscores the potential of AI models not only to simulate cyber threats but also to offer valuable guidance to cybersecurity professionals and enthusiasts. The study’s evaluation of performance metrics encompassed metrics like comprehensiveness, clarity, and conciseness, where ChatGPT demonstrated superior performance despite Google Gemini’s marginally higher accuracy rate.
A notable aspect of the study was the introduction of confirmation queries (“Are you sure?”) to the AI models after their initial responses. This iterative approach aimed to refine the accuracy and reliability of AI-generated insights in cybersecurity. The results showed that both ChatGPT and Google Gemini frequently adjusted their responses upon receiving confirmation queries, often correcting inaccuracies and enhancing the overall reliability of their outputs.
This iterative query processing mechanism not only improves the AI models’ accuracy but also mirrors the problem-solving approach of human experts in cybersecurity. It highlights the potential synergy between AI-driven automation and human oversight, reinforcing the argument for a collaborative approach in cybersecurity operations.
Laying the Groundwork for Future Study
While AI-driven tools like ChatGPT and Google Gemini offer promising capabilities in ethical hacking, ethical considerations loom large in their deployment. Prasad Calyam highlighted the importance of maintaining ethical standards and guidelines in leveraging AI for cybersecurity purposes. “In cybersecurity, the stakes are high,” he emphasized. “AI tools can provide valuable insights, but they should supplement—not replace—the critical thinking and ethical judgment of human cybersecurity experts.”
Looking ahead, AI’s role in cybersecurity is set to evolve significantly, driven by ongoing advancements and innovations. The collaborative research conducted by the University of Missouri and Amrita University lays the groundwork for future studies aimed at enhancing AI models’ effectiveness in ethical hacking. Key areas of exploration include improving AI’s capability in handling complex, real-time cybersecurity queries, which require high cognitive demand. Additionally, there is a push towards expanding AI models’ linguistic capabilities to support diverse global cybersecurity challenges effectively.
Moreover, establishing robust legal and ethical frameworks is crucial to ensure the responsible deployment of AI in ethical hacking practices. These frameworks will not only enhance technical proficiency but also address broader societal implications and ethical challenges associated with AI-driven cybersecurity solutions. Collaboration among academia, industry stakeholders, and policymakers will play a pivotal role in shaping the future of AI in cybersecurity. Together, they can foster innovation while safeguarding digital infrastructures against emerging threats, ensuring that AI technologies contribute positively to cybersecurity practices globally.