{"id":134029,"date":"2024-06-21T05:14:51","date_gmt":"2024-06-21T05:14:51","guid":{"rendered":"https:\/\/globeecho.com\/ar\/tech\/rewrite-this-title-in-arabic-hackers-jailbreak-powerful-ai-models-in-global-effort-to-highlight-flaws\/"},"modified":"2024-06-21T05:14:51","modified_gmt":"2024-06-21T05:14:51","slug":"rewrite-this-title-in-arabic-hackers-jailbreak-powerful-ai-models-in-global-effort-to-highlight-flaws","status":"publish","type":"post","link":"https:\/\/globetimeline.com\/ar\/tech\/rewrite-this-title-in-arabic-hackers-jailbreak-powerful-ai-models-in-global-effort-to-highlight-flaws\/","title":{"rendered":"rewrite this title in Arabic Hackers \u2018jailbreak\u2019 powerful AI models in global effort to highlight flaws"},"content":{"rendered":"<p>Summarize this content to 2000 words in 6 paragraphs in Arabic Pliny the Prompter says it typically takes him about 30 minutes to break the world\u2019s most powerful artificial intelligence models.The pseudonymous hacker has manipulated Meta\u2019s Llama 3 into sharing instructions for making napalm. He made Elon Musk\u2019s Grok gush about Adolf Hitler. His own hacked version of OpenAI\u2019s latest GPT-4o model, dubbed \u201cGodmode GPT\u201d, was banned by the start-up after it started advising on illegal activities.Pliny told the Financial Times that his \u201cjailbreaking\u201d was not nefarious but part of an international effort to highlight the shortcomings of large language models rushed out to the public by tech companies in the search for huge profits.\u201cI\u2019ve been on this warpath of bringing awareness to the true capabilities of these models,\u201d said Pliny, a crypto and stock trader who shares his jailbreaks on X. \u201cA lot of these are novel attacks that could be research papers in their own right .\u2009.\u2009.\u2009At the end of the day I\u2019m doing work for [the model owners] for free.\u201dPliny is just one of dozens of hackers, academic researchers and cyber security experts racing to find vulnerabilities in nascent LLMs, for example through tricking chatbots with prompts to get around \u201cguardrails\u201d that AI companies have instituted in an effort to ensure their products are safe.\u00a0These ethical \u201cwhite hat\u201d hackers have often found ways to get AI models to create dangerous content, spread disinformation, share private data or generate malicious code.Companies such as OpenAI, Meta and Google already use \u201cred teams\u201d of hackers to test their models before they are released widely. But the technology\u2019s vulnerabilities have created a burgeoning market of LLM security start-ups that build tools to protect companies planning to use AI models. Machine learning security start-ups raised $213mn across 23 deals in 2023, up from $70mn the previous year, according to data provider CB Insights.\u201cThe landscape of jailbreaking started around a year ago or so, and the attacks so far have evolved constantly,\u201d said Eran Shimony, principal vulnerability researcher at CyberArk, a cyber security group now offering LLM security. \u201cIt\u2019s a constant game of cat and mouse, of vendors improving the security of our LLMs, but then also attackers making their prompts more sophisticated.\u201dThese efforts come as global regulators seek to step in to curb potential dangers around AI models. The EU has passed the AI Act, which creates new responsibilities for LLM makers, while the UK and Singapore are among the countries considering new laws to regulate the sector.California\u2019s legislature will in August vote on a bill that would require the state\u2019s AI groups \u2014 which include Meta, Google and OpenAI \u2014 to ensure they do not develop models with \u201ca hazardous capability\u201d.\u201cAll [AI models] would fit that criteria,\u201d Pliny said.Meanwhile, manipulated LLMs with names such as WormGPT and FraudGPT have been created by malicious hackers to be sold on the dark web for as little as $90 to assist with cyber attacks by writing malware or by helping scammers create automated but highly personalised phishing campaigns. Other variations have emerged, such as EscapeGPT, BadGPT, DarkGPT and Black Hat GPT, according to AI security group SlashNext.Some hackers use \u201cuncensored\u201d open-source models. For others, jailbreaking attacks \u2014 or getting around the safeguards built into existing LLMs \u2014 represent a new craft, with perpetrators often sharing tips in communities on social media platforms such as Reddit or Discord.Approaches range from individual hackers getting around filters by using synonyms for words that have been blocked by the model creators, to more sophisticated attacks that wield AI for automated hacking.Last year, researchers at Carnegie Mellon University and the US Center for AI Safety said they found a way to systematically jailbreak LLMs such as OpenAI\u2019s ChatGPT, Google\u2019s Gemini and an older version of Anthropic\u2019s Claude \u2014 \u201cclosed\u201d proprietary models that were supposedly less vulnerable to attacks. The researchers added it was \u201cunclear whether such behaviour can ever be fully patched by LLM providers\u201d.Anthropic published research in April on a technique called \u201cmany-shot jailbreaking\u201d, whereby hackers can prime an LLM by showing it a long list of questions and answers, encouraging it to then answer a harmful question modelling the same style. The attack has been enabled by the fact that models such as those developed by Anthropic now have a bigger context window, or space for text to be added.\u201cAlthough current state-of-the-art LLMs are powerful, we do not think they yet pose truly catastrophic risks. Future models might,\u201d wrote Anthropic. \u201cThis means that now is the time to work to mitigate potential LLM jailbreaks before they can be used on models that could cause serious harm.\u201dSome AI developers said many attacks remained fairly benign for now. But others warned of certain types of attacks that could start leading to data leakage, whereby bad actors might find ways to extract sensitive information, such as data on which a model has been trained.DeepKeep, an Israeli LLM security group, found ways to compel Llama 2, an older Meta AI model that is open source, to leak the personally identifiable information of users. Rony Ohayon, chief executive of DeepKeep, said his company was developing specific LLM security tools, such as firewalls, to protect users.\u201cOpenly releasing models shares the benefits of AI widely and allows more researchers to identify and help fix vulnerabilities, so companies can make models more secure,\u201d Meta said in a statement.It added that it conducted security stress tests with internal and external experts on its latest Llama 3 model and its chatbot Meta AI.OpenAI and Google said they were continuously training models to better defend against exploits and adversarial behaviour. Anthropic, which experts say has made the most advanced efforts in AI security, called for more information-sharing and research into these types of attacks.Despite the reassurances, any risks will only become greater as models become more interconnected with existing technology and devices, experts said. This month, Apple announced it had partnered with OpenAI to integrate ChatGPT into its devices as part of a new \u201cApple Intelligence\u201d system.Ohayon said: \u201cIn general, companies are not prepared.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summarize this content to 2000 words in 6 paragraphs in Arabic Pliny the Prompter says it typically takes him about 30 minutes to break the world\u2019s most powerful artificial intelligence models.The pseudonymous hacker has manipulated Meta\u2019s Llama 3 into sharing instructions for making napalm. He made Elon Musk\u2019s Grok gush about Adolf Hitler. His own<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[63],"tags":[],"class_list":{"0":"post-134029","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-tech"},"_links":{"self":[{"href":"https:\/\/globetimeline.com\/ar\/wp-json\/wp\/v2\/posts\/134029","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/globetimeline.com\/ar\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/globetimeline.com\/ar\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/globetimeline.com\/ar\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/globetimeline.com\/ar\/wp-json\/wp\/v2\/comments?post=134029"}],"version-history":[{"count":0,"href":"https:\/\/globetimeline.com\/ar\/wp-json\/wp\/v2\/posts\/134029\/revisions"}],"wp:attachment":[{"href":"https:\/\/globetimeline.com\/ar\/wp-json\/wp\/v2\/media?parent=134029"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/globetimeline.com\/ar\/wp-json\/wp\/v2\/categories?post=134029"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/globetimeline.com\/ar\/wp-json\/wp\/v2\/tags?post=134029"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}