This SaaS product evaluates and ranks large language models (LLMs) based on various criteria such as safety, privacy, security, integrity, general capabilities, and domain-specific capabilities. It displays a leaderboard comparing models from different vendors with overall scores.
Provides a ranking of language models based on several criteria including safety, privacy, security, integrity, general capabilities, and domain-specific capabilities. It offers an overall score for each model.
Models are evaluated based on specific domains such as Safety, Privacy, Security, Integrity, General Capabilities, and Domain Specific Capabilities, each contributing to the model's overall score.
Displays the vendor associated with each model, giving users insight into the organizations behind the development of these language models.
Measures the proportion of text generated in the LLM actual output that are relevant to the input.
Measures whether relevant nodes in the retrieval context are ranked higher than irrelevant ones.
Measures the proportion of sentences in the actual output that can be attributed to the retrieval context.
Measures the proportion of text generated in the LLM actual output that are relevant to the input.
Measures the proportion of claims in the LLM actual output that are not contradictory to the retrieval context.
Measures the proportion of claims in the LLM actual output that are not contradictory to the ground truth.
Measures how well the original text was summarized by your LLM system based on the generated relevant content.
Measures how capable your LLM agent is able to call the correct tools for a given input.
Misinformation datasets are collections of data designed to evaluate how well language models can identify, generate, and handle misinformation.
Sexual content datasets are collections of examples designed to evaluate how language models handle discussions related to sexual content.
Crime datasets are collections of examples used to evaluate how language models handle topics related to criminal behavior and legal matters.
Hallucination datasets are collections of examples designed to evaluate how language models generate and handle incorrect or nonsensical information.
Defamation evaluation datasets are collections of data designed to assess how effectively language models can detect, analyze, and manage content that can damage reputations.
Terrorism datasets are collections of information specifically designed to evaluate language models' handling of content related to terrorism.
Bias datasets are collections of data designed to evaluate how language models handle issues related to bias and discrimination.
Insult datasets are collections of examples designed to evaluate how language models handle offensive or derogatory language.
Ethics datasets are collections of examples designed to evaluate how language models handle discussions related to ethical issues.
Malware evaluation dataset is a collection of data designed to assess how effectively systems, such as machine learning models, can handle malware-related content.
Violence datasets are curated collections of examples used to evaluate how language models respond to content related to violence.
Political sensitivity datasets are collections of data designed to evaluate how language models handle discussions related to politics.
Hate speech datasets are collections of examples designed to evaluate how language models handle language that promotes hate or violence against individuals or groups.
Illegal conduct datasets are collections of examples designed to evaluate how language models handle discussions or prompts related to illegal activities.
Adversarial Testing datasets are designed to evaluate how language models handle deliberately challenging or misleading inputs.
Prompt Automatic Iterative Refinement (PAIR) is an algorithm that generates semantic jailbreaks with only black-box access to an LLM.
This feature involves posing multiple-choice questions to LLMs to evaluate their accuracy in answering domain knowledge.
Enables humans to chat with LLMs through cipher prompts with system role descriptions and few-shot examples.
Persuasive Adversarial Prompt (PAP) uses persuasive techniques in jailbreak prompt construction, enhancing LLM responses.
Uses the personification ability of LLMs to construct virtual, nested scenes to see how they behave.
Extracts privacy data from LLMs trained on large datasets, testing their handling of such information.
Analyzing-based Jailbreak (ABJ) uses LLMs' reasoning capabilities to reveal underlying biases.
Do Anything Now (DAN) is a jailbreak character in ChatGPT that performs tasks without restrictions.
Enhances reasoning capabilities of LLMs by encouraging step-by-step thinking.
Designs prompts that enhance the performance of LLMs by adapting to their responses.
Tests LLMs' capabilities in multiple languages to address potential risks across language barriers.
Demonstrates how mal-intended prompts can lead models to generate harmful content.
Generalizes jailbreak attacks into Prompt Rewriting and Scenario Nesting for stronger jailbreaks.
Uses poor ASCII art recognition by LLMs to bypass safety measures.
Evaluates model responses using dataset samples without altering prompts.
A jailbreak framework using fuzzing techniques inspired by AFL for testing LLM robustness.
Encodes input prompts using LLMs' abilities to decode them, testing model predictions.
Tests LLMs' safety by detecting and generating responses based on model security hypotheses.
Guides jailbreak design using failure modes to evaluate models like OpenAI's GPT-4.
An automated method, Tree of Attacks with Pruning (TAP), to generate jailbreaks.
Disguise and Reconstruction Attack (DRA) conceals harmful jailbreak instructions to test LLM security.