What Large Language Models we are focusing on?

While broad generative AI can be used to build AI systems, our paper focuses on LLMs, which have garnered significant attention in recent years given their potential to augment the ability of humans to perform tasks across a wide variety of domains. As such, we consider the following three types of models:

1. Large Language Models (LLMs)

These models focus on generative AI model which support text-to-text generation, like GPT-3.5, Claude, Gemini, Llama 2, OpenAI o1, etc.

2. Multi-modal Large Language Models (MLLMs)

These models handle interactions that combine text with other modalities, such as images, videos, and audio, like GPT-4, GPT-4o.

3. Models with Similar Input-Output Patterns

This category includes models that operate with similar patterns of input and output with LLM, such as text-to-text or text-to-image transformations (e.g., Stable Diffusion Models).

1. Large Language Models (LLMs)​

2. Multi-modal Large Language Models (MLLMs)​

3. Models with Similar Input-Output Patterns​

1. Large Language Models (LLMs)

2. Multi-modal Large Language Models (MLLMs)

3. Models with Similar Input-Output Patterns