Since the majority of these devices are based on Mistral-7B or Mixtral-8x7B, a GPU with at least 24GB of VRAM is required (e.g., RTX 4090, A6000, A100 40GB/80GB, L40S). Quantized versions (GGUF, INT4/8) may be hosted using llama.cpp on high-end CPUs or GPUs with 16GB VRAM.

Was this answer helpful? 0 Users Found This Useful (0 Votes)