Resources

Images

  • Stable Diffusion Resources – This requires it’s own page as there are so many things to cover.
  • DALLE-2 – The original text-to-image AI that took the world by storm. Still probably the best publicly available model in terms of following complex prompts, but in terms of image quality it has been surpassed by Stable Diffusion/Midjourney.
  • Midjourney – Paid service, now using Stable Diffusion as a base. Uses a secret combination of fine tuning and prompt editing to ensure high quality results for most prompts at the expense of some flexibility. Good starting point if you are new to text-to-image generators and if you don’t mind paying for the service.

Text

  • OpenAI GPT-3 – API access to the OpenAI GPT-3 models. Currently still the best LLM with public access.
  • Open AI Chat-GPT – Chatbot built by OpenAI on top of the GPT-3.5 model, with fine tuning for chat and their own proprietary pre-prompt (or possibly embedding) to make the model work well in conversational mode.
  • GPT-Neo-X – EleutherAI’s open source LLM with 20B parameters. Very capable, especially for code generation tasks. Not easy to run locally due to the size, it requires at least 40GB VRAM to run in 16bit inference mode. Should in theory be possible to do inference on a 3090/4090 in 8bit mode but I haven’t tested this myself.
  • NLP Cloud – Cheap way to run open source LLMs (like GPT-Neo-X) in the cloud. Also has some great resources on how to prompt these models (much of which transfers to prompting other LLMs too).
  • Flan-T5 – T5 language model from Google fine tuned for instruction following (similar to the newer GPT-3 models, aka Instruct-GPT). Several model sizes available, and all small enough to fit in consumer card VRAM (though the XXL model at 11B parameters will require a 3090/4090, or will need to be run in 8bit inference mode to run on cards with 12GB+ VRAM). Remarkable performance for a model this small. While nowhere close to GPT3 DaVinci in performance, this is possibly the most powerful model that can be run on consumer hardware today.

Audio

  • Tortoise TTS – Free and open source voice synthesis diffusion model. Capable of cloning a voice from as little as 30 seconds of sample data. Capable of producing very convincing voice audio clips (but occasionally has some glitches).