Small but Smart sLLM, Google Cloud Gemma

Since last year, big language models such as ChatGPT, Gemini, and HyperClovaX have been hot issues. Amid this interest, Generative AI has developed into a competition for LLM development, and at the same time, as the market demand for lightweight has recently increased, sLLM is also attracting attention. This year, many industry insiders are pointing to cost-effective sLLM as a major trend. sLLM stands for small large language models, which are relatively small in size. As sLLM is low in cost and high in security, more and more companies prefer it.

Google launched the sLLM ‘Gemini Nano,’ open-source form of the sLLM, late last year, following the launch of the sLLM “Gemma,” a lightweight version of the giant language model Gemini this year. Google Cloud customers will now be able to customize and build Gemma models on Vertex AI and run them on Google Kubernetes Engine (GKE).

Gemma Open Models

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Developed by Google DeepMind and other teams across Google, Gemma is inspired by Gemini, and the name reflects the Latin gemma, meaning “precious stone.” Accompanying our model weights, we’re also releasing tools to support developer innovation, foster collaboration, and guide responsible use of Gemma models.

Here are the key details to know:

  • We’re releasing model weights in two sizes: Gemma 2B and Gemma 7B. Each size is released with pre-trained and instruction-tuned variants.
  • A new Responsible Generative AI Toolkit provides guidance and essential tools for creating safer AI applications with Gemma.
  • We’re providing toolchains for inference and supervised fine-tuning (SFT) across all major frameworks: JAX, PyTorch, and TensorFlow through native Keras 3.0.
  • Ready-to-use Colab and Kaggle notebooks, alongside integration with popular tools such as Hugging Face, MaxText, NVIDIA NeMo and TensorRT-LLM, make it easy to get started with Gemma.
  • Pre-trained and instruction-tuned Gemma models can run on your laptop, workstation, or Google Cloud with easy deployment on Vertex AI and Google Kubernetes Engine (GKE).
  • Optimization across multiple AI hardware platforms ensures industry-leading performance, including NVIDIA GPUs and Google Cloud TPUs.
  • Terms of use permit responsible commercial usage and distribution for all organizations, regardless of size.
Unlocking the power of Gemma in Vertex AI

Gemma joins over 130 models in Vertex AI Model Garden, including our recently announced expanded access to Gemini: Gemini 1.0 Pro, 1.0 Ultra, and 1.5 Pro models.

By using Gemma models on Vertex AI, developers can take advantage of an end-to-end ML platform that makes tuning, managing, and monitoring models simple and intuitive. With Vertex AI, builders can reduce operational overhead and focus on creating bespoke versions of Gemma that are optimized for their use case. For example, using Gemma models on Vertex AI, developers can:

  • Build generative AI apps for lightweight tasks such as text generation, summarization, and Q&A
  • Enable research and development using lightweight-but-customized models for exploration and experimentation
  • Support real-time generative AI use cases that require low latency, such as streaming text

Vertex AI makes it easy for developers to turn their own tuned models into scalable endpoints that can power AI applications of all sizes.

Scale from prototype to production with Gemma on GKE

GKE provides tools to build custom apps, from prototyping simple projects to rolling them out at enterprise scale. Today, developers can also deploy Gemma directly on GKE to create their own gen AI apps for building prototypes or testing model capabilities:

  • Deploy custom, fine-tuned models in portable containers alongside applications using familiar toolchains
  • Customize model serving and infrastructure configurations without the need to provision or maintain nodes
  • Integrate AI Infrastructure fast with the ability to scale to meet the most demanding training and inference scenarios

GKE delivers efficient resource management, consistent ops environments, and autoscaling. In addition, it helps enhance these environments with easy orchestration of Google Cloud AI accelerators, including GPUs and TPUs, for faster training and inference when building generative AI models.

Google Cloud Premier Partner, Cloocus

Cloocus is the Google Cloud Premier Partner, the highest level of Google Cloud Partners, and provides comprehensive cloud services based on Google Cloud. Specifically, Cloocus is rapidly acquiring Generative AI technologies that have recently been in the spotlight through a group of skilled Data & AI experts. If you need expert consultation regarding Generative AI deployment, please apply for expert consulting through the button below!

If you need a consulting about data and artificial intelligence service based on cloud, please contact Cloocus!
Secured By miniOrange