Google Gemini: everything you need to know about the new generative AI platform

[ad_1]

Google is trying to make waves with Gemini, a flagship suite of generative AI models, applications and services. But even though Gemini seems promising in some aspects, they fall short in others – as our informal review shows. revealed.

So, what is Gemini? How can you use it? And how does it compare to the competition?

To make it easier for you to keep up with the latest Gemini developments, we’ve put together this handy guide, which we’ll keep updated as new Gemini models and features are released.

What is Gemini?

Gemini is Google’s promised for a long time, a family of next-generation GenAI models, developed by Google’s AI research labs, DeepMind and Google Research. It comes in three flavors:

  • Gemini Ultrathe flagship model of the Gemini.
  • Gemini Proa “light” Gemini model.
  • Gemini Nanoa smaller “distilled” model that works on mobile devices like the Pixel 8 Pro.

All Gemini models have been trained to be “natively multimodal,” that is, able to work and use more than just words. They have been pre-trained and fine-tuned on a variety of audio, images and videos, a large set of codebases and texts in different languages.

This sets Gemini apart from models such as Google’s LaMDA, which was trained exclusively on text data. LaMDA cannot understand or generate anything other than text (e.g. essays, email drafts), but this is not the case with Gemini models.

What is the difference between Gemini apps and Gemini models?

The Bard of Google

Image credits: Google

Google, proving once again that it lacks branding talent, failed to make it clear from the start that Gemini was separate and distinct from the Gemini web and mobile apps (formerly Bard). Gemini apps are simply an interface through which certain Gemini models are accessed – think of them as a client for Google’s GenAI.

Furthermore, Gemini applications and models are also completely independent of Picture 2, Google’s text-to-image conversion model available in some of the company’s development tools and environments. Don’t worry, you’re not the only one confused by this.

What can Gemini do?

Because Gemini models are multimodal, they can in theory perform a range of multimodal tasks, from transcribing speech to captioning images and videos to generating artwork. Few of these features have made it to the product stage yet (more on that later), but Google is promising them all – and more – in the not-too-distant future.

Of course, it’s a little hard to take the company’s word for it.

Google seriously underdelivered with the original Bard launch. And more recently, it’s ruffled feathers with a video supposed to show Gemini’s capabilities this turned out to have been heavily doctored and was more or less ambitious.

However, assuming Google is more or less truthful in its claims, here is what the different Gemini levels will be able to do once they reach their full potential:

Gemini Ultra

Google says that Gemini Ultra — thanks to its multimodality — can be used to facilitate tasks such as physics homework, solve problems step by step on a worksheet and report possible errors in already completed answers.

Gemini Ultra can also be applied to tasks such as identifying scientific articles relevant to a particular problem, Google says: extracting information from those articles and “updating” a graph from one by generating the formulas necessary to recreate the chart with more recent data. .

Gemini Ultra technically supports image generation, as mentioned earlier. But this capability has not yet been incorporated into the product version of the model – perhaps because the mechanism is more complex than how applications such as ChatGPT generate images. Rather than passing prompts to an image generator (like DE-E 3in the case of ChatGPT), Gemini generates images “natively”, without any intermediate steps.

Gemini Ultra is available as an API through Vertex AI, Google’s fully managed AI development platform, and AI Studio, Google’s web tool for app and platform developers. It also powers Gemini apps, but not for free. Access to Gemini Ultra through what Google calls Gemini Advanced requires a subscription to the Google One AI Premium plan, priced at $20 per month.

The AI ​​Premium plan also connects Gemini to your broader Google Workspace account: think emails in Gmail, documents in Docs, presentations in Sheets, and Google Meet recordings. This is useful, for example, for summarizing emails or for Gemini to capture notes during a video call.

Gemini Pro

Google says Gemini Pro is an improvement over LaMDA in its reasoning, planning, and understanding capabilities.

An independent study by Carnegie Mellon and BerriAI researchers found that Gemini Pro is indeed better than OpenAI. GPT-3.5 to manage longer and more complex chains of reasoning. But the study also found that, like all major language models, Gemini Pro particularly struggles with math problems involving multiple numbers, and users found many examples of bad reasoning and errors.

However, the improvements promised by Google – and the first ones arrived in the form of Gemini 1.5 Pro.

Designed to be a drop-in replacement, Gemini 1.5 Pro (in preview now) is improved in a number of areas over its predecessor, perhaps most significantly in the amount of data it can process. Gemini 1.5 Pro can (in limited private preview) hold approximately 700,000 words or approximately 30,000 lines of code, which is 35 times the amount Gemini 1.0 Pro can handle. And – the model being multimodal – it is not limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or an hour of video in a variety of different languages, although slowly (for example, finding a scene in an hour-long video takes 30 seconds at a time). minute of treatment).

Gemini Pro is also available through the API in Vertex AI to accept text as input and generate text as output. An additional endpoint, Gemini Pro Vision, can process text And images – including photos and videos – and output text along the lines of OpenAI GPT-4 with Vision model.

Gemini

Use Gemini Pro in Vertex AI. Image credits: Gemini

In Vertex AI, developers can customize Gemini Pro for specific contexts and use cases using a fine-tuning or “grounding” process. Gemini Pro can also be connected to external third-party APIs to perform particular actions.

In AI Studio, there are workflows for creating structured discussion prompts using Gemini Pro. Developers have access to the Gemini Pro and Gemini Pro Vision endpoints, and they can adjust the temperature of the model to control the creative range of the output and provide examples to give direction in tone and style – and also adjust settings of security.

Gemini Nano

Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s efficient enough to run directly on (some) phones instead of sending the task to a server somewhere. So far, it powers two features of the Pixel 8 Pro: summarize in recorder and smart reply in Gboard.

The Recorder app, which lets users press a button to record and transcribe audio, includes a Gemini-powered summary of your conversations, interviews, presentations, and other recorded clips. Users receive these summaries even if they don’t have a Wi-Fi signal or connection – and, in the interest of privacy, no data leaves their phone in the process.

Gemini Nano is also present in Gboard, Google’s keyboard app, as a developer preview. There, it powers a feature called Smart Reply, which helps suggest the next thing you want to say during a conversation in a messaging app. The feature initially only works with WhatsApp, but will be available in more apps in 2024, Google says.

Is Gemini better than OpenAI’s GPT-4?

Google has several times praised Gemini’s benchmark superiority, claiming that Gemini Ultra exceeds current state-of-the-art results on “30 of 32 academic benchmarks widely used in the research and development of large language models.” The company claims that Gemini Pro, meanwhile, is more capable of tasks like content summarization, brainstorming, and writing than GPT-3.5.

But leaving aside the question of whether the benchmarks actually indicate a better model, Google’s reported scores only appear slightly better than OpenAI’s corresponding models. And – as previously mentioned – some first impressions haven’t been great, with users And academics pointing out that Gemini Pro tends to get basic facts wrong, struggles with translations, and gives bad coding suggestions.

How much will the Gemini cost?

Gemini Pro is free and can be used in Gemini apps and, currently, AI Studio and Vertex AI.

However, once Gemini Pro leaves preview in Vertex, the template will cost $0.0025 per character, while output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (around 140 to 250 words) and, in the case of models like Gemini Pro Vision, per image ($0.0025).

Suppose a 500-word article contains 2,000 characters. Summarizing this article with Gemini Pro would cost $5. Meanwhile, generating an article of a similar length would cost $0.1.

Ultra pricing has not yet been announced.

Where can you try Gemini?

Gemini Pro

The easiest place to experience Gemini Pro is Gemini apps. Pro and Ultra respond to queries in multiple languages.

Gemini Pro and Ultra are also accessible previewed in Vertex AI via an API. The API is free to use “within limits” at the moment and supports certain regions, including Europe, as well as features such as chat functionality and filtering.

Elsewhere, Gemini Pro and Ultra can be find in AI Studio. Through the service, developers can browse Gemini-based prompts and chatbots, then obtain API keys for use in their applications – or export the code to a more comprehensive IDE.

Duet AI for developers, Google’s suite of AI-powered assistive tools for code completion and generation, now uses Gemini templates. And Google introduced Gemini models in its development tools for the Chrome and Firebase mobile development platform.

Gemini Nano

Gemini Nano is on the Pixel 8 Pro – and will be available on other devices in the future. Developers interested in integrating the model into their Android apps can register for an overview.

[ad_2]

Leave a Comment

Your email address will not be published. Required fields are marked *