
Ahead of Google I/O 2024, there was little doubt that Google would be talking about AI. The event started with a suitably raucous atmosphere. YouTube sensation Marc Rebillet emerged from a giant cup and began the show wearing a bathrobe.
The social media star set the tone for the rest of the event by asking the audience for wild musical ideas, brought to life by Google’s AI DJ software. The hosts could not have gotten off to a better start. According to CEO Sundar Pichai, Google executives said he uttered the word “AI” 121 times.
By the end of the event, I was left with two questions that stuck with me. One: Is Google trying to solve problems that don’t even exist in normal people’s lives by forcing them to eat Gemini gelato? Two: Your phone’s AI has amazing superpowers Is there a market for specialized AI hardware worth hundreds of dollars when we are acquiring ?
Current status of AI accessories
So far, we have some cute orange AI gadgets like the Rabbit R1 and some great ones like the Humane AI Pin. One brand also makes AI pendants. Some people just listen. Some talk to you, record videos, make phone calls, use chatty AI bots, and even try to understand the world around you.
Now, I’m not going to discuss how poorly these devices have worked so far. However, Digital Trends’ mobile editor Joe Marling says the Rabbit R1 is one of the worst gadgets he has ever used. The story of Humane AI Pin isn’t all that different. ah! Now, these are all first-generation devices of their kind, so let’s cut them some slack.
But this is the reality. Their future doesn’t seem bright, economical, or convenient. In two days, two AI heavyweights, OpenAI and Google, have pretty much made that point definitively.
AI now knows the world
Let’s start with vision, the power that allows AI to see the world through the lens of a camera and talk about what it sees. Google showcased something called Gemini Live at I/O 2024. The day before, OpenAI announced his GPT-4o. Here “o” stands for omnimodal. This is a fancy way of saying multimodal, meaning that your AI companion can process text, audio, and visuals as input and output. However, the end goal is the same for both products.
Activate the AI of your choice, point the camera at virtually anything, and it will answer contextual questions. You can activate the front camera and have the AI explain how you are playing rock, paper, scissors with your friend. You’ll know if your pink shirt isn’t the best outfit for a job interview.
If you want, you can see objects and describe them in Portuguese, identify buildings like your trusted tour guide, and feel like it’s a special occasion when you see confetti spread on a table . Hold it over the code and the AI will explain the purpose of the code. And if the AI sees your car keys anywhere, it will tell you exactly where you left them.
Live demo of GPT-4o’s vision capabilities
Now, all the aforementioned features are not uniform across ChatGPT (which uses a lot of GPT-4o juice) and Gemini Live (which has Google Astra technology behind it). However, the fundamentals are shared. This is also a critical juncture where the fault line widens between the AI experience on mobile phones and the AI experience on dedicated hardware.
Hardware challenges
Rabbit R1 and Humane AI Pin are equipped with 8-megapixel and 12MP cameras respectively. Sure, they can see the world and make sense of it, but they can’t match the visual chops of the optically stabilized high-resolution cameras on half-baked current-generation smartphones. plug.
In short, the average smartphone feeds healthier visual data points to a local or cloud-based AI engine, which directly translates into better understanding. Think of it like comparing a budget video blog to a flagship phone and having them explain everything they see. Of course, blurry or blown-out clips won’t help much here.
Next is the computing part. Among them, the hottest AI gadgets in 2024 will run on low-to-mid-tier MediaTek and Qualcomm silicon. These devices aren’t burdened by the heft of an overall OS, but from what we’ve seen so far, even middling smartphones can handle AI tasks at a dramatically faster pace than the R1 or Humane’s Pin. It can be executed.
You don’t want your AI gadget to take 15 seconds to process a request when good old Siri can do a better job. That’s a poor benchmark, but that’s where R1 stands. Now that we’re talking about silicon, let’s explain how processing plays a key role here. Generative AI tricks are accomplished in two ways. Most solutions send queries to cloud servers. That means you need an internet connection.
The second option is offline processing, which is what Google’s Gemini Nano model does on devices like the Pixel 8 series and Samsung phones. The biggest advantage is that this scenario does not require an internet connection. Currently, there is no AI Thingamajig that can work without an internet connection.
On-device AI is a real gem
On-device processing allows the Recorder app on your Pixel phone to transcribe and summarize audio recordings. Magic Compose takes your texting game to the next level without the need for Wi-Fi or cellular connectivity. The same goes for translation and transcription. In fact, Google used his neural machine translation technology in 2018 to lay the foundation for reliable offline translation.
But that’s just the tip of the iceberg. Later this year, Google plans to release Gemini Nano with multimodality. That means no internet connection is required for Gemini Live to see and understand what you see and hear through your phone’s camera, screen and microphone, and provide contextual answers.
Google is enhancing TalkBack’s accessibility features with Gemini. This is a huge win for people who have audio and visibility issues and need a reliable TalkBack companion with multimodal capabilities, but don’t have access to an Internet connection.
Did we also mention that on-device AI processing is faster and dramatically more secure as no data is leaked from your phone? More importantly, it ultimately reduces the cost of delivering generative AI capabilities. That’s it.
The cost to consumers is currently one of the biggest uncertainties in the entire AI telephony marketing gambit. On-device AI comes as a huge sigh of relief amidst all this chaos, as it allows me to at least understand the bare minimum of what my phone can do without having to worry too much about feature compatibility for years to come.
Gemini is doing it right
Finally, there is the all-important issue of interaction. My life revolves around Gmail, Docs, Drive, Maps, Photos, Search, and more. Google has created Gems, custom Gemini-based assistants, to handle specific tasks that work closely with other ecosystem products.
For example, when you ask Gemini to plan your trip, it looks into your Gmail inbox, schedules tickets, and combines voice/text prompt data with relevant Google search information to create a fully fleshed out itinerary. To do.
Those willing to pay for Gemini aAdvanced get even more productivity superpowers. It can handle up to 1,500 pages of PDF, 30,000 lines of code, an hour of video, or a mix of different file formats.
Gemini processes all that input, provides a condensed version, identifies important aspects, and also acts as a teacher once all the material has been ingested. You can even use a mundane spreadsheet to create detailed financial reports with a clear understanding of profits and related insights.
The AI also listens to calls and alerts users if the caller is a fraud. In fact, Gemini doesn’t even move you to another app. When you need it, the Gemini interface simply hovers over the app you’re currently using, performs its function, and disappears.
It’s hard to beat smartphones
My point here is that AI should act as an assistant, but we need to strike the right balance between functionality diversity and practical convenience. I can only do that if I have access to data that is important to me personally and professionally. And we want all these smart features to be provided in the best possible way without any additional financial overhead.
At the moment, things like Rabbit R1 and Humane AI Pin barely scratch the surface of these deep product interconnects. Additionally, the hardware itself prevents AI from reaching its full potential. I can’t imagine Google licensing Gemini Nano for a product like the Rabbit R1, and even if they did, the experience would be hampered by the hardware.
So why pay extra and settle for a subpar experience when the phone in your pocket can do a great job for you? AI phones are here. And it’s been here forever. Meanwhile, shiny orange AI trinkets are as good as dead.
Editor’s picks
