A new open-source artificial intelligence model named Obsidian, announced in an Oct. 30 Reddit post, represents a breakthrough in multimodal AI accessibility. Obsidian is the first 3b parameter multimodal AI — which makes it a model compact enough to run efficiently on a regular laptop.

Multimodal AI refers to AI systems that can process and connect data from different modes, such as text, images, audio, and video — in this case, the model accepts text and pictures as input, much like the latest version of OpenAI’s GPT-4V. While multimodal AI models like DALL-E 3 and GPT-4 have shown impressive capabilities, their enormous size makes them resource-intensive to run, requiring expensive high-end hardware — and their models are a closely guarded secret, so you could never run them even if you had the necessary specialized hardware.

The AI intelligence model, Obsidian, packs multimodal intelligence into a standard laptop’s memory

Obsidian changes this by packing multimodal intelligence into a model small enough to fit into a standard laptop’s memory and run at practical speeds. At 3 billion parameters, Obsidian builds upon the Capybara-3B model architecture, which achieves state-of-the-art performance compared to similarly sized models. The developer also announced on Reddit that a multimodal model based on the highly-praised Mistral open-source 7B model will soon follow.

Obsidian’s compact size is thanks to techniques adapted from the LLaMA model architecture. According to the Reddit post announcing Obsidian, it was pre-trained on a diverse synthesized multi-modal dataset, including text paired with corresponding images. This training methodology allowed it to develop strong language and vision capabilities despite its reduced parameters.

The result is an AI assistant with conversational skills and visual understanding that can fit in your backpack. Obsidian breaks down barriers to accessing AI, opening up new possibilities for on-device intelligence.

While still an early version, Obsidian’s efficient form factor sets an exciting precedent. It demonstrates that multimodal AI does not have to be locked up in giant data centers but can be made compact enough to be distributed widely.

Featured Image Credit: From Image Creation at Aimesoft; Thank you!

Radek Zielinski

Radek Zielinski is an experienced technology and financial journalist with a passion for cybersecurity and futurology.

Source link

Previous articleWashington, DC Mayor Muriel Bowser Figures Out That Face Masks Are Fueling Crime | The Gateway Pundit
Next article39-Year Intel Analyst: ‘There WILL Be Multiple Terrorist Attacks in the U.S. Over the Next 14 Months’ | The Gateway Pundit
Harmony Evans is an award-winning author of Harlequin Kimani Romance, African-American romance, and so on. Harmony Evans is an award-winning author for Harlequin Kimani Romance, the leading publisher of African-American romance. Her 2nd novel, STEALING KISSES, will be released in November 2013. Harmony is a single mom to a beautiful, too-smart-for-her-own-good daughter, who makes her grateful for life daily. Her hobbies include cooking, baking, knitting, reading, and of course, napping and also review some of the best-selling and popular brands and services in the market and also write comprehensive blogs.


Please enter your comment!
Please enter your name here