Discover more from Datascience Learning Center
Hugging Face and BigScience's AI Language are Breaking Down Barriers for Democratization of AI
New breed of open access taking AI by community.
Hugging Face has changed the Microsoft monopoly game. What is the power of volunteers, collaboration and open-access in the future of the democratization of A.I? The BigScience project Bloom AI is really incredible.
Follow the training of "BLOOM
Even as BigTech are controlling so-called AI Labs, that start of non-profit, it’s all a big sham. DeepMind on most days appears miles ahead of OpenAI, but the entire world is following close behind.
The irony of all of this is Hugging Face is the probable corporate beneficiary of all of this.
BigScience and Open-Access in A.I. is a Movement
BigScience is showing another way is possible than BigTech’s walled garden approach to so-called “open-source”. BigScience is a movement. With more than 1,000 volunteer researchers — supported by ethicists, philosophers, legal scholars and engineers from startups and large tech companies alike — spent months working toward Bloom, which rivals in scale LLMs made by firms like OpenAI and Alphabet’s DeepMind.
While Microsoft thought it was special acquiring GitHub and bribing OpenAI for profit, what if the future of artificial intelligence is more open, democratic and really about global collaboration, and not simply more control, centralization and walled gardens? I have to admit as a Substack author, I’m getting a bit fond of the anti-BigTech sentiment couching in this.
Can Hugging Face be a catalyst for more inclusion and emphasis on A.I. ethics. Microsoft’s reputation for A.I. for Good is on shaky ground even as its Microsoft Research and Azure AI units get better at R&D with more talent coming into the ecosystem since 2020.
It’s not all about Bloom or BigScience, but what Hugging Face can build in this new volunteer and open-access movement. What is the “open-source” gig-economy of volunteers was too influenced by BigTech, the likes of Google, Microsoft and Apple? Should programming languages and their popularity really be so influenced by huge BigTech firms? Open-access in A.I. implies anyone can do serious research as these language models scale and improve.
Hugging Face could one day become much more important for the future of artificial intelligence. But will Microsoft try to acquire them as per its usual pattern of behavior in the ecosystem? I was pleased when Discord managed to evade such a Titan’s grasp. Microsoft is yet again becoming too powerful, and its acquisition of Activision really is a turn off - the same old Microsoft appearing in the gold sheep clothing of “Cloud computing”. The same anti-trust behavior.
Can Hugging Face show another path forward for machine learning collaboration and open-access? Will the movement stick? What will it become?
We desperately need more inclusion, A.I. ethics and agency among researchers independent of the control of BigTech to have a more fair world, a more democratic world of machine learning. In recent years it’s become clear Google, Microsoft, Amazon or Meta are not those entities. Their self-interest is too powerful, they are too enmeshed in the business of commerce and greed. They are poorly regulated. Their actions and their words do not match up.
BigScience’s origins lie in discussions years ago between Hugging Face chief science officer Thomas Wolf, GENCI’s Stéphane Requena and IDRIS‘ Pierre-François Lavallée. The founders envisioned creating software, datasets, LLMs and tools to explore the social impact of AI, which only in recent years has received increased attention from the research community.
In the history of A.I., 2022 could be a pivotal moment. The emergence of BigScience, Bloom and Hugging Face as a platform with potentially a new moral compass is fascinating.
Their philosophy, movement and even their Press seems sound. Can a rag-tag group of volunteers change the status quo and walled garden control of BigTech firms? Here is a bit about their mission statement: source.
The acceleration in Artificial Intelligence (AI) and Natural Language Processing (NLP) will have a fundamental impact on society, as these technologies are at the core of the tools we use on a daily basis. A considerable part of this effort currently stems in NLP from training increasingly larger language models on increasingly larger quantities of texts.
Unfortunately, the resources necessary to create the best-performing models are found mainly in the hands of big technology giants. The stranglehold on this transformative technology poses some problems, from a research advancement, environmental, ethical and societal perspective.
The BigScience effort also benefits from a wide array of contributors including Nvidia’s Megatron and the Microsoft DeepSpeed teams, as well as receiving support from CNRS, the French National Research Agency.
The amount of global collaboration in A.I. and research really does rise above for the most part politics and geopolitical rhetoric.
THE LAUNCH OF BLOOM
Bloom officially launched on July 12, 2022. The researchers hope developing an open-access LLM that performs as well as other leading models will lead to long-lasting changes in the culture of AI development and help democratize access to cutting-edge AI technology for researchers around the world.
From humble beginnings BigScience speaks to a better world for A.I. It speaks to a different model than the one BigTech has established. Soon, steering committees were formed to give members of BigScience — who hailed from more than 60 countries and 250 institutions — scientific and general advice, design collaborative tasks and organize workshops, hackathons and public events.
It’s that enthusiasm as a software programmer, or an Executive or an A.I. enthusiast that is undeniable. Hugging Face has helped to facilitate something the world desperately needs. A more positive vision of the future of A.I. where open-access is real and not just rhetoric of open-source communities, basically just puppets of BigTech.
At least that’s the way I see it, feel free to share your alterative opinion:
BLOOM has an architecture that is similar to OpenAI’s GPT-3 large language model, but with the key fundamental difference being that BLOOM is multilingual. OpenAI’s GPT-4 should be coming out soon and DeepMind and even Meta AI is throwing out a lot of PR and studies now.
I’m of course following this across three different Newsletters that more or less cover A.I. as it relates to breaking news, paper summaries and so forth. I sometimes write additional content here. On this Newsletter I hope to get Data Science specific columnists, as I have begun to do with Quantum Foundry.
Collaboration is the future. BigTech companies should never be the center of that. Only projects like Bloom and new regulatory oversight groups that have global authority or credibility in A.I ethics can be trusted to move the industry forward. BigScience’s backers also hope that Bloom will spur new investigations into ways to combat the problems that plague all LLMs, including bias and toxicity.
Thanks for reading!
For the price of a cup of coffee a month, please support an indie voice like me. I may have to stop writing on Substack if I cannot make ends meet. Right now I’m barely getting by.