Don’t expect large-scale language models like the next GPT to be democratized.

This article is part of the latest news articles. AI research.

In early May, Meta released Open Pretrained Transformer (OPT-175B), a Large Language Model (LLM) that can perform various tasks. Large language models have become one of the most popular research areas in artificial intelligence over the past few years.

The OPT-175B is the latest participant in the LLM arms race sparked by OpenAI. GPT-3, a deep neural network with 175 billion parameters. GPT-3 has shown that LLMs can perform many tasks, of which only a few examples can be seen, without further training (Zero or Pew Shot Running). Microsoft later incorporated GPT-3 into several products, demonstrating the scientific as well as commercial potential of the LLM.

neural newsletter header

Humanoid greeting

Subscribe now to get a weekly summary of your favorite AI Stories.

What makes the OPT-175B unique is Meta’s commitment to “openness” as the model name suggests. Meta made this model public (with a few caveats). We also revealed many details about our training and development process. in a post posted on Meta AI BlogThe company described the launch of the OPT-175B as “democratizing access to large-scale language models.”

The meta’s move toward transparency is commendable. But the competition over large language models has reached a point where it can no longer be democratized.

Meta’s OPT-175B release has several key features. It contains both the pre-trained model and the code needed to train and use the LLM. Pre-trained models are especially useful for organizations that do not have computational resources to train the model (training a neural network is more resource-intensive than running). It will also help with mass reduction. carbon footprint This is due to the computational resources required to train large neural networks.

Like GPT-3, OPT comes in different sizes, from 125 million to 175 billion parameters (models with more parameters have more training capacity). As of this writing, all models up to the OPT-30B are available for download. The full 175 billion parametric model is available to select researchers and institutions filling out the request form.

According to the Meta AI blog, “To maintain integrity and prevent misuse, we are releasing models under a non-commercial license so that we can focus on our research use cases. Access to our models is granted to academic researchers. Government, civil society and People associated with organizations in academia; with industrial laboratories around the world.”

In addition to models, Meta has also released a full logbook that provides a detailed technical timeline of the development and training process of large language models. Published papers usually contain only information about the final model. According to the meta, the logbook provides valuable insight into “how much computing was used to train the OPT-175B and the human overhead needed when the underlying infrastructure or the training process itself becomes unstable at scale”.

In a blog post, Meta states that large language models are mostly accessible via “paid APIs” and limited access to LLMs is intended to improve robustness by “limiting researchers’ ability to understand how and why these large language models work.” It hinders the progress of the effort. It mitigates known issues such as bias and toxicity.”

This is OpenAI’s jab (and Microsoft as an extension), instead of releasing the model’s weights and source code to the public, we released GPT-3 as a black box API service. One of the reasons OpenAI did not release GPT-3 was to control the misuse and development of harmful applications.

Meta believes that by making the model available to a larger audience, it will be in a better position to study and prevent the harm it can cause.

Here’s how Meta describes these efforts: “We hope the OPT-175B will bring more voices to the forefront of large-scale language model creation, help communities design jointly responsible release strategies, and add unprecedented levels of transparency and openness to large-scale development. The language model of the field”

However, it is worth noting that “transparency and openness” is not the same as “democratizing large language models”. The cost of training, constructing, and running large-scale language models remains staggering and will continue to grow.

According to Meta’s blog post, researchers at Meta have succeeded in significantly reducing the cost of training large language models. The company says the model’s carbon footprint has been reduced to one-seventh that of GPT-3. Experts I’ve talked to before have found that the cost of training in GPT-3 is high. Up to $27.6 million.

This means it still costs millions of dollars to train the OPT-175B. Fortunately, pre-trained models do not require training the model, and Meta says that it will provide a codebase used to train and deploy the full model “using only 16 NVIDIA V100 GPUs”. That’s about $400,000 the equivalent of an Nvidia DGX-2, which isn’t a small sum for a cash-constrained lab or individual researcher. (According to paper Meta, which provides details on the OPT-175B, has trained its own model with 992 80GB A100 GPUs, Much faster than the V100.)

Meta AI’s logbook further confirms that training large-scale language models is a very complex task. The OPT-175B’s timeline is full of server crashes, hardware failures, and other issues that require highly skilled technicians. The researchers also had to restart the training process multiple times, adjust hyperparameters, and change the loss function. All of this incurs additional costs that small laboratories cannot afford.

Language models such as OPT and GPT are Transformer Architecture. One of the main features of the converter is its ability to process large amounts of sequential data (eg text) in parallel at scale.

In recent years, researchers have shown that the performance of language tasks can be improved by adding more layers and parameters to the transformer model. Some researchers believe that reaching higher levels of intelligence is only a matter of scale. As a result, cash-rich labs such as Meta AI, DeepMind (owned by Alphabet) and OpenAI (supported by Microsoft) are bigger and bigger neural networks.

Last year, Microsoft and Nvidia 530 billion parametric language model It is called Megatron-Turing (MT-NLG). Last month, Google Pathways Language Model (PaLM), an LLM with 540 billion parameters. And there are rumors that OpenAI will release GPT-4 in the next few months.

However, larger neural networks also require more financial and technical resources. And the larger language model will have new bells and whistles (and new failure), which will inevitably center power in the hands of a handful of wealthy corporations, making it much more difficult for small labs and independent researchers to work on large-scale language models.

From a commercial standpoint, the big tech companies will have an even greater advantage. Running large language models is very expensive and challenging. Companies like Google and Microsoft have specialized servers and processors that can run these models at scale and in a profitable way. For a small company, the overhead of running their own version of an LLM such as GPT-3 is too much. As most businesses use cloud hosting services instead of building their own servers and data centers. ready-to-use system As large language models like the GPT-3 API become popular, they will get more attention.

This, in turn, will place more AI in the hands of large tech companies. More AI research labs should partner with big tech to fund research. And this will give big tech more power to shape the future direction of AI research (perhaps consistent with their financial interests). This can come at the cost of a research area with no short-term return on investment.

Bottom line, while celebrating Meta’s move to bring transparency to LLMs, let’s not forget that the nature of the large language model is undemocratic and favors the companies promoting it.

This article was originally written by Ben Dickson and published by Ben Dickson at tech talk, a publication that examines technology trends, how technology trends affect the way we live and do business, and the problems technology solves. But we also discuss the downsides of technology, the darker implications of new technologies, and what we need to be wary of. You can read the original article Here.

Leave a Comment