Alibaba says his new rival AI model of DeepSeeks’s R-1 rivals, OpenI’s O1

Alibaba Cloud launched the QWQ-32B on Thursday, a compact reasoning model based on the ITST large language model (LLM), Qwen2.5-32B, one says it brings comparable performance with other top models, including the Chinese rival Deepseek and Openai O1. Only 32 billion parameters.

Depending on the edition of the alibaby “performance QWQ-32B, he emphasizes the strength of reinforcing learning (RL), the basic technique of the model when applied to a robust endowment model such as Qwen2.5-32B, which is in advance stained in the large world. Using continuous scaling of RL, QWQ-32B demonist significant improvisation in mathematical thinking and coding of prainecptica. ”

The AWS defines RL as “machine learning technique that trains software to decide to achieve the most optimal results and mimics the process of learning the experiments and mistakes they use to achieve their goals. The software actions that work on your goal are reinforced while the actions that leave the target are ignored.

“Plus,” said the release, “the model was trained by rewards from the general reward model and verification based on rules, which increases its general abilities. This includes better monitoring of instructions, dealing with human preferences and improved agent performance.

The QWQ-32B is an open weight in hugging the face and the range of the model under the Apache 2.0 license, according to the accompanying Alibaba blog, which noted that the £ 32 billion parameters will achieve a “comparable performance with Deepseek-R1, which can boast of 671 billion parameters. Billion activated).

Its authors wrote: “This means Qwen’s initial step in scaling RL to increase the ability to think. Through this journey, we not only witnessed the immense potential of scalp RL, but we also recognized unused possibilities in pre -led language models.

They continued in the state: “When we are working on the development of the next generation Qwen, we are sure that the combination of strong Endowment models with RL -powered scalated computational resources drives us closer to reaching artificial universities (AG). In addition, we are actively examining the integration of RL agents to allow a long horizon to justify, lock more intelligence with an inference scale. ”

Justin St-Maurice, a technical advisor in the Research Group Research Group, asked for his response to the opening, said: “Comparing these models is like comparing the performance of different teams in NASCAR. Yes, they are fast, but someone else wins in each round … So it matters? Is generally at

St-Maurice added, “It is said that Openi wants to charge a price of $ 20,000/month for” PhD Intelligence “(whether it means) because it is a run. High -performance models from China question Dare that LLM must be expensive. The race uses optimization, not the algorithms of brutal strength and data centers of half tilion.

Deepseek, “He says that everyone is overpriced and insufficiently powerful, and there is some truth when efficiency leads an advantage. But Chinese AI is “safe for the rest of the world”.

According to st-Maurice: “All models question ethical boundaries in different ways. For example, framing another LLM, such as Grok North America, as inherently more ethical than the Chinese deep search, is still the most sophisticated and a matter of opinion; It depends on who sets the standard and what lens you are viewing.

The third big player in Chinese AI is Baidu, who launched his own model named Ernie last year, although it had a small impact outside China, the situation St-Maurice said is not a surprise.

“The site is still distributing responsible in Chinese, although it claims to support English,” he said. “It can be said with certainty that Alibaba and Deepseek focus more on global internship, while Baidu seems to be anchored on the domestic market.” Different priorities, different results. ”

Leave a Comment