Chinese AI can benefit from bigger models, more data, says startup founder

The founder of a Shanghai-based artificial intelligence (AI) startup is putting his faith in the “scaling laws” of large language model (LLM) development, despite China’s disadvantages in investment and advanced chips.

Jiang Daxin, a Microsoft veteran who founded and now runs Stepfun, said at a side event for the World Artificial Intelligence Conference (WAIC) in Shanghai that LLMs will eventually reach hundreds of trillions of parameters.

Scaling laws seek to measure the relationship between AI model performance and their parameters. They generally show improved performance from larger models, more data, and greater computational resources, but with diminishing returns. Tech giants have been splurging on the most advanced tech – most notably Nvidia chips like the H100s – to eke out any possible edge in performance.

“The upgrades of OpenAI’s GPT series, which powers ChatGPT, along with the substantial investments in supercomputing centres by tech giants such as Amazon, Microsoft and Meta, all validate the effectiveness of scaling laws,” Jiang said in a talk on Saturday. “Going forward, issues like the availability of data, human resources, and concerns about return on investment may impact the pace of scaling laws.”

China’s Big Tech firms and start-ups alike have rushed to launch their own LLMs since OpenAI launched ChatGPT in late 2022. There are more than 200 AI models in the country, including Tongyi Qianwen by Alibaba Group Holding and Ernie by Internet search giant Baidu. Alibaba owns the South China Morning Post.

Yet few Chinese AI firms have so far been able to match US tech giants in LLM spending, and many have sought to develop client-facing applications that can generate revenue.

Since its founding in April 2023, Stepfun has focused on developing fundamental models. At WAIC, it officially launched Step-2, a trillion-parameter LLM, along with the Step-1.5V multimodal model and the Step-1X image generation model.

“Apart from the scaling laws, multimodality is also crucial for constructing a world model,” Jiang said.

World models are those that allow for the development of internal representations of external environments by processing visual and other types of data. Stepfun aims to unify generative and comprehension capabilities within a single model, according to Jiang.

The company also operates consumer-facing products, including a ChatGPT-like personal assistant named Yuewen and Maopaoya, an AI companion that takes on the personalities of specific characters.

“Global AI investments reached US$22.4bil (RM105.16bil) last year, with 70 to 80% focused on companies developing large models,” Alex Zhou Zhifeng, managing partner at Qiming Venture Partners, said at another WAIC side event. Qiming was an early investor in Stepfun.

Zhou mentioned that in the near future, there will be more investments in AI applications, which are benefiting from decreasing token costs. In AI, a token is a fundamental unit of data that is processed by algorithms.

Peng Wensheng, an economist with China International Capital, said at the event that the size of China’s market for AI models is expected to reach about 5.2 trillion yuan (RM3.35 trillion or US$715.1bil) by 2030. The industrial AI market size will be about 9.4 trillion yuan, he added. – South China Morning Post

Related Posts