06.02.2025
Science
eye 50

What is the DeepSeek Neural Network: A Chatbot with Internet Search

A ChatGPT

What is the DeepSeek V3

DeepSeek V3 is a large open-source language model that contains 671 billion parameters and is trained on 14.8 trillion tokens. It is capable of analyzing texts, making translations and writing essays, as well as creating code.

The features of the model lie in its architecture and training methods. It uses:

  • Multi-token Prediction (MTP) architecture. This allows the model to predict multiple words instead of one, analyzing different parts of the sentence at the same time. This method improves the accuracy and performance of the model;
  • Mixture of Experts (MoE). This architecture uses several specialized and pre-trained neural network "experts" to analyze various input data. This speeds up learning and increases the efficiency of AI. DeepSeek V3 works with 256 such neural networks, of which eight are activated to process each token;
  • Multi-head Latent Attention (MLA) technology - an attention mechanism that is usually used in large language models and helps them identify the most important parts of a sentence. MLA allows you to extract key details from a piece of text several times, not just once. This means that AI is less likely to miss important information.

Thanks to these features, the model required only 2.788 million hours or two months of work of Nvidia H800 graphics processors for training. The cost of it was $5.5 million. For comparison - OpenAI spent $78 million on training GPT.

Developers claim that in tests the neural network outperformed GPT-4o from OpenAI, Llama 3 from Meta and Claude 3.5 Sonnet from Anthropic in programming and text processing tasks.

The main feature of the new model is a completely open code, which allows developers not only to use the technology for commercial purposes, but also to adapt it to solve various problems in the field of artificial intelligence.

DeepSeek V3 capabilities

The model offers a context window of 128 thousand tokens, like GPT-4o, which allows it to analyze up to 300 pages of text. It is capable of:

  • generating texts of different volumes and in different genres;
  • searching for information on the Internet;
  • deciphering diagrams and explaining pictures;
  • writing code, correctly formatting it and solving complex programming problems in C++, Go, Java, JavaScript, Python and Rust. The model successfully integrates with code editors;
  • reasoning like GPT-o1 and o1-mini in DeepThink mode.

DeepSeek V3 offers a high level of multilingualism, and its deep understanding of Chinese and English allows you to work with texts without losing the quality of the translated text and meaning. The model also supports Russian.

The disadvantage of the neural network is that it does not yet allow you to analyze materials from links, and only supports uploads or excerpts from texts.

Read also


Readers' choice
up