Overview

Genpilot is an AI‑powered large language model that combines the knowledge base, literature database, tool library, database and online search function of BGI. It integrates intelligent question‑answering and writing functions to provide comprehensive solutions for DCS Cloud users and researchers.

For DCS Cloud users: Genpilot not only utilizes the native model to address fundamental issues but also supports Online search and Academic search to gather and synthesize additional information. It features embedded DCS Cloud product manuals and FAQs, providing intelligent customer service.
For researchers: Genpilot provides intelligent assistance for the entire academic research process. Through the Literature library, Intelligent writing, researchers can efficiently handle repetitive and time‑consuming research tasks, explore innovative inventions and discoveries, and achieve rapid transformation and writing of final research results.

Introduction to Large Language Model

Large Language Model (LLM) is an artificial intelligence model based on machine learning and natural language processing techniques. It can understand human language and has a certain level of memory capability, allowing it to generate contextually relevant answers. Although LLMs have made significant progress in text generation, there are still some challenges in terms of accuracy and handling rare questions. In the field of spatial‑temporal omics and professional life sciences, LLMs still face issues such as outdated information and a lack of specialized language resources.

Genpilot, building upon the powerful text processing capabilities of LLMs, enhances data construction, focuses on user experience, and improves the accuracy, professionalism, and timeliness of answers. However, Genpilot still faces some technical limitations that have yet to be overcome. The following are the capabilities and limitations of Genpilot:

Capabilities:

Intelligent Q&A: Genpilot can understand user questions and make reasonable inferences and predictions based on data, providing relatively accurate answers.
Memory capability: Within the same conversation, Genpilot can remember users' historical questions and answers within a certain range, generating more coherent and logical responses.
Access to the latest information: Genpilot parses user questions and performs online searches, combining search results to generate the latest information and insights.
Access to professional information: Genpilot incorporates underlying data such as internal documents, literature, tools and knowledge bases, enabling access to more valuable and complex information.

Limitations:

Length limitation: Due to the current limitations of large language models, Genpilot is restricted by token limits in user input, answer generation and memory capabilities, only considering limited context information.
Data bias: Large language models may still encounter situations where they cannot answer or provide incorrect answers, requiring users to evaluate and use them cautiously.
Single modality: Currently, Genpilot only supports text input and output.

Definition of terms

Below are some terms mentioned in the Genpilot User Manual, for the sake of avoiding ambiguity, I hereby provide explanations.

Dialogue: It includes the user's question and the corresponding answer generated by Genpilot. A dialogue refers to a question and its answer, where there may be multiple answers.
Chat: It refers to a collection of dialogues.
Bot: A special tool designed to provide additional functionality and capabilities to Genpilot. By using a bot, Genpilot can access databases, APIs, etc., and integrate the obtained information to generate answers in order to better address various tasks and needs.
Application: Refers to the "Intelligent Q&A" application provided by Genpilot for DCS Cloud users, as well as the applications provided for researchers: Literature library, Intelligent writing and NFSC review. Different applications correspond to different databases and functionalities.
Token: It is the smallest unit of text in the large language model. It can be a word, a punctuation mark, a number, a symbol or other language elements. Each token has a corresponding encoding representation, and the model uses these encoding representations to process and generate text. For English, 1 token is approximately 4 characters or 0.75 words; for Chinese, 1 token is approximately 0.5 characters. Currently, there is a token limit for the large language model when processing and understanding text.
Streaming Output: It refers to the gradual generation of results in a continuous stream, rather than generating all results at once. Genpilot returns answers in a streaming output format.

Product Form

Genpilot is currently available in two product forms: Web platform and DingTalk mini-program.

Overview

# Overview

# Introduction to Large Language Model

# Definition of terms

# Product Form

Overview

Introduction to Large Language Model

Definition of terms

Product Form