# Dataset Upload for AI Agents in GAME Cloud: Guidelines, Common Issues, and Best Practices

{% hint style="warning" %}
**Notice:** GAME Cloud is currently **deprecated.** You can only use **GAME SDK** at this moment.
{% endhint %}

GM builders!

Welcome to this guide on custom dataset uploads. We’ll walk you through the essentials, covering:

* The motivation behind uploading datasets
* A step-by-step process for uploading datasets via GAME Cloud, and things to take note of
* Best practices to ensure your dataset works effectively
* Real-world use cases to inspire your projects

## The motivation behind uploading datasets

Uploading custom datasets is essential for tailoring your AI agent’s performance to specific needs. Let’s dive into the motivations behind this:

1️⃣ Customization for Unique Use Cases

| **Why?**     | Publicly available datasets or APIs might not fully capture the specific needs of your application (e.g., tracking niche crypto projects or analyzing specific Telegram channels).                                                            |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **When?**    | <p>• Your AI agent requires domain-specific knowledge (e.g., proprietary market reports, custom research, or industry datasets).<br>• The available public datasets contain irrelevant or generalized data not suited for your niche.<br></p> |
| **Example:** | A project requiring analysis of user sentiment on lesser-known altcoins might need a custom domain knowledge dataset uploads.                                                                                                                 |

2️⃣ Enhancing AI Understanding of Proprietary Content

| **Why?**     | If your agent interacts with users based on internal documentation or proprietary content, it needs access to that content to generate accurate responses.                                                                                                 |
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **When?**    | <p>• The agent is required to answer customer or team-specific queries (e.g., FAQs, internal documentation, or project-specific reports).<br>• You want the agent to provide personalized recommendations or insights based on your business data.<br></p> |
| **Example:** | A DeFi protocol may upload a dataset containing platform-specific FAQs, governance proposals, and tokenomics information to enhance user support through the agent.                                                                                        |

## How to Upload Datasets via GAME Cloud

Follow these steps to upload your dataset seamlessly:

{% stepper %}
{% step %} <mark style="background-color:yellow;">**Step 1:**</mark>&#x20;

Navigate to the section under Agent Knowledge called Dataset.

<figure><img src="/files/ySjy9UkwoT2suUGx8NsA" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %} <mark style="background-color:yellow;">**Step 2:**</mark>&#x20;

Select Upload Datasets

<figure><img src="/files/xvBBFH1L9FzJzgwQW5hY" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %} <mark style="background-color:yellow;">**Step 3:**</mark>&#x20;

Choose a file in one of the supported formats listed.

<figure><img src="/files/vfdPNrliahK1fsRXZhzU" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %} <mark style="background-color:yellow;">**Step 4:**</mark>

Upload your file

<figure><img src="/files/T9f8kLvZrNUzhV8E4SaK" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %} <mark style="background-color:yellow;">**Step 5:**</mark>

When enabled (toggle switched on), the uploaded dataset will be used as a referencing document by the agent when generating responses. When disabled (toggle switched off), the dataset remains uploaded but is temporarily ignored by the agent. This is useful if you don’t want the agent to refer to this dataset for a moment without deleting it.

<figure><img src="/files/7jzfAXBVElF4P7cqD76Z" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %} <mark style="background-color:yellow;">**Step 6:**</mark>&#x20;

In the Tweet Enrichment setup section, ensure that “Enable Tweet Enrichment” option is selected.

<figure><img src="/files/ggbr9U2tmd8iAWU1sIvK" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/9WDi2F6HZURQPwI57UvL" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %} <mark style="background-color:yellow;">**Step 7:**</mark>

Also check and ensure `{{retrieveknowledge}}`is enabled.

<figure><img src="/files/jX3U39oR5sRuoGp28I0n" alt=""><figcaption></figcaption></figure>

&#x20;
{% endstep %}
{% endstepper %}

## 🗒️ Things to Take Note of

While uploading datasets, here are some key considerations:

{% hint style="success" %}

#### **The Agent may not properly reference the dataset correctly**

* **Cause:** The dataset may not be correctly recognized or retrieved by the agent.
* **Solutions:**
  * **`Solution 1:`** Delete the dataset and re-upload it, ensuring correct format and structure compliance.
  * **`Solution 2:`** Enable the Retrieve Knowledge option under the Tweet Enrichment segment to allow the agent to access the uploaded dataset.
  * **`Solution 3:`** Ensure that the agent's goal is properly linked to the uploaded dataset, allowing it to reference and utilize the dataset when generating responses. May consider to add/ refer to sample prompt below:
    * `Designed to enhance tweet enrichment by leveraging uploaded datasets. When generating responses, always check if relevant information exists in the uploaded dataset.`
      {% endhint %}

{% hint style="success" %}
**Dataset Upload Constraints**

Here are some important platform limits to be aware of:

1️⃣ **Supported Formats**

The GAME Sandbox currently supports the following file formats for dataset uploads:

* `PDF`: Often used for large documents or reports.
* `TXT`: Best for simple, structured text data or logs.
* `CSV`: Ideal for tabular data with rows and columns. This format works well for numerical data, time series, and datasets requiring structured relationships.
* `HTML`: Useful when the dataset involves web-based content, such as blog articles or structured pages. HTML files can retain formatting and metadata, making them beneficial for agents focused on web scraping or content parsing
* `XLSX`: Suitable for complex spreadsheets with multiple sheets, formulas, and structured data. This format is great for datasets that require various data types and categorization.<br>

2️⃣ **File Size Limits**

* The maximum file size for each upload is 10 MB.
* Uploading files larger than this limit may cause failures or performance degradation. For larger datasets, consider breaking the file into multiple smaller chunks.<br>

3️⃣ **File Amount Restrictions**

* It is recommended to limit the total number of uploaded files to maintain system efficiency. Ideally, the file amount should be **<=5**.
* Uploading excessive numbers of files can result in increased processing time, system lag, or errors during dataset retrieval.
  {% endhint %}

## Dataset Upload: Best Practices and Tips

Follow these best practices to ensure smooth integration and efficient dataset use:

#### **Structured Organization**

| **Why**     | AI agents rely on clearly defined sections to parse and retrieve information effectively.                                            |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| **How**     | Use consistent headers (e.g., # Section: Common Questions) to categorize different data types such as FAQs and tokenomics.           |
| **Example** | The "Common Questions" section helps the agent identify relevant answers based on user queries without scanning the entire document. |

#### Relevance to Agent Goals

| **Why**     | Excessive irrelevant data can degrade the agent's performance by increasing processing time and reducing focus. |
| ----------- | --------------------------------------------------------------------------------------------------------------- |
| **How**     | Include only the sections and data points directly tied to the agent's tasks.                                   |
| **Example** | A DeFi support agent may not need token distribution details unless governance or staking queries are common.   |

**Support for Semantic Understanding**

| **Why**     | Agents powered by NLP benefit from additional context, such as explanatory notes or related information.           |
| ----------- | ------------------------------------------------------------------------------------------------------------------ |
| **How**     | Add brief definitions or context alongside technical terms where necessary.                                        |
| **Example** | Including a short definition of technical term helps the agent explain concepts to users unfamiliar with the term. |

**Avoidance of Redundancies**

| **Why**     | Duplicate data can confuse the agent's search and retrieval mechanisms.                                |
| ----------- | ------------------------------------------------------------------------------------------------------ |
| **How**     | Perform a dataset audit to remove redundant entries or sections that repeat across multiple documents. |
| **Example** | Use unique document titles and well-organized sections to differentiate content clearly.               |

**Compliance and Privacy**

| **Why**     | Sensitive data, such as wallet addresses or personal information, should be protected to prevent privacy breaches. |
| ----------- | ------------------------------------------------------------------------------------------------------------------ |
| **How**     | Anonymize or redact sensitive information where necessary.                                                         |
| **Example** | Replace identifiable wallet addresses with placeholders in public-facing datasets                                  |

***

### Real-World Example Use Cases

Here are some practical use cases to get your brain juices flowing and give you some ideas on how to leverage this feature!

#### <mark style="background-color:yellow;">Example 1: Proprietary FAQs for DeFi Protocol Support</mark>

Format: TXT (Q\&A structure)

```
# Dataset: DeFi Protocol FAQs
# Purpose: Improve support responses for a decentralized finance (DeFi) platform.

Q: What is the purpose of the $TOKEN in your protocol?
A: The $TOKEN is used for governance, staking rewards, and transaction fee discounts.

Q: How can I stake my tokens?
A: You can stake your tokens via the protocol dashboard by navigating to the "Staking" section and following the on-screen instructions.

Q: Are my funds insured in the event of a smart contract failure?
A: Currently, our protocol offers no direct insurance; however, third-party providers may offer coverage options.

Q: What are governance proposals, and how can I vote?
A: Governance proposals are community-driven initiatives that require token holders to vote. Visit the governance portal to cast your vote.
```

**Key Considerations:**

* Organize the dataset with Q\&A pairs for easy retrieval.
* Add domain-specific terminology to enhance the AI agent's understanding of industry jargon.
* Handle edge cases, such as variations in user queries
* Clean the dataset to remove duplicates and irrelevant entries, which may confuse the agent.

#### <mark style="background-color:yellow;">Example 2: Crypto Sentiment Analysis (Twitter/Telegram Discussions)</mark>

Format: TXT (dataset structure)

```
# Dataset: Crypto Sentiment on Altcoins
# Source: Twitter & Telegram discussions
# Format: Timestamp | Platform | Username | Text | Sentiment (Positive/Negative/Neutral)

2025-01-10 12:34:56 | Twitter | @CryptoTraderX | "Really bullish on $XYZ! Incredible project." | Positive
2025-01-11 08:23:45 | Telegram | User1234 | "This altcoin is just another scam, not touching it." | Negative
2025-01-12 15:10:00 | Twitter | @AltcoinExpert | "Holding $ABC for long-term gains. Fundamentals look strong." | Positive
2025-01-13 10:12:00 | Telegram | CryptoTalk456 | "Neutral on $LMN right now. Waiting for more updates." | Neutral
```

**Key Considerations:**

* Ensure text is structured consistently with clear delimiters (e.g., `|`).
* Add metadata, such as timestamps and platforms, to support time-based analysis.

#### <mark style="background-color:yellow;">Example 3: Web 3 Project Support Agent</mark>

Format: PDF (Referencing FAQs, Tutorials, User Instructions, Research Paper)

```
# Section: Common Questions
Q: How do I connect my wallet to the platform?  
A: Click on the "Connect Wallet" button on the top-right corner of the page and follow the instructions.

Q: What is slippage tolerance?  
A: Slippage refers to the price difference between the time an order is placed and when it is executed.

# Section: Tokenomics Overview
- Max Supply: 1,000,000,000 XYZ tokens
- Distribution:
  - 60% Public
  - 20% Team and Development
  - 10% Ecosystem Fund
```

**Key Considerations:**

* Misinterpretation Prevention: Users may phrase the same query differently (*"Why is my transaction failing?"* vs. *"My swap didn’t go through"*)—train AI to match intent, not just keywords.
* Security-related questions (e.g., “Is my wallet safe?”) should be handled with caution, linking to official documentation and disclaimers rather than speculative answers.

***

## **And that’s a wrap! 🚀**

But hey, remember `quality > quantity` every time!

Your AI agent doesn’t need a data dump—all it needs is clean, structured, and relevant data! WAGMI! 🤖

***

### Stay Connected and Join the Virtuals Community! 🤖 🎈

<table data-view="cards"><thead><tr><th></th><th data-type="content-ref"></th><th></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td><mark style="background-color:blue;"><strong>X: @GAME_Virtuals</strong></mark></td><td><a href="https://x.com/GAME_Virtuals">https://x.com/GAME_Virtuals</a></td><td>For updates and to join our live streaming jam sessions every Wednesday. Stay in the loop and engage with us in real time!</td><td><a href="/files/SKftUrknnGwpVwnfOt2R">/files/SKftUrknnGwpVwnfOt2R</a></td></tr><tr><td><mark style="background-color:blue;"><strong>Discord: @Virtuals Protocol</strong></mark></td><td><a href="http://discord.gg/virtualsio">http://discord.gg/virtualsio</a></td><td>Join Discord for tech support and troubleshooting, and don’t miss our GAME Jam session every Wednesday!</td><td><a href="/files/7cncpQIKabUdDG4afQkp">/files/7cncpQIKabUdDG4afQkp</a></td></tr><tr><td><mark style="background-color:blue;"><strong>Telegram: @Virtuals Protocol</strong></mark></td><td><a href="https://t.me/virtuals">https://t.me/virtuals</a></td><td>Join our Telegram group for non-tech support! Whether you need advice, a chat, or just a friendly space, we’re here to help!</td><td><a href="/files/kuTzXgQ3hnHheSx8QSau">/files/kuTzXgQ3hnHheSx8QSau</a></td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.game.virtuals.io/how-to/articles/dataset-upload-for-ai-agents-in-game-cloud-guidelines-common-issues-and-best-practices.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
