Dataset Upload for AI Agents in GAME Cloud: Guidelines, Common Issues, and Best Practices

By: Joey Lau

Notice: GAME Cloud is currently deprecated. You can only use GAME SDK at this moment.

GM builders!

Welcome to this guide on custom dataset uploads. We’ll walk you through the essentials, covering:

The motivation behind uploading datasets
A step-by-step process for uploading datasets via GAME Cloud, and things to take note of
Best practices to ensure your dataset works effectively
Real-world use cases to inspire your projects

The motivation behind uploading datasets

Uploading custom datasets is essential for tailoring your AI agent’s performance to specific needs. Let’s dive into the motivations behind this:

1️⃣ Customization for Unique Use Cases

Why?

Publicly available datasets or APIs might not fully capture the specific needs of your application (e.g., tracking niche crypto projects or analyzing specific Telegram channels).

When?

• Your AI agent requires domain-specific knowledge (e.g., proprietary market reports, custom research, or industry datasets). • The available public datasets contain irrelevant or generalized data not suited for your niche.

Example:

A project requiring analysis of user sentiment on lesser-known altcoins might need a custom domain knowledge dataset uploads.

2️⃣ Enhancing AI Understanding of Proprietary Content

Why?

If your agent interacts with users based on internal documentation or proprietary content, it needs access to that content to generate accurate responses.

When?

• The agent is required to answer customer or team-specific queries (e.g., FAQs, internal documentation, or project-specific reports). • You want the agent to provide personalized recommendations or insights based on your business data.

Example:

A DeFi protocol may upload a dataset containing platform-specific FAQs, governance proposals, and tokenomics information to enhance user support through the agent.

How to Upload Datasets via GAME Cloud

Follow these steps to upload your dataset seamlessly:

Step 1:

Navigate to the section under Agent Knowledge called Dataset.

Step 2:

Select Upload Datasets

Step 3:

Choose a file in one of the supported formats listed.

Step 4:

Upload your file

Step 5:

When enabled (toggle switched on), the uploaded dataset will be used as a referencing document by the agent when generating responses. When disabled (toggle switched off), the dataset remains uploaded but is temporarily ignored by the agent. This is useful if you don’t want the agent to refer to this dataset for a moment without deleting it.

Step 6:

In the Tweet Enrichment setup section, ensure that “Enable Tweet Enrichment” option is selected.

Step 7:

Also check and ensure {{retrieveknowledge}}is enabled.

🗒️ Things to Take Note of

While uploading datasets, here are some key considerations:

The Agent may not properly reference the dataset correctly

Cause: The dataset may not be correctly recognized or retrieved by the agent.
Solutions:
- Solution 1: Delete the dataset and re-upload it, ensuring correct format and structure compliance.
- Solution 2: Enable the Retrieve Knowledge option under the Tweet Enrichment segment to allow the agent to access the uploaded dataset.
- Solution 3: Ensure that the agent's goal is properly linked to the uploaded dataset, allowing it to reference and utilize the dataset when generating responses. May consider to add/ refer to sample prompt below:
  - Designed to enhance tweet enrichment by leveraging uploaded datasets. When generating responses, always check if relevant information exists in the uploaded dataset.

Dataset Upload Constraints

Here are some important platform limits to be aware of:

1️⃣ Supported Formats

The GAME Sandbox currently supports the following file formats for dataset uploads:

PDF: Often used for large documents or reports.
TXT: Best for simple, structured text data or logs.
CSV: Ideal for tabular data with rows and columns. This format works well for numerical data, time series, and datasets requiring structured relationships.
HTML: Useful when the dataset involves web-based content, such as blog articles or structured pages. HTML files can retain formatting and metadata, making them beneficial for agents focused on web scraping or content parsing
XLSX: Suitable for complex spreadsheets with multiple sheets, formulas, and structured data. This format is great for datasets that require various data types and categorization.

2️⃣ File Size Limits

The maximum file size for each upload is 10 MB.
Uploading files larger than this limit may cause failures or performance degradation. For larger datasets, consider breaking the file into multiple smaller chunks.

3️⃣ File Amount Restrictions

It is recommended to limit the total number of uploaded files to maintain system efficiency. Ideally, the file amount should be <=5.
Uploading excessive numbers of files can result in increased processing time, system lag, or errors during dataset retrieval.

Dataset Upload: Best Practices and Tips

Follow these best practices to ensure smooth integration and efficient dataset use:

Structured Organization

Why

AI agents rely on clearly defined sections to parse and retrieve information effectively.

How

Use consistent headers (e.g., # Section: Common Questions) to categorize different data types such as FAQs and tokenomics.

Example

The "Common Questions" section helps the agent identify relevant answers based on user queries without scanning the entire document.

Relevance to Agent Goals

Why

Excessive irrelevant data can degrade the agent's performance by increasing processing time and reducing focus.

How

Include only the sections and data points directly tied to the agent's tasks.

Example

A DeFi support agent may not need token distribution details unless governance or staking queries are common.

Support for Semantic Understanding

Why

Agents powered by NLP benefit from additional context, such as explanatory notes or related information.

How

Add brief definitions or context alongside technical terms where necessary.

Example

Including a short definition of technical term helps the agent explain concepts to users unfamiliar with the term.

Avoidance of Redundancies

Why

Duplicate data can confuse the agent's search and retrieval mechanisms.

How

Perform a dataset audit to remove redundant entries or sections that repeat across multiple documents.

Example

Use unique document titles and well-organized sections to differentiate content clearly.

Compliance and Privacy

Why

Sensitive data, such as wallet addresses or personal information, should be protected to prevent privacy breaches.

How

Anonymize or redact sensitive information where necessary.

Example

Replace identifiable wallet addresses with placeholders in public-facing datasets

Real-World Example Use Cases

Here are some practical use cases to get your brain juices flowing and give you some ideas on how to leverage this feature!

Example 1: Proprietary FAQs for DeFi Protocol Support

Format: TXT (Q&A structure)

# Dataset: DeFi Protocol FAQs
# Purpose: Improve support responses for a decentralized finance (DeFi) platform.

Q: What is the purpose of the $TOKEN in your protocol?
A: The $TOKEN is used for governance, staking rewards, and transaction fee discounts.

Q: How can I stake my tokens?
A: You can stake your tokens via the protocol dashboard by navigating to the "Staking" section and following the on-screen instructions.

Q: Are my funds insured in the event of a smart contract failure?
A: Currently, our protocol offers no direct insurance; however, third-party providers may offer coverage options.

Q: What are governance proposals, and how can I vote?
A: Governance proposals are community-driven initiatives that require token holders to vote. Visit the governance portal to cast your vote.

Key Considerations:

Organize the dataset with Q&A pairs for easy retrieval.
Add domain-specific terminology to enhance the AI agent's understanding of industry jargon.
Handle edge cases, such as variations in user queries
Clean the dataset to remove duplicates and irrelevant entries, which may confuse the agent.

Example 2: Crypto Sentiment Analysis (Twitter/Telegram Discussions)

Format: TXT (dataset structure)

# Dataset: Crypto Sentiment on Altcoins
# Source: Twitter & Telegram discussions
# Format: Timestamp | Platform | Username | Text | Sentiment (Positive/Negative/Neutral)

2025-01-10 12:34:56 | Twitter | @CryptoTraderX | "Really bullish on $XYZ! Incredible project." | Positive
2025-01-11 08:23:45 | Telegram | User1234 | "This altcoin is just another scam, not touching it." | Negative
2025-01-12 15:10:00 | Twitter | @AltcoinExpert | "Holding $ABC for long-term gains. Fundamentals look strong." | Positive
2025-01-13 10:12:00 | Telegram | CryptoTalk456 | "Neutral on $LMN right now. Waiting for more updates." | Neutral

Key Considerations:

Ensure text is structured consistently with clear delimiters (e.g., |).
Add metadata, such as timestamps and platforms, to support time-based analysis.

Example 3: Web 3 Project Support Agent

Format: PDF (Referencing FAQs, Tutorials, User Instructions, Research Paper)

# Section: Common Questions
Q: How do I connect my wallet to the platform?  
A: Click on the "Connect Wallet" button on the top-right corner of the page and follow the instructions.

Q: What is slippage tolerance?  
A: Slippage refers to the price difference between the time an order is placed and when it is executed.

# Section: Tokenomics Overview
- Max Supply: 1,000,000,000 XYZ tokens
- Distribution:
  - 60% Public
  - 20% Team and Development
  - 10% Ecosystem Fund

Key Considerations:

Misinterpretation Prevention: Users may phrase the same query differently ("Why is my transaction failing?" vs. "My swap didn’t go through")—train AI to match intent, not just keywords.
Security-related questions (e.g., “Is my wallet safe?”) should be handled with caution, linking to official documentation and disclaimers rather than speculative answers.

And that’s a wrap! 🚀

But hey, remember quality > quantity every time!

Your AI agent doesn’t need a data dump—all it needs is clean, structured, and relevant data! WAGMI! 🤖

Stay Connected and Join the Virtuals Community! 🤖 🎈

X: @GAME_Virtuals

https://x.com/GAME_Virtuals

For updates and to join our live streaming jam sessions every Wednesday. Stay in the loop and engage with us in real time!

Discord: @Virtuals Protocol

http://discord.gg/virtualsio

Join Discord for tech support and troubleshooting, and don’t miss our GAME Jam session every Wednesday!

Telegram: @Virtuals Protocol

https://t.me/virtuals

Join our Telegram group for non-tech support! Whether you need advice, a chat, or just a friendly space, we’re here to help!

PreviousGAME Cloud - How to Define Reply Worker and Worker Prompts NextVideo Tutorials

Last updated 16 days ago