Dataset Upload for AI Agents in GAME Cloud: Guidelines, Common Issues, and Best Practices
By: Joey Lau
GM builders!
Welcome to this guide on custom dataset uploads. We’ll walk you through the essentials, covering:
The motivation behind uploading datasets
A step-by-step process for uploading datasets via GAME Cloud, and things to take note of
Best practices to ensure your dataset works effectively
Real-world use cases to inspire your projects
The motivation behind uploading datasets
Uploading custom datasets is essential for tailoring your AI agent’s performance to specific needs. Let’s dive into the motivations behind this:
1️⃣ Customization for Unique Use Cases
Why?
Publicly available datasets or APIs might not fully capture the specific needs of your application (e.g., tracking niche crypto projects or analyzing specific Telegram channels).
When?
• Your AI agent requires domain-specific knowledge (e.g., proprietary market reports, custom research, or industry datasets). • The available public datasets contain irrelevant or generalized data not suited for your niche.
Example:
A project requiring analysis of user sentiment on lesser-known altcoins might need a custom domain knowledge dataset uploads.
2️⃣ Enhancing AI Understanding of Proprietary Content
Why?
If your agent interacts with users based on internal documentation or proprietary content, it needs access to that content to generate accurate responses.
When?
• The agent is required to answer customer or team-specific queries (e.g., FAQs, internal documentation, or project-specific reports). • You want the agent to provide personalized recommendations or insights based on your business data.
Example:
A DeFi protocol may upload a dataset containing platform-specific FAQs, governance proposals, and tokenomics information to enhance user support through the agent.
How to Upload Datasets via GAME Cloud
Follow these steps to upload your dataset seamlessly:
Step 1:
Navigate to the section under Agent Knowledge called Dataset.
Step 2:
Select Upload Datasets
Step 3:
Choose a file in one of the supported formats listed.
Step 4:
Upload your file
Step 5:
When enabled (toggle switched on), the uploaded dataset will be used as a referencing document by the agent when generating responses. When disabled (toggle switched off), the dataset remains uploaded but is temporarily ignored by the agent. This is useful if you don’t want the agent to refer to this dataset for a moment without deleting it.
Step 6:
In the Tweet Enrichment setup section, ensure that “Enable Tweet Enrichment” option is selected.
Step 7:
Also check and ensure {{retrieveknowledge}}
is enabled.
🗒️ Things to Take Note of
While uploading datasets, here are some key considerations:
The Agent may not properly reference the dataset correctly
Cause: The dataset may not be correctly recognized or retrieved by the agent.
Solutions:
Solution 1:
Delete the dataset and re-upload it, ensuring correct format and structure compliance.Solution 2:
Enable the Retrieve Knowledge option under the Tweet Enrichment segment to allow the agent to access the uploaded dataset.Solution 3:
Ensure that the agent's goal is properly linked to the uploaded dataset, allowing it to reference and utilize the dataset when generating responses. May consider to add/ refer to sample prompt below:Designed to enhance tweet enrichment by leveraging uploaded datasets. When generating responses, always check if relevant information exists in the uploaded dataset.
Dataset Upload Constraints
Here are some important platform limits to be aware of:
1️⃣ Supported Formats
The GAME Sandbox currently supports the following file formats for dataset uploads:
PDF
: Often used for large documents or reports.TXT
: Best for simple, structured text data or logs.CSV
: Ideal for tabular data with rows and columns. This format works well for numerical data, time series, and datasets requiring structured relationships.HTML
: Useful when the dataset involves web-based content, such as blog articles or structured pages. HTML files can retain formatting and metadata, making them beneficial for agents focused on web scraping or content parsingXLSX
: Suitable for complex spreadsheets with multiple sheets, formulas, and structured data. This format is great for datasets that require various data types and categorization.
2️⃣ File Size Limits
The maximum file size for each upload is 10 MB.
Uploading files larger than this limit may cause failures or performance degradation. For larger datasets, consider breaking the file into multiple smaller chunks.
3️⃣ File Amount Restrictions
It is recommended to limit the total number of uploaded files to maintain system efficiency. Ideally, the file amount should be <=5.
Uploading excessive numbers of files can result in increased processing time, system lag, or errors during dataset retrieval.
Dataset Upload: Best Practices and Tips
Follow these best practices to ensure smooth integration and efficient dataset use:
Structured Organization
Why
AI agents rely on clearly defined sections to parse and retrieve information effectively.
How
Use consistent headers (e.g., # Section: Common Questions) to categorize different data types such as FAQs and tokenomics.
Example
The "Common Questions" section helps the agent identify relevant answers based on user queries without scanning the entire document.
Relevance to Agent Goals
Why
Excessive irrelevant data can degrade the agent's performance by increasing processing time and reducing focus.
How
Include only the sections and data points directly tied to the agent's tasks.
Example
A DeFi support agent may not need token distribution details unless governance or staking queries are common.
Support for Semantic Understanding
Why
Agents powered by NLP benefit from additional context, such as explanatory notes or related information.
How
Add brief definitions or context alongside technical terms where necessary.
Example
Including a short definition of technical term helps the agent explain concepts to users unfamiliar with the term.
Avoidance of Redundancies
Why
Duplicate data can confuse the agent's search and retrieval mechanisms.
How
Perform a dataset audit to remove redundant entries or sections that repeat across multiple documents.
Example
Use unique document titles and well-organized sections to differentiate content clearly.
Compliance and Privacy
Why
Sensitive data, such as wallet addresses or personal information, should be protected to prevent privacy breaches.
How
Anonymize or redact sensitive information where necessary.
Example
Replace identifiable wallet addresses with placeholders in public-facing datasets
Real-World Example Use Cases
Here are some practical use cases to get your brain juices flowing and give you some ideas on how to leverage this feature!
Example 1: Proprietary FAQs for DeFi Protocol Support
Format: TXT (Q&A structure)
Key Considerations:
Organize the dataset with Q&A pairs for easy retrieval.
Add domain-specific terminology to enhance the AI agent's understanding of industry jargon.
Handle edge cases, such as variations in user queries
Clean the dataset to remove duplicates and irrelevant entries, which may confuse the agent.
Example 2: Crypto Sentiment Analysis (Twitter/Telegram Discussions)
Format: TXT (dataset structure)
Key Considerations:
Ensure text is structured consistently with clear delimiters (e.g.,
|
).Add metadata, such as timestamps and platforms, to support time-based analysis.
Example 3: Web 3 Project Support Agent
Format: PDF (Referencing FAQs, Tutorials, User Instructions, Research Paper)
Key Considerations:
Misinterpretation Prevention: Users may phrase the same query differently ("Why is my transaction failing?" vs. "My swap didn’t go through")—train AI to match intent, not just keywords.
Security-related questions (e.g., “Is my wallet safe?”) should be handled with caution, linking to official documentation and disclaimers rather than speculative answers.
And that’s a wrap! 🚀
But hey, remember quality > quantity
every time!
Your AI agent doesn’t need a data dump—all it needs is clean, structured, and relevant data! WAGMI! 🤖
Stay Connected and Join the Virtuals Community! 🤖 🎈
Last updated