Intro
A while back, the team at Galactic built a fantastic (and fast) NFT search engine called Fancy.art. Very simply, we wanted to be able to search across all the NFTs on mainnet quickly. To make it happen, we built crawlers that crawled mainnet for all ERC721 and ERC1155 NFTs and listened to all the ETH events.
It was awesome!
But also, many things weren’t awesome (some we anticipated, some we didn’t). Ultimately, we’ve decided to put Fancy.art on ice because we’re focused on something else but want to share our thoughts about this fascinating and challenging experiment.
Getting all the ERC721s & ERC1155s
The first big task was to index all ERC721s and ERC1155s. Crawling was a must, and we used ArchiveNode.io to crawl Ethereum backward until we had everything. It might sound simple, but there were plenty of complications along the way. Our code wasn’t great initially, and we accidentally flooded ArchiveNode with requests, but they were super helpful and worked with us to debug the problem. We’d highly recommend working with them. Eventually, we had a crawler maintaining a searchable index of all ERC721s and 1155s on mainnet.
Next, we needed to monitor all ETH events. As you might imagine, the number of ETH events on mainnet is enormous (and will only grow as more people use Ethereum). We listened for new ERC721 and ERC1155 mint events for new blocks, and once we discovered any new ERC721s or ERC1155s, we added them and listened for more.
In terms of storage, we stored everything in Firestore as a document store, allowing us to have a rigid schema for on-chain events and a flexible schema for NFT assets. One thing we quickly discovered is that Firestore is cheap… until it suddenly becomes VERY expensive! We also learned that the Google Cloud Firestore tools are way better than the Firebase tools.
When it came to scaling, we used some queues, but they became a bit more complicated and expensive than they were worth. This resulted in a lot of refactoring as the scale of the problem came into focus, and we needed to make everything more sustainable.
NFTs have a metadata problem. In terms of the data itself, we started to feel pain pretty quickly. Normalizing the data was going to be a thankless task. The challenges ranged from ERC721 and ERC1155 specs not being followed, no established metadata spec, and ultimately, anything you can imagine in terms of bad data was possible. We have a lot of empathy for anyone who is doing this. You are our heroes!
By the time we crawled everything and had everything indexed, we found ourselves storing a LOT of data. In total, we had about 105TB in images and assets. The cache of the asset data was about 10TB; even the on-chain data was massive! Between the storage and constant crawling, things started to get a bit expensive to run (~$10K/month).
Search needs to be fast (and affordable)
With all this data indexed and available to us, we just needed to search it, and we landed on Typesense as our search engine. It was speedy for the end user and compatible with our desire for “instant search.” Just as importantly, it wasn’t super expensive.
We started with Algolia, but from a cost standpoint, it wasn’t sustainable for us with the amount of data we were already dealing with. We liked it, but we can’t really recommend them if you have a large set of data that’s growing.
Assets are hard
A significant and ongoing issue is that there aren’t any adopted standards when it comes to asset storage. IPFS has its challenges and can feel very temporary. If an asset isn’t pinned, it’ll become inaccessible, and even if it is pinned, it can be slow to access. There were also a LOT of 404s on assets. Whether they weren’t pinned, were google docs, had centralized hosts, or were dead projects, the assets were often unavailable, leaving user experience inconsistent.
In the end, we decided to host everything on our own servers. This was both good and bad. On the one hand, it meant everything was loaded much faster and with no 404s. On the other hand, it resulted in many images not loading because they were misconfigured, outdated, or no longer existed.
Onchain and Offchain Data
We pulled a lot of data together from on-chain, but it’s also clear that there’s a lot of valuable information about NFTs that’s not on-chain or stored in a normalized fashion. For example, when looking at NFTs, users often want to see floor prices and collection information. OpenSea is an excellent example of having a set of centralized and normalized data that’s also off-chain. If you’re looking for data about a collection, many users go to OpenSea. With that in mind, there’s a risk for anything that adds value but isn’t decentralizing/open. An interesting takeaway is that the combination of on-chain and off-chain data about NFTs may be essential for users.
Overall User Experience
When creating Fancy.art, we wanted to search individual NFTs across collections based on their metadata. This was great in that we could pull traits and coms, monalities together. It resulted in some great ways to discover fun and individual NFTs that led us to learn about collections we didn’t know existed.
We also wanted to create a way to search NFTs without explicitly being a marketplace. However, people are naturally thinking about collections and prices, which made the overall user experience feel like they were missing some context. For the average user, it may make the most sense for search discovery tools to be collection-centric.
There’s so much more to go into here around UX and design, so we’ll save up those lessons for a future writeup as we continue to learn!
Why we’re shutting down Fancy
Fancy.art has been a fun and challenging experiment, one we hope will inspire more unique ways to discover NFTs. The response was awesome and encouraging. Searching NFTs quickly was something people enjoyed, and we got a lot of positive feedback.
However, we’re putting Fancy on ice. We’re shutting it down primarily because our focus is elsewhere (we’re excited to share what we’re up to - stay tuned), and we need to be laser-focused. And ultimately, it can get expensive to run since it’s 100% of all the ERC721s and ERC1155s (including cached images, etc.).
We’re excited about the next set of work that’ll be happening in the NFT space. So much has already been built in a short time, and we’re pumped to see what’ll happen.
That said…
If you have some ideas on what we should do with it, join our Discord to let us know! We want NFTs to be searchable and our hard work to survive. And anything is possible!