Your storefront web pages will be visited and indexed by search engines, allowing your website to be displayed in search results pages. If you want to prevent search engines from indexing parts of your storefront, you can update your search engine robots files. When switching to sitewide HTTPS, these files will automatically update so the appropriate pages are indexed.
Be careful! We do not recommend changing your Robots or HTTPS Robots text files unless you know exactly what you are doing. Changing these files will directly affect what pages are crawled by search engines, and hence affect what content users will be able to find as well as your search engine rankings.
If you choose to host your entire site on HTTPS, using our site-wide HTTPS feature, we’ll automatically back up and adjust your Robots and HTTPS Robot text files. You can find these backup files located in the root folder when you connect to your store via WebDAV. No updates to the files are necessary.
Viewing and Editing Your Search Engine Robots Files
To view or edit the Robots.txt file, go to Store Setup › Store Settings under the Website tab and scroll down to the Search Engine Robots section.
If you want to request that search engine robots not crawl a particular page or subdirectory, add Disallow: followed by the URL. For example:
- Disallow: /shipping-return-policy.html
- Disallow: /content/
These are examples of the default Robots text file for a non-sitewide HTTPS. See Sitewide HTTPS to see examples of the robots.txt for sitewide HTTPS sites. HTTP robots.txt will contain the following:
User-agent: AdsBot-Google Disallow: /account.php Disallow: /cart.php Disallow: /checkout.php Disallow: /finishorder.php Disallow: /login.php Disallow: /orderstatus.php Disallow: /postreview.php Disallow: /productimage.php Disallow: /productupdates.php Disallow: /remote.php Disallow: /search.php Disallow: /viewfile.php Disallow: /wishlist.php Disallow: /admin/ Disallow: /__socialshop/ User-agent: * Disallow: /account.php Disallow: /cart.php Disallow: /checkout.php Disallow: /finishorder.php Disallow: /login.php Disallow: /orderstatus.php Disallow: /postreview.php Disallow: /productimage.php Disallow: /productupdates.php Disallow: /remote.php Disallow: /search.php Disallow: /viewfile.php Disallow: /wishlist.php Disallow: /admin/ Disallow: /__socialshop/
There is also an HTTPS Robots text file, which prevents any secure version of a webpage from being crawled. By default, it contains the following:
User-agent: AdsBot-Google Disallow: /account.php Disallow: /cart.php Disallow: /checkout.php Disallow: /finishorder.php Disallow: /login.php Disallow: /orderstatus.php Disallow: /postreview.php Disallow: /productimage.php Disallow: /productupdates.php Disallow: /remote.php Disallow: /search.php Disallow: /viewfile.php Disallow: /wishlist.php Disallow: /admin/ Disallow: /__socialshop/ User-agent: * Disallow: / User-agent: google-xrawler Allow: /feeds/*
You can use these defaults if you want to revert to the original robot.txt files. They will only work if you are not using Sitewide HTTPS.
Files Disallowed by Default
User-agent: AdsBot-Google or * — If the value is *, that means that ALL bots/spiders should follow the disallow rules. If the value is "AdsBot-Google", then this line indicates that the following disallow rules apply only to “AdsBot-Google”. “AdsBot-Google” is the bot that Google uses to crawl landing pages associated with an advertisement, typically paid search, through the AdWords platform. However, “AdsBot-Google” is also used for display ads delivered through DoubleClick, AdWords and AdSense as well.
Disallow: /account.php — This line prevents AdsBot-Google from crawling the store account pages. More specifically, these pages are normally accessed when a store visitor registers with the store in order to complete a purchase, get status updates on their orders, etc. This does not relate to the store owner’s BigCommerce account.
Disallow: /cart.php — This prevents the crawling of the cart page. Since the cart page is dependent on the items a store user selects, it would be odd to have this page listed in search engines. Additionally, it would provide a poor user experience to new site visitors if they landed on a cart page with items that were selected by someone else.
Disallow: /checkout.php — This prevents the crawling of the checkout page. Like the cart page, this page is dependent on user input and would be of no value as a search result. Additionally, the checkout page could contain sensitive data such as name, email, addresses, credit card info, etc, so by preventing this page from appearing in search engines, BigCommerce protects the personal data of consumers purchasing from any store and maintains PCI compliance.
Disallow: /finishorder.php — Finishorder.php typically contains a lot of personal data. By preventing search engines from crawling this page, BigCommerce protects consumer data and maintains PCI compliance.
Disallow: /login.php — This prevents the crawling of the store customer login page. Since this page has very little content and no value to new visitors of the store, it is blocked from search engines.
Disallow: /orderstatus.php — The order status page requires a user to login before being able to see the content of this page. Since search engines do not have store accounts and can not input data into text fields, this page is blocked.
Disallow: /postreview.php — Similarly to the orderstatus.php page, a user is required to login before being able to post a product review. Since search engines do not have store accounts and can not input data into text fields, this page is blocked.
Disallow: /productimage.php — Productimage.php is used to create a jquery lightbox window on product pages. This is, typically, executed when a user clicks on a product image on a product page. Because the pop up window is not a dedicated page with it’s own URL and because the pop up window duplicates the some of the text on the product page, it is blocked to prevent duplicate content, missing title tag and description warnings in search console (webmaster tools); and thin content penalties.
Disallow: /productupdates.php — No longer used.
Disallow: /remote.php — Used for the store AJAX calls and does not actually produce a page that is usable by a human.
Disallow: /search.php — This page handles searches from the search box on a store. Google has previously stated that search results pages are not something they want in their index because it creates a poor user experience (going from a search results page to another search results page instead of going directly to the result).
Disallow: /viewfile.php — Used to attach files to an order. This typically happens with digital transactions (ie. digital downloads, pdfs, etc). Because the object being sold is a digital good, having it indexed would make it available to those who did not purchase the file.
Disallow: /wishlist.php — Wishlist.php is user dependent and would provide little to no value to searchers. Additionally, depending on how many products a user adds to a wishlist, the pages could be considered thin content and/or duplicate content. So to prevent a poor user experience and eliminate concerns around thin/duplicate content, this page is blocked.
Disallow: /admin/ — The store login path is blocked for security reasons. By making the login page hard to find to non-BigCommerce clients, hackers are somewhat thwarted from a direct attack. Additionally, this page would be of no value to a searcher.
Disallow: /__socialshop/ — This path is used for the Social Shop integration with Facebook. This allows the page to be rendered in an iframe on Facebook. Since the page is configured for Facebook, by allowing it to be crawled would create a less than ideal user experience and may cause the page to not function correctly.
Disallow: /?_bc_fsnf=1 — This blocks bots from following faceted search links and causing performance issues.
Disallow: /&_bc_fsnf=1 — This blocks bots from following faceted search links and causing performance issues.
Will my trial store be indexed by search engines?
- All BigCommerce trials are set to private when they are created, thus keeping search engines from indexing your store while still under development, and prevents the public from browsing your store until it's ready for launch.
Google Search Console is showing the warning "Indexed, though blocked by robots.txt" for some of my URLs. What should I do?
- By default, robots.txt file blocks URLs that pertain to customer checkout and accounts. These should be blocked for security reasons. The warnings can be disregarded if you have not altered the robots.txt file. The warning is only meant to notify you that some URLs are blocked and if it was intentional.