Do you want users to visit your site? For that you need to make sure that Google visits your site. This is done by Google’s expert bots called crawlers through robots.txt file.
Crawlers are named so because of their ability to go through your website’s pages and help Google decide whether users will find your website relevant.
In this blog, we’re here to teach you the method to create a custom robots.txt file. It’s a fairly easy and quick task in the technical SEO front; let’s get right to it!
How to create a custom robots.txt file
As an SEO agency, we’ve created this file for many of our clients and have enabled crawlers to visit their sites and index them accordingly. It’s super easy, so we’d like to share this SEO technique with you. You can get your website crawled quickly and make it every user’s dream destination for buying your products and services.
1. Create a file named robots.txt
Firstly, you need a text editor to create a robots.txt file. It could be any, like a notepad, TextEdit, and emacs. However, you cannot use a word processor as word processors will save your robots.txt files in a proprietary format which may add unexpected characters, like curly quotes. Crawlers find such character placements confusing.
Also, you must ensure to save the file with UTF-8 encoding if you are prompted during the save file dialog.
What you should know about the format and location rules:
- Make sure to name the file robots.txt.
- You cannot maintain more than one robots.txt file.
- The robots.txt file must be placed at the root of the website host to which it applies.
For instance, to manage crawling on all URLs below https://www.sitepage.com/, the robots.txt file should be placed at https://www.sitepage.com/robots.txt.
Do not place the file in a subdirectory (for example, at https://example.com/pages/robots.txt). Yes, technical SEO can be a little tricky, so we recommend you contact your SEO agency to help you. Don’t have one yet, connect with us today.
- A robots.txt file can also apply to subdomains (for example, https://website.subdomain.com/robots.txt) or on non-standard ports (for example, http://example.com:7272/robots.txt).
- Ensure that the robots.txt file is a UTF-8 encoded text file which includes ASCII. This is because Google may ignore characters that are not part of the UTF-8 range. And this will lead to rendering the robots.txt rules invalid.
2. Add rules to your robots.txt file
Do you want crawlers to crawl your entire website? You might want that but any expert and seasoned SEO agency would recommend against it. Here’s why; crawlers will establish your data to Google, making it available publicly, to the users that you target. You don’t want this because some of your site pages may contain confidential and private information, like transactions, user account details, passwords, and so on.
That is why you need to set a few rules for your robots.txt file to make sure that the crawlers are aware of which pages to crawl and which ones to ignore.
- A robots.txt file consists of one or more groups that consist of one or more rules. Each group begins with a user-agent line that specifies the group’s target.
- The group provides information about which user-agent it applies to, the instructions or files that the agent can access, and they cannot access.
- Crawlers process the groups from top to bottom, where the user-agent can match only one rule set. This is usually the first specific group that aligns with a given user-agent.
- Remember that the user-agent can crawl any page that is not blocked by the disallowFor example, disallow: /file.asp applies to https://www.sitepage.com/file.asp, but not https://www.sitepage.com/FILE.asp
- It’s essential to remember that the (#) character marks the beginning of a comment.
Now, you need to also know that Google’s crawlers follow these given directives in the robots.txt file below: (this might get a little technical; bear with us!)
User-agent: The directive specifies the name of the search engine crawler that the rule applies to. This is the first line for any rule group.
Disallow: The crawler will not crawl this page. Note that this rule when applied will be applied to the entire page.
Allow: The crawler will crawl this entire page. For a single page, you can specify the full page name as it is in the browser, and in case of a directory, you may end the rule with (/) mark.
Sitemap: Sitemaps are a great way to enable Google to crawl specific pages and ignore other pages not mentioned. Also, the sitemap URL should be a fully qualified URL, because Google will not check links like http/https/www.non-www.
3. Upload the robots.txt file to your site
You’ve added all the rules, and done whatever technical SEO you’ve had to do so far. The time has come to upload the robots.txt file. So, once you save your file to your computer, you can make it available to search engine crawlers. The tool that you use to upload the file will depend on your site and server architecture.
4. Test the robots.txt file
Once you have uploaded the file, you have to test whether it is publicly accessible and if Google can read it effectively.
For this, you can open a private browsing window in your browser and go over to the location of the robots.txt file. If you are able to locate and see the contents of your robots.txt file, your next step is to test the markup.
You can test it in the Search Console tool that is already accessible on your site. Alternatively, your website developer can check out and build Google’s open source robots.txt library. This tool can be used to test the files locally on your computer.
Once you submit your robots.txt file to Google, Google will begin crawling your website automatically. From here onwards, Google will do the rest, while you can sit back with a job well done.
If you need an SEO agency to complete this technical SEO task for your website, contact us today.