Thursday 4 October 2012

External urls

If there are external urls on your website and you want to followed or indexed by search engine. 

There are two parameters; "index" and "follow".

"Index" refers to whether or not you want that particular page to be indexed.  


"Follow" refers to whether or not you want the spider to follow the links on that page.

You would decide whether you wanted the page indexed, and whether you wanted the links followed. Based on that decision, you would chose one of the above options

 This is very important for SEO. 

eg: http://sitejabber.com is a external link on your website, then in html of the link.

<a href="http://sitejabber.com" rel=" noindex , nofollow" target="_blank">Click here</a>

or 

<a href="http://sitejabber.com" rel="index , nofollow" target="_blank">Click here</a>

Robot metatags


About the Robots <META> tag

In a nutshell

You can use a special HTML <META> tag to tell robots not to index the content of a page, and/or not scan it for links to follow.
For example:
<html>
<head>
<title>...</title>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
</head>
There are two important considerations when using the robots <META> tag:
  • robots can ignore your <META> tag. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the NOFOLLOW directive only applies to links on this page. It's entirely likely that a robot might find the same links on some other page without a NOFOLLOW (perhaps on some other site), and so still arrives at your undesired page.
Don't confuse this NOFOLLOW with the rel="nofollow" link attribute.

The details

Like the /robots.txt, the robots META tag is a de-facto standard. It originated from a "birds of a feather" meeting at a1996 distributed indexing workshop, and was described in meeting notes.
The META tag is also described in the HTML 4.01 specification, Appendix B.4.1.
The rest of this page gives an overview of how to use the robots <META> tags in your pages, with some simple recipes. To learn more see also the FAQ.

How to write a Robots Meta Tag

Where to put it

Like any <META> tag it should be placed in the HEAD section of an HTML page, as in the example above. You should put it in every page on your site, because a robot can encounter a deep link to any page on your site.

What to put into it

The "NAME" attribute must be "ROBOTS".
Valid values for the "CONTENT" attribute are: "INDEX", "NOINDEX", "FOLLOW", "NOFOLLOW". Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots <META> tag, the default is "INDEX,FOLLOW", so there's no need to spell that out. That leaves:
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

To secure specific urls of a website

Sometimes we need some urls https and others http, This can also done bye htaccess easily.


### AR SSL ######
RewriteEngine on 
Options +FollowSymlinks 
RewriteCond %{SERVER_PORT} ^443$ 
RewriteRule ^robots.txt$ robots_ssl.txt

In robot.txt i have given the some specific folder path or some file type which can be allow or disallow to a visitor

robot.txt:

# allow all
User-agent: *

Disallow: /backup/
Disallow: /site/
Disallow: /*.php
Disallow: /*% 
Disallow: /*? 
 
robot_ssl.txt
# SSL robots

User-agent: *
Disallow: / 

#User-agent: *
Allow: /signup/
Allow: /register/
Allow: /user/forgotpass/
 
So signup, forgot password and login page would be https and others would be http.
 
please give feedback if you want to.

To restrict files acess on server from htaccess


There should be security on your web server to restrict the access of some file types. This can be done by htaccess file.
# Exclude file types # AR 26/6/2012
<FilesMatch "\.(htaccess|htpasswd|ini|log|sh|inc|bak|tar|gz|sql|zip)$">
Order Allow,Deny
Deny from all
</FilesMatch>
 
Enjoy! 

Add trails in all urls of website

Google search is very much based on '/' in the url's end. We can do it easily with htaccess. Here is the code.


### AR add trailing slash ######
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://www.example.com/$1/ [L,R=301]

Hope this would be useful for you!!