Robots META tag

**geosoft** · 26th November 2004, 11:55

Desi multora le este cunoscut acest lucru nu toti s-au nascut invatati si consider util pentru unii acest mic indrumator si altele care vor urma.

Ce face acest meta tag si cum il putem utiliza in site-ul noustru.

In primul rand acest meta tag este adresat botilor, adica motoarelor de cautare precum google msn yahoo etc. Acest lucru este diferit de excluderea botilor folosind robots.txt de care o sa vorbesc mai tarziu.

Cu ajutorul acestui meta tag noi ii indicam motorului de cautare doua lucruri:
- sa indexeze pagina in cauza sau sa nu o indexeze
- sa urmareasca mai departe link-urile din pagina curenta sau sa nu faca acest lucru.

Cum facem acest lucru? Simplu!

Acest meta arata in felul urmator:
<META NAME="ROBOTS" CONTENT="">

bineinteles ca nu lasam CONTENT="" si aici avem urmatoarele optiuni dupa cum am spus:
- sa indexeze pagina "INDEX" | sa nu indexeze pagina "NOINDEX"
- sa urmeze link-urile "FOLLOW" | sa nu urmeze link-urile "NOFOLLOW"

si in final cateva exemple.

<meta name="robots" content="index,follow">
<meta name="robots" content="noindex,follow">
<meta name="robots" content="index,nofollow">
<meta name="robots" content="noindex,nofollow">

bineinteles ca mai sunt si alte directive pe care le putem da, unul dintre acestea fiind NOARCHIVE sau ARCHIVE.
Acesta arata in felul urmator <META NAME="robots" CONTENT="noarchive"> si il putem combina cu una din situatiile din exemplul de mai sus. Practic ajoritatea motoarelor de cautare stocheaza o copie a paginii noastre pe unul din serverele lor. Acest util acest lucru mai ales atunci cand site-ul nostru a intampinat ceva probleme si nu mai este online. Daca nu dorim ca motorul de cautare s faca cache, folosim NOARCHIVE. Bineinteles ca ARCHIVE este exact opusul la NOARCHIVE.

si un mic exemplu
<META NAME="robots" CONTENT="noindex,nofollow,noarchive">

**geosoft** · 26th November 2004, 13:40

De asemenea botii(spider-ii), adica motoarele de cautare despre care am vorbit, cauta in radacina site-ului un fisier denumit robots.txt
daca acesta exista el urmeaza instructiunile pe care noi scriem acolo, mai precis ce pagini sau directoare sa urmeze (download) sau ce pagini/directoare sa nu urmeze

sa explicam asta cu un mic exemplu

User-agent: *
Disallow: /

- sa explicam putin. Cu "User-agent: *" noi specificam motorul de cautare, iar in cazul nostru * inseamna orice motor de cautare
- cu "Disallow: /" noi am specificat directorul radacina, in concluzie el nu are ce cauta pe site-ul nostru datorita aceste linii scrise de noi. Opusul este "Disallow:"

cateva mici exemple

pentru ca sa nu viziteze un fisier folosim
User-agent: *
Disallow: fisier.html

iar pentru un director folosim
User-agent: *
Disallow: /cgi-bin/

bineinteles putem sa specificam pentru ce motor de cautare sa fie acea regula pusa de noi

User-agent: googlebot
sau
User-agent: WebCrawler

si in final lista noastra poate arata cam asa

User-agent: *
Disallow:
Disallow: /cgi-bin/
Disallow: /images/
User-agent: Googlebot
Disallow:
User-agent: Googlebot-Image
Disallow: /
User-agent: MSNBot
Disallow:
User-agent: Teoma
Disallow: /
User-agent: Gigabot
Disallow: /
User-agent: Scrubby
Disallow: /
User-agent: Robozilla
Disallow:

**geosoft** · 26th November 2004, 13:44

User-agent: Mozilla/3.0 (compatible;miner;mailto:miner@miner.com.br)
Disallow:

User-agent: WebFerret
Disallow:

User-agent: Due to a deficiency in Java it's not currently possible
to set the User-agent.
Disallow:

User-agent: no
Disallow:

User-agent: 'Ahoy! The Homepage Finder'
Disallow:

User-agent: Arachnophilia
Disallow:

User-agent: ArchitextSpider
Disallow:

User-agent: ASpider/0.09
Disallow:

User-agent: AURESYS/1.0
Disallow:

User-agent: BackRub/*.*
Disallow:

User-agent: Big Brother
Disallow:

User-agent: BlackWidow
Disallow:

User-agent: BSpider/1.0 libwww-perl/0.40
Disallow:

User-agent: CACTVS Chemistry Spider
Disallow:

User-agent: Digimarc CGIReader/1.0
Disallow:

User-agent: Checkbot/x.xx LWP/5.x
Disallow:

User-agent: CMC/0.01
Disallow:

User-agent: combine/0.0
Disallow:

User-agent: conceptbot/0.3
Disallow:

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow:

User-agent: root/0.1
Disallow:

User-agent: CS-HKUST-IndexServer/1.0
Disallow:

User-agent: CyberSpyder/2.1
Disallow:

User-agent: Deweb/1.01
Disallow:

User-agent: DragonBot/1.0 libwww/5.0
Disallow:

User-agent: EIT-Link-Verifier-Robot/0.2
Disallow:

User-agent: Emacs-w3/v[0-9.]+
Disallow:

User-agent: EmailSiphon
Disallow:

User-agent: EMC Spider
Disallow:

User-agent: explorersearch
Disallow:

User-agent: Explorer
Disallow:

User-agent: ExtractorPro
Disallow:

User-agent: FelixIDE/1.0
Disallow:

User-agent: Hazel's Ferret Web hopper,
Disallow:

User-agent: ESIRover v1.0
Disallow:

User-agent: fido/0.9 Harvest/1.4.pl2
Disallow:

User-agent: Hämähäkki/0.2
Disallow:

User-agent: KIT-Fireball/2.0 libwww/5.0a
Disallow:

User-agent: Fish-Search-Robot
Disallow:

User-agent: Mozilla/2.0 (compatible fouineur v2.0;
fouineur.9bit.qc.ca)
Disallow:

User-agent: Robot du CRIM 1.0a
Disallow:

User-agent: Freecrawl
Disallow:

User-agent: FunnelWeb-1.0
Disallow:

User-agent: gcreep/1.0
Disallow:

User-agent: ???
Disallow:

User-agent: GetURL.rexx v1.05
Disallow:

User-agent: Golem/1.1
Disallow:

User-agent: Gromit/1.0
Disallow:

User-agent: Gulliver/1.1
Disallow:

User-agent: yes
Disallow:

User-agent: AITCSRobot/1.1
Disallow:

User-agent: wired-digital-newsbot/1.5
Disallow:

User-agent: htdig/3.0b3
Disallow:

User-agent: HTMLgobble v2.2
Disallow:

User-agent: no
Disallow:

User-agent: IBM_Planetwide,
Disallow:

User-agent: gestaltIconoclast/1.0 libwww-FM/2.17
Disallow:

User-agent: INGRID/0.1
Disallow:

User-agent: IncyWincy/1.0b1
Disallow:

User-agent: Informant
Disallow:

User-agent: InfoSeek Robot 1.0
Disallow:

User-agent: Infoseek Sidewinder
Disallow:

User-agent: InfoSpiders/0.1
Disallow:

User-agent: inspectorwww/1.0
http://www.greenpac.com/inspectorwww.html
Disallow:

User-agent: 'IAGENT/1.0'
Disallow:

User-agent: IsraeliSearch/1.0
Disallow:

User-agent: JCrawler/0.2
Disallow:

User-agent: Jeeves v0.05alpha (PERL, LWP, lglb@doc.ic.ac.uk)
Disallow:

User-agent: Jobot/0.1alpha libwww-perl/4.0
Disallow:

User-agent: JoeBot,
Disallow:

User-agent: JubiiRobot
Disallow:

User-agent: jumpstation
Disallow:

User-agent: Katipo/1.0
Disallow:

User-agent: KDD-Explorer/0.1
Disallow:

User-agent: KO_Yappo_Robot/1.0.4(http://yappo.com/info/robot.html)
Disallow:

User-agent: LabelGrab/1.1
Disallow:

User-agent: LinkWalker
Disallow:

User-agent: logo.gif crawler
Disallow:

User-agent: Lycos/x.x
Disallow:

User-agent: Lycos_Spider_(T-Rex)
Disallow:

User-agent: Magpie/1.0
Disallow:

User-agent: MediaFox/x.y
Disallow:

User-agent: MerzScope
Disallow:

User-agent: NEC-MeshExplorer
Disallow:

User-agent: MOMspider/1.00 libwww-perl/0.40
Disallow:

User-agent: Monster/vX.X.X -$TYPE ($OSTYPE)
Disallow:

User-agent: Motor/0.2
Disallow:

User-agent: MuscatFerret
Disallow:

User-agent: MwdSearch/0.1
Disallow:

User-agent: NetCarta CyberPilot Pro
Disallow:

User-agent: NetMechanic
Disallow:

User-agent: NetScoop/1.0 libwww/5.0a
Disallow:

User-agent: NHSEWalker/3.0
Disallow:

User-agent: Nomad-V2.x
Disallow:

User-agent: NorthStar
Disallow:

User-agent: Occam/1.0
Disallow:

User-agent: HKU WWW Robot,
Disallow:

User-agent: Orbsearch/1.0
Disallow:

User-agent: PackRat/1.0
Disallow:

User-agent: Patric/0.01a
Disallow:

User-agent: Peregrinator-Mathematics/0.7
Disallow:

User-agent: Duppies
Disallow:

User-agent: Pioneer
Disallow:

User-agent: PGP-KA/1.2
Disallow:

User-agent: Resume Robot
Disallow:

User-agent: Road Runner: ImageScape Robot (lim@cs.leidenuniv.nl)
Disallow:

User-agent: Robbie/0.1
Disallow:

User-agent: ComputingSite Robi/1.0 (robi@computingsite.com)
Disallow:

User-agent: Roverbot
Disallow:

User-agent: SafetyNet Robot 0.1,
Disallow:

User-agent: Scooter/1.0
Disallow:

User-agent: not available
Disallow:

User-agent: Senrigan/xxxxxx
Disallow:

User-agent: SG-Scout
Disallow:

User-agent: Shai'Hulud
Disallow:

User-agent: SimBot/1.0
Disallow:

User-agent: Open Text Site Crawler V1.0
Disallow:

User-agent: SiteTech-Rover
Disallow:

User-agent: Slurp/2.0
Disallow:

User-agent: ESISmartSpider/2.0
Disallow:

User-agent: Snooper/b97_01
Disallow:

User-agent: Solbot/1.0 LWP/5.07
Disallow:

User-agent: Spanner/1.0 (Linux 2.0.27 i586)
Disallow:

User-agent: no
Disallow:

User-agent: Mozilla/3.0 (Black Widow v1.1.0; Linux 2.0.27; Dec 31
1997 12:25:00
Disallow:

User-agent: Tarantula/1.0
Disallow:

User-agent: tarspider
Disallow:

User-agent: dlw3robot/x.y (in TclX by http://hplyot.obspm.fr/~dl/)
Disallow:

User-agent: Templeton/
Disallow:

User-agent: TitIn/0.2
Disallow:

User-agent: TITAN/0.1
Disallow:

User-agent: UCSD-Crawler
Disallow:

User-agent: urlck/1.2.3
Disallow:

User-agent: Valkyrie/1.0 libwww-perl/0.40
Disallow:

User-agent: Victoria/1.0
Disallow:

User-agent: vision-search/3.0'
Disallow:

User-agent: VWbot_K/4.2
Disallow:

User-agent: w3index
Disallow:

User-agent: W3M2/x.xxx
Disallow:

User-agent: WWWWanderer v3.0
Disallow:

User-agent: WebCopy/
Disallow:

User-agent: WebCrawler/3.0 Robot libwww/5.0a
Disallow:

User-agent: WebFetcher/0.8,
Disallow:

User-agent: weblayers/0.0
Disallow:

User-agent: WebLinker/0.0 libwww-perl/0.1
Disallow:

User-agent: no
Disallow:

User-agent: WebMoose/0.0.0000
Disallow:

User-agent: Digimarc WebReader/1.2
Disallow:

User-agent: webs@recruit.co.jp
Disallow:

User-agent: webvac/1.0
Disallow:

User-agent: webwalk
Disallow:

User-agent: WebWalker/1.10
Disallow:

User-agent: WebWatch
Disallow:

User-agent: Wget/1.4.0
Disallow:

User-agent: w3mir
Disallow:

User-agent: no
Disallow:

User-agent: WWWC/0.25 (Win95)
Disallow:

User-agent: none
Disallow:

User-agent: XGET/0.7
Disallow:

User-agent: Nederland.zoek
Disallow:

User-agent: BizBot04 kirk.overleaf.com
Disallow:

User-agent: HappyBot (gserver.kw.net)
Disallow:

User-agent: CaliforniaBrownSpider
Disallow:

User-agent: EI*Net/0.1 libwww/0.1
Disallow:

User-agent: Ibot/1.0 libwww-perl/0.40
Disallow:

User-agent: Merritt/1.0
Disallow:

User-agent: StatFetcher/1.0
Disallow:

User-agent: TeacherSoft/1.0 libwww/2.17
Disallow:

User-agent: WWW Collector
Disallow:

User-agent: processor/0.0ALPHA libwww-perl/0.20
Disallow:

User-agent: wobot/1.0 from 206.214.202.45
Disallow:

User-agent: Libertech-Rover www.libertech.com?
Disallow:

User-agent: WhoWhere Robot
Disallow:

User-agent: ITI Spider
Disallow:

User-agent: w3index
Disallow:

User-agent: MyCNNSpider
Disallow:

User-agent: SummyCrawler
Disallow:

User-agent: OGspider
Disallow:

User-agent: linklooker
Disallow:

User-agent: CyberSpyder (amant@www.cyberspyder.com)
Disallow:

User-agent: SlowBot
Disallow:

User-agent: heraSpider
Disallow:

User-agent: Surfbot
Disallow:

User-agent: Bizbot003
Disallow:

User-agent: WebWalker
Disallow:

User-agent: SandBot
Disallow:

User-agent: EnigmaBot
Disallow:

User-agent: spyder3.microsys.com
Disallow:

User-agent: www.freeloader.com.
Disallow:

User-agent: Googlebot
Disallow:

User-agent: METAGOPHER
Disallow:

User-agent: *
Disallow: /

**Brindusa** · 15th January 2005, 14:29

Salut,
Deci, sa vad daca am inteles bine.
Daca vreau ca toate paginile mele sa fie indexate, si din toate paginile sa urmeze celelalte link-uri, trebuie sa includ in fiecare pagina
<meta name="robots" content="index,follow">
Corect?

**Razvan Pop** · 15th January 2005, 14:31

Salut,

Da, este corect. Dar nu necesar. Daca nu pui nimic care sa blocheze accesul spideri-lor ei vor indexa paginile si vor urma link-urile din ele.

**Brindusa** · 15th January 2005, 14:32

Eu am ceva de genul <meta name="robots" content="all">

**Razvan Pop** · 15th January 2005, 14:36

E ok. Asta inseamna ca pot sa indexe tot si sa urmeze link-urile.

Subiect: Robots META tag

Instrumente subiect

Afișează

Robots META tag

robots.txt - Robots Exclusion Standard

robots.txt - lista de roboti

Informații subiect

Utilizatori care navighează în acest subiect

Thread-uri Similare

Google si robots.txt

robots.txt unreachable

robots.txt

meta name= robots

robots.txt

Permisiuni postare