Why do repeated full Crawls using WSS / MOSS?

I just saw a good article from Mike Taghizadeh describing reasons to do a full crawl at regular intervals. Good article, as I knew about the first 2 reasons and the second last one, but the third one really caught my eye – The crawler does not detect updates in SharePoint ASPX pages – so update a page’s content or change a view, and these changes will only get picked up during a full crawl.

When planning your search & indexing schedule, take this into account and you will probably have to set up daily full indexes for SharePoint sites (after hours, of course).

From Mike’s article (http://feeds.feedburner.com/~r/sharepointmsblogs/~3/134295959/reasons-for-a-full-crawl.aspx):
____________________________________________________________________

I have been asked few times, the reasons why MOSS Search would need to do a full crawl. The following information has been taken out from one the whitepapers on TechNet and does a good job of explaining this:

Reasons for an SSP administrator to do a full crawl include:

  • One or more QFE or service pack was installed on servers in the farm. See the instructions for the hotfix or service pack for more information.
  • An SSP administrator added a new managed property.
  • To re-index ASPX pages on Windows SharePoint Services 3.0 or Office SharePoint Server 2007 sites.

    Note: The crawler cannot discover when ASPX pages on Windows SharePoint Services 3.0 or Office SharePoint Server 2007 sites have changed. Because of this, incremental crawls do not re-index views or home pages when individual list items are deleted. We recommend that you periodically do full crawls of sites that contain ASPX files to ensure that these pages are re-indexed.

  • To resolve consecutive incremental crawl failures. In rare cases, if an incremental crawl fails one hundred consecutive times at any level in a repository, the index server removes the affected content from the index.
  • One or more crawl rules have been added or modified
  • To repair a corrupted index

The system does a full crawl even when an incremental crawl is requested under the following circumstances:

  • An SSP administrator stopped the previous crawl.
  • A content database was restored.
  • A full crawl of the site has never been done.
  • To repair a corrupted index. Depending upon the severity of the corruption, the system might attempt to perform a full crawl if corruption is detected in the index

__________________________________________________________________

There’s also a comment at the bottom of the article that indicates that you also need to do a Full Index in order to pull down the new ACL’s of a file if the access list was changed but the file was not – otherwise it’s possible that a user would see a search result linking to a file they do not have access to view (so the security trimming fails).

***UPDATE – SP1 and Post-SP1 Hotfix rollup*** – The incremental indexing of files now also looks at ACL settings and updates them if necessary, provided you have applied the Post-SP1 hotfix rollup for WSS – KB941422

Bye 🙂

Advertisements

About Brad Saide

I'm a SharePoint consultant. I'm also slowly going bald, seem to have a permanent spare tyre around my waist and enjoy socialising with friends over a beer or 10. The last 2 may possibly be related. Started working with SharePoint when the first version was in limited beta release (participated in the Technology Adoption Program while at Woolworths) and have been committed to the adoption of the technology as a business enabler ever since.
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Why do repeated full Crawls using WSS / MOSS?

  1. Chester Thomas says:

    Great post thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s