I just saw a good article from Mike Taghizadeh describing reasons to do a full crawl at regular intervals. Good article, as I knew about the first 2 reasons and the second last one, but the third one really caught my eye – The crawler does not detect updates in SharePoint ASPX pages – so update a page’s content or change a view, and these changes will only get picked up during a full crawl.
When planning your search & indexing schedule, take this into account and you will probably have to set up daily full indexes for SharePoint sites (after hours, of course).
From Mike’s article (http://feeds.feedburner.com/~r/sharepointmsblogs/~3/134295959/reasons-for-a-full-crawl.aspx):
I have been asked few times, the reasons why MOSS Search would need to do a full crawl. The following information has been taken out from one the whitepapers on TechNet and does a good job of explaining this:
Reasons for an SSP administrator to do a full crawl include:
- One or more QFE or service pack was installed on servers in the farm. See the instructions for the hotfix or service pack for more information.
- An SSP administrator added a new managed property.
- To re-index ASPX pages on Windows SharePoint Services 3.0 or Office SharePoint Server 2007 sites.
Note: The crawler cannot discover when ASPX pages on Windows SharePoint Services 3.0 or Office SharePoint Server 2007 sites have changed. Because of this, incremental crawls do not re-index views or home pages when individual list items are deleted. We recommend that you periodically do full crawls of sites that contain ASPX files to ensure that these pages are re-indexed.
- To resolve consecutive incremental crawl failures. In rare cases, if an incremental crawl fails one hundred consecutive times at any level in a repository, the index server removes the affected content from the index.
- One or more crawl rules have been added or modified
- To repair a corrupted index
The system does a full crawl even when an incremental crawl is requested under the following circumstances:
- An SSP administrator stopped the previous crawl.
- A content database was restored.
- A full crawl of the site has never been done.
- To repair a corrupted index. Depending upon the severity of the corruption, the system might attempt to perform a full crawl if corruption is detected in the index
There’s also a comment at the bottom of the article that indicates that you also need to do a Full Index in order to pull down the new ACL’s of a file if the access list was changed but the file was not – otherwise it’s possible that a user would see a search result linking to a file they do not have access to view (so the security trimming fails).
***UPDATE – SP1 and Post-SP1 Hotfix rollup*** – The incremental indexing of files now also looks at ACL settings and updates them if necessary, provided you have applied the Post-SP1 hotfix rollup for WSS – KB941422