by Michael S. Kaplan, published on 2005/03/08 11:59 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/03/08/389675.aspx
Apologies for the title, I still cannot resist that sort of thing. Maybe one day....
If you have not read it yet, look at Language-specific processing #0 for more info about this series!
IFilter is one interface that you can use to lower the barriers between the engines that do the work of indexing and the data that may be sitting in proprietary formats. The documentation probably explains it better than I could here:
The IFilter interface scans documents for text and properties (also called attributes). It extracts chunks of text from these documents, filtering out embedded formatting and retaining information about the position of the text. It also extracts chunks of values, which are properties of an entire document or of well-defined parts of a document. IFilter provides the foundation for building higher-level applications such as document indexers and application-independent viewers.
Immediately several of what seems much like the shipping implementations of this feature like this will come to mind: Full Text Search in SQL Server, SharePoint, Exchange, and Index Server for starters. And then there are those like MSN Desktop Search, as well. All of the times that search suppots additional file formats. Imagine being able to get in on the fun to make sure your own format is supported for some type of indexing/searching?
This is a COM interface so to implement it you have to implement AddRef/Release/QueryInterface as always. The additional methods you have to implement:
The general topic about the IFilter interface has pointers to summaries, samples, instructions on building, applying and testing filters, as well as methods to bind to already existing IFilter implementations.
It is also nice to see such a great effort on the security side -- links and information to help guarantee that ISVs who write code against this interface do it securely. Throughout there are good warnings:
Caution IFilters for Indexing Service run in the Local System security context. They should be written to manage buffers and to stack correctly. All string copies must have explicit checks to guard against buffer overruns. You should always verify the allocated size of the buffer. You should always test the size of the data against the size of the buffer.
That and a link to secure code practices to consider when implementing these interfaces are a welcome touch as far as I am concerned (as it does no good for Microsoft to write secure code if an ISV writes a component with a security issue!).
Now note that this interface, this IFilter, is not really about language-specific processing as much as it is about format-specific processing. But one of the greatest strengths of a service like MS Search is the ability to apply it to different file formats. It makes IFilter a very important interface to stretch the boundaries of what can be searched.
And it gives the future topics, that deal with those more linguistic aspects of language-specific processing a much wider reach than they would otherwise have. So I will give IFilter an honorary "cool" status that I would usually reserve for things more linguisticalish :-)
This post was sponsored by "F" (U+0046, a.k.a. LATIN CAPITAL LETTER F)
A letter that realized it would never get to sponsor any of the fun "F" words while I am working for Microsoft, so it thought it should take "Filter" while it was available.
# Jonathan Payne on 9 Mar 2005 12:04 AM:
# bg on 9 Mar 2005 12:13 AM:
# RIO - Randektív Informatikai Oldal on 9 Mar 2005 3:56 AM:
# Michael Kaplan on 9 Mar 2005 3:59 AM:
# Michael Kaplan on 12 Mar 2005 9:22 PM:
# Stephen on 7 Oct 2008 3:24 AM:
A lot of the MSDN links on this subject seem dead now - could someone write a little update on this subject now?
# Michael S. Kaplan on 7 Oct 2008 9:16 AM:
Jerry Camel on 17 Dec 2008 11:08 AM:
I can't find a whole lot int he way of examples for developing an iFilter... Even the SDK references samples that don't seem to exist anymore.
I'm looking specifically for how to pass an embedded document on to it's appropriate iFilter.
Can you point me to some sample code? I suspect BindIFilterToStream will be involved, but I can't figure out exactly how...
Michael S. Kaplan on 17 Dec 2008 2:48 PM:
I would suggest looking over at http://blogs.msdn.com/ifilter/ for more information here....
Prakash Tandukar on 28 Dec 2008 12:39 AM:
I am able to read properties of *.docx file using ifilter but it does not read any property of *.doc (Microsoft Word 2003). What should be changed in ifilter code to read property of *.doc files as well.
go to newer or older post, or back to index or month or day