Since the early 1990s, Adobe Acrobat has been ubiquitous for distribution of documents on corporate networks and the Internet. Google’s been indexing PDF files since 2001, and has said they pass page rank, but few take the time to optimize their files for search engines after publishing them.
By taking the time to optimize, you’re more likely to find attractive printouts of your company’s white papers, tips sheets, or instructions on a C-level executive’s desk, while a competitors’ messy web-page-printouts fall into the trash. Here are a few tips for PDF distribution to increases your chances of success in this growing area.
Tip 1: Convert all scanned PDFs into text using OCR. If you’ve scanned an article or something from paper, use the Adobe Acrobat Optical Character Recognition (OCR) function to make sure the file can be read by search engines. A quick test to see if your PDF file needs this is to try to select some text with your mouse. If it won’t work, you need to convert. OCR doesn’t always work perfectly – so if this is an important file, it’s worth it to have someone re-type it into a fresh document and put the PDF file back in place.
Tip 2: Give your PDF a meaningful title. Many think of their documents in the context of their website, never considering that it will need to stand alone in the Internet search results pages (SERPs) or document search results. Search Google for some PDF files and take note of how they appear in the results list. Which would you click on?
Tip 3: Separate words in the PDF filenames with dashes, not spaces or underbars. Search Engines prefer dashes as word breaks in filenames in their ranking algorithm. Think about the search phrase that users will most likely type to find the PDF document. For example, you might name a file “Ingersol-Rand-Annual-Report-2008.pdf,” instead of “Annual Report 08.pdf” as it better reflects the likely search query. In addition, keep in mind that PDF files are often saved on the users’ local computer, and you want it to be easy for them to see and find that file later on. If you change the URL of your PDF files, make sure to add a 301 permanent redirect from the old file to the new file to retain links and any ranking power the file had before.
Tip 4: Add alternative text descriptions to images and illustrations. If you’ve included photographs, charts, or other graphics in your file, make sure to enter the “Alternate Text” description for each under the Accessibility menu. This helps both search engines and people better understand your document. To reduce the size of the PDF files, look into various optimization techniques. Try this tool on a non-confidential file to see how much it reduces – if you gain a lot, you might want to look at your entire library for opportunities.
Tip 5: Check meta data. Take a close look at “creator” information or other hidden information in the document’s “meta text.” Make sure there is not embarrassing or confidential information hidden therein. Privacy experts say that many competitors and journalists pour over this data looking for tidbits day and night. I have found some pretty scary stuff in the meta data! Furthermore adding tags can help you define the “theme family” that a document belongs in – very useful both on and off the web, and in association with “reading order” (see tip 10.)
Tip 6: Add appropriate password protection. Search engines can’t access password protected files for the most part. If you plan to protect the entire document behind a password, make sure you fill in the document description thoroughly as this will be all search engines see. If password protection is only needed for copying/pasting or printing, then use the least restrictive password you can get by with. If only some of your PDF file is confidential, consider rewriting a version of the document with the confidential bits removed, and then add a link for the user to request the full document. Alternatively you can divide up the PDF file, linking the non-confidential bit to the password-protected confidential supplements.
Tip 7: Add an “introduction” page to help search engines. Similar to Tip 6, while you may not wish to alter the original document for search optimization, you can add a title page which includes information specifically designed to help. Adobe Acrobat and similar tools allow you to insert a page 1 as you need to. This should be the first page and include the title of your document, the date of publication, and a 200-300 word description of the file. Make sure the description includes important keywords early in the description can assist with relevant indexing.
Tip 8: Target the PDF file to “current version minus one, or two”. This will prevent users from being forced to download the latest version of Acrobat to see your document. Adobe allows you to change the compatibility level and you should use the most compatible version you can get away with while maintaining your document’s integrity. Most business documents are fine with two-version-old versions of PDF files. An exception would be if any major security concerns exist with an older version.
Tip 9: Take hyperlinks seriously from your PDF file. Links are an important consideration in the indexing of PDF files. By giving documents meaningful links and link text (often called “anchor text”) you give clues to the search engines about how you’d like the file ranked. Anchor text reading “click here” is useless in while “how to seal a deck” does a much better job. In addition, many usability experts recommend an indicator to tell users they’re about to open a PDF file. When you link to a separate document or web address in the file, remember that search engines will pass page rank through that link. In addition, offering diversity inf the anchor text you provide can improve the indexing of your pages.
Tip 10: Use “reading order” tags. SEO can be improved as well by altering the “reading order” on your tagged PDF files – you can indicate which areas of text capture the “essence” of the document and tagging those as “first to read.” Google seems to utilize this information when crawling PDF files. This method is ideal for large PDF files, since search engines allocate a limited amount of time to each page they crawl. If they stop looking at your page before the important content, it might rank poorly.
Bonus tip 11: As load time becomes an increasingly important SEO signal to Google and Bing, it pays to spend time optimizing your images and illustrations on PDF files. You want them to look good, but very often they are presented in too high of a resolution for their most likely use (viewing on screen.) Size them appropriately and use a tool like JPEGMini to losslessly compress them prior to insertion in the original document. If your document is already live, you may not wish to recreate it for a small file reduction.
Thinking like search engines and users for a few minutes prior to publishing will help you meet your marketing, operations, and customer support goals when documents are published and for their entire lifetime of use.
Postscript: Check out Galen De Young’s article on PDF Embedded Sharing Options
Postscript: A new article on PDF Optimization from Aaron Egan from Bruce Clay Australasia.