Since the early 1990s, Adobe Acrobat has been ubiquitous for distribution of documents on corporate networks and the Internet. Google’s been indexing PDF files since 2001, and has said they pass page rank, but few take the time to do PDF SEO work on their files to be sure they’re visible.
By taking the time to optimize, you’re more likely to find attractive printouts of your company’s white papers, tips sheets, or instructions on a C-level executive’s desk, while a competitor’s’ messy web-page-printouts fall into the trash. Here are a few tips for PDF distribution to increases your chances of success in this growing area.
Update: I do want to say that, if possible, I still recommend that your primary content be HTML content and that PDF files be used as print-friendly versions of that content. If you do this, you must be sure to add a canonical directive to the PDF file to show search engines which of the two “nearly identical” pieces of content it should consider the main one. With PDF, you must do this by responding to Googlebot/Bingbot requests with a Canonical directive in the server header. This is done by adjusting the server slightly and then modifying the links to the PDF versions a bit. Here’s Google’s page on how to do this (scroll down to the PDF section) … it’s not that hard if you think about it in advance!
But let’s say the PDF versions of the files is all you have…. These on-page PDF SEO tips can help you get content found. I have used these tips on hundreds of PDF files and their position and visibility increased universally:
Tip 1: Convert all scanned PDFs into text using OCR. If you’ve scanned an article or something from paper, use the Adobe Acrobat Optical Character Recognition (OCR) function to make sure the file can be read by search engines. A quick test to see if your PDF file needs this is to try to select some text with your mouse. If it won’t work, you need to convert. OCR doesn’t always work perfectly – so if this is an important file, it’s worth it to have someone re-type it into a fresh document and put the PDF file back in place. If the document is SO messy that it cannot be scanned, you should seriously consider how important it is to have online. If it is that important, create a new HTML or PDF file with the content in text form. If creation of this content is not feasible, see tip 7 for information on creating an introduction page.
Tip 2: Give your PDF a meaningful title that fits into 60 characters. Many think of their documents in the context of their website, never considering that it will need to stand alone in the Internet search results pages (SERPs) or document search results. But also remember that you have only 60 or so characters to use for this. Many titles in Google right now are badly cut off and make no sense. Search Google for PDF files (use the filetype:pdf directive) and take note of how they appear in the results list. Those with confusing title probably get less traffic.
Tip 3: Separate words in the PDF filenames with dashes, not spaces or underbars. Search Engines prefer dashes as word breaks in filenames in their ranking algorithm. Think about the search phrase users will most likely type when searching for your content. For example, you might name a file “Ingersol-Rand-Annual-Report-2008.pdf,” instead of “Annual Report 08.pdf” as it better reflects the likely search query. In addition, keep in mind that PDF files are often saved on the users’ local computer for a long time after finding them online and you want it to be easy to keep track of it. If you change the URL of your PDF files, make sure to add a 301 permanent redirect from the old file to the new file to retain links and any ranking power the file had before to retain PDF SEO gains.
Tip 4: Add alternative text descriptions to images and illustrations. If you’ve included photographs, charts, or other graphics in your file, make sure to enter the “Alternate Text” description for each under the Accessibility menu. As with any SEO project, PDF SEO rewards proper use of this tag. This helps both search engines and people better understand your document.
Tip 5: Check metadata. Take a close look at “creator” information or other hidden information in the document’s “meta text.” Make sure there is not embarrassing or confidential information hidden therein. Privacy experts say that many competitors and journalists pour over this data looking for tidbits day and night. I have found some pretty scary stuff in the meta data!
Tip 6: Search Engines Cannot Get Past Passwords. Search engines can’t access password protected files. If you plan to protect the entire document behind a password, make sure you fill in the document description thoroughly as this will be all search engines see. If password protection is only needed for copying/pasting or printing, then use the least restrictive password you can get by with. If only some of your PDF file is confidential, consider rewriting a version of the document with the confidential bits removed into an introduction page in HTML and then add a link for the user to download the full document. Alternatively you can divide up the PDF file, linking the non-confidential bit to the password-protected confidential supplements inside.
Tip 7: Add an “introduction” page to help search engines. Similar to PDF SEO Tip 6, while you may not wish to alter the original document for search optimization, you can add a title page which includes information specifically designed to help. Adobe Acrobat and similar tools allow you to insert a page 1 as you need to. This should be the first page and include the title of your document, the date of publication, and a 200-300 word description of the file. Make sure the description includes important keywords early in the description can assist with relevant indexing.
Tip 8: Target the PDF file to “current version minus one, or two”. This will prevent users from being forced to download the latest version of Acrobat to see your document. Adobe allows you to change the compatibility level and you should use the most compatible version you can get away with while maintaining your document’s integrity. Most business documents are fine with two-version-old versions of PDF files. An exception would be if any major security concerns exist with an older version.
Tip 9: Take hyperlinks seriously from your PDF file. Links are an important consideration in the indexing of Acrobat files in PDF SEO projects. By giving documents meaningful links and link text (often called “anchor text”) you give clues to the search engines about how you’d like the file ranked. Anchor text reading “click here” is useless in while “how to seal a deck” does a much better job. In addition, many usability experts recommend an indicator to tell users they’re about to open a PDF file. When you link to a separate document or web address in the file, remember that search engines will pass page rank through that link. In addition, offering diversity inf the anchor text you provide can improve the indexing of your pages.
Tip 10: Use “reading order” tags. SEO can be improved as well by altering the “reading order” on your tagged PDF files – you can indicate which areas of text capture the “essence” of the document and tagging those as “first to read.” Google seems to utilize this information when crawling PDF files. This method is ideal for large PDF files, since search engines allocate a limited amount of time to each page they crawl. If they stop looking at your page before the important content, it might rank poorly.
Bonus tip 11: As load time becomes an increasingly important SEO signal to Google and Bing, it pays to spend time optimizing your images and illustrations on yet-unpublished PDF files. You want them to look good, but very often they are presented in too high of a resolution for their most likely use (viewing on screen.) Size them appropriately and use a tool like JPEGMini to losslessly compress them prior to insertion in the original document. If your document is already live, you may not wish to recreate it for a small file reduction unless it contains many large images.
Thinking like search engines and users for a few minutes prior to publishing will help you meet your marketing, operations, and customer support goals when documents are published and for their entire lifetime of use. PDF files are still used heavily for people who prefer printed documents or email attachments, and your top content deserves to exist in that format as well.
Postscript: Check out Galen De Young’s article on PDF Embedded Sharing Options
Postscript: A new article on PDF Optimization from Aaron Egan from Bruce Clay Australasia.