Updated: Feb 11, 2019
Author: Tony Waldegrave
Google Scholar is Google's specialised search engine that allows searches across scholarly literature including journals, articles, theses, abstracts etc from academic publishers, professional societies, online repositories and universities etc.
Note: Adhesion does not provide any submission services for indexing articles in Google Scholar. Please use this article as a best practice guide for complying with Google's requirements for indexing in Google Scholar.
For further guidance, please visit Google Scholar Troubleshooting.
Index inclusion requirements are stringent and so it is necessary to make sure your articles tick all Google Scholar's criteria.
You can check what content is indexed in Google Scholar by completing the following search query operator in Google Scholar:
Using the search tools you can filter by date to review what content has been indexed recently.
This article provides direction on complying with Google Scholar's criteria.
If you are a solo author it is best practice to upload articles to your website, set up a publications webpage and make your articles accessible via a readily visible link.
For your articles to be indexed successfully on Google Scholar, they must be formatted as a PDF file. Once the above has been implemented, Google's crawlers will index your publications.
New content uploads may take many weeks or months to be indexed and getting updated content re-indexed can take up to several months.
If problems persist, you should begin troubleshooting this by confirming that your hosting repository is compatible with Google Scholar and reading Google Scholar's inclusion troubleshooting guide.
Tertiary eduction repositories should host articles using the latest software from either Eprints, Digital Commons or DSpace.
For hosting small numbers of journal articles on a website, Google recommends the hosting services: Atypon, Highwire Press or MetaPress. For hosting large numbers of journals articles on a website, Google recommends the hosting services: JSTOR or SciELO. Please check with these hosting services to ensure they support indexing on Google Scholar. If you are capable of managing your own website and publications, Google recommends downloading and using Open Journal Systems.
To meet the criteria for Google Scholar indexing the majority of your website's content must be scholarly publications, including journal articles, reports, draft papers, abstracts and the like. Note that crawlers will not index book reviews, editorials, news, magazine articles and other similar low-authority articles on Google Scholar. If the files you wished to be indexed on Google Scholar exceed 5MB (e.g. long books), they must be uploaded to Google Books first.
Scholarly content that is uploaded to Google Books automatically meets the inclusion requirements for Google Scholar.
When a searcher lands on your website, either the full author-written abstract (or the entire article which your abstract is from) must be readily visible and accessible. This means, searchers must not need to complete any activity (e.g. sign in, close ads or pop-ups, scroll down etc.) in order to access and read the complete abstract.
It is necessary to structure your website to make it easy for crawlers to determine the URLs of all your publications. For Google to crawl your content efficiently, your files must be formatted as HTML or PDF. Any PDF documents must have text such that Adobe Acrobat Reader can search for and select words.
To ensure that crawlers can locate your articles, you must be able to navigate to every article via no more than ten easily-accessible HTML links.
If your website contains a small number of articles, then a list of links should exist on a separate, simple and easily-accessible HTML page. These links should each point to their appropriate article. Remember this file must be a PDF of the full article.
For websites with thousands of articles, it is best practice to provide Google's crawlers with a full list of your articles by publication date. Other ordering values such as author name or specific keywords take much longer for crawlers to index and this could lead to your article's having a poor rank on Google Scholar search results.
If your website has hundreds of thousands of articles, you should add a page listing all papers from the last fortnight, also in order of publication date. In doing this, your papers will be re-crawled more frequently so to ensure your large number of publications are all indexed in Google Scholar.
Crawlers browse your webpages periodically to index new or updated content and so your articles must be constantly accessible online. This occurs approximately weekly, monthly or less often, depending on the rank and complexity of your website.
You can inform crawlers of temporary page errors (e.g. temporary hosting incapacity for a large array of publications) using HTTP 5xx code or permanent page errors using HTTP 4xx code.
To help crawlers index publications that you've moved to new URLs, you need redirect them using HTTP 301 code. The redirects must point to the new URL, or the authoritative abstract, not alternative pages such as the homepage.
You can regulate crawling by managing your sitemap, also known as a robots.txt file. Firstly, make sure your sitemap does not restrict crawlers from accessing any of your publications.
And secondly, you should restrict crawlers from accessing expansive, dynamically-generated content (e.g. the results page from a users keyword search within your site) as these increase the time it takes for crawlers to browse your site.
Google Scholar recognises information from your articles' bibliographies and references using automated software called "parsers". Articles can be poorly indexed if their bibliographic information is incorrect or does not match the references from other articles. Thus, it is necessary to publish articles with bibliographic and reference meta-data that parsers can process.
The bibliographic information of an article must be included in its page's HTML meta tags (avoid using Dublin Core tags). These meta tags (e.g. <meta name="citation_author" content="Smith, John J.">) are specific and usually only relevant to an article's specific webpage.
Google Scholar requires the webpage of any article to have meta tags for that article's title (citation_title), author(s) (citation_author) and publication date (citation_publication_date).
Webpages of conference and journal articles require meta data so they can be identified when cited by another paper. These include tags for an article's volume and issue number (citation_volume and citation_issue), as well as the first and last page numbers of the article being referred to (citation_firstpage and citation_lastpage).
Webpages of theses and technical reports require meta data so that they can be identified when cited by another paper. These include tags for the thesis or report's institution (citation_dissertation_institution or citation_technical_report_institution), as well as the number of the technical report being referred to (citation_technical_report_number).
For a webpage showing only an abstract of an article, you must specify the location of the full-text PDF article using the URL meta tag (citation_pdf_url), where the content value of the URL tag is the PDF's absolute URL (e.g. <meta name="citation_pdf_url" content="URL">). Note that the URLs of the abstract and its associate article must exist under the same subdirectory.
If an article does not have meta tags, it must provide its bibliographical and reference information with specifically formatted text:
Article titles must either have a font size of at least 24pt (for PDFs), be positioned inside an <h1> or <h2> tag (for HTMLs) or be identified with the CSS class "citation_title" (for HTML files with CSS formatting).
Note that one font must be used for the entire title and that the title must have the largest font of any text in the entire article.
The author(s) of an article must be printed either immediately before or after the title. The authorship of an article must either have a font size of 16-23pt (for PDFs), be positioned inside an <h3> tag (for HTMLs) or be identified with the CSS class "author" (for HTML files with CSS formatting). Note that one font must be used for all of the authors.
The font sizes of the repository, the journal and subheadings must be smaller than that of the author(s).
Titles, authors, repository and journal text must be written with title-case (e.g. "Indexing Scholarly Publications"), whereas subheadings must use sentence-case (e.g. "Indexing scholarly publications").
You must also insert a bibliography citation on its own line, either in the header and footer of the first page (for PDFs) or just below the title and author(s) (for HTMLs). This bibliographic citation must be of a published edition of the full-text article.
Citations of published articles should be formatted explicitly (e.g. "J. Smith. Index., vol. 24, no. 8, pp. 873-876, September 2007"), however if your article is not published you must include that editions date in full on its own line (e.g. September 30, 2007).
Do not use type 3 fonts anywhere in PDF formatted articles. To check a font's type in Adobe Acrobat Reader, click on file, then properties.
Lastly, there are a few requirements that an article must abide by in order to be published on Google Scholar: The section of your articles that contain references needs to have a typical heading on its own line (e.g. "Bibliography" or "References"). Each of these references needs to be numbered (e.g.    or 1. 2. 3., for PDFs) or placed inside ordered list tags (i.e. <ol>, for HTMLs). These must also be explicitly formatted without any informal commentary.
For Google Scholar troubleshooting, metrics, policies & more, visit Google Scholar Troubleshooting.
Our reputation goes hand-in-hand with our team’s dedication to best practice. As a registered Premier Google Partner, our team refreshes our certifications every 12 months — A tradition we started over a decade ago. To stay ahead, we are always looking forward to upcoming certifications for online advertising, website development and search engine optimisation.