Each application provides statistics for the number of times a visitor views and downloads content on the website. As an administrator, you can configure what stats are collected, how the data is compiled into daily or monthly totals, and whether or not to make some statistics publicly available.
Requests to view and download files from the public website are recorded in a log file. This file is compiled into statistics on a daily basis. The application will try to filter out bots and double-clicks. Depending on the size and age of your site, the metrics tables in your database may consume a lot of disk space. You can reduce the size of these tables by configuring the application to only keep monthly data, and to record little or no geographic data.
Read the sections below to learn how to configure the statistics for your site, how to view and download stats, and how to recover lost data.
To ensure that statistics are collected correctly, make sure you have configured your application to run scheduled tasks and jobs.
As an administrator, you can restrict the type of statistics that will be collected. You may choose to do this to protect the privacy of visitors, comply with legal requirements in your jurisdiction, or reduce the disk space required for your database.
Follow the steps below to set the statistics settings for all journals, presses or preprint servers on your site.
A manager may configure some of these settings differently for each journal, press or preprint server they manage. In such cases, the site settings act as a “ceiling”. For example, if the site has disabled geographic statistics, the manager will not be able to enable them for their journal, press or preprint server. If the site has enabled country-level statistics only, the manager will not be able to enable region and city statistics.
However, the manager can turn off statistics even if the site has disabled them. For example, if the site has enabled geographic statistics for country, region and city, the manager will be able to turn geographic statistics off, or choose to collect only country and region data.
Statistics are collected for different kinds of data and can be accessed in several different formats. Some statistics can be viewed as tables and graphs in the application. Others can only be viewed by downloading a report in CSV or JSON. The CSV format can be opened in spreadsheet software, like Excel or LibreOffice Calc.
Type | Description | Web | CSV | JSON |
---|---|---|---|---|
Publications | Views and downloads of articles, books and preprints, and their files. | ✔ | ✔ | ✔ |
Issues | Views and downloads (OJS) | ✔ | ✔ | ✔ |
Homepage | Views of the homepage of the journal, press or preprint server. | ✔ | ✔ | ✔ |
Geography | Views by country, region and city | ✔ | ✔ | ✔ |
COUNTER | An industry-recognized format for distributing usage statistics. | ✘ | ✘ | ✔ |
Editorial Activity | Number of submissions accepted and rejected, the average time to a decision, and more. | ✔ | ✘ | ✘ |
Users | User profiles and roles. | ✘ | ✔ | ✘ |
Reviews | Reviewer names, due dates, and comments for all review assignments | ✘ | ✔ | ✘ |
Submissions | Titles and metadata for all submissions | ✘ | ✔ | ✘ |
Subscriptions | Data on subscriptions (OJS) | ✘ | ✔ | ✘ |
Follow these steps to get the number of views and downloads of articles, books and preprints, as well as their files.
Publication statistics can also be accessed in JSON format through the REST API.
This is only available in OJS.
Follow these steps to get the number of views and downloads of issues and issue galley files.
Issue statistics can also be accessed in JSON format through the REST API.
Follow these steps to get the number of views of the homepage of the journal, press or preprint server.
Homepage statistics can also be accessed in JSON format through the REST API.
You must enable geographic statistics first.
Follow these steps to download a CSV file with the number of views and downloads for each city, region and country.
Geographic statistics can also be accessed in JSON format through the REST API.
COUNTER sets standards for our how usage statistics should be calculated and distributed. Statistics matching the COUNTER 5 SUSHI protocol can be downloaded through the REST API. Statistics matching the COUNTER 4 protocol (Journal Report 1 and Article Report 1) can be downloaded by following these steps.
Editorial statistics can change significantly depending on the selected date range. Read the recommendations below to avoid these pitfalls.
Follow these steps to view stats about the editorial activity of a journal, press or preprint server, such as the number of submissions accepted and rejected, the average time it takes to record a decision, and more.
These stats are based on editorial activity recorded by the system. If your editors routinely work outside of the system, stats may not be correct. For example, if an editor asks for a review by email and does not record it in the system, that review will not be counted in the editorial statistics.
When selecting a date range, think carefully about the editoral activity you are interested in. For example, if you are looking at the last three months, the Acceptance Rate will be calculated only from submissions submitted in the last three months that have already received an accepted or declined decision. We recommend using a date range that accounts for the duration of your editorial review and ends at least 12 months ago.
Follow these steps to download a CSV report with the user profiles and their roles in each journal, press or preprint server.
There are other ways to export user data.
Follow these steps to download a CSV report about review assignments that includes the reviewer names, due dates, comments, and more.
Follow these steps to download a CSV report about submissions that includes the titles, contributors and metadata.
There are many other ways to export submission data.
The subscriptions report is only available in OJS when subscriptions are enabled.
Follow these steps to download a CSV report about subscriptions that includes the titles, contributors and metadata.
If the system is not recording any statistics, the application may not be configured correctly. Read the configuration advice.
You may notice historical gaps in your statistics data. These can arise for several reasons. Often it is because the application was misconfigured, the server resources were limited, or an application error prevented the logs from being processed. This can leave historical gaps in statistics data if the problem goes unnoticed for a while. When this happens, it can be difficult to restore the data. But it may be possible.
In order to recover the data, you will need to have log files that cover that period. These may be the application’s stats logs or the Apache access log files.
The sections below provide information to help you determine what log files are available and reprocess them to compile statistics.
You must understand the log files before continuing.
Before processing the logs, they must be in the format recognized by the application. A log file will typically have hundreds or thousands of lines in JSON format, like the one below.
{"time":"2023-02-27 11:41:14","ip":"87d8edf8ca58ab4d3e9421b03edcd9c5a2093a79c341964179b8e379faabd324","userAgent":"Mozilla\/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko\/20100101 Firefox\/110.0","canonicalUrl":"https:\/\/example.org\/publicknowledge\/index","assocType":256,"contextId":1,"submissionId":null,"representationId":null,"submissionFileId":null,"fileType":null,"country":null,"region":null,"city":null,"institutionIds":[],"version":"3.4.0.0","issueId":null,"issueGalleyId":null}
If your log file looks like the one below, it is a log file from an older version of the application.
127.0.0.1 bot 1 "2023-03-01 11:52:47" http://localhost/index.php/publicknowledge/index 200 "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/110.0.5481.100 Safari/537.36"
To convert this log file to the correct format, move the log file into the archive
directory and run the following command from the root directory of your application.
php lib/pkp/tools/convertUsageStatsLogFile.php <log>
After conversion, the old log file will be renamed <filename>_old.log
. For example, if the file was usage_event_20230202.log
, the old log file will be named usage_event_20230202_old.log
.
If you use the Apache web server, you may have access logs from the period covering the historical gaps in your stats data. Run the following command to generate logs in the correct format.
php lib/pkp/tools/convertApacheAccessLogFile.php <log>
A log file named apache_usage_events_YYYYMMDD.log
will be created in the archive
directory for each day with one or more log entries from that file. If several log files are created for the same day, you may need to manually combine them into one file.
The command must be run by a user who has permission to read the Apache log files on the server. If you face any problems using this script, you may need to open lib/pkp/tools/convertApacheAccessLogFile.php
and set the following variables to match your server configuration: EGREP_PATH
, PARSEREGEX
, PHP_DATETIME_FORMAT
, PHP_DATE_FORMAT
.
Once you have identified the log files you want to reprocess, move them into the stage
directory. Then run the following command, once for every month you want to reprocess.
php lib/pkp/tools/reprocessUsageStatsMonth.php YYYYMM
For example, if I had log files for 2022-10-01 to 2022-11-30, I would run the command twice:
php lib/pkp/tools/reprocessUsageStatsMonth.php 202210
php lib/pkp/tools/reprocessUsageStatsMonth.php 202211
If you want to have accurate monthly statistics, you will need to reprocess a whole month at a time. For example, if you are missing statistics from 2022-10-15 to 2022-11-12, you would need to reprocess logs for every day of both months in order to have accurate monthly stats for those months.
Keep the following in mind when working with the log files.
usage_events_YYYYMMDD.log
or apache_usage_events_YYYYMMDD.log
.disable_path_info
set to On
in config.inc.php
, change the PATH_INFO_DISABLED
variable to true
in the log conversion scripts before running the commands.stage
directory will be processed automatically. Do not move files in there unless you want them processed.Statistics are compiled once a day. No statistics will appear until 24 hours after a visitor has been logged. If you have visited the homepage of your journal, press or preprint server, waited more than 24 hours and still do not see those statistics, you may need to configure scheduled tasks and jobs.
You can tell if the scheduled task is being run by looking in the log directory at <files_dir>/usageStats
. Once a log file has been processed, it will be moved to the archive
directory. Learn more about the log files.
If you see log files in the archive, but still do not have any statistics, inspect the URLs in the log entries. Does the URL in the log files exactly match the base_url
in my configuration? Does it point to a published submission in a journal, press or preprint server?
The application uses the visitor’s IP address to determine their location. In order for this to work, the application must have a copy of the database that maps IPs to their location. This file will be located in <files_dir>/usageStats/IPGeoDB.mmdb
. If you have properly configured the application to run scheduled tasks, this will be updated monthly.
If you have been running the application for many years, you may have periods during which no stats were recorded. For example in versions of OJS 2. You may be able to recover these stats if you have the Apache access logs from this period. Read how to convert log files.
This is a theme option. If you theme supports it, you can enable it at Settings > Website > Appearance > Theme.
The application collects statistics based on the IDs of these files. If you change the file without deleting the galley (OJS, OPS), the download counts will not be effected. However, if you delete the galley and upload the file to a new galley, the download counts will be effected.
This will not effect the publication’s overall download counts. It will only be reflected in the submission file itself, when download counts for each file are distinguished in the downloadable reports.
Since 3.x, the application filters views as per the COUNTER Project code of conduct. Specifically, when someone views or downloads a file more than once in 30 seconds, the application only registers one view or download. Also, known bots and crawlers are filtered out.
This will result in lower overall usage metrics. The drop shouldn’t be significant, though it can be noticeable. The COUNTER Project is always adding new bots to their specification, and the application updates this specification in each release.
What to do when you don’t see any stats.
This happens when the amount of data you are trying to download exceeds the server’s capacity to deliver it. This can be resolved by trying to download a smaller data set, for example reducing the date range, or by increasing the server’s resource (for example PHP’s memory_limit
or max_execution_time
).
See the section on recovering lost data