Jump to table of contents

Project Background

The Center for Digital Systems (CeDiS) of Free University Berlin (FUB) currently implements “OJS.de” – a project funded by Deutsche Forschungsgemeinschaft (DFG), the German government’s science funding organization. The project has been set up to adapt OJS even better to the needs of OJS users in Germany and other German-speaking countries while – wherever possible – creating value for the larger OJS community, too. One of the project tasks is to implement an “optimized search function”. Three main goals should be achieved:

  1. The current OJS search function experiences problems in dealing with multilingual content. The optimized search function should be able to deal with documents in all supported OJS languages.
  2. OJS search should – where possible – benefit from additional search features provided by Lucene/solr.
  3. The current search function does not scale well. The optimized search function should work even for large OJS providers who want to provide a central search server including journals from several separate OJS installations.

Currently OJS provides custom search across article meta-data and full texts. A simple algorithm is used to process text for search: Text is first being split up according to white space (“tokenization”), then common words (“stopwords”) are being removed from the token stream. All remaining tokens will be stored as a relational database table. This yields satisfactory search results for languages like German or English. It does not work for languages that use logographic notational systems like Japanese or Chinese. Such languages should be made searchable from OJS. In addition, the new search platform should support additional advanced search features, e.g. improved ranking, faceted search or searching across several OJS installations. From an end user, administrator and OJS provider perspective the new Lucene/solr solution must implement at least all currently available OJS search functionality. Several deployment scenarios should be supported (ranging from single journal installations to large OJS provider deployments). Depending on the deployment scenario, design principles like simplicity and flexibility have to be reconciled. Both, the deployment scenarios and corresponding design principles will be detailed below. The new search function should be implemented based on the enterprise search server “solr” and the underlying “Lucene” search framework. As solr and Lucene provide complex and flexible configuration support, a detailed specification is required that defines the exact search functionality and user interface, the architecture of the search platform and the integration of OJS with solr/Lucene. Potentially conflicting project goals have to be reconciled separately for each deployment scenario. This document specifies the exact scope, requirements, user interface, design principles and technical architecture of the new search function.