2.3.5. Sphinx

Payment

On a virtual hosting service is paid daily with personal balance, on the businesshosting is included in the tariff price. When ordering for shared hosting, the balance must be sufficient for payment minimum of 1 month use of the service.

Sphinx (SQL Phrase Index) is a full-text search system with support for the morphology of various languages. Allows you to quickly and flexibly search for information in the database using arbitrary text.

  1. Open the section "Sphinx".
  2. In the block "Main data" click "Order":
  3. Wait approximately 15 minutes for the service to activate.
  4. Customize Sphinx and website .
You can check if Sphinx is working with test script.

Notes:

  • Auto and manual configuration files are not synchronized with each other. If you configure Sphinx through the control panel and switch to manual editing, then the configuration file will not have changes made through the control panel, and vice versa.
  • To find out the version of Sphinx, connect to the hosting via SSH and run the command /usr/local/sphinx/bin/searchd -v.
  • A detailed description of how Sphinx works and a list of all its parameters can be found in official documentation (English).

Access protocols for the Sphinx search daemon (searchd):

  • "MySQL-Sphinx Socket" — Access via SphinxQL, a kind of SQL similar to MySQL.
  • "Native-Sphinx socket" — Access via SphinxAPI, Sphinx’s own API.

To use Sphinx you need:

  1. Customize Sphinx — via control panel or manual editing.
  1. In the block "Main data" disable "Manual editing":
  2. Add a data source:
    1. In the block "Databases" click "Connect database":
    2. Specify the name and data to connect to the MySQL database (you can select existing database and the fields will be filled in automatically) and click "Save":
  3. Add an index:
    1. In the block "Indexes" click "Create index":
    2. Specify the name of the index, select the database, specify the SQL query to get data for indexing (for example, SELECT id, title, content FROM table — columns id, title and content from the table table), and press "Save":
      The presence of a column in the SQL query id is a prerequisite of Sphinx. If a column with a different name is used as the ID, use an alias: SELECT custom_id as id.

  4. Set up the index:
    • Index fields and attributes — a list of fields and data types for each of them (at least one field must be of Fulltext type):
    • Indexing frequency — the schedule by which indexing should be performed. It is configured in the same way as the start time in cron. By default, it runs every 15 minutes daily.
    • Morphology and indexing parameters:
      • Morphology — the name of the library (for example, stem_ru), which will be used to search for a word with different word forms — for example, by request "dog" results with variants will be returned "dogs", "dog", "dogs" etc.
      • Indexing Options — Remove HTML (html_strip), storage of original words (index_exact_words), automatic query expansion (expand_keywords), minimum word length (min_word_len), infixes (min_infix_len), prefixes (min_prefix_len), path to a dictionary of specialized terms (wordforms), encoding table, and ignoring characters.
  5. In the block "Main data" click "Apply Configuration" to update the configuration on the server. Note Button "sphinx.conf" you can view the contents of the generated configuration file.
  1. In the block "Main data" turnon "Manual editing":
  2. In the block "Sphinx Configuration" specify data sources, indexing parameters and save changes:
    • Data source (section source) — data source configuration (from where Sphinx should get data) and its name (in the example db_source):
      • Data type (type) — data source type (in the example mysql).
      • Database connection data (sql_host, sql_port, sql_user, sql_pass, sql_db) — credentials for connection to MySQL database, where the information for indexing will come from.
      • Preliminary request (sql_query_pre) — a request that will be before the main request for receiving data from the database (in the example SET NAMES utf8 — setting UTF-8 encoding).
      • The main query (sql_query) — a request to get the necessary data from the database for indexing (in the example select id, title, content from table — columns id, title and content from the table table).
      • Other directives — allow you to define the order of grouping, filtering, sorting (detailed information can be found in official documentation).
    • Indexing options (section index) — index configuration (how exactly Sphinx should work with data) and its name (in the example test_index):
      • Data source for indexing (source) — the name of the data source, where the information for indexing will come from (in the example db_source — see above).
      • The path where the index data will be stored (path) — the absolute path to the file with indices (in the example /home/example/.system/sphinx/test_index).
      • Morphology settings (morphology) — the name of the library (in the example stem_ru), which will be used to search for a word with different word forms — for example, by request "dog" results with variants will be returned "dogs", "dog", "dogs" etc.
      • Saving words in the index in their original form (index_exact_words) — when used in conjunction with the directive expand_keywords allows you to return more relevant results on request (in the example 1 — included).
      • The minimum word length for indexing (min_word_len) — used by default 1, but words of this length usually do not carry a semantic load (in the example 3).
      • Other directives — allow you to define the order of grouping, filtering, sorting (detailed information can be found in official documentation).
    • Lemmatization. To enable lemmatization support, you need to place the files of the necessary dictionaries in the hosting account and add the section to the configuration:
      common {
          lemmatizer_base = /home/example/path/to/sphinx/dicts/
      }
  3. After saving the configuration:
    • Sphinx will restart.
    • If there are sections in the configuration index in the block "Indexes" for each index, a cron task will be created to update it once every 15 minutes and the creation of indexes will be started (may take some time):

      In the block "Indexes":

      • Size data is cached for 10 minutes.
      • After clicking the delete button, the section will be deleted from the configuration index this index.

Attention!

Before using plugins to work with Sphinx, carefully read their requirements and compatibility. Most plugins have not been supported for a long time and work with Sphinx version 2.2 or lower, the hosting uses version 3.

Setting up a site to work with Sphinx is in the competence of the site developer or involved third-party specialists.

The task comes down to the following steps:

  1. Examine the content of the site database and determine which data you need a quick search for.
  2. Configure Sphinx with the required parameters: specify from which database you want to get data, which tables and columns you want to index, set the indexing rules.
  3. Add code to the site that, instead of the standard database search, will search for data in the created Sphinx index. The code can be written by the developer either onone’sown, or ready-made plugins or modules mentioned on official website.

Connecting a site to Sphinx is done via a socket. The socket path can be found on the Sphinx management page. The port should be specified depending on the plug-in configuration; in general, the port can be neglected by specifying it as 0 or 9312.

You can view the log in several ways:

  • Control panel: In chapter "Sphinx" in the block "Main data" click "Log" — a window will appear in which the latest entries in the log will be displayed in real time.
  • File manager: In chapter "Sphinx" in the block "Main data" in line "Log" click 🔍 — the log will open in the built-in editor of the file manager.
  • Console: connect to the hosting via SSH and run the required command:
    • View full log:
      cat ~/.system/sphinx/searchd.log
    • Log monitoring in real time:
      tail -f ~/.system/sphinx/searchd.log

      To finish, use a combination Ctrl+C.

Log files searchd.log and query.log (connection error log and request log) can be painlessly deleted if you do not need information from them.
Content