2.3.5. Sphinx
Payment
On a virtual hosting service is paid daily with personal balance, on the businesshosting is included in the tariff price. When ordering for shared hosting, the balance must be sufficient for payment minimum of 1 month use of the service.Sphinx (SQL Phrase Index) is a full-text search system with support for the morphology of various languages. Allows you to quickly and flexibly search for information in the database using arbitrary text.
Order
- Open the section "Sphinx".
- In the block "Main data" click "Order":
- Wait approximately 15 minutes for the service to activate.
Configuring Sphinx
Notes:
- To find out the version of Sphinx, connect to the hosting via SSH and run the command
/usr/local/sphinx/bin/searchd -v
. - A detailed description of how Sphinx works and a list of all its parameters can be found in official documentation (English).
Access protocols for the Sphinx search daemon (searchd):
- "MySQL-Sphinx Socket" — Access via SphinxQL, a kind of SQL similar to MySQL.
- "Native-Sphinx socket" — Access via SphinxAPI, Sphinx’s own API.
To use Sphinx you need:
- Customize Sphinx — via control panel or manual editing.
Through the control panel
- In the block "Main data" disable "Manual editing":
- Add a data source:
- In the block "Databases" click "Connect database":
- Specify the name and data to connect to the MySQL database (you can select existing database and the fields will be filled in automatically) and click "Save":
- Add an index:
- In the block "Indexes" click "Create index":
- Specify the name of the index, select the database, specify the SQL query to get data for indexing (for example,
SELECT id, title, content FROM table
— columnsid
,title
andcontent
from the tabletable
), and press "Save":The presence of a column in the SQL queryid
is a prerequisite of Sphinx. If a column with a different name is used as the ID, use an alias:SELECT custom_id as id
.
- Set up the index:
- Index fields and attributes — a list of fields and data types for each of them (at least one field must be of Fulltext type):
- Indexing frequency — the schedule by which indexing should be performed. It is configured in the same way as the start time in cron. By default, it runs every 15 minutes daily.
- Morphology and indexing parameters:
- Morphology — the name of the library (for example,
stem_ru
), which will be used to search for a word with different word forms — for example, by request "dog" results with variants will be returned "dogs", "dog", "dogs" etc. - Indexing Options — Remove HTML (
html_strip
), storage of original words (index_exact_words
), automatic query expansion (expand_keywords
), minimum word length (min_word_len
), infixes (min_infix_len
), prefixes (min_prefix_len
), path to a dictionary of specialized terms (wordforms
), encoding table, and ignoring characters.
- In the block "Main data" click "Apply Configuration" to update the configuration on the server. Note Button "sphinx.conf" you can view the contents of the generated configuration file.
Manual editing
- In the block "Main data" turnon "Manual editing":
- In the block "Sphinx Configuration" specify data sources, indexing parameters and save changes:
- Data source (section
source
) — data source configuration (from where Sphinx should get data) and its name (in the exampledb_source
):- Data type (
type
) — data source type (in the examplemysql
). - Database connection data (
sql_host
,sql_port
,sql_user
,sql_pass
,sql_db
) — credentials for connection to MySQL database, where the information for indexing will come from. - Preliminary request (
sql_query_pre
) — a request that will be before the main request for receiving data from the database (in the exampleSET NAMES utf8
— setting UTF-8 encoding). - The main query (
sql_query
) — a request to get the necessary data from the database for indexing (in the exampleselect id, title, content from table
— columnsid
,title
andcontent
from the tabletable
). - Other directives — allow you to define the order of grouping, filtering, sorting (detailed information can be found in official documentation).
- Indexing options (section
index
) — index configuration (how exactly Sphinx should work with data) and its name (in the exampletest_index
):- Data source for indexing (
source
) — the name of the data source, where the information for indexing will come from (in the exampledb_source
— see above). - The path where the index data will be stored (
path
) — the absolute path to the file with indices (in the example/home/example/.system/sphinx/test_index
). - Morphology settings (
morphology
) — the name of the library (in the examplestem_ru
), which will be used to search for a word with different word forms — for example, by request "dog" results with variants will be returned "dogs", "dog", "dogs" etc. - Saving words in the index in their original form (
index_exact_words
) — when used in conjunction with the directiveexpand_keywords
allows you to return more relevant results on request (in the example1
— included). - The minimum word length for indexing (
min_word_len
) — used by default1
, but words of this length usually do not carry a semantic load (in the example3
). - Other directives — allow you to define the order of grouping, filtering, sorting (detailed information can be found in official documentation).
- Lemmatization. To enable lemmatization support, you need to place the files of the necessary dictionaries in the hosting account and add the section to the configuration:
common { lemmatizer_base = /home/example/path/to/sphinx/dicts/ }
- After saving the configuration:
- Sphinx will restart.
- If there are sections in the configuration
index
in the block "Indexes" for each index, a cron task will be created to update it once every 15 minutes and the creation of indexes will be started (may take some time):In the block "Indexes":
- Size data is cached for 10 minutes.
- After clicking the delete button, the section will be deleted from the configuration
index
this index.
Site settings
Attention!
Before using plugins to work with Sphinx, carefully read their requirements and compatibility. Most plugins have not been supported for a long time and work with Sphinx version 2.2 or lower, the hosting uses version 3.Setting up a site to work with Sphinx is in the competence of the site developer or involved third-party specialists.
The task comes down to the following steps:
- Examine the content of the site database and determine which data you need a quick search for.
- Configure Sphinx with the required parameters: specify from which database you want to get data, which tables and columns you want to index, set the indexing rules.
- Add code to the site that, instead of the standard database search, will search for data in the created Sphinx index. The code can be written by the developer either onone’sown, or ready-made plugins or modules mentioned on official website.
Connecting a site to Sphinx is done via a socket. The socket path can be found on the Sphinx management page. The port should be specified depending on the plug-in configuration; in general, the port can be neglected by specifying it as 0
or 9312
.
View the log
You can view the log in several ways:
- Control panel: In chapter "Sphinx" in the block "Main data" click "Log" — a window will appear in which the latest entries in the log will be displayed in real time.
- File manager: In chapter "Sphinx" in the block "Main data" in line "Log" click 🔍 — the log will open in the built-in editor of the file manager.
- Console: connect to the hosting via SSH and run the required command:
- View full log:
cat ~/.system/sphinx/searchd.log
- Log monitoring in real time:
tail -f ~/.system/sphinx/searchd.log
To finish, use a combination Ctrl+C.
searchd.log
and query.log
(connection error log and request log) can be painlessly deleted if you do not need information from them.