Control panel

This control panel is for feature demonstration only.


Total indexed documents: 0

Indexer state: Idle

Indexer settings

Seed urls

The search engine starts indexing web pages from one or multiple pre-defined addresses by collecting and examining every link that the page points to. One URL address is required, but it may be necessary to add other urls too if not every section of the webpage is interlinked. It also may be beneficial to add addresses of pages that list newest blog posts or news as this helps indexing new, previously unindexed URLs faster.






Minimal indexing intervals

The search index must be updated regularly, if new content gets added into the website. Indexer will run automatically every time search is used if previous indexing happened more than [indexing interval value] ago.

Notice: by defining indexing interval zero, automatic updating will be disabled and user can choose to run the indexer periodically through CRON or choose to disable automatic indexing altogether. Please define the value zero only if you have got access to CRON or are otherwise sure that automatic index updating is not needed.

Indexing interval defines how often the indexer is allowed to run. At every indexer run, the seed urls are downloaded and investigated for links pointing to previously unindexed web pages. These new pages as well as the pages that have been expired by being older than [update interval value] will be downloaded, investigated and indexed.

minutes

Update interval defines how often the previously indexed web pages are allowed to be updated.

minutes

Index through localhost

This option chooses whether the data should be loaded directly from the localhost (local server). This is highly beneficial, since unnecessary domain name lookups can be avoided. This works only, if Pickmybrain is at the same server as the actual web page that is indexed. In practice this means that every link found is altered in the following manner before loading the actual contents:

http://www.mydomain.com/index.php => http://localhost/index.php
http://mydomain.com/index.php => http://localhost/index.php
http://subdomain.mydomain.com/index.php => http://subdomain.localhost/index.php

Notice: Original addresses will be preserved for search results.

Disabled
Enabled

Custom address can also be defined, if Pickmybrain and the actual web page are on different servers, but still on the same local area network.

Notice #1: if this field is left blank, localhost will be used instead.
Notice #2: subdomains won't work with this option.

Allow subdomains

Only domains defined as seed urls will be indexed. This setting chooses, whether the indexer is allowed to index different subdomains, like subdomain.mydomain.com.

Notice: www.mydomain.com and mydomain.com are considered to be the same thing.

Disabled
Enabled

Honor nofollow-attributes

Choose whether to follow links with nofollow-attribute ( rel="nofollow" ).

Further filtering and categorization can done at Selective indexing and Categories.

Ignore nofollow-attributes
Honor nofollow-attributes

Selective indexing (optional)

User can choose to index only certain web pages by filtering them by their respective URLs. This is done be defining an optional keyword or keywords. Non-wanted keywords can also be defined by adding - ( hyphen ) in front of them.

Example, defined keywords news europe -economy

Explanation: Only URLs with keywords news and europe will be indexed. URLs containing word economy will not be indexed in any case.

Sentiment analysis

Provided by Pickmybrain proprietary algorithms, the sentiment analysis feature analyzes the incoming textual content and enables users to search and sort results by polarity of opinions. Notice: correct language must be set.

Disabled
English
Finnish

Index PDF-files

Chooses if PDF-files should be indexed. This is feature is provided by a third-party software.
Notice #1: This option works only with the exec() script execution method.
Notice #2: Copy protected PDFs will not be indexed.

Disabled
Enabled

Character set

Only predefined characters will be kept. Other characters will be ignored. Letters are case-insensitive and if defined, blend chars will be added into the character set as well.

Example: Character set 0-9a-zöäå# will match all numbers between 0-9, letters between a-z and additional characters of ö, ä, å and #.

Blend chars

Words containing blend chars will be indexed as separate words. The original token will also be preserved.
Example: If - ( hyphen ) would be defined as blend char, the word well-kept would be indexed as well, kept and well-kept

Ignore chars

Ignore chars will be ignored alltogether and removed from the original document.
Example: If ' ( apostrophe ) would be defined as ignore char, the word Joe's would be indexed as Joes

Prefixes, Postfixes and Infixes

Disabled Prefixes Prefixes&
Postfixes
Infixes Min. length

By enabling prefixes, postfixes and/or infixes, each word will be indexed as multiple different tokens as this greatly improves search results.
For example, the word avenues with the minumum length of 4 would be indexed as:
Disabled: avenues
Prefixes: aven, avenu, avenue, avenues
Prefixes & Postfixes: aven, avenu, avenue, avenues, nues, enues, venues
Infixes: aven, venu, enue, nues, avenu, venue, enues, avenue, venues, avenues

Thus the search term avenue would yield results, but in the disabled mode it would not.

Dialect processing

This feature creates prefixes from words containing non-ascii alphabetical characters by replacing the characters with their ascii base forms. Notice: for this feature to work, non-ascii characters must be defined in the charset.

Examples:
räikkönen => raikkonen
à la carte => a la carte
a$ap rocky => asap rocky
Pokémon => Pokemon

Disabled
Enabled

Trim page titles

If each web page's title contains a common part, like the domain name, it can be removed with this option as it improves search results.
Example title: My photo page - mydomain.com
Example trim value: - mydomain.com
Outcome: My photo page




Separate letters from numbers with space

Disabled
Enabled

Choose whether to separate numbers and letters from each other with a space. This is beneficial when infixing is not enabled and the indexed documents include tokens that have both numbers and letters in them.

Example: input Sony KDL42W705B    output Sony KDL 42 W 705 B

Searching with query Sony 705 would not yield any results with this feature disabled. However, this feature enabled the query would return results.

Default search (runtime) settings

These are the default runtime settings, used always except when user decides to provide his/hers own parameters using Pickmybrain API.

Field weights

Not every keyword match is treated equal. If keyword is found from page title, it can be configured to have more weight on the final results than a keyword hit on page content.

Choose whether to use custom field weights while sorting results by positivity / negativity. This setting has effect only if sentiment analysis is enabled.

Disabled
Enabled

Keyword stemming

Whether search terms given by the user are stemmed before they will be matched against the search index. Example:

Input: Cars    Output: Car OR Cars

Disabled
Enabled

Dialect matching

This feature removes dialect from user-provided keywords. Either the original keyword or the processed keyword is required to match.

Example: Input: räikkönen    Output: räikkönen OR raikkonen

Disabled
Enabled

Forgive non-matching keywords

By default, as many other search engines, Pickmybrain runs in boolean search mode. This means that every provided keyword has to be found on each resulting web page or otherwise the web page is not considered to be a match.

Some missing keywords can be overlooked thought, as providing an empty or near empty resultset is usually bad practice. Missing keywords will affect on the matching pages final score in the following manner: (number of found keywords / number of provided keywords) * normal score. In this way, even documents with missing keywords can be in the top results if other results are poor.

Disabled
Enabled

Enable prefix match quality scoring

If given search term matches prefix, postfix or an infix of another word, this option chooses whether these kind of matches will be treated as equal or non-equal to exact matches. If this feature is disabled, each prefix will have a score of 1. Example:

Provided keyword: state, length: 5, quality scoring enabled
Match 1: state, 5/5 = score 1.0
Match 2: states, 5/6 = score 0.833
Match 3: statement, 5/9 = score 0.555
Match 4: estates, 5/7 = score 0.714

Disabled
Enabled

Prefix/Postfix/Infix Expansion limit

Limits the amount of prefixes, postfixes and infixes that the search term can match. Closest results come first.

Larger value: more results, slower
Smaller value: less results, faster

Query logging

As the name suggests, this feature stores all searches plus additional information such as date, user's ip address, count of returned results, selected search mode and query processing time. This data might be crucial for improving your service.

Disabled
Enabled

Categories

Define categories (optional)

Pages can be categorized either by adding a specific HTML attribute in them or by filtering them by their respective URL addresses. Searches can be then limited to these user defined categories only. Each page can have up to three different categories.

Categorizing with attributes:
Create a new element or modify an existing element and add following attributes:
<div id="pmb-category" data-pmb-category="sports,foods,news"> </div>
The attribute data-pmb-category now contains three user-defined categories: sports, foods and news. For these categories to work, they must also be defined below, each as their own category. Set the category types as Attribute. These types of categories are case-insensitive.

Categorizing with URLs:
For to a web page to match a category, user can filter them by giving wanted and non-wanted keywords.
example keywords: wantedword thistoo -butnotthis
Set the category type as URL. These types of categories are case-sensitive.

Category keyword(s)
Category description
Type

General settings

Script execution method

By default script are launched via exec() function resulting in non-blocking background processes. However, if this is not possible, alternative methods can be used instead.

Notice: Indexing PDF-files requires the exec() script execution method.

Use exec() ( recommended )
Alternative method

MySQL Data Directory

This setting makes it possible to store the search index ( a group of MySQL InnoDB tables ) into a custom location. Please use this setting only if you really know what you are doing. Example: You have configured your MySQL data directory on a HDD disk, but you have also got a SSD available. Therefore you can make the search index faster by storing the data files into the SSD.

Notice: If this setting is modified, the search index will be deleted and re-indexing is needed.

Unfortunately it seems that your MySQL does not support this feature at this moment. Please set the global variable innodb_file_per_table ON.