User guide's home page

Detection rules

Dection rules are meant to specify what to grab on a page when automatic detection is not working properly on a particular website.

It's a little bit technical as it involves regular expressions and xpath queries. But it provides a way to grab content no matter how the site is built.

The detection rules are configured in the option screen accessible by the Options menu item.
options menu item

The options screen appears.
options
Then clicking on the detection rules button make them appear.
rules button

detection rules

To add a new rule click on the Add rule button.
add rule

A form appears to let you describe the rule.
add rule

First you have to enter a name for the rule. It is just a name that will appear in left list once the rule is added.
The second field is the regular expression. Everytime a content is about to be grabbed from a page (or a feed), if the url of the page matches the entered regular expression, then the rule will be applied.
In the above example we specify that this rule should apply to every page of which the url contains 'name_of_site'.
The xPath field lets you specify which part of the page should be grabbed.
In the above example you tell to grab the nodes with id equal to 'article': //*[@id='article']
Another example would be to grab all nodes with class 'myClass': //*[@class='myClass']

In most cases the zone you want to grab has an id, or a class. If because of bad luck the page doesn't use any id, class or other attribute that could easily identify the zone to grab, then you will have to use the full xPath path to the html element. The xPath to set in GrabMyBooks in case of full xPath path starts after the body element. Example: /html/body/table[2]/tbody/tr/td[6]. You must enter: /table[2]/tbody/tr/td[6].

Here are some examples on how to precisly select the content to grab.

Once a rule is added after a click on Ok, it appears on screen in the left list. By clicking on it it is possible to edit it and more importantly to change its position in the list.

A page could match more than one regular expression which means it could be compatible with more than one detection rule. In such a case, the rule that is the most on top in the list is going to be used.
rule priority

Rule editor

In the options screen, rules can be defined to precisely describe which part(s) of the page must be grabbed on a particular website. It is a bit technical as xPath is used to tell which zones are to grab. The rule editor allows the user to define rules in an graphical way, just using the mouse. When you are on a page for which the automatic detection doesn't suit you, click on the Rule editor menu item. The rule editor panel appears on screen.
Rule editor menu item

Rule editor panel
You can move the panel on each side of the screen by clicking on the green arrows.

As you move the mouse on the screen, the parts grabbable by GrabMyBooks are highlighted and the computed xPath rule is displayed in blue on the panel.
Rule editor mouse over highlighted zone
To see which zone would have been automatically detected by GrabMyBooks check the Show default selection box. The zone appears in gray on screen.
Rule editor default selection

To select the zone(s) that you want to be part of the rule just click on them. They are outlined in red and are added in the rule editor panel.
Rule editor selection
If a zone has sub elements you can decide not to take into account the first few ones or the last few ones by clicking on the related buttons.
Rule editor sub selection
The two buttons on the top left corner enable you to not select the first few sub elements. The two buttons on the bottom right enable you to not select the last few sub elements. The button on the top right corner restores the bounds of the sub selection.
Rule editor sub selection example
To deselect a zone, click again on it or close the corresponding info box in the rule editor panel.
Rule editor close selection
You can select as many zones as you want.

Test your selection by clicking on the Add to book button. The grabbed content is added to the current book.
Rule editor add to book

Then save your rule by clicking on the Save rule button. The detection rule part of the options screen appears and let you save the rule. You can override an existing rule if you want.
Rule editor save rule

Rule editor new or edit rule

Rule editor save rule in options screen
From now on, for each page of the same website, GrabMyBooks will try and detect the zones you described.

Example:
On this web site the default detection doesn't suit me because the article image is not grabbed.
Rule editor example 1
So I decide to create a rule to include it. One click on the image and one click on the article text.
Rule editor example 2
I test the rule by clicking on the Add to book button. It seems ok, the rule can then be saved by clicking on the Save rule button.
Rule editor example 3
And it works also with other pages from the same website.
Rule editor example 4

Please note that the rules don't work wich content generated on the client side via javascript.

Next page rule

On some websites, articles are spread accross several pages. This can make it a bit difficult to have to grab each separated page to get the whole article. This is especially true when you come on a regular basis on the same website to grab content.
In a rule it is possible to specify how to reach the next page of an article. When adding/editing a rule, the field dedicated to the detection of the next page is named Next page href is located at xPath.
Next page rule field
This field can contain an attribute xpath rule that would give the address of the next article page. Example: //a[@class='next']/@href. This says that the address of the next page of the article is to be found in a link having the class 'next'.
Of course this varies depending on the website.

Link on page rule

This is useful for feeds grabbing. Sometimes, links in feed entries lead to an intermediate page with an other link to the actual article. This can also be useful for websites aggregating content from other websites. They display the first few lines of the article and then there is a link to reach the full content.
In such cases one can define a rule to grab not the page directly but a link on the page. When adding/editing a rule, the field dedicated to the detection of the link to grab instead of the page is named Don't grab the page itself but a link on the page of which href is located at xPath
Link on page rule field
This field can contain an attribute xpath rule that would give the address of the article to grab. Example: //a/@href. This says that the address of the article to grab instead of the current page is to be taken from the first link encountered on the page.
Of course this varies depending on the website.

User guide's home page