Input patterns

An input pattern declares a known text structure for which the bot has a response. Input patterns that have the same output are grouped together in a rule.  Multiple rules that have something in common are grouped together in a topic. An input pattern can consist of: statics, variables, thesaurus variables, operators, sub-topics and/or sub-rules.

Statics

A static is a regular word, number or sign that has to be matched with the input, depending on the chatbot’s properties, this has to be an exact match or a synonym Spaces and capitals in the input patterns have no importance: spaces (also newlines and tabs) are skipped and capitals are removed. ex:

This is a test.
Some static text.
I am Jan.

Statics have the highest precedence value, so if there are 2 rules that can both match the same input, but the first rule is using all statics and the second isn’t, than the ‘all-statics’ comes first.

Variables

A variable is a named, temporary memory location that will store the input found at the same position as the variable was declared in the pattern. Variables can come in different flavors: with a specific length, a range or no length specification, they can be ‘space collectors’ or not and you can optionally assign a custom weight to a variable.

Regular variables

This type of variable will continue to collect tokens until the next value as define in the pattern is found, the end of the input is encountered or the end of a sentence is found (the ‘end-of-sentence tokens can be defined in the project’s properties). It’s illegal to put another variable after a regular one, since the first won’t know when to stop collecting input otherwise. Ex:

Is $name your name?

This will match any input of the form ‘Is x your name?’ where x can be of any length.

Variables with length

This case is the easiest for the engine to collect, A variable of this type will only collect the specified amount of input tokens, no more, no less. As an example, take:

Is $name:1 your name?

Is $name:2 your name?

The first example will collect  only 1 word in the variable called ‘name’. So an input ‘Is Aici your name?’ will match, but ‘Is Marie Antoinette your name’ will not.  I’ve added the second example to show what happens with spaces: by default, they are skipped and not counted. So ‘Is Marie Antoinette your name’ will match with the second pattern.

Any type of input part can come after this type of variable, including other variables. This is because the number of words that the variable should collect is always fixed and doesn’t depend on what comes after the variable.

Variables with a range

The next class of variables defines a range. The range determines the minimum and maximum number of words that the variable should collect.  It will stop collecting input before the maximum is reached, when a token is found in the input that is a known next-value of the variable. For this reason, no other variable can come after a ranged-var. That’s because variables can collect any kind of token, so it would be impossible to know where the first variable stops and the second begins. Ex:

Is $name:1-2 your name?

This example will match both ‘Is Aici your name?’ and ‘Is Marie Antoinette your name’.

Space collectors

As already mentioned, variables normally don’t collect spaces. This is primarily to make certain that double spaces are eliminated in compound words. Sometimes however, it is required to store the spaces. For instance, in file paths, spaces tend to play an important role. That’s why all 3 types of variables can be expended with ‘:CollectSpaces’ (capitals not important). This will let the pattern matcher know that it needs to collect the spaces in between the words. They wont be counted in the length or range though. Ex:

Copy $from:3:collectspaces to $to
Copy $from:3-3:collectspaces to $to
Copy $from:collectspaces to $to

The first pattern will collect 3 words with all the spaces in between, the second between 3 and  6 words, all spaces in between and the last will collect any number of tokens.

Custom weights

Starting from version 1.2, the engine supports the option to assign a custom weight to variables. This is done with the ‘%’ operator. By default, a variable has very little weight (only 0.0000001 per word) while a thesaurus comes in at 0.5 and a static weighs 1.  Sometimes however, it is useful to make a variable more important, hence this new feature. Here are some examples:

Copy $from:3%0.3 to $to
Copy $from:3-3:collectspaces%2 to $to
Copy $from%0.5 to $to

Thesaurus variables

A thesaurus variable is a named memory locations that stores the input which was found and which is a child of the thesaurus node specified in the thesaurus variable. The thesaurus node is identified through a path. By default, a path is always followed using the ‘is a’ relationship, but this can be overwritten as the first path item after the name of the variable.  Next in the path is always the part of speech value. This can be: noun, verb, adverb (adv), adjective (adj), pronoun (pron), article (art), compelementizer (comp), conjunction (conj), , interjection (inter), preposition (prep), number, integer (int), double and any. The last one is a catch-all that covers all types of pos values. And after the pos, you can optionally declare a series of thesaurus entries that further specify the path. This doesn’t have to start at the root of the thesaurus, as long as enough values are specified to find a single and unique item.

The EBNF for thesaurus variables is defined as:

ThesVariable = ‘^’identifier ['->' relationship ] ‘:’ pos-type { PathItem };
PathItem = ‘.’ ( identifier | ‘(‘ identifier ‘)’ );

Note that path items can be put between brackets. This is for thesaurus entries that consist out of multiple words. This is required cause otherwise the parser thinks that the thesaurus path has come to an end and it will see the next word as a static, which we can avoid by using brackets.

some examples

^input:noun.name

^input:adjective.(age related)

^input->similar:adj.absolute

Thesaurus variables have a bigger impotence compared to regular variables, both are less than statics. A thesaurus variable comes in at about half the weight of a static while a regular value weights only 1/100th of a static.

operators

Operators are used to control the flow of the pattern matching process in some way.

Name Symbol Description
Loop {} With the loop operator, you can declare a section that can occur 0, 1 or more times.
Option [] The content between an option can occur 0 or 1 times.
Group () Are used to group items together. Usually used in conjunction with 1 or more conditionals.
Conditional | Forces a choice between the items on the left and right side of the line. Normally used inside an option, loop or group.
And && Allows for wholes in the pattern: the items on the left and right side of && don’t need to be next to each other in the input
Start of input |< Indicates that the pattern can only match if it is at the start of the input.
End of input >| Indicates that the pattern can only match when it is at the end of the input.

 

Some examples:

a {b | c}  //matches: a | x a | a b| a b c | a b b | a b c b c b |…

a (b | c)  //matches a b | a c | x x x a b

a [b | c]  //matches a | a b | a c

a && b     //matches a b | a xxx b | a x x b |…

|< a b     //matches a b xxx   but not: xxx a b

a b >|     //matches xxx a b   but not: a b xxx

Most combinations of operators and variables are allowed except that regular variables (that don’t have a single length value), can’t be the first item of an option or loop. This includes any conditionals inside an option or loop: they also can’t be followed by a regular or ranged variable. That is because it is impossible to determine this way which path to follow: a regular variable can collect any kind of text. This problem does not exist with thesaurus variables.

Sub topics and rules

It is possible to reference other topics and rules within a pattern. This means that any of the patterns defined in the topic or rule can be found at that location. It is also legal for a pattern to reference it’s own rule or topic, which creates recursion. This is a very powerful feature, but also dangerous. If not used carefully, you can get into loops that don’t exit.

The EBNF for sub topics and rules:

Sub =  ‘~’ TopicName ['.' RuleName]

TopicName = RuleName = identifier | ‘(‘ identifier {identifier}  ‘)’

ex:

~subject                 //references one of the standard topics

~numbers.Add             //references the add rule within the numbers topic

~(hell world)            //a topic who’s name contains multiple words

Because topics and rules are referenced by their name inside the patterns, each topic and every rule within each topic needs to have a unique name. The editor will verify this, but not enforce it. When you use a duplicate name for a topic, the icon will be red, duplicate rule names will produce errors in the log.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>