Appendix A - Index RecordContainer Properties

Index field types

The exact Solr configuration of the individual index field types can change and differ slightly between ADITO versions.

If you are not sure which field type fits for an EntityField, the Solr AdminUI offers a way to test the analysis of the values for individual field types. The names of the field types in the schema mostly match the field type name in lowercase, with the exceptions of TEXT and TEXT_NOSTOPWORDS, which start with 'adito_'.

Path example: http://localhost:8983/solr/#/test\_solr9/analysis

Path pattern: http://<solr-host-and-port>/solr/#/<collection>/analysis

The following section describes all currently available index field types, as they can be set in the indexFieldType property of the RecordFieldMappings of an IndexRecordContainer.

ADDRESS

Description:
Field type for address data such as country, city, postal code, or street names. This type normalizes input (e.g., by converting umlauts and special characters) and additionally generates phonetic tokens using RefinedSoundex. Synonyms are planned, but currently not active. The goal is robust match accuracy for different spellings and abbreviations, such as country codes.

Example:
Input:
Konrad-Zuse-Straße 4 DE - 84144 Geisenhausen
Tokens before phonetics:
"KonradZuseStrasse", "Konrad-Zuse-Strasse", "Konrad", "Zuse", "Strasse", "4", "DE", "-", "84144", "Geisenhausen"
Tokens with phonetics:
"k3089065030369030", "konradzusestrasse", "k308906", "konrad", "z5030", "zuse", "s369030", "strasse", "4", "d60", "de", "84144", "g403080308", "geisenhausen"

Solr field type: address
Content types: TEXT

Properties:

Attribute	Value	Notes
Type	`TextField`
Tokenizer	`WhitespaceTokenizer`
Lower-Caseing	yes
Stopwords	no
ASCII-Folding	yes	replaces e.g. umlauts
Normalization	yes	GermanNormalization
Word Delimiter	yes	fully active
Phonetic Analysis	yes	RefinedSoundex
Synonyms	planned	not yet active
Leading Wildcards Support	no

Solr configuration:

<!-- fieldType for Addresses -->
<fieldType name="address" className="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer className="solr.WhitespaceTokenizerFactory"/>
    <filter className="solr.WordDelimiterGraphFilterFactory"
            generateWordParts="1" generateNumberParts="1"
            catenateWords="1" catenateNumbers="1" catenateAll="1"
            splitOnCaseChange="1" preserveOriginal="1"/>
    <filter className="solr.SynonymGraphFilterFactory"
            synonyms="lang/adito/address_synonyms.txt"
            ignoreCase="true" expand="false"/>
    <filter className="solr.GermanNormalizationFilterFactory"/>
    <filter className="solr.ASCIIFoldingFilterFactory"/>
    <filter className="solr.PhoneticFilterFactory" encoder="RefinedSoundex"/>
    <filter className="solr.LowerCaseFilterFactory"/>
    <filter className="solr.FlattenGraphFilterFactory"/>
    <filter className="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer className="solr.WhitespaceTokenizerFactory"/>
    <filter className="solr.WordDelimiterGraphFilterFactory"
            generateWordParts="1" generateNumberParts="1"
            catenateWords="1" catenateNumbers="1" catenateAll="1"
            splitOnCaseChange="1" preserveOriginal="1"/>
    <filter className="solr.SynonymGraphFilterFactory"
            synonyms="lang/adito/address_synonyms.txt"
            ignoreCase="true" expand="false"/>
    <filter className="solr.GermanNormalizationFilterFactory"/>
    <filter className="solr.ASCIIFoldingFilterFactory"/>
    <filter className="solr.PhoneticFilterFactory" encoder="RefinedSoundex"/>
    <filter className="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

BOOLEAN

Description:
Primitive field type for boolean values. Accepts true or false. Values starting with 1, t, or T are interpreted as true, all others as false. No analysis or tokenization.

Example:
Input: true
Stored value: true
Input: 0
Stored value: false

Solr field type: boolean, booleans
Content types: BOOLEAN

Properties:

Attribute	Value	Notes
Type	`BoolField`	Primitive field

Solr configuration:

<!-- boolean type: "true" or "false" -->
<fieldType name="boolean" className="solr.BoolField" sortMissingLast="true"/>
<fieldType name="booleans" className="solr.BoolField" sortMissingLast="true" multiValued="true"/>

COMMUNICATION

Description:
General field type for communication data such as phone numbers, email addresses, and URLs. The input is treated as a single token and then split into components (e.g., domain, TLD, local parts) via the WordDelimiter filter. Includes ASCII-Folding and lowercasing. For pure phone numbers, the TELEPHONE field type is recommended.

Example:
Email input:
info@adito-software.de
Tokens: "info@adito-software.de", "info", "adito", "software", "de", "infoaditosoftwarede"

Phone number input:
+49 (8743) 9664-0
Tokens: "+49 (8743) 9664-0", "49", "8743", "9664", "0", "49874396640"

URL input:
https://www.adito.de/unternehmen/philosophie.html
Tokens: "https://www.adito.de/unternehmen/philosophie.html", "https", "www", "adito", "de", "unternehmen", "philosophie", "html", "httpswwwaditodeunternehmenphilosophiehtml"

Solr field type: communication_address
Content types: TEXT, EMAIL, TELEPHONE, LINK

Properties:

Attribute	Value	Notes
Type	`TextField`
Tokenizer	`KeywordTokenizer`
Lower-Caseing	yes
Stopwords	no
ASCII-Folding	yes
Normalization	no
Word Delimiter	yes	full functionality
Phonetic Analysis	no
Synonyms	no
Leading Wildcards Support	yes

Solr configuration:

<!-- general fieldType for email, urls and phone-numbers -->
<fieldType name="communication_address" className="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer className="solr.KeywordTokenizerFactory"/>
    <filter className="solr.ASCIIFoldingFilterFactory"/>
    <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
    <filter className="solr.LowerCaseFilterFactory"/>
    <filter className="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2" maxFractionAsterisk="0"/>
    <filter className="solr.FlattenGraphFilterFactory"/>
    <filter className="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer className="solr.KeywordTokenizerFactory"/>
    <filter className="solr.ASCIIFoldingFilterFactory"/>
    <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
    <filter className="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

DATE

Description:
Primitive field type for date values. Supports timestamps with millisecond precision in ISO format (yyyy-MM-dd'T'HH:mm:ss[.SSS]Z). Storage is as a point field for fast range queries.
See also: Solr Working with Dates

Example:
Input: 2022-03-15T14:22:00Z
Stored value: 2022-03-15T14:22:00Z

Solr field type: pdate, pdates
Content types: DATE

Properties:

Attribute	Value	Notes
Type	`DatePointField`	Primitive field

Solr configuration:

<!-- KD-tree versions of date fields -->
<fieldType name="pdate" className="solr.DatePointField" docValues="true"/>
<fieldType name="pdates" className="solr.DatePointField" docValues="true" multiValued="true"/>

DOUBLE

Description:
Primitive field type for 64-bit floating point numbers (double). Stored as a point field for efficient range and value queries.

Example:
Input: 3.14159
Stored value: 3.14159

Solr field type: pdouble, pdoubles
Content types: NUMBER

Properties:

Attribute	Value	Notes
Type	`DoublePointField`	Primitive field

Solr configuration:

<fieldType name="pdouble" className="solr.DoublePointField" docValues="true"/>
<fieldType name="pdoubles" className="solr.DoublePointField" docValues="true" multiValued="true"/>

EMAIL

Description:
Field type specifically for email addresses. Non-ASCII characters are normalized, special characters generate additional tokens. WordDelimiter splits on special characters except CamelCase.

Example:
Input: info@adito-online.de
Tokens: "info@adito-online.de", "info", "adito", "online", "de", "infoadito", "aditoonline", "onlinede", "infoaditoonlinede"

Solr field type: email_address
Content types: TEXT, EMAIL

Properties:

Attribute	Value	Notes
Type	`TextField`
Tokenizer	`WhitespaceTokenizer`
Lower-Caseing	yes
Stopwords	no
ASCII-Folding	yes
Normalization	no
Word Delimiter	yes	except CamelCase
Phonetic Analysis	no
Synonyms	no
Leading Wildcards Support	yes

Solr configuration:

<fieldType name="email_address" className="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer className="solr.WhitespaceTokenizerFactory"/>
    <filter className="solr.ASCIIFoldingFilterFactory"/>
    <filter className="solr.LowerCaseFilterFactory"/>
    <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>
    <filter className="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2" maxFractionAsterisk="0"/>
    <filter className="solr.FlattenGraphFilterFactory" />
    <filter className="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer className="solr.WhitespaceTokenizerFactory"/>
    <filter className="solr.ASCIIFoldingFilterFactory"/>
    <filter className="solr.LowerCaseFilterFactory"/>
    <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>
  </analyzer>
</fieldType>

HTML

Description:
Field type for HTML content. Internally treated like TEXT_NO_STOPWORDS, i.e., no stopword filtering and no special HTML analysis.

Example:
Input: <p>Willkommen bei ADITO!</p>
Tokens: "willkommen", "bei", "adito"

Solr field type: adito_text_nostopwords
Content types: TEXT, HTML

Properties:

Attribute	Value	Notes
Type	`TextField`
Tokenizer	`StandardTokenizer`
Lower-Caseing	yes
Stopwords	no
ASCII-Folding	yes
Normalization	yes	German
Word Delimiter	no
Phonetic Analysis	no
Synonyms	yes	Solr default
Leading Wildcards Support	no

Solr configuration:
see section TEXT_NO_STOPWORDS

INTEGER

Description:
Primitive field type for 32-bit signed integers (int). Stored as a point field for efficient range queries.

Example:
Input: 42
Stored value: 42

Solr field type: pint, pints
Content types: NUMBER

Properties:

Attribute	Value	Notes
Type	`IntPointField`	Primitive field

Solr configuration:

<fieldType name="pint" className="solr.IntPointField" docValues="true"/>
<fieldType name="pints" className="solr.IntPointField" docValues="true" multiValued="true"/>

LOCATION

Description:
Primitive field type for geographic coordinates (latitude/longitude pairs, format: lat,lon). Supports spatial search and distance calculation. Stored as a point field.
See also: Solr Spatial Search

Example:
Input: 48.123456,11.654321
Stored value: 48.123456,11.654321

Solr field type: location
Content types: TEXT

Properties:

Attribute	Value	Notes
Type	`LatLonPointSpatialField`	Primitive field

Solr configuration:

<!-- A specialized field for geospatial search filters and distance sorting. -->
<fieldType name="location" className="solr.LatLonPointSpatialField" docValues="true"/>

LONG

Description:
Primitive field type for 64-bit signed integers (long). Stored as a point field for efficient range queries.

Example:
Input: 12345678901234
Stored value: 12345678901234

Solr field type: plong, plongs
Content types: NUMBER, FILESIZE, DATE

Properties:

Attribute	Value	Notes
Type	`LongPointField`	Primitive field

Solr configuration:

<fieldType name="plong" className="solr.LongPointField" docValues="true"/>
<fieldType name="plongs" className="solr.LongPointField" docValues="true" multiValued="true"/>

LONG_TEXT

Description:
Field type for large text content such as PDFs with many pages or entire books.

This field type behaves like the TEXT type with three exceptions:

Stopwords are already filtered during indexing.
Separated words due to a line break are joined together again.
The large property prevents the contents from being loaded into the (Solr) cache.

IN: A new infographic about our logo should …
OUT: "neue"(2), "infografik"(3), "unser"(5), "logo"(6), "soll"(7),

Solr field type: adito_text_large

Content types: TEXT, FILE, HTML

Properties:

Attribute	Value	Notes
Type	TextField
Tokenizer	WhitespaceTokenizer
Lower-Caseing	YES
Stopwords	YES
ASCII-Folding	YES
Normalization	YES
Word Delimiter	YES	Only special characters
Phonetic	NO
Synonyms	YES	Solr default → Internally empty
Leading Wildcards Support	NO

The large attribute prevents the contents of the field from being loaded into the Solr cache. This increases search performance, as large texts, such as the entire content of a PDF, do not "clog" the cache.
However, fields with this attribute cannot be multiValued!

Solr Schema

    <fieldType name="adito_text_large" className="solr.TextField" positionIncrementGap="100" multiValued="true" large="true">
          <analyzer type="index">
            <tokenizer className="solr.WhitespaceTokenizerFactory"/>
            <filter className="solr.HyphenatedWordsFilterFactory"/>
            <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" preserveOriginal="0"/>
            <filter className="solr.StopFilterFactory" ignoreCase="true" words="lang/adito/stopwords_mixed.txt" format="snowball"/>
            <filter className="solr.GermanNormalizationFilterFactory"/>
            <filter className="solr.ASCIIFoldingFilterFactory"/>
            <filter className="solr.LowerCaseFilterFactory"/>
            <filter className="solr.FlattenGraphFilterFactory"/>
            <filter className="solr.RemoveDuplicatesTokenFilterFactory"/>
          </analyzer>
          <analyzer type="query">
            <tokenizer className="solr.StandardTokenizerFactory"/>
            <filter className="solr.StopFilterFactory" ignoreCase="true" words="lang/adito/stopwords_mixed.txt" format="snowball"/>
            <filter className="solr.GermanNormalizationFilterFactory"/>
            <filter className="solr.ASCIIFoldingFilterFactory"/>
            <filter className="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter className="solr.LowerCaseFilterFactory"/>
          </analyzer>
        </fieldType>

PHONETIC_NAME

Description:
Field type for phonetic content such as personal names. This type uses a phonetic filter (BeiderMorseFilter) to analyze the content. This enables matches for terms or names that sound similar, e.g., "Meier", "Maier", and "Mayer".

IN: Tim Meier
OUT: "tim"(1), "tn"(1), "mDr"(2)

IN: Tim Maier
OUT: "tim"(1), "tn"(1), "mDr"(2)

Solr field type: phonetic_name

Content types: TEXT

Properties:

Attribute	Value	Notes
Type	TextField
Tokenizer	StandardTokenizer
Lower-Caseing	YES
Stopwords	NO
ASCII-Folding	NO
Normalization	NO
Word Delimiter	YES	CamelCase
Phonetic	YES	BeiderMorse
Synonyms	YES	Currently empty
Leading Wildcards Support	YES

Solr Schema

    <fieldType name="phonetic_name" className="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer className="solr.StandardTokenizerFactory"/>
                <filter className="solr.SynonymGraphFilterFactory" synonyms="lang/adito/pers_name_synonyms.txt" ignoreCase="true"/>
                <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
                <filter className="solr.BeiderMorseFilterFactory" nameType="GENERIC" ruleType="APPROX" concat="true" languageSet="auto" />
                <filter className="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2" maxFractionAsterisk="0"/>
                <filter className="solr.FlattenGraphFilterFactory" />
                <filter className="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer className="solr.StandardTokenizerFactory"/>
                <filter className="solr.SynonymGraphFilterFactory" synonyms="lang/adito/pers_name_synonyms.txt" ignoreCase="true"/>
                <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
                <filter className="solr.BeiderMorseFilterFactory" nameType="GENERIC" ruleType="APPROX" concat="true" languageSet="auto" />
            </analyzer>
        </fieldType>

PROPER NAME

Description:
Field type for proper names such as company names. The content is first normalized (umlauts & non-ASCII characters) and then analyzed with a simple phonetic filter (DoubleMetaphoneFilter).

IN: quick-mix
OUT: "quickmix"(1), "KKMK"(1), "quick-mix"(1), "quick"(1), "KK"(1), "mix"(1), "MKS"(1)

Solr field type: proper_name

Content types: TEXT

Properties:

Attribute	Value	Notes
Type	TextField
Tokenizer	StandardTokenizer
Lower-Caseing	YES
Stopwords	NO
ASCII-Folding	YES
Normalization	YES
Word Delimiter	YES	FULL
Phonetic	YES	DoubleMetaphone
Synonyms	NO
Leading Wildcards Support	YES

Solr Schema

    <fieldType name="proper_name" className="solr.TextField" positionIncrementGap="100">
                <analyzer type="index">
                    <tokenizer className="solr.WhitespaceTokenizerFactory"/>
                    <filter className="solr.GermanNormalizationFilterFactory"/>
                    <filter className="solr.ASCIIFoldingFilterFactory"/>
                    <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
                    <filter className="solr.LowerCaseFilterFactory"/>
                    <filter className="solr.DoubleMetaphoneFilterFactory"/>
                    <filter className="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2" maxFractionAsterisk="0"/>
                    <filter className="solr.FlattenGraphFilterFactory" />
                    <filter className="solr.RemoveDuplicatesTokenFilterFactory"/>
                </analyzer>
                <analyzer type="query">
                    <tokenizer className="solr.WhitespaceTokenizerFactory"/>
                    <filter className="solr.GermanNormalizationFilterFactory"/>
                    <filter className="solr.ASCIIFoldingFilterFactory"/>
                    <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
                    <filter className="solr.LowerCaseFilterFactory"/>
                    <filter className="solr.DoubleMetaphoneFilterFactory"/>
                </analyzer>
            </fieldType>

STRING

Description:
Primitive field type for short strings (UTF-8). No analysis or tokenization – the value is stored as-is. Suitable for fields up to ~32 KB.

Example:
Input: ADITO123
Stored value: ADITO123

Solr field type: string, strings
Content types: TEXT, BOOLEAN, DATE

Properties:

Attribute	Value	Notes
Type	`StrField`	no analysis, stored as given

Solr configuration:

<!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
<fieldType name="string" className="solr.StrField" sortMissingLast="true" docValues="true" />
<fieldType name="strings" className="solr.StrField" sortMissingLast="true" multiValued="true" docValues="true" />

TELEPHONE

Description:
Field type optimized for phone numbers.

This type can handle area codes and + signs. The digits of the number are concatenated (e.g.: +49 871 123456 → 0049871123456) and then additional sub-numbers (n-grams) are generated.

IN: +49 (8743) 9664-0
OUT: "004987439660"(1), "00498743966"(1), "0049874396"(1), "004987439"(1), … "04987439660"(1), "4987439660"(1), "987439660"(1), "87439660"(1), … "004"(1), "049"(1), "498"(1) … "966"(1), "660"(1)

Solr field type: phone_number

Content types: TEXT, TELEPHONE

Properties:

Attribute	Value	Notes
Type	TextField
Tokenizer	KeywordTokenizer
Lower-Caseing	YES
Stopwords	NO
ASCII-Folding	NO
Normalization	NO
Word Delimiter	YES	Only for numbers
Phonetic	NO
Synonyms	NO
Leading Wildcards Support	NO

Solr Schema

    <fieldType name="phone_number" className="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer className="solr.KeywordTokenizerFactory"/>
                <filter className="solr.LowerCaseFilterFactory"/>
                <filter className="solr.PatternReplaceFilterFactory" pattern="^[+]" replacement="00" replace="first"/>
                <filter className="solr.PatternReplaceFilterFactory" pattern="^0([^0])" replacement="$1" replace="first"/>
                <filter className="solr.PatternReplaceFilterFactory" pattern="\s" replacement="-"/>
                <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="0" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="0"/>
                <filter className="solr.NGramFilterFactory" minGramSize="3" maxGramSize="30"/>
                <filter className="solr.FlattenGraphFilterFactory" />
                <filter className="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer className="solr.KeywordTokenizerFactory"/>
                <filter className="solr.LowerCaseFilterFactory"/>
                <filter className="solr.PatternReplaceFilterFactory" pattern="^[+]" replacement="00" replace="first"/>
                <filter className="solr.PatternReplaceFilterFactory" pattern="^0([^0])" replacement="$1" replace="first"/>
                <filter className="solr.PatternReplaceFilterFactory" pattern="\s" replacement="-"/>
                <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="0" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="0"/>
            </analyzer>
        </fieldType>

TEXT

Description:
Type for standard text.

The content is normalized (umlauts & non-ASCII characters). Terms with special characters and CamelCase are additionally split.

During search, stopwords are filtered; however, if the pattern only contains stopwords (e.g., 'AT', which is also a country code), the stopword filter is ignored.

Example: Indexing

IN: Eine neue Infografik über unser Logo soll …
OUT: "eine"(1), "neue"(2), "infografik"(3), "uber"(4), "unser"(5), "logo"(6), "soll"(7),

Example: Searching

IN: Eine neue Infografik über unser Logo soll …
OUT: "neue"(2), "infografik"(3), "logo"(6), "soll"(7),

Example: Searching only stopwords

IN: Sein oder nicht sein
OUT: "sein"(1), "oder"(2), "nicht"(3), "sein"(4),

Solr field type: adito_text

Content types: TEXT, FILE, HTML

Properties:

Attribute	Value	Notes
Type	TextField
Tokenizer	StandardTokenizer
Lower-Caseing	YES
Stopwords	YES
ASCII-Folding	YES
Normalization	YES	German
Word Delimiter	YES	only word splitting
Phonetic	NO
Synonyms	YES	Solr default → Internally empty
Leading Wildcards Support	NO

Solr Schema

    <!-- Default ADITO text field used by dynamic schema -->
        <fieldType name="adito_text" className="solr.TextField" positionIncrementGap="100" multiValued="true">
            <analyzer type="index">
                <tokenizer className="solr.StandardTokenizerFactory"/>
                <filter className="solr.GermanNormalizationFilterFactory"/>
                <filter className="solr.ASCIIFoldingFilterFactory"/>
                <filter className="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
                <filter className="solr.LowerCaseFilterFactory"/>
                <filter className="solr.FlattenGraphFilterFactory"/>
                <filter className="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer className="solr.StandardTokenizerFactory"/>
                <filter className="solr.SuggestStopFilterFactory" ignoreCase="true" words="lang/adito/stopwords_mixed.txt" format="snowball"/>
                <filter className="solr.GermanNormalizationFilterFactory"/>
                <filter className="solr.ASCIIFoldingFilterFactory"/>
                <filter className="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter className="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>

TEXT_NO_STOPWORDS

Description:
Standard text field (TEXT) without stopword filtering.

Example

IN: Eine neue Infografik über unser Logo soll …
OUT: "eine"(1), "neue"(2), "infografik"(3), "uber"(4), "unser"(5), "logo"(6), "soll"(7),

Solr field type: adito_text_nostopwords

Content types: TEXT, FILE, HTML

Properties:

Attribute	Value	Notes
Type	TextField
Tokenizer	StandardTokenizer
Lower-Caseing	YES
Stopwords	NO
ASCII-Folding	YES
Normalization	YES	German
Word Delimiter	NO
Phonetic	NO
Synonyms	YES	Solr default → Internally empty
Leading Wildcards Support	NO

Solr Schema

    <fieldType name="adito_text_nostopwords" className="solr.TextField" positionIncrementGap="100" multiValued="true">
          <analyzer type="index">
            <tokenizer className="solr.StandardTokenizerFactory"/>
            <filter className="solr.GermanNormalizationFilterFactory"/>
            <filter className="solr.ASCIIFoldingFilterFactory"/>
            <filter className="solr.LowerCaseFilterFactory"/>
          </analyzer>
          <analyzer type="query">
            <tokenizer className="solr.StandardTokenizerFactory"/>
            <filter className="solr.GermanNormalizationFilterFactory"/>
            <filter className="solr.ASCIIFoldingFilterFactory"/>
            <filter className="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter className="solr.LowerCaseFilterFactory"/>
          </analyzer>
        </fieldType>

TEXT_PLAIN

Description:
Field type for texts whose content should not be analyzed.
This type only eliminates punctuation and transforms the text to lowercase.

This field type treats 'ä', 'ö', 'ü', and 'ß' as distinct characters.

Example

IN: Neue ADITO Schreibblöcke!
OUT "neue"(1) "adito"(2) "schreibblöcke"(3)

Solr field type: text_plain

Content types: TEXT, FILE, HTML

Properties:

Attribute	Value	Notes
Type	TextField
Tokenizer	StandardTokenizer
Lower-Caseing	YES
Stopwords	NO
ASCII-Folding	NO
Normalization	NO
Word Delimiter	NO
Phonetic	NO
Synonyms	YES	Solr default → Internally empty
Leading Wildcards Support	NO

Solr Schema

    <fieldType name="text_plain" className="solr.TextField" positionIncrementGap="100" multiValued="true">
          <analyzer type="index">
            <tokenizer className="solr.StandardTokenizerFactory"/>
            <!-- in this example, we will only use synonyms at query time
            <filter className="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
            <filter className="solr.FlattenGraphFilterFactory"/>
            -->
            <filter className="solr.LowerCaseFilterFactory"/>
          </analyzer>
          <analyzer type="query">
            <tokenizer className="solr.StandardTokenizerFactory"/>
            <filter className="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
            <filter className="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter className="solr.LowerCaseFilterFactory"/>
          </analyzer>
        </fieldType>

Tokenizer

Tokenizers are responsible for splitting an input text into individual tokens. They operate at the character level and produce a TokenStream, which is further processed by an analyzer. Unlike an analyzer, a tokenizer does not know the field context and only processes the raw format.

The following tokenizers are used in various ADITO field types:

WhitespaceTokenizer

Splits text exclusively at whitespace. Punctuation and special characters are retained.

Input:
"To be, or what?"
Token output:
"To", "be,", "or", "what?"

StandardTokenizer

Splits text at whitespace and most punctuation and special characters. Some characters, such as dots within domains or numeric formats, are not split. The @ character is a separator, so email addresses are fragmented.

Input:
"Please, email john.doe@foo.com by 03-09, re: m37-xq."
Token output:
"Please", "email", "john.doe", "foo.com", "by", "03", "09", "re", "m37", "xq"

KeywordTokenizer

Reads the entire input text as a single token. Used when no splitting should occur – e.g., for phone numbers, IDs, or strings to be stored exactly as entered.

Input:
"Please, email john.doe@foo.com by 03-09, re: m37-xq."
Token output:
"Please, email john.doe@foo.com by 03-09, re: m37-xq."

Filter

Filters process token streams after the tokenizer. They transform, discard, or expand the tokens depending on their function. The filter chain is crucial for the behavior of the field type.

LowerCaseFilter

Converts all letters in a token to lowercase. Other characters remain unchanged.

Example:
Input: "ADITO"
Output: "adito"

ASCIIFoldingFilter

Converts all non-ASCII characters to their ASCII equivalents – e.g., diacritics (umlauts, accents).

Example:
Input: "français, südlich"
Output: "francais", "sudlich"

GermanNormalizationFilter

Normalizes German umlauts, ß, and similar spelling variants. The filter is based on the German2 Snowball algorithm.

Transformations:

ä, ae → a
ö, oe → o
ü, ue → u
ß → ss

WordDelimiterGraphFilter

Splits tokens at word and character boundaries. Typical splits occur at CamelCase, numeric transitions, or hyphens.

Example:
Input: "hotSpot-XL42"
Output: "hot", "Spot", "XL", "42", "hotSpot", "XL42", "hotSpotXL42"

Configurable via:

splitOnCaseChange
splitOnNumerics
preserveOriginal
catenateWords / Numbers / All

PhoneticFilter

Converts tokens into phonetic codes. Supported algorithms:

DoubleMetaphone
– DoubleMetaphoneFilter
– for proper names like "Meyer" / "Meier"
RefinedSoundex
– PhoneticFilterFactory
– simple syllable encoding
BeiderMorse
– BeiderMorseFilterFactory
– designed for personal/last names
– higher precision than Soundex

SynonymGraphFilter

Assigns defined synonyms to existing tokens. Enables semantically equivalent search queries. ADITO currently uses empty synonym lists. The feature is prepared but not active.

Example configuration:

<filter class="solr.SynonymGraphFilterFactory"
        synonyms="mysynonyms.txt"
        ignoreCase="true" expand="true"/>
<filter class="solr.FlattenGraphFilterFactory"/>

Synonym list mysynonyms.txt:

couch,sofa,divan
teh => the
huge,ginormous,humungous => large
small => tiny,teeny,weeny

Input: "teh small couch"
Output: "the", "tiny", "teeny", "weeny", "couch", "sofa", "divan"

ReversedWildcardFilter

Enables efficient search queries with leading wildcards (*foo). Tokens are indexed in reverse.

Input: "*bar"
Output: "rab*"
Tokens without wildcards remain unchanged.

StopFilter

Filters defined stopwords out of the token stream. ADITO uses a combined German-English stopword list (lang/adito/stopwords_mixed.txt).

Example:
Input: "To be or what?"
Tokens before filter: "To", "be", "or", "what"
Tokens after filter: "what"

Stopwords

A list of German and English stopwords is used.

lang/adito/stopwords_mixed.txt

| NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

| Comments begin with vertical bar. Each stop word is at the start of a line.

| German stop word list.

aber | but

alle | all allem allen aller alles

als | than, as also | so am | an + dem an | at

ander | other andere anderem anderen anderer anderes anderm andern anderr anders

der | the den des dem die das

daß | that

derselbe | the same derselben denselben desselben demselben dieselbe dieselben dasselbe

dazu | to that

dein | thy deine deinem deinen deiner deines

denn | because

derer | of those dessen | of him

dich | thee dir | to thee du | thou

dies | this diese diesem diesen dieser dieses

doch | (several meanings) dort | (over) there

durch | through

ein | a eine einem einen einer eines

einig | some einige einigem einigen einiger einiges

einmal | once

er | he ihn | him ihm | to him

es | it etwas | something

euer | your eure eurem euren eurer eures

ich | I mich | me mir | to me

ihr | you, to her ihre ihrem ihren ihrer ihres euch | to you

jede | each, every jedem jeden jeder jedes

jene | that jenem jenen jener jenes

jetzt | now kann | can

kein | no keine keinem keinen keiner keines

können | can könnte | could machen | do man | one

manche | some, many a manchem manchen mancher manches

mein | my meine meinem meinen meiner meines

sein | his seine seinem seinen seiner seines

selbst | self sich | herself

sie | they, she ihnen | to them

sind | are so | so

solche | such solchem solchen solcher solches

uns | us unse unsem unsen unser unses

welche | which welchem welchen welcher welches

| English stop word list

a an and are as at be but by for if in into is it no not of on or such that the their then there these they this to was will with

Index field types​

ADDRESS​

BOOLEAN​

COMMUNICATION​

DATE​

DOUBLE​

EMAIL​

HTML​

INTEGER​

LOCATION​

LONG​

LONG_TEXT​

PHONETIC_NAME​

PROPER NAME​

STRING​

TELEPHONE​

TEXT​

TEXT_NO_STOPWORDS​

TEXT_PLAIN​

Tokenizer​

WhitespaceTokenizer​

StandardTokenizer​

KeywordTokenizer​

Filter​

LowerCaseFilter​

ASCIIFoldingFilter​

GermanNormalizationFilter​

WordDelimiterGraphFilter​

PhoneticFilter​

SynonymGraphFilter​

ReversedWildcardFilter​

StopFilter​

Stopwords​

Index field types

ADDRESS

BOOLEAN

COMMUNICATION

DATE

DOUBLE

EMAIL

HTML

INTEGER

LOCATION

LONG

LONG_TEXT

PHONETIC_NAME

PROPER NAME

STRING

TELEPHONE

TEXT

TEXT_NO_STOPWORDS

TEXT_PLAIN

Tokenizer

WhitespaceTokenizer

StandardTokenizer

KeywordTokenizer

Filter

LowerCaseFilter

ASCIIFoldingFilter

GermanNormalizationFilter

WordDelimiterGraphFilter

PhoneticFilter

SynonymGraphFilter

ReversedWildcardFilter

StopFilter

Stopwords