Skip to main content

Solr & ZooKeeper Issues

Corrupted Index Configuration

When the ADITO index configuration becomes corrupted or inaccessible, the system may fail to display search configurations or experience degraded search performance.

Symptoms

The following issues indicate a corrupted index configuration:

  • No configuration is displayed in the ADITO Designer for the IndexSearch alias
  • The incremental indexer shows pending status without completing
  • Search functionality returns incomplete or no results
  • Collection creation fails during system startup

Resolution Process

The recovery process involves removing the corrupted collection and allowing ADITO to recreate it with the correct configuration.

Step 1: Remove Corrupted Collection

To resolve configuration corruption, follow these steps in sequence:

  1. Shut down the ADITO server through the System Service Portal (SSP) or disable index search functionality in the system configuration
  2. Access the Solr Administration UI and navigate to the Collections section
  3. Select the affected collection from the collection list
  4. Click "Delete Collection" to remove the corrupted collection
  5. Restart the ADITO server to trigger automatic collection recreation
note

The first rebuild attempt after deleting a corrupted collection may fail. This occurs because the collection structure is created during the first rebuild and populated during the second rebuild. If the initial rebuild fails, wait for the system to automatically retry or manually trigger a second rebuild.

Step 2: Alternative Collection Rename

If the collection cannot be deleted or continues to fail, rename the collection to force creation of a new one:

  1. Open the ADITO Designer and navigate to the project tree
  2. Double-click on the system node to open system properties
  3. Locate the IndexSearch alias configuration
  4. Modify the collectionName property in the Property window to a new, unique name
  5. Deploy the changes and restart the system

This approach is effective when Solr can no longer access the existing configuration due to version incompatibilities or corrupted metadata.


Solr Service Crash Loop

Solr pods may enter a crash loop state where they continuously restart without successfully initializing. This typically occurs due to ZooKeeper data corruption or configuration issues.

Identifying the Issue

Monitor the Solr logs to identify the specific cause of the crash loop. The most common indicators include connection failures and data corruption errors.

ZooKeeper Connection Failures

The following log pattern indicates ZooKeeper is not running properly:

If no config is displayed here, the incremental index (rework) is pending

  • Shut down the server via SSP or disable index search, delete the collection in the Solr UI (Collections in main menu > Select collection > Delete Collection) and restart ADITO server (possibly wrong config version (see ADITO versions and dependent systems))
    • Note: After deleting the config, it often happens that the first rebuild fails. It usually works on the second try. This is probably because the collection is only built during the first rebuild and populated during the second rebuild.
  • If collection is not created, rename the collection in Designer (Project tree > Double click on system > IndexSearch > collectionName in Property window)
    • This may be because a config was created that Solr can no longer access.

Solr Pod is in Crash Loop

It can happen that the Solr pod ends up in a crash loop and no longer starts cleanly. The first step should be to check the Solr log.

ZooKeeperServer not running:

Node 127.0.0.1 is not currently serving requests
2024-08-20 12:55:56.249 WARN (NIOWorkerThread-2) [ ] o.a.z.s.NIOServerCnxn Close of session 0x0 => java.io.IOException: ZooKeeperServer not running
at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:544)
java.io.IOException: ZooKeeperServer not running
at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:544) ~[?:?]
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:332) ~[?:?]
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) ~[?:?]
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]

For this exception, check in the file browser under /solr/zoo_data/version-2. There should be multiple log files here. If one of them has 0B, delete it. Then restart the system.

Important: Directly after starting Solr, this EOF Exception occurs when starting ZooKeeper → this triggers the actual problem:

2025-03-18 09:08:05.521 ERROR (Thread-11) [ ] o.a.s.c.SolrZkServer ZooKeeper Server ERROR => java.io.EOFException
at java.base/java.io.DataInputStream.readInt(Unknown Source)
java.io.EOFException: null
at java.io.DataInputStream.readInt(Unknown Source) ~[?:?]
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96) ~[?:?]
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:67) ~[?:?]
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:725) ~[?:?]
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:743) ~[?:?]
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:711) ~[?:?]
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:792) ~[?:?]
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:352) ~[?:?]
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:258) ~[?:?]
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:303) ~[?:?]
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285) ~[?:?]
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:494) ~[?:?]
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:665) ~[?:?]
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:758) ~[?:?]
at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:130) ~[?:?]
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:159) ~[?:?]
at org.apache.solr.cloud.SolrZkServer$1.run(SolrZkServer.java:121) ~[?:?]

Connection Issues Between ADITO and Solr

ADITO systems may experience connection timeouts or complete connectivity loss to the Solr index service. These issues typically manifest as failed searches, query timeouts, or degraded search performance.

Symptoms

Connection problems between ADITO and Solr present with the following characteristics:

  • Search queries consistently timeout after the configured timeout period
  • ADITO web pods report connection failures to the Solr service
  • Some searches work intermittently while others fail completely
  • Query response times significantly exceed normal performance benchmarks

Diagnostic Steps

To identify the root cause of connection issues, examine these key areas systematically.

ZooKeeper Transaction Log Cleanup

Excessive accumulation of ZooKeeper transaction logs frequently causes performance degradation and connection instability.

Navigate to the directory var/solr/data/zoo_data/version-2 within the Solr container or pod. The system should maintain only the 3 most recent snapshot files and the 3 most recent transaction log files. If more than 10-15 files exist in this directory, log cleanup is required.

To resolve log accumulation:

  1. Stop the Solr and ZooKeeper services
  2. Access the container file system or mounted volume
  3. Remove older log and snapshot files, retaining only the 3 most recent files of each type
  4. Restart the Solr service
warning

Always preserve the most recent log and snapshot files by modification date. Removing current transaction logs will prevent ZooKeeper from starting properly.

Resource Allocation Analysis

Insufficient CPU and memory resources commonly cause query timeouts in large collections. Systems with millions of indexed documents require adequate resource allocation to maintain acceptable query response times.

Recommended resource allocation for production Solr instances:

ComponentMinimum RequirementOptimal Configuration
CPU Cores4 processors4+ processors
Java Heap6GB6-8GB
Total RAM8GB (Heap + 2GB)10-12GB

Configure these settings through JVM parameters:

  • CPU allocation: -XX:ActiveProcessorCount=4
  • Memory allocation: -Xms6000m -Xmx6000m
  • Maximum RAM: -XX:MaxRAM=8GB

Compare resource allocation with similar production systems that handle comparable document volumes to ensure appropriate sizing.

Search Timeout Configuration

The default search timeout of 10 seconds should accommodate most queries. However, complex searches on large collections may require longer processing times.

Adjust the search timeout in the ADITO IndexSearch alias configuration by modifying the searchTimeout property. Increase this value incrementally, as timeouts exceeding 30 seconds typically indicate underlying performance issues rather than insufficient timeout duration.

note

Extended search timeouts often mask performance problems. Address resource allocation and configuration optimization before increasing timeout values.


ConfigSet Configuration Issues

Incomplete or outdated Solr ConfigSet configurations can significantly impact search performance and cause intermittent connection failures.

Configuration Validation

The Solr collection configuration overlay must include complete ADITO-specific settings for optimal performance. Missing configuration elements, particularly tag search components, have historically caused performance degradation.

Current Configuration Assessment

Check the current configoverlay.json configuration through the Solr Admin UI or direct API access. The configuration should include comprehensive user properties and search components.

A minimal configuration typically contains only basic user properties:

{
"userProps": {
"ADITO.schema.configset": "example_prod_config_v7.1.2",
"ADITO.schema.version": 7,
"ADITO.schema.isCustom": false,
"ADITO.schema.isDynamic": false,
"ADITO.schema.baseConfigSet": "adito_config_v7.1.1",
"ADITO.schema.version.build": "7.1.2"
},
"props": {
"updateHandler": {
"autoCommit": {
"openSearcher": true,
"maxDocs": 10000
}
}
}
}

Complete Configuration Requirements

A fully configured overlay should include tag search components and additional user properties:

{
"userProps": {
"ADITO.schema.configset": "example_prod_config_v7.1.2",
"ADITO.schema.version": 7,
"ADITO.schema.isCustom": false,
"ADITO.schema.isDynamic": false,
"ADITO.config.version": "7.0.1",
"ADITO.schema.version.build": "7.1.2",
"ADITO.schema.baseConfigSet": "adito_config_v7.1.1"
},
"props": {
"updateHandler": {
"autoCommit": {
"openSearcher": true,
"maxDocs": 10000
}
}
},
"searchComponent": {
"tag_search": {
"name": "tag_search",
"class": "solr.SuggestComponent",
"suggester": [
{
"name": "tag_fuzzy",
"lookupImpl": "FuzzyLookupFactory",
"dictionaryImpl": "DocumentDictionaryFactory",
"maxEdits": "2",
"unicodeAware": "true",
"field": "_tags_suggest_",
"suggestAnalyzerFieldType": "tag",
"buildOnStartup": "true",
"buildOnCommit": "false"
},
{
"name": "tag_infix",
"lookupImpl": "BlendedInfixLookupFactory",
"dictionaryImpl": "DocumentDictionaryFactory",
"indexPath": "TagLookupSearchBlendedIndexDir",
"field": "_tags_suggest_",
"contextField": "_tags_",
"suggestAnalyzerFieldType": "tag",
"queryAnalyzerFieldType": "tag_suggest",
"buildOnStartup": "true",
"buildOnCommit": "false",
"highlight": "false",
"blenderType": "position_linear"
}
]
}
}
}

Manual Configuration Update

When automatic ConfigSet updates fail or incomplete configurations cause performance issues, manual configuration updates can resolve the problems.

Prerequisites

Ensure you have administrative access to the Solr service and the correct collection name before proceeding with manual updates.

Configuration Update Process

Execute the following PowerShell commands through an SSH tunnel to update the configuration. Replace example_collection with your actual collection name:

# Set additional user properties
Invoke-RestMethod -Uri "http://localhost:8983/solr/example_collection/config" `
-Method Post -ContentType "application/json" `
-Body '{"set-user-property": {"ADITO.config.version": "7.0.1"}}'

Invoke-RestMethod -Uri "http://localhost:8983/solr/example_collection/config" `
-Method Post -ContentType "application/json" `
-Body '{"set-user-property": {"ADITO.schema.version.build": "7.1.2"}}'

Update the tag search component configuration:

Invoke-RestMethod -Uri "http://localhost:8983/solr/example_collection/config" `
-Method Post -ContentType "application/json" `
-Body @'
{
"update-searchcomponent": {
"name": "tag_search",
"class": "solr.SuggestComponent",
"suggester": [
{
"name": "tag_fuzzy",
"lookupImpl": "FuzzyLookupFactory",
"dictionaryImpl": "DocumentDictionaryFactory",
"maxEdits": "2",
"unicodeAware": "true",
"field": "_tags_suggest_",
"suggestAnalyzerFieldType": "tag",
"buildOnStartup": "true",
"buildOnCommit": "false"
},
{
"name": "tag_infix",
"lookupImpl": "BlendedInfixLookupFactory",
"dictionaryImpl": "DocumentDictionaryFactory",
"indexPath": "TagLookupSearchBlendedIndexDir",
"field": "_tags_suggest_",
"contextField": "_tags_",
"suggestAnalyzerFieldType": "tag",
"queryAnalyzerFieldType": "tag_suggest",
"buildOnStartup": "true",
"buildOnCommit": "false",
"highlight": "false",
"blenderType": "position_linear"
}
]
}
}
'@

Configuration Activation

After updating the configuration, reload the collection through the Solr Admin UI:

  1. Navigate to the Collections section in the Solr Admin UI
  2. Select your collection from the available collections
  3. Click the "Reload" button to apply the configuration changes
  4. Verify that the collection loads without errors
tip

Configuration changes take effect immediately after reloading the collection. Monitor search performance to verify that the updates resolve the connectivity issues.