Implement Dataset recommendation system in databus for several analysis... #64

anshuman9468 · 2026-01-17T12:59:57Z

The goal of this project is to implement a dataset recommendation system for the DBpedia Databus that assists users in selecting appropriate datasets for various types of analysis. By analyzing dataset metadata such as domain, structure, update frequency, and provenance, the system will recommend relevant datasets, improving discoverability, usability, and efficiency in data analysis workflows.

Issue Link-: #39

Summary by CodeRabbit

New Features
- Added dataset recommendation feature enabling users to search and discover relevant datasets by keyword
- Enhanced chatbot to recognize multiple natural language patterns for dataset queries (e.g., "recommend datasets for X", "search databus for X")
- Recommended datasets display with labels, descriptions (truncated for readability), and direct links to dataset resources

_{✏️ Tip: You can customize this high-level summary in your review settings.}

dbpedia#39

coderabbitai · 2026-01-17T13:00:08Z

📝 Walkthrough

Walkthrough

A new dataset recommendation feature is introduced that queries the Databus SPARQL endpoint for datasets matching a given keyword. The functionality is integrated through template routing, request type configuration, and RiveScript patterns supporting various user query variations.

Changes

Cohort / File(s)	Summary
Service Layer `src/main/java/chatbot/lib/api/DatabusService.java`	New service class with `buildQuery()` for query prefix injection and `getRecommendedDatasets()` executing SPARQL queries against Databus endpoint with configurable timeout, result truncation to 200 chars, and ResponseData object mapping.
Request Configuration `src/main/java/chatbot/lib/request/TemplateType.java`	Added constant `DBPEDIA_DATABUS_RECOMMENDATION` with value `"dbpedia-databus-recommendation"` for template type routing.
Handler Integration `src/main/java/chatbot/lib/handlers/TemplateHandler.java`, `src/main/java/chatbot/lib/handlers/templates/dbpedia/DatasetTemplateHandler.java`	TemplateHandler adds routing case for `DBPEDIA_DATABUS_RECOMMENDATION`. DatasetTemplateHandler introduces handling logic invoking DatabusService to fetch and format recommended datasets as carousel or text response; minor formatting adjustments applied to existing template construction.
RiveScript Patterns `src/main/resources/rivescript/dbpedia-databus.rive`	New template patterns for dataset recommendations covering variations: "recommend datasets for ", "find dataset", "search databus for ", "recommend data", "dataset recommendation system [*]", and primary catchall pattern.

Sequence Diagram

sequenceDiagram
    actor User
    participant RiveScript
    participant TemplateHandler
    participant DatasetTemplateHandler
    participant DatabusService
    participant DatabusEndpoint as Databus SPARQL<br/>Endpoint

    User->>RiveScript: User query (e.g., "recommend datasets for health")
    RiveScript->>RiveScript: Pattern match to dbpedia-databus-recommendation
    RiveScript->>TemplateHandler: Request with TemplateType.DBPEDIA_DATABUS_RECOMMENDATION
    TemplateHandler->>DatasetTemplateHandler: Route to handler
    DatasetTemplateHandler->>DatabusService: Call getRecommendedDatasets(keyword)
    DatabusService->>DatabusService: buildQuery() with SPARQL prefixes
    DatabusService->>DatabusEndpoint: Execute SPARQL SELECT query with<br/>keyword filter and limit
    DatabusEndpoint->>DatabusService: Return ResultSet
    DatabusService->>DatabusService: Parse results, truncate descriptions,<br/>map to ResponseData objects
    DatabusService->>DatasetTemplateHandler: Return List<ResponseData>
    DatasetTemplateHandler->>DatasetTemplateHandler: Format as carousel or text response
    DatasetTemplateHandler->>RiveScript: Return ResponseData
    RiveScript->>User: Display recommendations

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Implement Dataset recommendation system in databus for several analysis...' is partially related to the changeset but incomplete and truncated.	Complete the title without truncation and ensure it clearly specifies what type of analysis or which specific aspect is being addressed.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@src/main/java/chatbot/lib/api/DatabusService.java`:
- Line 42: The line logger.info("Querying Databus: " + sparqlQuery) in
DatabusService logs raw user input; change it to avoid exposing user-provided
keywords by either logging at DEBUG level (logger.debug(...)) or logging a
sanitized/redacted version of the query (e.g., build a sanitizedQuery that masks
the user keyword) and use that in the log; update the call site where
sparqlQuery is constructed (in the same DatabusService method) to produce
sanitizedQuery or maskedKeyword and log that instead of the full sparqlQuery.
- Around line 38-40: The SPARQL query concatenates the user-provided variable
`keyword` directly, causing a SPARQL injection risk; update the code in
DatabusService where the query string is built (the fragment that uses
`keyword`) to avoid direct concatenation by using Jena's
ParameterizedSparqlString (or an equivalent parameterized API) and bind the
keyword as a literal (or escape/normalize it) before inserting into the FILTER
regex, or alternatively properly escape quotes/backslashes and regex
metacharacters in `keyword` before concatenation; ensure the code uses
ParameterizedSparqlString.setLiteral/setEscapedString (or a safe escaping
helper) instead of string concatenation so malicious input cannot alter the
query structure.

In `@src/main/java/chatbot/lib/handlers/TemplateHandler.java`:
- Around line 68-73: The switch in TemplateHandler that routes template types to
DatasetTemplateHandler is missing the TemplateType.DBPEDIA_DATASET_ONTOLOGY
case, causing ontology requests to fall back; update the switch in
TemplateHandler.java to include TemplateType.DBPEDIA_DATASET_ONTOLOGY alongside
DBPEDIA_DATASET, DBPEDIA_DATASET_NLP, and DBPEDIA_DATABUS_RECOMMENDATION so that
responseGenerator is set via new DatasetTemplateHandler(request, payload,
helper).handleTemplateMessage() for ontology requests.

🧹 Nitpick comments (2)

src/main/java/chatbot/lib/api/DatabusService.java (2)

34-37: Consider making description optional to avoid missing datasets.

The query requires both ?label and ?description to be present. Datasets without descriptions will be excluded from results. Consider using OPTIONAL for the description field.

♻️ Proposed fix

     String sparqlQuery = buildQuery(
             "SELECT DISTINCT ?dataset ?label ?description WHERE {\n" +
                     "  ?dataset a dataid:Group .\n" +
                     "  ?dataset rdfs:label ?label .\n" +
-                    "  ?dataset dct:description ?description .\n" +
+                    "  OPTIONAL { ?dataset dct:description ?description . }\n" +
                     "  FILTER (regex(?label, \"" + keyword + "\", \"i\") || regex(?description, \"" + keyword
                     + "\", \"i\"))\n" +
                     "} LIMIT 5");

21-25: Remove redundant new String() wrapper.

The new String(...) constructor is unnecessary when the argument is already a string literal. This adds no value and creates an extra object.

♻️ Proposed fix

-    private static final String PREFIXES = new String(
-            "PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>\n" +
-                    "PREFIX dct: <http://purl.org/dc/terms/>\n" +
-                    "PREFIX dcat: <http://www.w3.org/ns/dcat#>\n" +
-                    "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n");
+    private static final String PREFIXES =
+            "PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>\n" +
+            "PREFIX dct: <http://purl.org/dc/terms/>\n" +
+            "PREFIX dcat: <http://www.w3.org/ns/dcat#>\n" +
+            "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n";

coderabbitai · 2026-01-17T13:02:42Z

src/main/java/chatbot/lib/api/DatabusService.java

+                        "  FILTER (regex(?label, \"" + keyword + "\", \"i\") || regex(?description, \"" + keyword
+                        + "\", \"i\"))\n" +
+                        "} LIMIT 5");


⚠️ Potential issue | 🔴 Critical

Critical: SPARQL injection vulnerability.

The keyword parameter is directly concatenated into the SPARQL query without sanitization. A malicious user could inject SPARQL syntax to manipulate the query behavior.

For example, input like test", "i")) . ?x <http://evil> ?y . FILTER(regex(?label, " could alter the query structure.

🔒 Proposed fix to sanitize input

public List<ResponseData> getRecommendedDatasets(String keyword) { List<ResponseData> responseDataList = new ArrayList<>(); + // Sanitize keyword: escape backslashes and quotes for SPARQL string literal + String sanitizedKeyword = keyword + .replace("\\", "\\\\") + .replace("\"", "\\\""); String sparqlQuery = buildQuery( "SELECT DISTINCT ?dataset ?label ?description WHERE {\n" + " ?dataset a dataid:Group .\n" + " ?dataset rdfs:label ?label .\n" + " ?dataset dct:description ?description .\n" + - " FILTER (regex(?label, \"" + keyword + "\", \"i\") || regex(?description, \"" + keyword + " FILTER (regex(?label, \"" + sanitizedKeyword + "\", \"i\") || regex(?description, \"" + sanitizedKeyword + "\", \"i\"))\n" + "} LIMIT 5");

🤖 Prompt for AI Agents

In `@src/main/java/chatbot/lib/api/DatabusService.java` around lines 38 - 40, The SPARQL query concatenates the user-provided variable `keyword` directly, causing a SPARQL injection risk; update the code in DatabusService where the query string is built (the fragment that uses `keyword`) to avoid direct concatenation by using Jena's ParameterizedSparqlString (or an equivalent parameterized API) and bind the keyword as a literal (or escape/normalize it) before inserting into the FILTER regex, or alternatively properly escape quotes/backslashes and regex metacharacters in `keyword` before concatenation; ensure the code uses ParameterizedSparqlString.setLiteral/setEscapedString (or a safe escaping helper) instead of string concatenation so malicious input cannot alter the query structure.

coderabbitai · 2026-01-17T13:02:43Z

src/main/java/chatbot/lib/api/DatabusService.java

+                        + "\", \"i\"))\n" +
+                        "} LIMIT 5");
+
+        logger.info("Querying Databus: " + sparqlQuery);


⚠️ Potential issue | 🟡 Minor

Avoid logging user input directly.

The SPARQL query containing user input is logged at INFO level. If the keyword contains sensitive data or the injection issue is exploited, this could expose sensitive information in logs.

Consider logging at DEBUG level or redacting the user-provided keyword:

- logger.info("Querying Databus: " + sparqlQuery); + logger.debug("Querying Databus with keyword: {}", keyword);

🤖 Prompt for AI Agents

In `@src/main/java/chatbot/lib/api/DatabusService.java` at line 42, The line logger.info("Querying Databus: " + sparqlQuery) in DatabusService logs raw user input; change it to avoid exposing user-provided keywords by either logging at DEBUG level (logger.debug(...)) or logging a sanitized/redacted version of the query (e.g., build a sanitizedQuery that masks the user keyword) and use that in the log; update the call site where sparqlQuery is constructed (in the same DatabusService method) to produce sanitizedQuery or maskedKeyword and log that instead of the full sparqlQuery.

coderabbitai · 2026-01-17T13:02:43Z

src/main/java/chatbot/lib/handlers/TemplateHandler.java

            // DBpedia Datset Scenarios
            case TemplateType.DBPEDIA_DATASET:
            case TemplateType.DBPEDIA_DATASET_NLP:
+            case TemplateType.DBPEDIA_DATABUS_RECOMMENDATION:
                responseGenerator = new DatasetTemplateHandler(request, payload, helper).handleTemplateMessage();
                break;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "DBPEDIA_DATASET_ONTOLOGY" --type java

Repository: dbpedia/chatbot

Length of output: 319

🏁 Script executed:

# Get line count and view the relevant sections of TemplateHandler.java wc -l src/main/java/chatbot/lib/handlers/TemplateHandler.java

Repository: dbpedia/chatbot

Length of output: 118

🏁 Script executed:

# Search for all case statements in TemplateHandler.java to see complete routing rg -n "case TemplateType\." src/main/java/chatbot/lib/handlers/TemplateHandler.java

Repository: dbpedia/chatbot

Length of output: 1901

🏁 Script executed:

# Check DatasetTemplateHandler.java around line 51 to see DBPEDIA_DATASET_ONTOLOGY handling sed -n '45,65p' src/main/java/chatbot/lib/handlers/templates/dbpedia/DatasetTemplateHandler.java

Repository: dbpedia/chatbot

Length of output: 1513

🏁 Script executed:

# View the full TemplateHandler.java to see the default case sed -n '1,150p' src/main/java/chatbot/lib/handlers/TemplateHandler.java | tail -60

Repository: dbpedia/chatbot

Length of output: 2640

🏁 Script executed:

# Also check if there's a default case after all the cases rg -n "default:" src/main/java/chatbot/lib/handlers/TemplateHandler.java -A 5

Repository: dbpedia/chatbot

Length of output: 245

Add DBPEDIA_DATASET_ONTOLOGY to the routing switch statement.

The DatasetTemplateHandler contains a case for DBPEDIA_DATASET_ONTOLOGY (lines 51-60), but this template type is not included in the routing switch in TemplateHandler.java. Currently only DBPEDIA_DATASET, DBPEDIA_DATASET_NLP, and DBPEDIA_DATABUS_RECOMMENDATION are routed there. Without this case, ontology-related requests will fall through to the default handler and receive a generic fallback response instead of the proper ontology information.

Fix: Add the missing case

case TemplateType.DBPEDIA_DATASET: case TemplateType.DBPEDIA_DATASET_NLP: case TemplateType.DBPEDIA_DATASET_ONTOLOGY: case TemplateType.DBPEDIA_DATABUS_RECOMMENDATION: responseGenerator = new DatasetTemplateHandler(request, payload, helper).handleTemplateMessage(); break;

🤖 Prompt for AI Agents

In `@src/main/java/chatbot/lib/handlers/TemplateHandler.java` around lines 68 - 73, The switch in TemplateHandler that routes template types to DatasetTemplateHandler is missing the TemplateType.DBPEDIA_DATASET_ONTOLOGY case, causing ontology requests to fall back; update the switch in TemplateHandler.java to include TemplateType.DBPEDIA_DATASET_ONTOLOGY alongside DBPEDIA_DATASET, DBPEDIA_DATASET_NLP, and DBPEDIA_DATABUS_RECOMMENDATION so that responseGenerator is set via new DatasetTemplateHandler(request, payload, helper).handleTemplateMessage() for ontology requests.

Implement Dataset recommendation system in databus for several analysis

191cf7d

dbpedia#39

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Dataset recommendation system in databus for several analysis... #64

Implement Dataset recommendation system in databus for several analysis... #64

Uh oh!

anshuman9468 commented Jan 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 17, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 17, 2026

Uh oh!

coderabbitai bot Jan 17, 2026

Uh oh!

coderabbitai bot Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement Dataset recommendation system in databus for several analysis... #64

Are you sure you want to change the base?

Implement Dataset recommendation system in databus for several analysis... #64

Uh oh!

Conversation

anshuman9468 commented Jan 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anshuman9468 commented Jan 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 17, 2026 •

edited

Loading