Skip to content

Reason for attaching the keyword to the question during the context retrieval #25

@pradeepbansal1

Description

@pradeepbansal1
def _find_most_similar_columns(self, question: str, evidence: str, keywords: List[str], top_k: int) -> Dict[str, Dict[str, Dict[str, str]]]:
        """
        Finds the most similar columns based on the question and evidence.

        Args:
            question (str): The question string.
            evidence (str): The evidence string.
            keywords (List[str]): The list of keywords.
            top_k (int): The number of top similar columns to retrieve.

        Returns:
            Dict[str, Dict[str, Dict[str, str]]]: A dictionary containing the most similar columns with descriptions.
        """
        logging.info("Finding the most similar columns")
        tables_with_descriptions = {}
        
        for keyword in keywords:
            question_based_query = f"{question} {keyword}"
            evidence_based_query = f"{evidence} {keyword}"
            
            retrieved_question_based_query = DatabaseManager().query_vector_db(question_based_query, top_k=top_k)
            retrieved_evidence_based_query = DatabaseManager().query_vector_db(evidence_based_query, top_k=top_k)
            
            tables_with_descriptions = self._add_description(tables_with_descriptions, retrieved_question_based_query)
            tables_with_descriptions = self._add_description(tables_with_descriptions, retrieved_evidence_based_query)
        
        return tables_with_descriptions

Hello,
What is the reason for querying the vector DB for every keyword by attaching it with the question ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions