-
Notifications
You must be signed in to change notification settings - Fork 79
Open
Description
def _find_most_similar_columns(self, question: str, evidence: str, keywords: List[str], top_k: int) -> Dict[str, Dict[str, Dict[str, str]]]:
"""
Finds the most similar columns based on the question and evidence.
Args:
question (str): The question string.
evidence (str): The evidence string.
keywords (List[str]): The list of keywords.
top_k (int): The number of top similar columns to retrieve.
Returns:
Dict[str, Dict[str, Dict[str, str]]]: A dictionary containing the most similar columns with descriptions.
"""
logging.info("Finding the most similar columns")
tables_with_descriptions = {}
for keyword in keywords:
question_based_query = f"{question} {keyword}"
evidence_based_query = f"{evidence} {keyword}"
retrieved_question_based_query = DatabaseManager().query_vector_db(question_based_query, top_k=top_k)
retrieved_evidence_based_query = DatabaseManager().query_vector_db(evidence_based_query, top_k=top_k)
tables_with_descriptions = self._add_description(tables_with_descriptions, retrieved_question_based_query)
tables_with_descriptions = self._add_description(tables_with_descriptions, retrieved_evidence_based_query)
return tables_with_descriptions
Hello,
What is the reason for querying the vector DB for every keyword by attaching it with the question ?
Metadata
Metadata
Assignees
Labels
No labels