Understanding AI Agents and Building an NLP to SQL Agent
- Sairam Penjarla
- Oct 19, 2024
- 7 min read
Updated: Oct 20, 2024
Watch My YouTube Video
For a visual walkthrough of this project, check out my YouTube video
What Are AI Agents?
At its core, an agent is a piece of code that performs an action for you. It can be something as simple as setting an alarm, creating a reminder about birthdays, or checking the weather. But an AI agent goes one step further, leveraging the power of Large Language Models (LLMs) to automatically perform complex tasks that were previously difficult or time-consuming.
For example, let’s say I want to create an architecture diagram or a flowchart. Traditionally, I would have to gather all the details, select the icons, draw the shapes, adjust colors, and make sure the flow is easy to understand. It’s a manual and often tedious process.
But with AI agents, we can streamline this entire workflow. Imagine feeding in your requirements, and the agent generating a polished diagram for you in seconds. AI agents make this magic happen by combining environmental context with LLM-generated outputs.
How Do AI Agents Work?
Let’s break down the process of how AI agents work, especially in the context of LLMs.
Input Request: First, the agent receives a user request. This could be anything from asking for a chart, a SQL query, or even a document summarization.
Understand the Environment: Next, the agent gathers information about its environment. For instance, if we’re building an architecture diagram, the agent needs to know which shapes are available, what colors to use, and the rules for how different components should connect.
Invoke the LLM: The magic happens when the Large Language Model is called upon. It processes the input and generates an output that can be understood by the system. For something like an architecture diagram, it might create a structured JSON output that defines the shapes, their connections, and the content inside each block.
Execute the Task: Once the LLM output is ready, the environment can use it to execute the actual task. For example, a JSON file generated by the LLM could be used by a diagram creation tool to automatically build the architecture diagram.
Here’s a real-world analogy: dbdiagram.io is a tool that generates ER diagrams from a schema. If we use an LLM to generate this schema based on a user’s description, the output can then be used to create the ER diagram automatically. The LLM would even consider details such as primary keys, foreign keys, column data types, and table relationships.
My NLP to SQL AI Agent Project
Now that we understand the fundamentals of AI agents, let’s jump into my project: the NLP to SQL AI Agent.
This Flask-based application uses NLP to allow users to interact with SQL databases using natural language. Essentially, you can type in a question like, “Show me the top 5 products sold last month,” and the AI agent will translate that into a SQL query, run it on the database, and return the results.
Let me break down how the project works.
How the NLP to SQL Agent Works
Extract Relevant Schema: The agent first understands the structure of the database by extracting the relevant schema. Based on user input, the agent figures out which parts of the database schema are required for the query.
Generate SQL Queries: Once the schema is extracted, the agent uses an LLM to transform the user’s natural language input into a valid SQL query. This query is then executed on the database.
Streaming Agent Responses: After retrieving the SQL results, the agent generates responses in real-time, offering explanations or additional details as needed.
Markdown to HTML: Finally, the agent converts any Markdown text from the output into HTML for better readability in the user interface.
Expanded Code Walkthrough
Let’s dive deeper into the key methods in this Flask-based NLP to SQL agent. The application uses various methods to interact with a natural language processing (NLP) system, convert user input into SQL queries, and manage the user interface.
1. @app.route("/") - Landing Page
@app.route("/")
def landing_page():
return render_template("index.html")
Purpose: This method is the entry point of the application. When users visit the root URL, it renders the index.html file.
What It Does: It simply serves the landing page where users can interact with the NLP to SQL agent by providing inputs and receiving results.
2. @app.route('/extract_relavant_schema', methods=['POST']) - Extract Relevant Schema
@app.route('/extract_relavant_schema', methods=['POST'])
def extract_relavant_schema():
data = request.get_json() # Capture input data as JSON
user_input = data.get('user_input') # Extract user input
# Generate a message to be passed to the LLM
msg = utils.get_user_msg(
content = COLUMN_GUIDELINES + COMPLETE_SCHEMA, # Database schema and column guidelines
conversations = agent_conversation, # Context-specific conversation
present_question = user_input, # User question for schema extraction
)
llm_output = utils.invoke_llm(conversations=[msg]) # Invoke LLM with message
RELAVANT_SCHEMA = utils.get_relavant_schema(llm_output) # Parse the LLM's output to get the schema
return jsonify({"relavant_schema" : RELAVANT_SCHEMA}) # Return the relevant schema
Purpose: This method takes user input and extracts the relevant database schema using an LLM.
What It Does:
It accepts a POST request with user input in JSON format.
Constructs a message to send to the Large Language Model (LLM), including column guidelines and schema information.
The LLM processes the input and returns relevant schema information, which is then extracted and returned as a JSON response.
Customization: You can modify the schema extraction logic in the utils.py file and adjust the guidelines in prompt_templates.py.
3. @app.route('/gather_sql_content', methods=['POST']) - Generate SQL Content
@app.route('/gather_sql_content', methods=['POST'])
def gather_sql_content():
data = request.get_json() # Capture input data
RELAVANT_SCHEMA = data.get('relavant_schema') # Extract relevant schema from request
user_input = data.get('user_input') # User query or input
# Generate a message using the relevant schema and user query
msg = utils.get_user_msg(
content = RELAVANT_SCHEMA,
conversations = queries_conversation, # Conversation context for SQL generation
present_question = user_input,
)
# Invoke the LLM to generate SQL queries
llm_output = utils.invoke_llm(conversations=queries_conversation + [msg])
query = utils.extract_query(llm_output) # Extract SQL query from LLM output
# Execute the query and retrieve results in a DataFrame
df = utils.execute_query(query)
SQL_CONTENT = utils.get_sql_content(df) # Format DataFrame results for display
return jsonify({"sql_content" : SQL_CONTENT}) # Return SQL content in JSON format
Purpose: This method generates a SQL query based on the user's natural language input and the relevant schema.
What It Does:
Accepts a POST request containing the relevant schema and user input.
The system constructs a query message based on the schema and user input.
It sends the message to the LLM, which generates a SQL query.
The query is executed on the database, and the results are formatted into SQL content, which is then returned to the user.
Customization: You can modify the SQL content formatting logic in the utils.py file or adjust the LLM prompt settings to influence how SQL queries are generated.
4. @app.route('/invoke_agent', methods=['POST']) - Invoke Agent for Streaming Output
@app.route('/invoke_agent', methods=['POST'])
def invoke_agent():
data = request.get_json() # Capture input data
SQL_CONTENT = data.get('sql_content') # Extract SQL content from the request
user_input = data.get('user_input') # User question or input
# Generate a message using the SQL content and user input
msg = utils.get_user_msg(
content = SQL_CONTENT,
conversations = agent_conversation, # Agent-specific conversation context
present_question = user_input,
)
# Invoke the LLM to generate a real-time response (streaming)
agent_output = utils.invoke_llm_stream(conversations=agent_conversation + [msg])
# Return the streaming response
return Response(agent_output, content_type='text/event-stream')
Purpose: This method invokes the agent to generate streaming responses based on the SQL content.
What It Does:
Accepts a POST request with SQL content and user input.
Constructs a message using the SQL content and the user’s query, which is then sent to the LLM.
The LLM generates a streaming response, which is returned to the user in real-time.
Streaming Responses: Streaming allows the user to see the response as it is generated, improving interactivity, especially for long or complex responses.
5. @app.route('/markdown_to_html', methods=['POST']) - Convert Markdown to HTML
@app.route('/markdown_to_html', methods=['POST'])
def markdown_to_html():
return jsonify({"agent_output" : utils.markdown_text}) # Convert markdown to HTML and return it
Purpose: This method converts any markdown text generated by the agent into HTML format for better display.
What It Does:
It’s a simple method that takes markdown text (which could be the result of an agent's output) and returns it in HTML format.
This is useful when the agent outputs content that needs to be displayed in a web-friendly format.
Customization: If you need to modify how markdown is converted, you can adjust the logic inside utils.py.
Customizing and Extending the Agent
The agent is built to be fully customizable:
Customizing the Database Connection:
In the utils.py file, you can modify the execute_query() function to connect to any database. Currently, the project uses a small CSV dataset from Kaggle, but this can be easily replaced by your own database connection.
def execute_query(query): # Modify this function to connect to your own database pass
Adjusting Prompt Guidelines:
In the prompt_templates.py file, you can modify COLUMN_GUIDELINES and COMPLETE_SCHEMA to tailor the guidelines to your database’s specific schema. This ensures that the LLM understands the structure of your database when generating SQL queries.
Changing Conversations:
The get_default_conversation() function in utils.py allows you to modify the conversation examples that are passed to the LLM. You can add or modify examples to improve the quality of responses.
Running the Application
Clone the Repository:
git clone <https://github.com/sairam-penjarla/nlp-to-sql-ai-agent.git> cd nlp-to-sql-ai-agent
Download the Dataset:
Download the Phone Search Dataset from Kaggle and place it in the appropriate directory.
Install Dependencies:
Install the required libraries using:
pip install -r requirements.txt
Run the Flask Application:
Start the app by running:
python app.py
The application will be accessible at http://0.0.0.0:8000.
Customizing the Agent
The project is fully customizable:
You can modify the execute_query function in utils.py to connect to your own database.
If you need different guidelines for your queries, update the COLUMN_GUIDELINES in prompt_templates.py.
Change the examples in the get_default_conversation function to suit your database needs.
Conclusion
AI agents are transforming how we interact with complex systems, making technology more accessible and intuitive. Whether it's generating SQL queries from natural language or automating tedious tasks like creating architecture diagrams, AI agents powered by LLMs are the future.
Feel free to explore the code, customize it, and let me know what you think! If you have any questions, drop a comment below or reach out via my YouTube channel.
That’s a wrap for this blog! Stay tuned for more exciting AI and data science projects.