Final AIReport - Report for AI class - Tài liệu tham khảo | Đại học Hoa Sen

Final AIReport - Report for AI class - Tài liệu tham khảo | Đại học Hoa Sen được sưu tầm và soạn thảo dưới dạng file PDF để gửi tới các bạn sinh viên cùng tham khảo, ôn tập đầy đủ kiến thức, chuẩn bị cho các buổi học thật tốt. Mời bạn đọc đón xem

AI205DE01 ARTIFICIAL INTELLIGENCE
FINAL PROJECT REPORT
HSU CHATBOT
Lecturer: Lê Thanh Tùng
Member List:
1. Lê Văn Niềm – 22207193
2. Phan Văn Khải– 22206077
3. Nguyễn Trần Trung Kiên– 22205375
JULY 02,2024
MINISTRY OF EDUCATION AND TRAINING
HOA SEN UNIVERSITY
FACULTY OF INFORMATION TECHNOLOGY
--- * ---
MINISTRY OF EDUCATION AND TRAINING
HOA SEN UNIVERSITY
FACULTY OF INFORMATION TECHNOLOGY
--- * ---
AI205DE01 ARTIFICIAL INTELLIGENCE
FINAL PROJECT REPORT
HSU CHATBOT
Lecturer: Lê Thanh Tùng
Member List:
1. Lê Văn Niềm– 22207193
2. Phan Văn Khải – 22206077
3. Nguyễn Trần Trung Kiên – 22205375
JULY 2,2024
PLEDGE
“We have read and understand the academic integrity violations. We pledge on
our personal honor that this work was done by us and does not violate academic
integrity.”
Day … month … year …
(Student’s full name and signature)
ABSTRACT
This project introduces a revolutionary AI chatbot that is intended to improve
user engagement and streamline information access for prospective and current
Hoa Sen University (HSU) clients. The chatbot uses cutting-edge technology to
offer a tailored and thorough guide to everything HSU.
The chatbot's fundamental language model is Google's Gemini 1.5, which is
well-known for its outstanding natural language processing and creation skills.
To provide a complete knowledge base, the project integrates data taken from
approximately 300 websites on HSU's official website, which is processed and
stored in a vector database called ChromaDB. This database enables the efficient
retrieval of relevant information depending on user queries.
The chatbot's functionality is based on LangChain, a robust framework for
creating sophisticated conversational bots. LangChain's Retrieval-Augmented
Generation (RAG) approach is used, which enables the chatbot to retrieve
important information from the ChromaDB database and smoothly integrate it
into its responses. This ensures that the chatbot's responses are not just useful but
also personalized to each user's specific demands.
The HSU AI Chatbot strives to improve user happiness and foster better ties
between the university and its community by making information easily
available, engaging, and instructive. Its ability to provide tailored insights on
HSU's academic programs, facilities, student life, and other facets of university
life has the potential to dramatically improve the user experience, promoting a
better awareness of the university's offerings and ideals.
iii
ACKNOWLEDGEMENT
iv
LECTURER’S REVIEW
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
Ho Chi Minh City, Day … month … year 2023
REVIEWER
v
TABLE OF CONTENTS
ACKNOWLEDGEMENT................... .................... .................... .................... ......................... iv
LECTURER’S REVIEW................... .................... .................... .................... .................... ........ v
TABLE OF CONTENTS................... .................... .................... .................... ........................... vi
LIST OF TABLES, DIAGRAMS, IMAGES................... .................... ...................................vii
1. Introduction................... .................... .................... .................... .................... ..................... 8
2. System Description........................................... .................... .................... .......................... 8
2.1. Overview of the system................... .................... .................... .................... ................ 8
2.2. System Architecture/System Flow................... .................... .................... ..................9
2.3. Detailed Description of System Components................... .................... .....................9
3. Project Scope........................................... .................... .................... .................................10
4. Results................... .................... .................... .................... .................... ............................12
5. Summary................... .................... .................... .................... .................... ........................ 13
6. Reference................... .................... .................... .................... .................... .................... .... 14
vi
LIST OF TABLES, DIAGRAMS, IMAGES
Image 1: Chatbot’s flow................... .................... .................... .................... .................... ............ 9
Image 2: User’s input................... .................... .................... .................... ..................................12
Image 3: Chatbot’s output................... .................... .................... .................... ...........................12
vii
1. Introduction
This project introduces an artificial intelligence chatbot that will act as a thorough
guide for prospective students and current Hoa Sen University clients. The chatbot is
powered by Google's Gemini 1.5 language model and LangChain's Retrieval-
Augmented Generation (RAG) approach, and it uses a vector database (ChromaDB) to
extract information from over 300 webpages on HSU's main website. The chatbot
responds to user requests with individualized responses that provide thorough
information about HSU's academic programs, facilities, student life, and other topics.
This project seeks to improve the user experience by offering an easily available and
useful resource, streamlining communication, and increasing engagement with the
university.
2. System Description
2.1. Overview of the system
The Hoa Sen academic AI Chatbot is a cutting-edge conversational assistant that
aims to increase user engagement and provide quick access to academic information. It
makes use of Google's strong Gemini 1.5 language model, which ensures natural and
informative responses, as well as a massive knowledge base derived from over 300
webpages scraped from HSU's official website. This information is kept in a vector
database known as ChromaDB, which allows the chatbot to swiftly extract relevant
information in response to user inquiries.
The chatbot's functionality is based on LangChain, a framework for creating
conversational bots. It uses the Retrieval-Augmented Generation (RAG) approach,
which allows the chatbot to retrieve important information from ChromaDB and
smoothly integrate it into its responses.
This innovative approach empowers the chatbot to deliver personalized and
comprehensive responses to user inquiries about HSU's academic programs, facilities,
student life, and more. It offers a convenient and engaging way for prospective students
and current customers to explore the university and find the information they need. The
8
chatbot aims to improve user satisfaction and strengthen the connection between HSU
and its community by offering a readily accessible and informative resource.
2.2. System Architecture/System Flow
Image 1: Chatbot’s flow
2.3. Detailed Description of System Components
1. Data Gathering and Preparation:
Webpage Scraping: The project begins by extracting relevant information from Hoa Sen
University's official website. This involves systematically collecting data from various
webpages, potentially using BeautifulSoup.
Data Chunking: The scraped data is divided into manageable chunks, ensuring efficient
processing and storage. This involve splitting text data from webpage by 1000 words per chunks
for the vector database chew.
Embedding with Google AI: Google AI's embedding technology is used to transform
the text chunks into numerical representations. This allows the ChromaDB database to
efficiently store and retrieve information based on semantic similarity, meaning it can
understand the meaning of text rather than just matching keywords.
ChromaDB Storage: The embedded data chunks are stored within the ChromaDB vector
database. ChromaDB is optimized for storing and retrieving large amounts of text data, enabling
efficient search based on semantic similarity.
2. Agent phase:
User Query: The user interacts with the chatbot by typing in a question about Hoa Sen
University.
9
Query Processing: The chatbot receives the user's query and uses a language
understanding model (like Gemini 1.5) to interpret its meaning and intent.
ChromaDB Retrieval: The chatbot utilizes the ChromaDB database to retrieve relevant
chunks of information based on the user's query. This retrieval basically a tool included in the
custom agent is driven by semantic similarity, ensuring that the chatbot finds the most relevant
information even if the user's query uses different words than the original text.
Response Generation: The chatbot combines the retrieved data with Gemini model
capabilities to generate a comprehensive and informative response for the user.
Chat History: The chatbot stores the history of user interactions, allowing it to
potentially provide more personalized responses in future interactions.
3. Project Scope
Data Gathering:
The project begins with acquiring important information from Hoa Sen University (HSU).
Scraping websites from HSU's official website, with a focus on academics, student life,
facilities, and general university information. In addition, documents providing essential
information concerning HSU are collected. To facilitate efficient storage and processing,
scraped data and document content are separated into manageable parts. These pieces are
subsequently converted to numerical representations via Google AI's embedding technology.
Finally, the embedded data chunks are saved in a ChromaDB vector database, which is designed
to store and retrieve vast volumes of text data based on semantic similarity. This method builds
a comprehensive knowledge base from which the chatbot can present consumers with correct
and relevant information.
Chatbot Development:
The chatbot's functionality is based on LangChain, a framework for creating advanced
conversational bots. The project uses the ChromaDB database for information retrieval,
allowing the chatbot to access the relevant knowledge base during discussions. To interpret user
questions and provide natural responses, the chatbot incorporates Google's sophisticated Gemini
1.5 language model. A basic conversational flow is created to direct user activities, and a small
chat history function is incorporated to save the current conversation for future reference. This
combination of technology results in a chatbot that can interact with users, interpret their
questions, acquire relevant information, and deliver informative responses.
10
Excluded:
Data from social platforms: The project will not include data from social media platforms like
Facebook or Instagram, focusing on official website and provided documents.
Multi-lingual support: The chatbot will primarily operate in Vietnamese.
Advanced voice interaction: The chatbot will primarily be text-based.
Complex user authentication: User accounts and authentication will be kept simple for the
initial version.
Real-time data updates: The chatbot's knowledge base will be updated periodically, but it will
not have real-time access to dynamically changing information.
Different chat sessions: The chatbot did not implement separate chat sessions for different
users.
4. Results
Image 2: User’s input
Image 3: Chatbot’s output
11
| 1/15

Preview text:

MINISTRY OF EDUCATION AND TRAINING HOA SEN UNIVERSITY
FACULTY OF INFORMATION TECHNOLOGY --- * ---
AI205DE01 ARTIFICIAL INTELLIGENCE FINAL PROJECT REPORT HSU CHATBOT
Lecturer: Lê Thanh Tùng Member List:
1. Lê Văn Niềm – 22207193
2. Phan Văn Khải– 22206077
3. Nguyễn Trần Trung Kiên– 22205375 JULY 02,2024
MINISTRY OF EDUCATION AND TRAINING HOA SEN UNIVERSITY
FACULTY OF INFORMATION TECHNOLOGY --- * ---
AI205DE01 ARTIFICIAL INTELLIGENCE FINAL PROJECT REPORT HSU CHATBOT
Lecturer: Lê Thanh Tùng Member List: 1. Lê Văn Niềm– 22207193
2. Phan Văn Khải – 22206077
3. Nguyễn Trần Trung Kiên – 22205375 JULY 2,2024 PLEDGE
“We have read and understand the academic integrity violations. We pledge on
our personal honor that this work was done by us and does not violate academic integrity.” Day … month … year …
(Student’s full name and signature) ABSTRACT
This project introduces a revolutionary AI chatbot that is intended to improve
user engagement and streamline information access for prospective and current
Hoa Sen University (HSU) clients. The chatbot uses cutting-edge technology to
offer a tailored and thorough guide to everything HSU.
The chatbot's fundamental language model is Google's Gemini 1.5, which is
well-known for its outstanding natural language processing and creation skills.
To provide a complete knowledge base, the project integrates data taken from
approximately 300 websites on HSU's official website, which is processed and
stored in a vector database called ChromaDB. This database enables the efficient
retrieval of relevant information depending on user queries.
The chatbot's functionality is based on LangChain, a robust framework for
creating sophisticated conversational bots. LangChain's Retrieval-Augmented
Generation (RAG) approach is used, which enables the chatbot to retrieve
important information from the ChromaDB database and smoothly integrate it
into its responses. This ensures that the chatbot's responses are not just useful but
also personalized to each user's specific demands.
The HSU AI Chatbot strives to improve user happiness and foster better ties
between the university and its community by making information easily
available, engaging, and instructive. Its ability to provide tailored insights on
HSU's academic programs, facilities, student life, and other facets of university
life has the potential to dramatically improve the user experience, promoting a
better awareness of the university's offerings and ideals. iii ACKNOWLEDGEMENT iv LECTURER’S REVIEW
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
……………………………………………………………………………………..
Ho Chi Minh City, Day … month … year 2023 REVIEWER v TABLE OF CONTENTS
ACKNOWLEDGEMENT........................................................................................................ iv
LECTURER’S REVIEW........................................................................................................... v
TABLE OF CONTENTS.......................................................................................................... vi
LIST OF TABLES, DIAGRAMS, IMAGES..........................................................................vii 1.
Introduction........................................................................................................................ 8 2.
System Description............................................................................................................. 8 2.1.
Overview of the system............................................................................................... 8 2.2.
System Architecture/System Flow.............................................................................9 2.3.
Detailed Description of System Components............................................................9 3.
Project Scope....................................................................................................................10 4.
Results...............................................................................................................................12 5.
Summary...........................................................................................................................13 6.
Reference........................................................................................................................... 14 vi
LIST OF TABLES, DIAGRAMS, IMAGES
Image 1: Chatbot’s flow...............................................................................................................9
Image 2: User’s input.................................................................................................................12
Image 3: Chatbot’s output..........................................................................................................12 vii 1. Introduction
This project introduces an artificial intelligence chatbot that will act as a thorough
guide for prospective students and current Hoa Sen University clients. The chatbot is
powered by Google's Gemini 1.5 language model and LangChain's Retrieval-
Augmented Generation (RAG) approach, and it uses a vector database (ChromaDB) to
extract information from over 300 webpages on HSU's main website. The chatbot
responds to user requests with individualized responses that provide thorough
information about HSU's academic programs, facilities, student life, and other topics.
This project seeks to improve the user experience by offering an easily available and
useful resource, streamlining communication, and increasing engagement with the university. 2. System Description
2.1. Overview of the system
The Hoa Sen academic AI Chatbot is a cutting-edge conversational assistant that
aims to increase user engagement and provide quick access to academic information. It
makes use of Google's strong Gemini 1.5 language model, which ensures natural and
informative responses, as well as a massive knowledge base derived from over 300
webpages scraped from HSU's official website. This information is kept in a vector
database known as ChromaDB, which allows the chatbot to swiftly extract relevant
information in response to user inquiries.
The chatbot's functionality is based on LangChain, a framework for creating
conversational bots. It uses the Retrieval-Augmented Generation (RAG) approach,
which allows the chatbot to retrieve important information from ChromaDB and
smoothly integrate it into its responses.
This innovative approach empowers the chatbot to deliver personalized and
comprehensive responses to user inquiries about HSU's academic programs, facilities,
student life, and more. It offers a convenient and engaging way for prospective students
and current customers to explore the university and find the information they need. The 8
chatbot aims to improve user satisfaction and strengthen the connection between HSU
and its community by offering a readily accessible and informative resource.
2.2. System Architecture/System Flow
Image 1: Chatbot’s flow
2.3. Detailed Description of System Components
1. Data Gathering and Preparation:
Webpage Scraping: The project begins by extracting relevant information from Hoa Sen
University's official website. This involves systematically collecting data from various
webpages, potentially using BeautifulSoup.
Data Chunking: The scraped data is divided into manageable chunks, ensuring efficient
processing and storage. This involve splitting text data from webpage by 1000 words per chunks for the vector database chew.
Embedding with Google AI: Google AI's embedding technology is used to transform
the text chunks into numerical representations. This allows the ChromaDB database to
efficiently store and retrieve information based on semantic similarity, meaning it can
understand the meaning of text rather than just matching keywords.
ChromaDB Storage: The embedded data chunks are stored within the ChromaDB vector
database. ChromaDB is optimized for storing and retrieving large amounts of text data, enabling
efficient search based on semantic similarity. 2. Agent phase:
User Query: The user interacts with the chatbot by typing in a question about Hoa Sen University. 9
Query Processing: The chatbot receives the user's query and uses a language
understanding model (like Gemini 1.5) to interpret its meaning and intent.
ChromaDB Retrieval: The chatbot utilizes the ChromaDB database to retrieve relevant
chunks of information based on the user's query. This retrieval basically a tool included in the
custom agent is driven by semantic similarity, ensuring that the chatbot finds the most relevant
information even if the user's query uses different words than the original text.
Response Generation: The chatbot combines the retrieved data with Gemini model
capabilities to generate a comprehensive and informative response for the user.
Chat History: The chatbot stores the history of user interactions, allowing it to
potentially provide more personalized responses in future interactions. 3. Project Scope Data Gathering:
The project begins with acquiring important information from Hoa Sen University (HSU).
Scraping websites from HSU's official website, with a focus on academics, student life,
facilities, and general university information. In addition, documents providing essential
information concerning HSU are collected. To facilitate efficient storage and processing,
scraped data and document content are separated into manageable parts. These pieces are
subsequently converted to numerical representations via Google AI's embedding technology.
Finally, the embedded data chunks are saved in a ChromaDB vector database, which is designed
to store and retrieve vast volumes of text data based on semantic similarity. This method builds
a comprehensive knowledge base from which the chatbot can present consumers with correct and relevant information. Chatbot Development:
The chatbot's functionality is based on LangChain, a framework for creating advanced
conversational bots. The project uses the ChromaDB database for information retrieval,
allowing the chatbot to access the relevant knowledge base during discussions. To interpret user
questions and provide natural responses, the chatbot incorporates Google's sophisticated Gemini
1.5 language model. A basic conversational flow is created to direct user activities, and a small
chat history function is incorporated to save the current conversation for future reference. This
combination of technology results in a chatbot that can interact with users, interpret their
questions, acquire relevant information, and deliver informative responses. 10 Excluded:
Data from social platforms: The project will not include data from social media platforms like
Facebook or Instagram, focusing on official website and provided documents.
Multi-lingual support: The chatbot will primarily operate in Vietnamese.
Advanced voice interaction: The chatbot will primarily be text-based.
Complex user authentication: User accounts and authentication will be kept simple for the initial version.
Real-time data updates: The chatbot's knowledge base will be updated periodically, but it will
not have real-time access to dynamically changing information.
Different chat sessions: The chatbot did not implement separate chat sessions for different users. 4. Results Image 2: User’s input
Image 3: Chatbot’s output 11