-
-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #86 from merefield/topic_title_embedding
FEATURE: topic title embeddings and semantic title search
- Loading branch information
Showing
19 changed files
with
470 additions
and
97 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# frozen_string_literal: true | ||
|
||
# Job is triggered on an update to a Post. | ||
class ::Jobs::ChatbotTopicTitleEmbedding < Jobs::Base | ||
sidekiq_options retry: 5, dead: false, queue: 'low' | ||
|
||
def execute(opts) | ||
begin | ||
topic_id = opts[:id] | ||
|
||
::DiscourseChatbot.progress_debug_message("100. Creating/updating a Topic Title Embedding for Topic id: #{topic_id}") | ||
|
||
process_topic_title_embedding = ::DiscourseChatbot::TopicTitleEmbeddingProcess.new | ||
|
||
process_topic_title_embedding.upsert(topic_id) | ||
rescue => e | ||
Rails.logger.error("Chatbot: Topic Title Embedding: There was a problem, but will retry til limit: #{e}") | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# frozen_string_literal: true | ||
|
||
# Job is triggered on a Topic destruction. | ||
class ::Jobs::ChatbotTopicTitleEmbeddingDelete < Jobs::Base | ||
sidekiq_options retry: false | ||
|
||
def execute(opts) | ||
begin | ||
topic_id = opts[:id] | ||
|
||
::DiscourseChatbot.progress_debug_message("101. Deleting a Topic Title Embedding for Topic id: #{topic_id}") | ||
|
||
::DiscourseChatbot::TopicTitleEmbedding.find_by(topic_id: topic_id).destroy! | ||
rescue => e | ||
Rails.logger.error("Chatbot: Topic Title Embedding: There was a problem, but will retry til limit: #{e}") | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# frozen_string_literal: true | ||
|
||
module ::DiscourseChatbot | ||
class TopicEmbeddingsBookmark < ActiveRecord::Base | ||
self.table_name = 'chatbot_topic_embeddings_bookmark' | ||
|
||
validates :topic_id, presence: true | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# frozen_string_literal: true | ||
|
||
module ::DiscourseChatbot | ||
class TopicTitleEmbedding < ActiveRecord::Base | ||
self.table_name = 'chatbot_topic_title_embeddings' | ||
|
||
validates :topic_id, presence: true, uniqueness: true | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletions
12
db/migrate/20240412010101_create_chatbot_topic_title_embeddings_table.rb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# frozen_string_literal: true | ||
|
||
class CreateChatbotTopicTitleEmbeddingsTable < ActiveRecord::Migration[7.0] | ||
def change | ||
create_table :chatbot_topic_title_embeddings do |t| | ||
t.integer :topic_id, null: false, index: { unique: true }, foreign_key: true | ||
t.column :embedding, "vector(1536)", null: false | ||
t.column :model, :string, default: nil | ||
t.timestamps | ||
end | ||
end | ||
end |
10 changes: 10 additions & 0 deletions
10
db/migrate/20240412010103_create_chatbot_topic_embeddings_bookmark_table.rb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# frozen_string_literal: true | ||
|
||
class CreateChatbotTopicEmbeddingsBookmarkTable < ActiveRecord::Migration[7.0] | ||
def change | ||
create_table :chatbot_topic_embeddings_bookmark do |t| | ||
t.integer :topic_id | ||
t.timestamps | ||
end | ||
end | ||
end |
16 changes: 16 additions & 0 deletions
16
db/migrate/20240412010105_create_cosine_pg_vector_chatbot_topic_title_embeddings_index.rb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# frozen_string_literal: true | ||
|
||
class CreateCosinePgVectorChatbotTopicTitleEmbeddingsIndex < ActiveRecord::Migration[7.0] | ||
def up | ||
execute <<-SQL | ||
CREATE INDEX pgv_hnsw_index_on_chatbot_topic_title_embeddings ON chatbot_topic_title_embeddings USING hnsw (embedding vector_cosine_ops) | ||
WITH (m = 32, ef_construction = 64); | ||
SQL | ||
end | ||
|
||
def down | ||
execute <<-SQL | ||
DROP INDEX IF EXISTS pgv_hnsw_index_on_chatbot_topic_title_embeddings; | ||
SQL | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# frozen_string_literal: true | ||
require "openai" | ||
|
||
module ::DiscourseChatbot | ||
|
||
class EmbeddingProcess | ||
|
||
def setup_api | ||
::OpenAI.configure do |config| | ||
config.access_token = SiteSetting.chatbot_open_ai_token | ||
end | ||
if !SiteSetting.chatbot_open_ai_embeddings_model_custom_url.blank? | ||
::OpenAI.configure do |config| | ||
config.uri_base = SiteSetting.chatbot_open_ai_embeddings_model_custom_url | ||
end | ||
end | ||
if SiteSetting.chatbot_open_ai_model_custom_api_type == "azure" | ||
::OpenAI.configure do |config| | ||
config.api_type = :azure | ||
config.api_version = SiteSetting.chatbot_open_ai_model_custom_api_version | ||
end | ||
end | ||
@model_name = SiteSetting.chatbot_open_ai_embeddings_model | ||
@client = ::OpenAI::Client.new | ||
end | ||
|
||
def upsert(id) | ||
raise "Overwrite me!" | ||
end | ||
|
||
def get_embedding_from_api(id) | ||
raise "Overwrite me!" | ||
end | ||
|
||
|
||
def semantic_search(query) | ||
raise "Overwrite me!" | ||
end | ||
|
||
def in_scope(id) | ||
raise "Overwrite me!" | ||
end | ||
|
||
def is_valid(id) | ||
raise "Overwrite me!" | ||
end | ||
|
||
def in_categories_scope(id) | ||
raise "Overwrite me!" | ||
end | ||
|
||
def in_benchmark_user_scope(id) | ||
raise "Overwrite me!" | ||
end | ||
|
||
def benchmark_user | ||
cache_key = "chatbot_benchmark_user" | ||
benchmark_user = Discourse.cache.fetch(cache_key, expires_in: 1.hour) do | ||
allowed_group_ids = [0, 10, 11, 12, 13, 14] # automated groups only | ||
barred_group_ids = ::Group.where.not(id: allowed_group_ids).pluck(:id) # no custom groups | ||
unsuitable_users = ::GroupUser.where(group_id: barred_group_ids).pluck(:user_id).uniq # don't choose someone with in a custom group | ||
safe_users = ::User.where.not(id: unsuitable_users).distinct.pluck(:id) # exclude them and find a suitable vanilla, junior user | ||
user = ::User.where(id: safe_users).where(trust_level: SiteSetting.chatbot_embeddings_benchmark_user_trust_level, active: true, admin: false, suspended_at: nil)&.last | ||
if user.nil? | ||
raise StandardError, "Chatbot: No benchmark user exists for Post embedding suitability check, please add a basic user" | ||
end | ||
user | ||
end | ||
|
||
benchmark_user | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.