Warning: Undefined array key 0 in /var/www/tgoop/function.php on line 65

Warning: Trying to access array offset on value of type null in /var/www/tgoop/function.php on line 65
- Telegram Web
Telegram Web
Compositional Learning Journal Club at RIML Lab 🔥

We are pleased to announce the establishment of a new research group within RIML Lab, dedicated to the study and advancement of Compositional Learning.

Compositional learning is inspired by the inherent human ability to comprehend and generate complex ideas from simpler concepts. By enabling the recombination of learned components, compositional learning enhances a machine's ability to generalize to out-of-distribution samples encountered in real-world scenarios. This characteristic has spurred vibrant research in areas such as object-centric learning, compositional generalization, and compositional reasoning, with wide-ranging applications across various tasks, including controllable text generation, factual knowledge reasoning, image captioning, text-to-image generation, visual reasoning, speech processing, and reinforcement learning.

To promote collaboration and the exchange of knowledge, we are launching a weekly Journal Club. These sessions will be held every Sunday from 3:30 PM to 5:00 PM, where we will engage in discussions on the latest research papers and significant advancements in Compositional Learning.

For updates and additional information, please visit our blog: complearnjc.github.io.

For in-person communication, you may contact us via Telegram at @amirkasaei and @arashmarioriyad.

We look forward to your participation.
«دستیاری درس تحلیل هوشمند تصاویر پزشکی»

⭕️ دانشجویانی که تمایل دارند در نیم‌سال آینده (نیم‌سال اول ۱۴۰۳-۰۴) دستیار آموزشی درس تحلیل هوشمند تصاویر پزشکی دکتر رهبان باشند، می‌توانند فرم زیر را پر کنند.

https://docs.google.com/forms/d/e/1FAIpQLSekQsk7e-UavxTfliCGSPpK7-dABoMpsslGgyGPG7F71hyKkw/viewform?usp=sf_link
💠 Compositional Learning Journal Club

This Week's Presentation:

🔹 Title: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

🔸 Presenter: Arash Marioriyad

🌀 Abstract:
Diffusion models have achieved significant success in text-to-image generation. However, alleviating the misalignment between text prompts and generated images remains a challenging issue.
This presentation will focus on two observed causes of misalignment: concept ignorance and concept mis-mapping. To address these issues, we will discuss CoMat, an end-to-end diffusion model fine-tuning strategy that uses an image-to-text concept matching mechanism.
Using only 20K text prompts to fine-tune SDXL, CoMat significantly outperforms the baseline SDXL model on two text-to-image alignment benchmarks, achieving state-of-the-art performance.

📄 Paper:
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Session Details:
- 📅 Date: Sunday, 8 September 2024
- 🕒 Time: 3:30 - 5:00 PM (GMT+3:30)
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
🧠 RL Journal Club: This Week's Session

🤝 We invite you to join us for this week's RL Journal Club session, where we will explore the intriguing synergies between Reinforcement Learning (RL) and Large Language Models (LLMs). This session will delve into how these two powerful fields intersect, offering new perspectives and opportunities for advancement in AI research.

This Week's Presentation:

🔹 Title: Synergies Between RL and LLMs
🔸 Presenter: Moein Salimi
🌀 Abstract: In this presentation, we will review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs), two domains that have been significantly propelled by deep neural networks. The discussion will center around a novel taxonomy proposed in the paper, categorizing the interaction between RL and LLMs into three main classes: RL4LLM, where RL enhances LLM performance in NLP tasks; LLM4RL, where LLMs assist in training RL models for non-NLP tasks; and RL+LLM, where both models work together within a shared planning framework. The presentation will explore the motivations behind these synergies, their successes, potential challenges, and avenues for future research.

The presentation will be based on the following paper:

▪️ The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models (https://arxiv.org/abs/2402.01874)

Session Details:

📅 Date: Tuesday
🕒 Time: 3:30 - 5:00 PM
🌐 Location: Online at https://vc.sharif.edu/ch/rohban
📍 For in-person attendance, please message me on Telegram at @alirezanobakht78

☝️ Note: The discussion is open to everyone, but we can only host students of Sharif University of Technology in person.

💯 This session promises to be an enlightening exploration of how RL and LLMs can work together to push the boundaries of AI research. Don’t miss this opportunity to deepen your understanding and engage in thought-provoking discussions!

✌️ We look forward to your participation!

#RLJClub #JClub #RIML #SUT #AI #RL #LLM
💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

This Week's Presentation:

🔹 Title: Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback

🔸 Presenter: Amir Kasaei

🌀 Abstract:
Recent advancements in text-conditioned image generation, particularly through latent diffusion models, have achieved significant progress. However, as text complexity increases, these models often struggle to accurately capture the semantics of prompts, and existing tools like CLIP frequently fail to detect these misalignments.

This presentation introduces a Decompositional-Alignment-Score, which breaks down complex prompts into individual assertions and evaluates their alignment with generated images using a visual question answering (VQA) model. These scores are then combined to produce a final alignment score. Experimental results show this method aligns better with human judgments compared to traditional CLIP and BLIP scores. Moreover, it enables an iterative process that improves text-to-image alignment by 8.7% over previous methods.

This approach not only enhances evaluation but also provides actionable feedback for generating more accurate images from complex textual inputs.

📄 Paper: Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback


Session Details:
- 📅 Date: Sunday
- 🕒 Time: 2:00 - 3:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban


We look forward to your participation! ✌️
🧠 RL Journal Club: This Week's Session

🤝 We invite you to join us for this week's RL Journal Club session, where we will dive into a minimalist approach to offline reinforcement learning. In this session, we will explore how simplifying algorithms can lead to more robust and efficient models in RL, challenging the necessity of complex modifications commonly seen in recent advancements.

This Week's Presentation:

🔹 Title: Revisiting the Minimalist Approach to Offline Reinforcement Learning
🔸 Presenter: Professor Mohammad Hossein Rohban
🌀 Abstract: This presentation will delve into the trade-offs between simplicity and performance in offline RL algorithms. We will review the minimalist approach proposed in the paper, which re-evaluates core algorithmic features and shows that simpler models can achieve performance on par with more intricate methods. The discussion will include experimental results that demonstrate how stripping away complexity can lead to more effective learning, providing fresh insights into the design of RL systems.

The presentation will be based on the following paper:

▪️ Revisiting the Minimalist Approach to Offline Reinforcement Learning (https://arxiv.org/abs/2305.09836)

Session Details:

📅 Date: Tuesday
🕒 Time: 4:00 - 5:00 PM
🌐 Location: Online at https://vc.sharif.edu/ch/rohban
📍 For in-person attendance, please message me on Telegram at @alirezanobakht78

☝️ Note: The discussion is open to everyone, but we can only host students of Sharif University of Technology in person.

💯 Join us for an insightful session where we rethink how much complexity is truly necessary for effective offline reinforcement learning! Don't miss this chance to deepen your understanding of RL methodologies.

✌️ We look forward to your participation!
#RLJClub #JClub #RIML #SUT #AI #RL
Forwarded from Rayan AI Course
🧠 آغاز ثبت‌نام رایگان مسابقات بین‌المللی هوش مصنوعی رایان (Rayan) | دانشگاه صنعتی شریف

🪙با بیش از ۳۵ هزار دلار جایزه نقدی
🎓چاپ دستاوردهای ۱۰ تیم برتر در کنفرانس‌‌ها/مجلات برتر بین‌المللی هوش مصنوعی
🗓شروع مسابقه از ۲۶ مهرماه ۱۴۰۳

💬موضوعات مورد بررسی Trustworthiness In Deep Learning:
💬 Model Poisoning
💬 Compositional Generalization
💬 Zero-Shot Anomaly Detection

👀 مسابقات بین‌المللی هوش مصنوعی رایان با حمایت معاونت علمی ریاست‌جمهوری و موضوع Trustworthy AI، توسط دانشگاه صنعتی شریف برگزار می‌گردد. برگزاری این مسابقه در ۳ مرحله (۲ مرحله مجازی و ۱ مرحله حضوری) از تاریخ ۲۶ مهر آغاز می‌شود.

⭐️ رایان جهت حمایت از تیم‌های برتر راه‌یافته به مرحله سوم، ضمن تامین مالی بابت هزینه سفر و اسکان، دستاوردهای علمی تیم‌های برتر را در یکی از کنفرانس‌ها یا مجلات مطرح این حوزه با ذکر نام اعضای تیم در مقاله‌ی مربوطه، چاپ و منتشر خواهد کرد. این شرکت‌کنندگان برای دستیابی به جایزه ۳۵ هزار دلاری برای تیم‌های برتر، در فاز سوم به رقابت می‌پردازند.

👥 تیم‌های شرکت‌کننده، ۲ الی ۴ نفره هستند.

💬 ثبت‌نام کاملاً رایگان تا پایان ۲۵ مهرماه از طریق آدرس زیر:
ai.rayan.global

🌐Linkedin
🌐@Rayan_AI_Contest
Please open Telegram to view this post
VIEW IN TELEGRAM
💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

This Week's Presentation:

🔹 Title: A semiotic methodology for assessing the compositional effectiveness of generative text-to-image models

🔸 Presenter: Amir Kasaei

🌀 Abstract:
A new methodology for evaluating text-to-image generation models is being proposed, addressing limitations in current evaluation techniques. Existing methods, which use metrics such as fidelity and CLIPScore, often combine criteria like position, action, and photorealism in their assessments. This new approach adapts model analysis from visual semiotics, establishing distinct visual composition criteria. It highlights three key dimensions: plastic categories, multimodal translation, and enunciation, each with specific sub-criteria. The methodology is tested on Midjourney and DALL·E, providing a structured framework that can be used for future quantitative analyses of generated images.

📄 Paper: A semiotic methodology for assessing the compositional effectiveness of generative text-to-image models

Session Details:
- 📅 Date: Sunday
- 🕒 Time: 5:00 - 6:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban


We look forward to your participation! ✌️
🚨Open Position: Visual Compositional Generation Research 🚨

We are excited to announce an open research position for a project under Dr. Rohban at the RIML Lab (Sharif University of Technology). The project focuses on improving text-to-image generation in diffusion-based models by addressing compositional challenges.

🔍 Project Description:

Large-scale diffusion-based models excel at text-to-image (T2I) synthesis, but still face issues like object missing and improper attribute binding. This project aims to study and resolve these compositional failures to improve the quality of T2I models.

Key Papers:
- T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional T2I Generation
- Attend-and-Excite: Attention-Based Semantic Guidance for T2I Diffusion Models
- If at First You Don’t Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection
- ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

🎯 Requirements:

- Must: PyTorch, Deep Learning,
- Recommended: Transformers and Diffusion Models.
- Able to dedicate significant time to the project.


🗓 Important Dates:

- Application Deadline: 2024/10/12 (23:59 UTC+3:30)

📌 Apply here:
Application Form

For questions:
📧 a.kasaei@me.com
💬 @amirkasaei

@RIMLLab
#research_application
#open_position
💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

This Week's Presentation:

🔹 Title: InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

🔸 Presenter: Amir Kasaei

🌀 Abstract:
Recent advancements in diffusion models, like Stable Diffusion, have shown impressive image generation capabilities, but ensuring precise alignment with text prompts remains a challenge. This presentation introduces Initial Noise Optimization (InitNO), a method that refines initial noise to improve semantic accuracy in generated images. By evaluating and guiding the noise using cross-attention and self-attention scores, the approach effectively enhances image-prompt alignment, as demonstrated through rigorous experimentation.


📄 Paper: InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

Session Details:
- 📅 Date: Sunday
- 🕒 Time: 5:00 - 6:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban


We look forward to your participation! ✌️
💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

This Week's Presentation:

🔹 Title: Backdooring Bias into Text-to-Image Models

🔸 Presenter: Mehrdad Aksari Mahabadi

🌀 Abstract:
This paper investigates the misuse of text-conditional diffusion models, particularly text-to-image models, which create visually appealing images based on user descriptions. While these images generally represent harmless concepts, they can be manipulated for harmful purposes like propaganda. The authors show that adversaries can introduce biases through backdoor attacks, affecting even well-meaning users. Despite users verifying image-text alignment, the attack remains hidden by preserving the text's semantic content while altering other image features to embed biases, amplifying them by 4-8 times. The study reveals that current generative models make such attacks cost-effective and feasible, with costs ranging from 12 to 18 units. Various triggers, objectives, and biases are evaluated, with discussions on mitigations and future research directions.

📄 Paper: Backdooring Bias into Text-to-Image Models

Session Details:
- 📅 Date: Sunday
- 🕒 Time: 5:00 - 6:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban


We look forward to your participation! ✌️
RIML Lab
💠 Compositional Learning Journal Club Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models…
جلسه‌ی امروز متاسفانه برگزار نخواهد شد
سایر جلسات از طریق همین کانال اطلاع رسانی خواهد شد
Research Position at the Sharif Information Systems and Data Science Center
 
Project Description: Anomaly detection in time series on various datasets, including those related to autonomous vehicle batteries, predictive maintenance, and determining remaining useful life (RUL) upon anomaly detection in products, particularly electric vehicle batteries. The paper deadline for this project is by the end of February. The project also involves the use of federated learning algorithms to support multiple local devices in anomaly detection, RUL estimation, and predictive maintenance on each local device.
 
Technical Requirements: Two electrical or computer engineering students with strong skills in deep learning, robustness concepts, time series anomaly detection, federated learning algorithms, and a creative mindset, strong and clean implementation skills.
 
Benefits: Access to a new, well-equipped lab and Research under the supervision of three professors in Electrical and Computer Engineering.

Dr. Babak  Khalaj
Dr. Siavash Ahmadi
Dr. Mohammad Hossein Rohban

Please send your CV, with the subject line "Research Position in Time Series Anomaly Detection,"
to the email address: data-icst@sharif.edu.
Forwarded from RIML Lab (Amir Kasaei)
💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

This Week's Presentation:

🔹 Title: Backdooring Bias into Text-to-Image Models

🔸 Presenter: Mehrdad Aksari Mahabadi

🌀 Abstract:
This paper investigates the misuse of text-conditional diffusion models, particularly text-to-image models, which create visually appealing images based on user descriptions. While these images generally represent harmless concepts, they can be manipulated for harmful purposes like propaganda. The authors show that adversaries can introduce biases through backdoor attacks, affecting even well-meaning users. Despite users verifying image-text alignment, the attack remains hidden by preserving the text's semantic content while altering other image features to embed biases, amplifying them by 4-8 times. The study reveals that current generative models make such attacks cost-effective and feasible, with costs ranging from 12 to 18 units. Various triggers, objectives, and biases are evaluated, with discussions on mitigations and future research directions.

📄 Paper: Backdooring Bias into Text-to-Image Models

Session Details:
- 📅 Date: Sunday
- 🕒 Time: 5:00 - 6:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban


We look forward to your participation! ✌️
💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

This Week's Presentation:

🔹 Title: Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control

🔸 Presenter: Arshia Hemmat

🌀 Abstract:
This presentation introduces advancements in addressing compositional challenges in text-to-image (T2I) generation models. Current diffusion models often struggle to associate attributes accurately with the intended objects based on text prompts. To address this, a new Edge Prediction Vision Transformer (EPViT) is introduced for improved image-text alignment evaluation. Additionally, the proposed Focused Cross-Attention (FCA) mechanism uses syntactic constraints from input sentences to enhance visual attention maps. DisCLIP embeddings further disentangle multimodal embeddings, improving attribute-object alignment. These innovations integrate seamlessly into state-of-the-art diffusion models, enhancing T2I generation quality without additional model training.

📄 Paper: Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control


Session Details:
- 📅 Date: Sunday
- 🕒 Time: 5:00 - 6:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban


We look forward to your participation! ✌️
🚨 Open Research Position: Visual Anomaly Detection

We announce that there is an open research position in the RIML lab at Sharif University of Technology, supervised by Dr. Rohban.

🔍 Project Description:
Industrial inspection and quality control are among the most prominent applications of visual anomaly detection. In this context, the model is given a training set of solely normal samples to learn their distribution. During inference, any sample that deviates from this established normal distribution, should be recognized as an anomaly.
This project aims to improve the capabilities of existing models, allowing them to detect intricate anomalies that extend beyond conventional defects.

Introductory Paper:
Deep Industrial Image Anomaly Detection: A Survey

Requirements:
- Good understanding of deep learning concepts
- Fluency in Python, PyTorch
- Willingness to dedicate significant time

Submit your application here:
Application Form

Application Deadline:
2024/11/22 (23:59 UTC+3:30)

If you have any questions, contact:
@sehbeygi79
💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

This Week's Presentation:

🔹 Title: Counting Understanding in Visoin Lanugate Models

🔸 Presenter: Arash Marioriyad

🌀 Abstract:
Counting-related challenges represent some of the most significant compositional understanding failure modes in vision-language models (VLMs) such as CLIP. While humans, even in early stages of development, readily generalize over numerical concepts, these models often struggle to accurately interpret numbers beyond three, with the difficulty intensifying as the numerical value increases. In this presentation, we explore the counting-related limitations of VLMs and examine the proposed solutions within the field to address these issues.

📄 Papers:
- Teaching CLIP to Count to Ten (ICCV, 2023)
- CLIP-Count: Towards Text-Guided Zero-Shot Object Counting (ACM-MM, 2023)


Session Details:
- 📅 Date: Sunday
- 🕒 Time: 5:00 - 6:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban


We look forward to your participation! ✌️
💠 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

This Week's Presentation:

🔹 Title: GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

🔸 Presenter: Dr Rohban

🌀 Abstract:
This innovative framework addresses the limitations of current image generation models in handling intricate text prompts and ensuring reliability through verification and self-correction mechanisms. Coordinated by a multimodal large language model (MLLM) agent, GenArtist integrates a diverse library of tools, enabling seamless task decomposition, step-by-step execution, and systematic self-correction. With its tree-structured planning and advanced use of position-related inputs, GenArtist achieves state-of-the-art performance, outperforming models like SDXL and DALL-E 3. This session will delve into the system’s architecture and its groundbreaking potential for advancing image generation and editing tasks.


📄 Papers: GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing


Session Details:
- 📅 Date: Wednesday
- 🕒 Time: 3:30 - 4:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
Research Week 1403.pdf
2.4 MB
با سلام. اسلایدهای ارائه هفته پژوهش در مورد مقاله نوریپس پذیرفته شده از RIML خدمت عزیزان تقدیم می‌شود. همینطور در این رشته توییت توضیحاتی در مورد مقاله داده‌ام: https://x.com/MhRohban/status/1867803097596338499
Forwarded from Arash
📣 TA Application Form

🤖 Deep Reinforcement Learning
🧑🏻‍🏫 Dr. Mohammad Hossein Rohban
Deadline: December 31th

https://docs.google.com/forms/d/e/1FAIpQLSduvRRAnwi6Ik9huMDFWOvZqAWhr7HHlHjXdZbst55zSv5Hmw/viewform
2025/01/13 06:58:45
Back to Top
HTML Embed Code: