Spaces:

Omartificial-Intelligence-Space
/

context-caching-gemini-pdf-qa

Sleeping

App Files Files Community

context-caching-gemini-pdf-qa / README.md

Omartificial-Intelligence-Space

Create README.md

35d7319 verified 9 months ago

preview code

raw

history blame contribute delete

7.72 kB

	---
	license: apache-2.0
	title: Long Context Caching Gemini PDF QA
	sdk: docker
	emoji: 📚
	colorFrom: yellow
	---
	# 📚 Smart Document Analysis Platform

	A modern web application that leverages Google Gemini API's caching capabilities to provide efficient document analysis. Upload documents once, ask questions forever!

	## 🚀 Features

	- Document Upload: Upload PDF files via drag-and-drop or URL
	- Gemini API Caching: Documents are cached using Gemini's explicit caching feature
	- Cost-Effective: Save on API costs by reusing cached document tokens
	- Real-time Chat: Ask multiple questions about your documents
	- Beautiful UI: Modern, responsive design with smooth animations
	- Token Tracking: See how many tokens are cached for cost transparency
	- Smart Error Handling: Graceful handling of small documents that don't meet caching requirements

	## 🎯 Use Cases

	This platform is perfect for:

	- Research Analysis: Upload research papers and ask detailed questions
	- Legal Document Review: Analyze contracts, legal documents, and policies
	- Academic Studies: Study course materials and textbooks
	- Business Reports: Analyze quarterly reports, whitepapers, and presentations
	- Technical Documentation: Review manuals, specifications, and guides

	## ⚡️ Deploy on Hugging Face Spaces

	You can deploy this app on [Hugging Face Spaces](https://huggingface.co/spaces) using the Docker SDK.

	### 1. Select Docker SDK
	- When creating your Space, choose Docker (not Gradio, not Static).

	### 2. Project Structure
	Make sure your repo includes:
	- `app.py` (Flask app)
	- `requirements.txt`
	- `Dockerfile`
	- `.env.example` (for reference, do not include secrets)

	### 3. Dockerfile
	A sample Dockerfile is provided:
	```dockerfile
	FROM python:3.10-slim
	WORKDIR /app
	RUN apt-get update && apt-get install -y build-essential && rm -rf /var/lib/apt/lists/*
	COPY requirements.txt .
	RUN pip install --no-cache-dir -r requirements.txt
	COPY . .
	EXPOSE 7860
	CMD ["python", "app.py"]
	```

	### 4. Port Configuration
	The app will run on the port provided by the `PORT` environment variable (default 7860), as required by Hugging Face Spaces.

	### 5. Set Environment Variables
	- In your Space settings, add your `GOOGLE_API_KEY` as a secret environment variable.

	### 6. Push to Hugging Face
	- Push your code to the Space's Git repository.
	- The build and deployment will happen automatically.

	---

	## 📋 Prerequisites

	- Python 3.8 or higher
	- Google Gemini API key
	- Internet connection for API calls

	## 🔧 Local Installation

	1. Clone the repository
	```bash
	git clone <repository-url>
	cd smart-document-analysis
	```

	2. Install dependencies
	```bash
	pip install -r requirements.txt
	```

	3. Set up environment variables
	```bash
	cp .env.example .env
	```

	Edit `.env` and add your Google Gemini API key:
	```
	GOOGLE_API_KEY=your_actual_api_key_here
	```

	4. Get your API key
	- Visit [Google AI Studio](https://makersuite.google.com/app/apikey)
	- Create a new API key
	- Copy it to your `.env` file

	## 🚀 Running the Application Locally

	1. Start the server
	```bash
	python app.py
	```

	2. Open your browser
	Navigate to `http://localhost:7860`

	3. Upload a document
	- Drag and drop a PDF file, or
	- Click to select a file, or
	- Provide a URL to a PDF

	4. Start asking questions
	Once your document is cached, you can ask unlimited questions!

	## 💡 How It Works

	### 1. Document Upload
	When you upload a PDF, the application:
	- Uploads the file to Gemini's File API
	- Checks if the document meets minimum token requirements (4,096 tokens)
	- If eligible, creates a cache with the document content
	- If too small, provides helpful error message and suggestions
	- Stores cache metadata locally
	- Returns a cache ID for future reference

	### 2. Question Processing
	When you ask a question:
	- The question is sent to Gemini API
	- The cached document content is automatically included
	- You only pay for the question tokens, not the document tokens
	- Responses are generated based on the cached content

	### 3. Cost Savings
	- Without caching: You pay for document tokens + question tokens every time
	- With caching: You pay for document tokens once + question tokens for each question

	## 🔍 API Endpoints

	- `GET /` - Main application interface
	- `POST /upload` - Upload PDF file
	- `POST /upload-url` - Upload PDF from URL
	- `POST /ask` - Ask question about cached document
	- `GET /caches` - List all cached documents
	- `DELETE /cache/<cache_id>` - Delete specific cache

	## 📊 Cost Analysis

	### Example Scenario
	- Document: 10,000 tokens
	- Question: 50 tokens
	- 10 questions asked

	Without Caching:
	- Cost = (10,000 + 50) × 10 = 100,500 tokens

	With Caching:
	- Cost = 10,000 + (50 × 10) = 10,500 tokens
	- Savings: 90% cost reduction!

	### Token Requirements
	- Minimum for caching: 4,096 tokens
	- Recommended minimum: 5,000 tokens for cost-effectiveness
	- Optimal range: 10,000 - 100,000 tokens
	- Maximum: Model-specific limits (check Gemini API docs)

	## 🎨 Customization

	### Changing the Model
	Edit `app.py` and change the model name:
	```python
	model="models/gemini-2.0-flash-001" # Current
	model="models/gemini-2.0-pro-001" # Alternative
	```

	### Custom System Instructions
	Modify the system instruction in the cache creation:
	```python
	system_instruction="Your custom instruction here"
	```

	### Cache TTL
	Add TTL configuration to cache creation:
	```python
	config=types.CreateCachedContentConfig(
	system_instruction=system_instruction,
	contents=[document],
	ttl='24h' # Cache for 24 hours
	)
	```

	## 🔒 Security Considerations

	- API keys are stored in environment variables
	- File uploads are validated for PDF format
	- Cached content is managed securely through Gemini API
	- No sensitive data is stored locally

	## 🚧 Production Deployment

	For production deployment:

	1. Use a production WSGI server
	```bash
	pip install gunicorn
	gunicorn -w 4 -b 0.0.0.0:7860 app:app
	```

	2. Add database storage
	- Replace in-memory storage with PostgreSQL/MySQL
	- Add user authentication
	- Implement session management

	3. Add monitoring
	- Log API usage and costs
	- Monitor cache hit rates
	- Track user interactions

	4. Security enhancements
	- Add rate limiting
	- Implement file size limits
	- Add input validation

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes
	4. Add tests if applicable
	5. Submit a pull request

	## 📝 License

	This project is licensed under the MIT License - see the LICENSE file for details.

	## 🙏 Acknowledgments

	- Google Gemini API for providing the caching functionality
	- Flask community for the excellent web framework
	- The open-source community for inspiration and tools

	## 📞 Support

	If you encounter any issues:

	1. Check the [Gemini API documentation](https://ai.google.dev/docs)
	2. Verify your API key is correct
	3. Ensure your PDF files are valid
	4. Check the browser console for JavaScript errors
	5. For small document errors: Upload a larger document or combine multiple documents

	## 🔮 Future Enhancements

	- [ ] Support for multiple file formats (Word, PowerPoint, etc.)
	- [ ] User authentication and document sharing
	- [ ] Advanced analytics and usage tracking
	- [ ] Integration with cloud storage (Google Drive, Dropbox)
	- [ ] Mobile app version
	- [ ] Multi-language support
	- [ ] Advanced caching strategies
	- [ ] Real-time collaboration features
	- [ ] Document preprocessing to meet token requirements
	- [ ] Batch document processing