[Groonga-commit] ranguba/chupa-text-decomposer-pdf at d6cefcf [master] Add new line between page

Back to archive index

Kouhei Sutou null+****@clear*****
Sun Feb 16 22:52:41 JST 2014


Kouhei Sutou	2014-02-16 22:52:41 +0900 (Sun, 16 Feb 2014)

  New Revision: d6cefcf01717126b2eed02e023cc2dfb04a011c1
  https://github.com/ranguba/chupa-text-decomposer-pdf/commit/d6cefcf01717126b2eed02e023cc2dfb04a011c1

  Message:
    Add new line between page

  Modified files:
    lib/chupa-text/decomposers/pdf.rb
    test/test-pdf.rb

  Modified: lib/chupa-text/decomposers/pdf.rb (+4 -1)
===================================================================
--- lib/chupa-text/decomposers/pdf.rb    2014-01-05 16:06:58 +0900 (49b1981)
+++ lib/chupa-text/decomposers/pdf.rb    2014-02-16 22:52:41 +0900 (94c864f)
@@ -32,7 +32,10 @@ module ChupaText
         document = Poppler::Document.new(data.body)
         text = ""
         document.each do |page|
-          text << page.get_text
+          page_text = page.get_text
+          next if page_text.empty?
+          text << "\n" unless text.empty?
+          text << page_text
         end
         text_data = TextData.new(text)
         text_data.uri = data.uri

  Modified: test/test-pdf.rb (+1 -1)
===================================================================
--- test/test-pdf.rb    2014-01-05 16:06:58 +0900 (212fef7)
+++ test/test-pdf.rb    2014-02-16 22:52:41 +0900 (1b71db4)
@@ -125,7 +125,7 @@ class TestPDF < Test::Unit::TestCase
 
     sub_test_case("multi pages") do
       def test_body
-        assert_equal(["Page1Page2"], decompose.collect(&:body))
+        assert_equal(["Page1\nPage2"], decompose.collect(&:body))
       end
 
       private
-------------- next part --------------
HTML����������������������������...
다운로드 



More information about the Groonga-commit mailing list
Back to archive index