Masafumi Yokoyama
null+****@clear*****
Sun Apr 19 03:37:19 JST 2015
Masafumi Yokoyama 2015-04-19 03:37:19 +0900 (Sun, 19 Apr 2015) New Revision: f3539b3be9442d493973f47022e40c1843598157 https://github.com/ranguba/rroonga/commit/f3539b3be9442d493973f47022e40c1843598157 Message: doc: migrate to Markdown from Textile GitHub: fix #48 Modified files: doc/text/tutorial.md Modified: doc/text/tutorial.md (+248 -351) =================================================================== --- doc/text/tutorial.md 2015-04-19 03:36:35 +0900 (da885fe) +++ doc/text/tutorial.md 2015-04-19 03:37:19 +0900 (95edac7) @@ -1,165 +1,122 @@ -h1. Tutorial +# Tutorial This page introduce how to use Rroonga via a simple application making. -h2. Install +## Install You can install Rroonga in your compter with RubyGems. -<pre> -!!!command_line -% sudo gem install rroonga -</pre> + % sudo gem install rroonga -h2. Create Database +## Create Database Let's create database for simple bookmark application. Please execute irb with loading Rroonga with this command: -<pre> -!!!irb -% irb --simple-prompt -r groonga ->> -</pre> + % irb --simple-prompt -r groonga + >> Now you use UTF-8 as the encoding of database. -<pre> -!!!irb ->> Groonga::Context.default_options = {:encoding => :utf8} -=> {:encoding=>:utf8} -</pre> + >> Groonga::Context.default_options = {:encoding => :utf8} + => {:encoding=>:utf8} Then, try to create database in a file. -<pre> -!!!irb ->> Groonga::Database.create(:path => "/tmp/bookmark.db") -=> #<Groonga::Database ...> -</pre> + >> Groonga::Database.create(:path => "/tmp/bookmark.db") + => #<Groonga::Database ...> From now, the created database is used implicitly. You don't have to be aware of it after you created a database first. -h2. Define table +## Define table Groonga supports 4 types of tables. -- Groonga::Hash := - Hash table. It manages records via each primary key. It supports - very quickly exact match search. - =: - -- Groonga::PatriciaTrie := - Patricia Trie. It supports some search such as predictive search and - common prefix search, but it provides a little slowly exact match search - than Groonga::Hash. It provides cursor to take records in ascending - or descending order. - =: - -- Groonga::DoubleArrayTrie := - Double Array Trie. It requires large spaces rather than other - tables, but it can update key without ID change. It provides exract - match search, predictive search and common prefix search and cursor - like Groonga::PatriciaTrie. - =: - -- Groonga::Array := - Array. It doesn't have primary keys. It manages records by ID. - =: - -Now, you use Groonga::Hash and create the table named @Items �� . The type +Groonga::Hash +: Hash table. It manages records via each primary key. It supports + very quickly exact match search. + +Groonga::PatriciaTrie +: Patricia Trie. It supports some search such as predictive search and + common prefix search, but it provides a little slowly exact match search + than Groonga::Hash. It provides cursor to take records in ascending + or descending order. + +Groonga::DoubleArrayTrie +: Double Array Trie. It requires large spaces rather than other + tables, but it can update key without ID change. It provides exract + match search, predictive search and common prefix search and cursor + like Groonga::PatriciaTrie. + +Groonga::Array +: Array. It doesn't have primary keys. It manages records by ID. + +Now, you use Groonga::Hash and create the table named `Items`. The type of its primary key is String. -<pre> -!!!irb ->> Groonga::Schema.create_table("Items", :type => :hash) -=> [...] -</pre> + >> Groonga::Schema.create_table("Items", :type => :hash) + => [...] -You have @Items@ table by this code. +You have `Items` table by this code. You can refer the defined table with Groonga.[] like below: -<pre> -!!!irb ->> items = Groonga["Items"] -=> #<Groonga::Hash ...> -</pre> + >> items = Groonga["Items"] + => #<Groonga::Hash ...> You can treat it like Hash. -For example, let's type****@items*****@ to get the number of records in +For example, let's type `items.size` to get the number of records in the table. -<pre> -!!!irb ->> items.size -=> 0 -</pre> + >> items.size + => 0 -h2. Add records +## Add records -Let's add records to @Items@ table. +Let's add records to `Items` table. -<pre> -!!!irb ->> items.add("http://en.wikipedia.org/wiki/Ruby") -=> #<Groonga::Record ...> ->> items.add("http://www.ruby-lang.org/") -=> #<Groonga::Record ...> -</pre> + >> items.add("http://en.wikipedia.org/wiki/Ruby") + => #<Groonga::Record ...> + >> items.add("http://www.ruby-lang.org/") + => #<Groonga::Record ...> Please check the number of records. It increases from 0 to 2. -<pre> -!!!irb ->> items.size -=> 2 -</pre> + >> items.size + => 2 If you can get record by primary key, type like below: -<pre> -!!!irb ->> items["http://en.wikipedia.org/wiki/Ruby"] -=> #<Groonga::Record ...> -</pre> + >> items["http://en.wikipedia.org/wiki/Ruby"] + => #<Groonga::Record ...> -h2. Full text search +## Full text search Let's add item's title to full text search. -first, you add the @Text@ type column "@title@" to @Items@ table. +first, you add the `Text` type column "`title`" to `Items` table. -<pre> -!!!irb ->> Groonga::Schema.change_table("Items") do |table| -?> table.text("title") ->> end -=> [...] -</pre> + >> Groonga::Schema.change_table("Items") do |table| + ?> table.text("title") + >> end + => [...] -Defined columns is named as @#{TABLE_NAME}.#{COLUMN_NAME}@. +Defined columns is named as `#{TABLE_NAME}.#{COLUMN_NAME}`. You can refer them with {Groonga.[]} as same as tables. -<pre> -!!!irb ->> title_column = Groonga["Items.title"] -=> #<Groonga::VariableSizeColumn ...> -</pre> + >> title_column = Groonga["Items.title"] + => #<Groonga::VariableSizeColumn ...> Secondly, let's add the table containing terms from splited from texts. -Then you define the @Terms@ for it. +Then you define the `Terms` for it. -<pre> -!!!irb ->> Groonga::Schema.create_table("Terms", -?> :type => :patricia_trie, -?> :normalizer => :NormalizerAuto, -?> :default_tokenizer => "TokenBigram") -</pre> + >> Groonga::Schema.create_table("Terms", + ?> :type => :patricia_trie, + ?> :normalizer => :NormalizerAuto, + ?> :default_tokenizer => "TokenBigram") -You specify @:default_tokenzier => "TokenBigram"@ for "Tokenizer" in +You specify `:default_tokenzier => "TokenBigram"` for "Tokenizer" in the above code. "Tokenizer" is the object to split terms from texts. The default value for it is none. @@ -169,71 +126,56 @@ Full text search with N-gram uses splited N characters and their position in texts. "N" in N-gram specifies the number of each terms. Groonga supports Unigram (N=1), Bigram (N=2) and Trigram (N=3). -You also specify @:normalizer => :NormalizerAuto@ to search texts with +You also specify `:normalizer => :NormalizerAuto` to search texts with ignoring the case. Now, you ready table for terms, so you define the index of - �� Items.tiltle@ column. +`Items.tiltle` column. -<pre> -!!!irb ->> Groonga::Schema.change_table("Terms") do |table| -?> table.index("Items.title") ->> end -=> [...] -</pre> + >> Groonga::Schema.change_table("Terms") do |table| + ?> table.index("Items.title") + >> end + => [...] -You may feel a few unreasonable code. The index of @Items@ table's -column is defined as the column in @Terms �� . +You may feel a few unreasonable code. The index of `Items` table's +column is defined as the column in `Terms`. -When a record is added to @Items@, groonga adds records associated -each terms in it to @Terms@ automatically. +When a record is added to `Items`, groonga adds records associated +each terms in it to `Terms` automatically. - �� Terms@ is a few particular table, but you can add some columns to term -table such as @Terms@ and manage many attributes of each terms. It is +`Terms` is a few particular table, but you can add some columns to term +table such as `Terms` and manage many attributes of each terms. It is very useful to process particular search. Now, you finished table definition. -Let's put some values to @title@ of each record you added before. +Let's put some values to `title` of each record you added before. -<pre> -!!!irb ->> items["http://en.wikipedia.org/wiki/Ruby"].title = "Ruby" -=> "Ruby" ->> items["http://www.ruby-lang.org/"].title = "Ruby Programming Language" -"Ruby Programming Language" -</pre> + >> items["http://en.wikipedia.org/wiki/Ruby"].title = "Ruby" + => "Ruby" + >> items["http://www.ruby-lang.org/"].title = "Ruby Programming Language" + "Ruby Programming Language" Now, you can do full text search like above: -<pre> -!!!irb ->> ruby_items = items.select {|record| record.title =~ "Ruby"} -=> #<Groonga::Hash ..., normalizer: (nil)> -</pre> + >> ruby_items = items.select {|record| record.title =~ "Ruby"} + => #<Groonga::Hash ..., normalizer: (nil)> Groonga returns the search result as Groonga::Hash. -Keys in this hash table is records of hitted @Items �� . +Keys in this hash table is records of hitted `Items`. -<pre> -!!!irb ->> ruby_items.collect {|record| record.key.key} -=> ["http://en.wikipedia.org/wiki/Ruby", "http://www.ruby-lang.org/"] -</pre> + >> ruby_items.collect {|record| record.key.key} + => ["http://en.wikipedia.org/wiki/Ruby", "http://www.ruby-lang.org/"] -In above example, you get records in @Items@ with****@recor*****@, and -keys of them with @record.key.key �� . +In above example, you get records in `Items` with `record.key`, and +keys of them with `record.key.key`. -You can access a refered key in records briefly with @record["_key"]@. +You can access a refered key in records briefly with `record["_key"]`. -<pre> -!!!irb ->> ruby_items.collect {|record| record["_key"]} -=> ["http://en.wikipedia.org/wiki/Ruby", "http://www.ruby-lang.org/"] -</pre> + >> ruby_items.collect {|record| record["_key"]} + => ["http://en.wikipedia.org/wiki/Ruby", "http://www.ruby-lang.org/"] -h2. Improve the simple bookmark application +## Improve the simple bookmark application Let's try to improve this simple application a little. You can create bookmark application for multi users and they can comment to each @@ -243,268 +185,223 @@ First, you add tables for users and for comments like below: !http://qwik.jp/senna/senna2.files/rect4605.png! -Let's add the table for users, @Users �� . +Let's add the table for users, `Users`. -<pre> -!!!irb ->> Groonga::Schema.create_table("Users", :type => :hash) do |table| -?> table.text("name") ->> end -=> [...] -</pre> + >> Groonga::Schema.create_table("Users", :type => :hash) do |table| + ?> table.text("name") + >> end + => [...] -Next, let's add the table for comments as @Comments �� . +Next, let's add the table for comments as `Comments`. -<pre> -!!!irb ->> Groonga::Schema.create_table("Comments") do |table| -?> table.reference("item") ->> table.reference("author", "Users") ->> table.text("content") ->> table.time("issued") ->> end -=> [...] -</pre> + >> Groonga::Schema.create_table("Comments") do |table| + ?> table.reference("item") + >> table.reference("author", "Users") + >> table.text("content") + >> table.time("issued") + >> end + => [...] -Then you define the index of @content@ column in @Comments@ for full +Then you define the index of `content` column in `Comments` for full text search. -<pre> -!!!irb ->> Groonga::Schema.change_table("Terms") do |table| -?> table.index("Comments.content") ->> end -=> [...] -</pre> + >> Groonga::Schema.change_table("Terms") do |table| + ?> table.index("Comments.content") + >> end + => [...] You finish table definition by above code. -Secondly, you add some users to @Users �� . +Secondly, you add some users to `Users`. -<pre> -!!!irb ->> users = Groonga["Users"] -=> #<Groonga::Hash ...> ->> users.add("alice", :name => "Alice") -=> #<Groonga::Record ...> ->> users.add("bob", :name => "Bob") -=> #<Groonga::Record ...> -</pre> + >> users = Groonga["Users"] + => #<Groonga::Hash ...> + >> users.add("alice", :name => "Alice") + => #<Groonga::Record ...> + >> users.add("bob", :name => "Bob") + => #<Groonga::Record ...> Now, let's write the process to bookmark by a user. -You assume that the user, @moritan@, bookmark a page including +You assume that the user, `moritan`, bookmark a page including infomation related Ruby. -First, you check if the page has been added @Items@ already. +First, you check if the page has been added `Items` already. -<pre> -!!!irb ->> items.has_key?("http://www.ruby-doc.org/") -=> false -</pre> + >> items.has_key?("http://www.ruby-doc.org/") + => false -The page hasn't been added, so you add it to @Items �� . +The page hasn't been added, so you add it to `Items`. -<pre> -!!!irb ->> items.add("http://www.ruby-doc.org/", -?> :title => "Ruby-Doc.org: Documenting the Ruby Language") + >> items.add("http://www.ruby-doc.org/", + ?> :title => "Ruby-Doc.org: Documenting the Ruby Language") => #<Groonga::Record ...> -</pre> - -Next, you add the record to @Comments �� . This record contains this page -as its @item@ column. - -<pre> -!!!irb ->> require "time" -=> true ->> comments = Groonga["Comments"] -=> #<Groonga::Array ...> ->> comments.add(:item => "http://www.ruby-doc.org/", -?> :author => "alice", -?> :content => "Ruby documents", -?> :issued => Time.parse("2010-11-20T18:01:22+09:00")) -=> #<Groonga::Record ...> -</pre> -h2. Define methods for this process +Next, you add the record to `Comments`. This record contains this page +as its `item` column. + + >> require "time" + => true + >> comments = Groonga["Comments"] + => #<Groonga::Array ...> + >> comments.add(:item => "http://www.ruby-doc.org/", + ?> :author => "alice", + ?> :content => "Ruby documents", + ?> :issued => Time.parse("2010-11-20T18:01:22+09:00")) + => #<Groonga::Record ...> + +## Define methods for this process For usefull, you define methods for above processes. -<pre> -!!!irb ->> @items = items -=> #<Groonga::Hash ...> ->> @comments = comments -=> #<Groonga::Array ...> ->> def add_bookmark(url, title, author, content, issued) ->> item = @items[url] || @items.add(url, :title => title) ->> @comments.add(:item => item, -?> :author => author, -?> :content => content, -?> :issued => issued) ->> end -=> nil -</pre> - -You assign @items@ and @comments@ to each instance variable, so you can -use them in @add_bookmark@ method. - - �� add_bookmark@ executes processes like below: - -* Check if the record associated the page exists in @Items@ table. + >> @items = items + => #<Groonga::Hash ...> + >> @comments = comments + => #<Groonga::Array ...> + >> def add_bookmark(url, title, author, content, issued) + >> item = @items[url] || @items.add(url, :title => title) + >> @comments.add(:item => item, + ?> :author => author, + ?> :content => content, + ?> :issued => issued) + >> end + => nil + +You assign `items` and `comments` to each instance variable, so you can +use them in `add_bookmark` method. + +`add_bookmark` executes processes like below: + +* Check if the record associated the page exists in `Items` table. * If not, add the record to it. -* Add the record to @Comments@ table. +* Add the record to `Comments` table. With this method, lets bookmark some pages. -<pre> -!!!irb ->> add_bookmark("https://rubygems.org/", -?> "RubyGems.org | your community gem host", "alice", "Ruby gems", -?> Time.parse("2010-10-07T14:18:28+09:00")) -=> #<Groonga::Record ...> ->> add_bookmark("http://ranguba.org/", -?> "Fulltext search by Ruby with groonga - Ranguba", "bob", -?> "Ruby groonga fulltextsearch", -?> Time.parse("2010-11-11T12:39:59+09:00")) -=> #<Groonga::Record ...> ->> add_bookmark("http://www.ruby-doc.org/", -?> "ruby-doc", "bob", "ruby documents", -?> Time.parse("2010-07-28T20:46:23+09:00")) -=> #<Groonga::Record ...> -</pre> - -h2. Full text search part 2 + >> add_bookmark("https://rubygems.org/", + ?> "RubyGems.org | your community gem host", "alice", "Ruby gems", + ?> Time.parse("2010-10-07T14:18:28+09:00")) + => #<Groonga::Record ...> + >> add_bookmark("http://ranguba.org/", + ?> "Fulltext search by Ruby with groonga - Ranguba", "bob", + ?> "Ruby groonga fulltextsearch", + ?> Time.parse("2010-11-11T12:39:59+09:00")) + => #<Groonga::Record ...> + >> add_bookmark("http://www.ruby-doc.org/", + ?> "ruby-doc", "bob", "ruby documents", + ?> Time.parse("2010-07-28T20:46:23+09:00")) + => #<Groonga::Record ...> + +## Full text search part 2 Let's do full text search for added records. -<pre> -!!!irb ->> records = comments.select do |record| -?> record["content"] =~ "Ruby" ->> end -=> #<Groonga::Hash ...> ->> records.each do |record| -?> comment = record ->> p [comment.id, -?> comment.issued, -?> comment.item.title, -?> comment.author.name, -?> comment.content] ->> end -[1, 2010-11-20 18:01:22 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Alice", "Ruby documents"] -[2, 2010-10-07 14:18:28 +0900, "RubyGems.org | your community gem host", "Alice", "Ruby gems"] -[3, 2010-11-11 12:39:59 +0900, "Fulltext search by Ruby with groonga - Ranguba", "Bob", "Ruby groonga fulltextsearch"] -[4, 2010-07-28 20:46:23 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Bob", "ruby documents"] -</pre> + >> records = comments.select do |record| + ?> record["content"] =~ "Ruby" + >> end + => #<Groonga::Hash ...> + >> records.each do |record| + ?> comment = record + >> p [comment.id, + ?> comment.issued, + ?> comment.item.title, + ?> comment.author.name, + ?> comment.content] + >> end + [1, 2010-11-20 18:01:22 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Alice", "Ruby documents"] + [2, 2010-10-07 14:18:28 +0900, "RubyGems.org | your community gem host", "Alice", "Ruby gems"] + [3, 2010-11-11 12:39:59 +0900, "Fulltext search by Ruby with groonga - Ranguba", "Bob", "Ruby groonga fulltextsearch"] + [4, 2010-07-28 20:46:23 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Bob", "ruby documents"] You can access the columns with the same name method as each them. These methods suport to access the complex data type. -(In usually RDB, you should namage JOIN tables, @Items@, @Comments@, - �� Users@.) +(In usually RDB, you should namage JOIN tables, `Items`, `Comments`, +`Users`.) The search is finished when the first sentence in this codes. The results of this search is the object as records set. -<pre> -!!!irb ->> records -#<Groonga::Hash ..., size: <4>> -</pre> + >> records + #<Groonga::Hash ..., size: <4>> You can arrange this records set before output. For example, sort these records in the descending order by date. -<pre> -!!!irb ->> records.sort([{:key => "issued", :order => "descending"}]).each do |record| -?> comment = record ->> p [comment.id, -?> comment.issued, -?> comment.item.title, -?> comment.author.name, -?> comment.content] ->> end -[1, 2010-11-20 18:01:22 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Alice", "Ruby documents"] -[2, 2010-11-11 12:39:59 +0900, "Fulltext search by Ruby with groonga - Ranguba", "Bob", "Ruby groonga fulltextsearch"] -[3, 2010-10-07 14:18:28 +0900, "RubyGems.org | your community gem host", "Alice", "Ruby gems"] -[4, 2010-07-28 20:46:23 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Bob", "ruby documents"] -=> [...] -</pre> + >> records.sort([{:key => "issued", :order => "descending"}]).each do |record| + ?> comment = record + >> p [comment.id, + ?> comment.issued, + ?> comment.item.title, + ?> comment.author.name, + ?> comment.content] + >> end + [1, 2010-11-20 18:01:22 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Alice", "Ruby documents"] + [2, 2010-11-11 12:39:59 +0900, "Fulltext search by Ruby with groonga - Ranguba", "Bob", "Ruby groonga fulltextsearch"] + [3, 2010-10-07 14:18:28 +0900, "RubyGems.org | your community gem host", "Alice", "Ruby gems"] + [4, 2010-07-28 20:46:23 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Bob", "ruby documents"] + => [...] Let's group the result by each item for easy view. -<pre> -!!!irb ->> records.group("item").each do |record| -?> item = record.key ->> p [record.n_sub_records, -?> item.key, -?> item.title] ->> end -[2, "http://www.ruby-doc.org/", "Ruby-Doc.org: Documenting the Ruby Language"] -[1, "https://rubygems.org/", "RubyGems.org | your community gem host"] -[1, "http://ranguba.org/", "Fulltext search by Ruby with groonga - Ranguba"] -=> nil -</pre> - - �� n_sub_records@ is the number of records in each group. + >> records.group("item").each do |record| + ?> item = record.key + >> p [record.n_sub_records, + ?> item.key, + ?> item.title] + >> end + [2, "http://www.ruby-doc.org/", "Ruby-Doc.org: Documenting the Ruby Language"] + [1, "https://rubygems.org/", "RubyGems.org | your community gem host"] + [1, "http://ranguba.org/", "Fulltext search by Ruby with groonga - Ranguba"] + => nil + +`n_sub_records` is the number of records in each group. It is similar value as count() function of a query including "GROUP BY" in SQL. -h2. more complex search +## more complex search Now, you challenge the more useful search. You should calcurate goodness of fit of search explicitly. -You can use****@Items*****@ and****@Comme*****@ as search targets now. - �� Items.title@ is the a few reliable information taken from each -original pages. On the other hands, @Comments.content@ is the less +You can use `Items.title` and `Comments.content` as search targets now. +`Items.title` is the a few reliable information taken from each +original pages. On the other hands, `Comments.content` is the less reliable information because this depends on users of bookmark application. Then, you search records with this policy: -* Search item match****@Items*****@ or @Comments.content �� . +* Search item matched `Items.title` or `Comments.content`. * Add 10 times heavier weight to socres of each record matched - @Items.title@ than ones of @Comments.comment �� . -* If multi @comment@ of one item are matched keyword, specify the sum - of scores of each @coments@ as score of the item. + `Items.title` than ones of `Comments.comment`. +* If multi `comment` of one item are matched keyword, specify the sum + of scores of each `coments` as score of the item. On this policy, you try to type below: -<pre> -!!!irb ->> ruby_comments =****@comme***** {|record| record.content =~ "Ruby"} -=> #<Groonga::Hash ..., size: <4> ->> ruby_items =****@items***** do |record| -?> target = record.match_target do |match_record| -?> match_record.title * 10 ->> end ->> target =~ "Ruby" ->> end -#<Groonga::Hash ..., size: <4>> -</pre> - -You group the results of _ruby_comments_ in each item and union -_ruby_items_ . - -<pre> -!!!irb ->> ruby_items = ruby_comments.group("item").union!(ruby_items) + >> ruby_comments =****@comme***** {|record| record.content =~ "Ruby"} + => #<Groonga::Hash ..., size: <4> + >> ruby_items =****@items***** do |record| + ?> target = record.match_target do |match_record| + ?> match_record.title * 10 + >> end + >> target =~ "Ruby" + >> end + #<Groonga::Hash ..., size: <4>> + +You group the results of *ruby_comments* in each item and union +*ruby_items* . + + >> ruby_items = ruby_comments.group("item").union!(ruby_items) #<Groonga::Hash ..., size: <5>> ->> ruby_items.sort([{:key => "_score", :order => "descending"}]).each do |record| ->> p [record.score, record.title] ->> end -[22, "Ruby-Doc.org: Documenting the Ruby Language"] -[11, "Fulltext search by Ruby with groonga - Ranguba"] -[10, "Ruby Programming Language"] -[10, "Ruby"] -[1, "RubyGems.org | your community gem host"] -</pre> + >> ruby_items.sort([{:key => "_score", :order => "descending"}]).each do |record| + >> p [record.score, record.title] + >> end + [22, "Ruby-Doc.org: Documenting the Ruby Language"] + [11, "Fulltext search by Ruby with groonga - Ranguba"] + [10, "Ruby Programming Language"] + [10, "Ruby"] + [1, "RubyGems.org | your community gem host"] Then, you get the result. -------------- next part -------------- HTML����������������������������...다운로드